The Linux Kernel's AI Contribution Rules Are Really About Code Ownership
Source: hackernews
The Linux kernel doesn’t ban AI coding tools. It doesn’t require special tooling to detect them, and it doesn’t disqualify patches that were written with their assistance. What the new Documentation/process/coding-assistants.rst does is something more pointed: it makes explicit that using an AI assistant does not transfer, dilute, or share the developer’s responsibility for the code.
That sounds obvious until you consider how many teams treat AI-generated code in practice, accepting suggestions without reading them carefully, submitting output with only surface-level testing, treating plausible-looking code as correct code. The kernel formalizes what serious developers already knew, and in doing so clarifies what that actually demands.
The DCO Is Unambiguous
Every patch merged into the Linux kernel carries a Signed-off-by: tag in its commit message. This is not a convention. It is an assertion by the developer that they have read and agreed to the Developer Certificate of Origin, a legal statement certifying that the contribution is either their own original work or that they have the right to submit it, that it does not violate any known third-party rights, and that they understand it will be publicly recorded and licensed under the kernel’s terms.
When a developer submits a patch, they sign off on every line. The AI tool does not sign off. It cannot. The kernel’s guidance on AI assistance does not create a new tag for AI involvement in the way that Co-developed-by:, Reviewed-by:, and Tested-by: formalize other kinds of contribution. Instead, it requires human disclosure in the cover letter or commit message and makes clear that the Signed-off-by: from the human developer covers the whole submission.
This matters because the DCO has teeth. If AI-generated code contains fragments derived from training data with incompatible licensing, the developer who signed off on it is the one who made the legal assertion. The review chain, which might involve a subsystem maintainer, a maintainer for a subsystem tree like net-next or drm-fixes, and ultimately Linus himself, does not catch licensing contamination reliably. They are reviewing for correctness and style, not running license provenance analysis on every submitted hunk.
License Contamination Is a Specific Problem in GPL Codebases
The Linux kernel is licensed under GPL v2. Companies ship it, modify it, and distribute it. The licensing constraints are not theoretical. They have enforcement bodies behind them, and high-profile violations have resulted in legal action.
AI tools trained on public code have ingested repositories with a wide range of licenses: MIT, BSD, Apache 2.0, GPL v2, GPL v3, and many others. When a model reproduces a pattern from a BSD-licensed implementation of, say, a ring buffer or a hash table, the output might look original to a reviewer but carry provenance the developer cannot verify. The kernel’s concern about this is grounded. Other projects using permissive licensing have somewhat less at stake, but the kernel has GPLv2 specifically and a contributor base that includes both individual volunteers and engineers at major corporations with legal departments watching.
The guidance in coding-assistants.rst does not solve this problem. There is no current reliable way to audit AI-generated code for training data provenance. What it does is put the responsibility squarely on the developer to understand and vouch for the code they submit, which is at minimum the correct framing.
Kernel Code Breaks in Ways AI Tools Do Not Anticipate
The subtler technical concern in the guidance is that AI tools produce plausible code, and in kernel development, plausible is not the same as correct.
Kernel code operates under constraints that do not appear in most software. Memory accessed from interrupt context cannot sleep. Spinlocks cannot be held across code that might schedule. RCU read-side critical sections have rules about what can happen inside them. Reference counting disciplines must be followed exactly or you get use-after-free bugs that appear rarely and corrupt memory silently. smp_mb(), smp_rmb(), and smp_wmb() are not interchangeable, and the difference matters on POWER and ARM architectures even when x86’s strong memory model papers over the error in testing.
A language model trained on kernel code has seen these patterns. It can reproduce them. But it does not understand the invariants. It will produce a patch that looks like it handles the locking correctly, and a reviewer who is not paying careful attention might agree. The failure surfaces only under specific scheduling interleavings or on architectures with weaker memory ordering guarantees.
This is why the kernel documentation asks developers to disclose AI involvement. It’s a signal to reviewers to scrutinize more carefully. Maintainers like Greg Kroah-Hartman, who oversees the stable kernel trees and the driver subsystem, have been direct about rejecting patches that appear to have been generated without the developer properly understanding them. The review process is rigorous precisely because the failure modes are subtle, and AI tools increase the risk of submitting code that passes casual review but breaks in production.
The Patch Process Already Selects for Understanding
The kernel contribution workflow is one of the more demanding in open source. Patches are submitted via email to the Linux Kernel Mailing List, formatted with git format-patch and sent with git send-email. Cover letters explain the motivation, the approach, and any open questions. Reviewers reply inline to specific lines of the patch. The developer is expected to respond, address concerns, and resubmit revised versions, sometimes many times.
This process, which many developers find painful, is well-suited to catching the failure mode of AI-assisted patches. A developer who does not understand their own code will struggle to respond to technical questions from a subsystem maintainer. They will give vague answers, miss the point of the concern, or submit a revised patch that addresses the symptom but not the underlying issue. The LKML process surfaces this quickly.
The formal guidance on AI tools extends this naturally. Disclosure in the cover letter signals to reviewers that extra scrutiny is warranted. The developer’s response to review comments reveals whether they understand the code they’re submitting. The bar for acceptance does not change.
What Other Projects Should Take From This
The kernel’s documentation is specific to its own context, but the underlying framework applies broadly. Any codebase where correctness requirements exceed what tests can verify, which includes most systems code, any security-critical path, and any code with complex invariants, has the same fundamental issue with AI-assisted development.
The right frame is not whether to allow AI tools but what disclosure and understanding standards to apply. The kernel’s answer is: disclose usage, the developer remains fully responsible, and the quality bar is unchanged. That’s a workable policy for any serious project.
Projects that do not address this explicitly are not avoiding the question. They are just leaving it unresolved, which means every team answers it differently and reviewers have no shared frame of reference when they suspect a patch was generated rather than written.
The kernel documented its position. That alone puts it ahead of most of the ecosystem.