The Linux Kernel's AI Policy Is Really About Accountability, Not Tools
Source: lobsters
The Linux kernel recently formalized what many maintainers had been communicating informally for years: if you use an AI coding assistant to help write a patch, you need to say so. The new Documentation/process/coding-assistants.rst document is not a ban, not a manifesto, and not a reaction to hype. It is a policy document that fits precisely into the kernel’s existing contributor accountability framework, and reading it carefully tells you more about how the kernel development process works than it does about AI.
The Developer Certificate of Origin Is the Load-Bearing Beam
To understand why the kernel cares about AI disclosure, you have to understand the Developer Certificate of Origin. Every patch submitted to the Linux kernel must carry a Signed-off-by line with the submitter’s real name and email address. That signature is a legal attestation under the DCO, certifying that the code was written by the person signing it, or that they have the right to submit it under the applicable open-source license, and that they understand the code they are certifying.
That third point is where AI tooling creates friction. When you use GitHub Copilot or any LLM to generate a function body and then sign off on the resulting patch, you are certifying something about code whose origin is partially opaque. The kernel’s position is not that AI-generated code is automatically bad, but that the Signed-off-by attestation carries specific meaning, and that meaning requires the signer to actually understand what they are submitting.
This is a coherent position. The DCO exists precisely because the kernel has been the target of legal challenges around copyright provenance. SCO Group’s litigation in the early 2000s, though ultimately unsuccessful, made the kernel community intensely aware of the importance of clean code lineage. A process that systematically obscures provenance is a problem regardless of what is generating the code.
What the Policy Actually Says
The document takes a pragmatic position. It does not prohibit AI assistance. It requires disclosure when AI tools were used to generate substantial portions of a submission. It also restates the existing expectation that contributors must understand the code they submit and be able to defend it during review.
The practical implication is that using Copilot to autocomplete a simple loop is different from using ChatGPT to generate an entire driver subsystem. The former is a development convenience; the latter produces a patch the submitter may not be able to fully explain when a maintainer asks questions. Kernel maintainers ask a lot of questions.
There is also a copyright angle the document addresses carefully. AI tools trained on existing code raise unresolved questions about the licensing status of their outputs. The kernel is licensed under GPLv2, and there is ongoing legal uncertainty, reinforced by cases like the GitHub Copilot class action, about whether code generated by models trained on GPL-licensed repositories carries any GPL obligations. The kernel community’s response to unresolved legal uncertainty is characteristically conservative: require disclosure so that the issue can be examined case by case rather than swept under the rug.
Why Kernel Development Is Especially Hostile to LLM Output
The problems that led to this policy were practical before they were philosophical. Starting around 2022, kernel maintainers began receiving patches that exhibited a specific failure mode: syntactically plausible code that referenced functions, structures, or kernel APIs that did not exist, had been removed, or were used in the wrong subsystem context.
Greg Kroah-Hartman and other subsystem maintainers were public about this. The patches often had the surface appearance of legitimate contributions. The commit messages used correct terminology. The code compiled in some configurations. But the underlying logic reflected a model interpolating from training data rather than an engineer who had read the relevant subsystem documentation and understood the call semantics.
This is a particularly acute problem in kernel development for several reasons.
No runtime safety net. In userspace, a hallucinated API call produces a compile error or a runtime exception. In the kernel, the cost of incorrect assumptions about memory ownership, locking discipline, or interrupt context can be a silent corruption that surfaces hours later as an inexplicable crash in a completely different subsystem.
Subsystem boundaries are not obvious from code. A model trained on kernel source has seen a lot of code, but the rules about which functions can be called from which contexts, which locks must be held, and which memory allocations are permitted under which conditions are documented in comments, in Documentation/, and in maintainer tribal knowledge. They are not mechanically enforceable from the source text alone.
The review burden is asymmetric. Generating a plausible-looking patch takes seconds. Reviewing it thoroughly enough to catch subtle locking errors or RCU misuse takes significantly longer. A maintainer who receives ten AI-generated patches with subtle bugs in each has had their time consumed by a process that produced no value and potentially introduced regressions.
The UMN Affair Set the Tone
The kernel community’s sensitivity to this problem has a specific historical anchor. In 2021, researchers from the University of Minnesota submitted intentionally flawed patches to the kernel as part of a study on the code review process. When this was discovered, the entire university’s contribution history was reverted, and the kernel community adopted new rules requiring good-faith participation from all contributors.
That episode, documented in Greg Kroah-Hartman’s public response on LKML, established a clear norm: the kernel review process depends on every participant acting in good faith, and automated or deceptive submissions break the social contract that makes distributed kernel development function. AI-generated code that the submitter does not understand is not the same thing as intentionally malicious code, but it creates a similar problem for maintainers: they cannot trust the Signed-off-by as a genuine attestation of understanding.
The Policy in Context of Other Open Source Projects
The Linux kernel is not alone in thinking about this. The Python Software Foundation and various Python projects have discussed similar policies. The Apache Software Foundation’s legal team has written about the copyright ambiguity of AI-generated contributions. The FSF has staked out a harder position, arguing that training data provenance issues with current models make AI-generated code risky to include in copyleft projects.
What makes the Linux kernel’s approach notable is its grounding in the existing contribution infrastructure rather than in abstract principles. The DCO already required attestation. The Signed-off-by already created accountability. The new policy extends that existing system to cover a new class of tooling rather than creating a parallel governance structure.
Compare this to how a project without strong contributor provenance requirements might handle the same problem: with an ad-hoc blanket policy or a vague “use your judgment” recommendation. The kernel’s answer is cleaner because the underlying infrastructure was already there.
What This Means for AI-Assisted Systems Programming More Broadly
The kernel’s position represents a coherent middle ground that will probably influence how other mature open-source projects with strong quality and provenance requirements approach the same question. The answer is not “ban AI tools” and it is not “anything goes”; it is “the contributor is still accountable, and they must disclose what they used to produce the work.”
For developers who work in systems programming contexts, this is a reasonable baseline. Using an LLM to explore an unfamiliar part of a codebase, to draft documentation, or to suggest approaches is different from treating generated code as a finished contribution. The kernel’s policy formalizes a distinction that thoughtful developers were already making informally.
The harder question, which the policy does not resolve, is what happens as AI tooling improves. If a model can generate correct, well-attributed, subsystem-aware kernel patches that a maintainer cannot distinguish from human-written code, the disclosure requirement becomes both more important and harder to enforce. The kernel’s answer to this seems to be the same as its answer to everything else: trust the contributor’s attestation, hold them accountable when it fails, and improve the review process continuously.
That is not a technologically sophisticated answer, but it is a socially sustainable one, and the Linux kernel has been running on socially sustainable answers for over thirty years.