Signed-Off-By Means You: The Kernel's Stance on AI-Generated Code

The Linux kernel now has an official document on AI coding assistants, placed under Documentation/process/ next to the patch submission guides and coding style documentation. It was merged during the v6.7/v6.8 development cycle, authored primarily by Jonathan Corbet, the kernel’s documentation maintainer and the person behind LWN.net. The document does not prohibit using AI tools to learn the codebase, explore unfamiliar subsystems, or draft exploratory code. Its restriction is narrower: you must be able to fully understand, defend, and certify the provenance of every line you submit upstream, regardless of how it was generated.

That requirement has existed since 2004, encoded in the Signed-off-by: tag that every kernel commit must carry.

The DCO Already Covered This

The Developer Certificate of Origin, adopted by the Linux kernel in 2004 following the SCO litigation that threatened the project’s legal standing, is a short attestation. When a developer adds Signed-off-by: Name <email> to a commit, they certify that the code was written by them, or that they have the right to submit it, and that it can be distributed under the project’s license. The full text is fewer than 200 words, and it has carried legal weight in kernel contribution workflows ever since.

LLMs make that attestation complicated. The kernel is licensed under GPLv2-only, meaning contributed code must be compatible with that license and must not carry copyright claims from code distributed under incompatible terms. An LLM trained on a mix of GPL, MIT, BSD, and proprietary code without clear provenance tracking produces output whose copyright status is genuinely uncertain. Legal opinions differ on whether LLM outputs are independent works, derivative works, or something else entirely. The courts have not resolved this, and until they do, the safe position for a project as legally scrutinized as Linux is to require that contributors can certify what they are submitting.

Submitting AI-generated code and signing off on it without being able to verify its provenance arguably makes that DCO attestation false. The coding-assistants document does not use that language, but the implication runs throughout.

What Prompted the Document

The document did not emerge from abstract concern. Between 2022 and 2023, kernel subsystem maintainers saw a surge in low-quality, obviously AI-generated patch submissions. Greg Kroah-Hartman, who maintains the stable kernel tree and the USB/driver-core subsystems, was among the first to publicly reject these patches on the Linux Kernel Mailing List. The pattern was consistent: verbose commit messages with no technical substance, code that compiled but violated kernel invariants, patches fixing non-existent bugs, and contributors who could not answer basic follow-up questions about their own submissions.

One recurring class of incident involved students submitting AI-generated patches as part of coursework or open-source contribution assignments, sometimes dozens of patches from the same email domain in a short window. Maintainers are volunteers with finite bandwidth. Reviewing and rejecting a flood of unworkable patches imposes a real cost on people who contribute their time to one of the most critical software projects in existence. Kroah-Hartman and others were explicit on LKML that this pattern was not acceptable, and the documentation effort formalized that position.

The Technical Reasons Go Deeper Than Copyright

The kernel has hundreds of subsystem-specific invariants that are not expressed in any function signature or type annotation. RCU (Read-Copy-Update) read-side critical sections cannot sleep. Certain operations are forbidden in interrupt context. Memory barriers must be placed precisely. Locking order rules exist to prevent deadlocks and are documented in comments, maintainer knowledge, and years of code review, not in any format a language model can cleanly parse.

Consider a simple example of the kind of mistake LLMs reproduce from incorrect patterns in training data:

/* Incorrect: sleeping in an RCU read-side critical section */
rcu_read_lock();
p = rcu_dereference(global_ptr);
if (p)
    mutex_lock(&p->lock);  /* BUG: mutex_lock can sleep; illegal under rcu_read_lock */
rcu_read_unlock();

The fix requires understanding not just the API surface but the execution context model of the kernel, where acquiring a sleeping lock is forbidden inside a read-side critical section. LLMs generate code by pattern-matching against training data. That training data includes both correct and incorrect kernel code, outdated API usage, and examples from kernel versions that predate the current codebase by years. The kernel’s internal APIs evolve quickly, and functions appearing frequently in older training data may be deprecated or semantically changed in the current tree.

The checkpatch.pl script catches style violations, but it does not catch an incorrect memory ordering assumption, a missing rcu_read_lock(), or a sleeping allocation in atomic context. Those require human judgment grounded in genuine understanding of what the code does. The coding-assistants document is asking for that judgment, not forbidding tools that might help build it.

Where AI Tools Can Fit

The document is careful not to prohibit AI tools entirely, and that restraint is sensible. LLMs are useful for several things that do not involve submitting generated code upstream.

Understanding unfamiliar subsystems is one. The kernel is approximately 30 million lines of code across hundreds of subsystems. Using an LLM to get oriented in a new area, understand the general purpose of a driver, or decode a dense block of RCU logic before reading the actual documentation is a legitimate use of the technology. The model’s explanation is a starting point for further verification, not a reference to act on directly.

Navigating API history is another. Questions like “what is the current idiom for DMA-coherent allocations” or “when was this locking pattern deprecated” are the kind of lookup tasks where a model’s broad training data has value, provided the answer is verified against current documentation and recent commits before being used.

Generating mechanical boilerplate is a third area. The kernel’s test infrastructure, particularly KUnit and kselftest, involves repetitive setup code. Generating a skeleton and filling in the substance manually is a different operation from generating a core algorithmic change and submitting it verbatim.

The line the document draws is at submission: sending generated code to a project where every merged line carries legal weight, correctness expectations, and long-term maintenance burden, without the submitter being able to stand behind it.

The Broader Signal for Open Source

The kernel is not the only project dealing with this. PostgreSQL core developers have expressed similar concerns on pgsql-hackers. The CPython project saw a surge of AI-generated issues and pull requests in 2023. OpenBSD’s position, consistent with that project’s culture around correctness and audit trails, has been blunt about copyright provenance. Several GNOME maintainers added explicit notes to their contributing guides after their GitLab instances were flooded with AI-generated merge requests.

None of these projects has issued a sweeping prohibition, but the norm across serious infrastructure projects is consistent: AI tools are not the problem. The problem is submitting code that the contributor does not genuinely understand and cannot certify. That norm predates LLMs and applies regardless of what tools were used to write the first draft. What LLMs have done is make it much easier to produce large volumes of plausible-looking code that does not meet that standard.

The kernel’s document is notable primarily because it is written down and lives in the official process documentation alongside submitting-patches.rst. Most communities are still handling this through individual maintainer judgment and implicit expectation. Formalizing it does two things: it gives maintainers a reference to cite when rejecting a patch, and it signals to contributors that describing your workflow is not the same as defending your output.

For developers building on and contributing to open source infrastructure, both the practical and the legal dimensions here are worth taking seriously. The GPLv2 has been litigated. The DCO is a legal certification, not a formality. Kernel maintainers are not being obstructionist about AI tools; they are enforcing a contribution quality standard that the project’s scale and criticality requires, and asking that contributors meet it regardless of what helped them write the first draft.