How AI Coding Assistants Strain the Linux Kernel's Review Pipeline

When Greg Kroah-Hartman started receiving a surge of USB and driver patches that looked syntactically correct but were logically broken, he recognized a pattern before the policy caught up to it. The patches were AI-generated, submitted without disclosure by authors who apparently trusted that the tools had produced something good enough to pass review. Maintainer time was spent diagnosing rejections instead of advancing the kernel.

The Linux kernel project’s formal response is a new document at Documentation/process/coding-assistants.rst, authored primarily by Kroah-Hartman and Jonathan Corbet and merged around the 6.10/6.11 development cycle in 2024. The document establishes disclosure requirements and restates existing quality standards; it does not ban AI tools outright. Reading it carefully reveals something more significant than the surface-level AI debate: the kernel’s review pipeline is a finite resource, and anything that increases patch volume without improving patch quality puts the entire project under pressure.

The Disclosure Requirement

The document establishes two new tags for commit messages:

Generated-by: <tool name and version>

for patches produced primarily by AI tools, and

AI-assisted: <tool name and version>

for patches where AI played a supporting role. Submitters who know a patch is AI-generated and omit these tags are not just violating a formatting convention; the document frames that omission as deception, grounds for rejection and potentially for exclusion from future contributions.

The Signed-off-by: tag, which has underpinned kernel contribution since the SCO litigation era, carries a specific legal representation: that you have the right to submit the code under the indicated license and that you have reviewed it. The document makes clear that signing off on AI-generated code without genuinely reviewing it is a misuse of the Developer Certificate of Origin, not a technicality. The DCO was designed to create an accountable chain of human judgment; using it to launder AI output undermines that purpose.

The Asymmetric Cost Problem

The core concern here is not that AI tools write bad code, although they frequently do for kernel-specific reasons covered below. The concern is that AI tools radically lower the cost of generating a patch while doing nothing to reduce the cost of reviewing one.

Kernel maintainers are the binding constraint on development velocity. Kroah-Hartman maintains the USB subsystem and the stable tree; he reviews hundreds of patches per release cycle. Corbet maintains kernel documentation and writes LWN, the primary publication covering kernel development. When an LLM produces plausible-looking patches at high volume, and each patch requires minutes of careful review by a human expert, the economics of the review queue degrade rapidly.

Kroah-Hartman has said publicly on the kernel mailing list that he was seeing an influx of AI-generated patches that consumed maintainer time without contributing usable code. Some were syntactically valid but logically unreachable. Some duplicated existing functionality. Some introduced error paths that appeared correct but failed to undo state changes made before the error, leaving subsystems in inconsistent state. Each of these required expert diagnosis to reject correctly.

The disclosure requirement functions partly as a triage tool. If a maintainer knows a patch is AI-generated, they can apply an appropriate level of scrutiny rather than treating it as the work of a developer with deep subsystem knowledge.

Why Kernel Code Is Particularly Hard for LLMs

Kernel code has properties that make it a poor fit for LLM-based generation even compared to application code.

Locking discipline in the kernel involves invariants that are rarely written down in the code itself. The rules around spinlocks, mutexes, RCU, and seqlocks include context-specific requirements: you cannot sleep while holding a spinlock, RCU read-side critical sections have specific preemption constraints, lock ordering must be globally consistent to prevent deadlock. An LLM trained on kernel source learns the surface patterns of how these primitives appear; it does not learn the underlying invariants, because those live primarily in developer knowledge and in documentation written in a different register than the code.

The Linux Kernel Memory Model (LKMM) formalizes the memory ordering semantics that kernel code must respect. Code that works on x86, where memory ordering is relatively strict, may fail silently on ARM or POWER where reordering is more aggressive. Placing a memory barrier correctly requires understanding the exact ordering guarantees needed, which requires understanding the concurrent readers and writers of the data structure being protected. Pattern matching on syntactically similar code is not sufficient.

Error handling in the kernel uses a goto-based unwinding pattern that must undo operations in reverse order. AI tools frequently generate error paths that look structurally correct but release the wrong resources, release them in the wrong order, or miss a resource acquired in a code path the tool did not fully trace.

Linus Torvalds put the concern directly at the 2023 Open Source Summit: LLMs can learn what kernel code looks like; they cannot learn why kernel code is written a particular way. The “why” is where the correctness lives.

The Copyright Paradox in a GPL Project

The legal dimension of this policy carries significant weight. The kernel is licensed under GPL-2.0-only, a license that requires any derivative work to be distributed under the same terms. The DCO requires contributors to warrant that they have the right to submit code under that license.

An LLM trained on kernel source code may produce output that is a derivative work of GPL-2.0-only code. Whether model output constitutes a “derivative work” of training data remains an open legal question; the Doe v. GitHub class action addressed this question directly in the context of Copilot and was ongoing as of mid-2025.

A second problem compounds this. The US Copyright Office has consistently held that purely AI-generated content with no human creative authorship is not copyrightable. If a patch is entirely AI-generated with no substantial human creative contribution, the submitter cannot own its copyright and therefore cannot license it under GPL-2.0. You cannot grant a license to something you do not own. The GNU project’s position is starker: FSF-affiliated projects require copyright assignment to the FSF, which is impossible for AI-generated code since there is no legal author to make the assignment.

The Generated-by: tag, in this context, is not housekeeping. It is a flag that triggers legal scrutiny of the entire DCO chain.

Comparison Across the Ecosystem

The kernel’s disclosure-plus-accountability model is becoming a reference point for open source projects navigating the same tension. LLVM added an explicit AI policy in 2024 requiring disclosure of AI assistance and allowing maintainers to request human explanation of any AI-generated section. Debian Legal has flagged that AI-generated code with unclear provenance may not satisfy the Debian Free Software Guidelines. The Apache Software Foundation’s ICLA requires contributors to warrant they have the right to submit code, creating the same provenance problem for AI-generated contributions.

The GNU project takes a harder line, consistent with the FSF’s broader position on software freedom: AI contributions are not acceptable in GNU projects, both because copyright assignment to the FSF is required and because the FSF views AI tools as raising deeper ethical concerns. This is a stricter position than the kernel takes, but it follows from the same legal analysis, since no human author means no copyright and no valid license grant.

What coding-assistants.rst does that other project policies have not is make the cost structure explicit. The document does not argue that AI tools are inherently bad. It argues that maintainer review time is valuable, that increased patch volume without improved quality depletes it, and that contributors bear full responsibility for the code they submit regardless of how it was generated. Hiding the AI provenance of a patch deprives maintainers of information they need to calibrate their review effort.

That framing applies to any project that depends on volunteer maintainer bandwidth, which describes most of open source. The kernel articulated it first because the kernel feels the pressure first, given its scale and the technical depth required to review even a small patch correctly.