The Gap Between How Fast AI Writes Code and How Fast You Can Read It

Addy Osmani has a useful name for something teams are quietly experiencing everywhere: comprehension debt. The idea is straightforward enough. AI coding assistants can produce hundreds of lines of plausible, syntactically correct code in seconds. Your ability to understand what those lines do, why they do it that way, and what edge cases they may or may not handle, has not changed at all. The velocity gap between generation and comprehension is the debt.

It sounds almost obvious when stated plainly, but the implications are less obvious, and they connect to a body of software engineering research that predates AI coding tools by decades.

Reading Has Always Been the Hard Part

Robert C. Martin’s observation in Clean Code that developers spend over ten hours reading code for every hour they spend writing it is practitioner wisdom rather than a controlled study result, but empirical research backs up the general shape of it. Andrew Begel and Thomas Zimmermann’s work at Microsoft Research found that developers spend roughly 70% of their time reading and navigating code, compared to around 30% writing new code. The ratio fluctuates by role and task type, but the direction is consistent: comprehension is where the work actually lives.

This matters because most productivity framing around AI tools focuses on generation speed. GitHub’s 2023 Copilot study reported a 55% faster task completion rate in controlled conditions, which generated a lot of headlines. What those conditions did not measure was maintenance burden: what happens six weeks later when someone has to debug the code, extend it, or review a change to it. Generating code faster does not reduce the reading burden on a codebase. In most cases it increases it, because there is now more code, and whoever reads it later does not have the author’s memory of writing it.

Why AI Output Is a Specific Kind of Problem

Code generation is not new. CASE tools in the 1980s and 1990s promised similar productivity gains by generating boilerplate from high-level specifications. Template engines, code scaffolding tools, and macro systems have all generated code that developers did not write by hand. Teams learned to manage the comprehension challenges those tools introduced by treating generated code as a black box with a defined interface, auditing it rarely, and keeping it separate from hand-authored code.

AI-generated code does not behave like that. It looks like idiomatic code written by a competent developer. It uses the same variable naming conventions, the same library idioms, the same structural patterns as hand-authored code. There are no GENERATED_DO_NOT_EDIT headers. It gets committed into the same files, reviewed alongside hand-written changes, and gradually becomes indistinguishable from the rest of the codebase. The signals that previously helped teams identify code requiring extra scrutiny are absent.

This creates a specific failure mode in code review. When you review code another developer wrote, you benefit from an implicit model of that developer’s intent, even if you did not watch them write it. You know their style, you can ask them questions, and the PR description usually explains the reasoning. With AI-generated code, the author frequently cannot explain the specific reasoning behind a particular approach because they did not reason through it themselves. They accepted a suggestion that looked correct. The review process then has to do the work that authorship normally provides for free.

What the Code Churn Data Shows

GitClear’s 2024 analysis of coding patterns across repositories with significant AI tool adoption found that code churn, defined as code written and then reverted or substantially altered within two weeks, approximately doubled between 2022 and 2024. The same analysis found increased rates of copy-pasted and duplicated code. Neither finding proves causation, and GitClear’s methodology has been debated, but the correlation is worth taking seriously.

Code churn is a reasonable proxy for code that was accepted without full comprehension. You commit something, it goes to production or into review, and then a problem surfaces that would have been caught earlier if the code had been understood more thoroughly before it was written. The comprehension debt becomes visible when it gets called.

Security is another place where this shows up concretely. A 2023 study by researchers from Stanford and other institutions found that roughly 40% of GitHub Copilot suggestions in security-relevant contexts contained vulnerabilities. The suggestions were syntactically correct and stylistically plausible. A developer accepting them quickly, without careful comprehension, would likely miss the issue.

Cognitive Load and Code Ownership

Felienne Hermans’ book The Programmer’s Brain provides a useful framework for understanding why this problem is structural rather than a matter of developer discipline. Code comprehension relies on three types of memory: working memory for the current context, short-term memory for recently read structures, and long-term memory for recognizing familiar patterns. Experienced developers read quickly because they recognize large chunks of code as known patterns, reducing the working memory load.

AI-generated code disrupts this. Even when it uses familiar idioms at the surface level, the combination of choices, the specific way a function is structured, the order in which conditions are checked, reflects a generative process rather than the reasoning of a specific developer. It may not match any familiar chunk in the reader’s long-term memory. The cognitive load of comprehending it can be higher than the cognitive load of reading code written by a developer you have worked with for years, even if the AI output looks cleaner.

Microsoft Research’s work on code ownership, particularly the 2011 paper by Bird et al. examining Windows Vista and Windows 7 data, found that components with low ownership concentration had significantly higher defect rates. Code with many contributors and no clear owner had more bugs, more post-release failures, and harder-to-resolve issues. AI-generated code is not owned by anyone. The person who accepted the suggestion is nominally responsible for it, but they did not reason through it in the way that authorship normally implies. At scale, this looks like a code ownership problem.

What Actually Helps

The mitigations that matter are not primarily technological. They are process changes that insert comprehension requirements into the workflow.

Treating AI-generated code with the same scrutiny as third-party library code is a useful mental model. When you pull in an external library, you read the documentation, check the license, look at the maintenance status, and test the behavior at your boundaries. You do not simply trust that it does what the function name implies. Applying that standard to AI-generated code means requiring that the person committing it can explain each significant decision, not just assert that the tests pass.

Code review practices need adjustment as well. Reviewers currently rely on PR descriptions and on their model of the author’s intent to interpret ambiguous code. When that intent information is unavailable because the code came from a model, reviewers need to do more of the comprehension work themselves rather than giving the benefit of the doubt. This is slower, which is the right trade-off.

Some teams have experimented with requiring developers to write a brief explanation of AI-generated code before committing it, not the usual commit message, but a description of what the code does, why the approach was chosen, and what alternatives were considered. The exercise itself surfaces comprehension gaps before they enter the codebase. If you cannot write the explanation, you have not understood the code.

The Velocity Trap

The seductive thing about AI coding tools is that the comprehension debt they create is invisible in the short term. Code gets written, tests pass, the feature ships. The debt accumulates in the form of code that nobody owns, bugs that are hard to locate because nobody has a mental model of the relevant subsystem, and reviews that approve changes without catching problems. It shows up in churn rates, in debugging time, in onboarding costs for new developers trying to understand a codebase.

Osmani’s framing as a debt is precisely right. Like financial debt, it is not inherently wrong to incur it. Sometimes the short-term velocity is worth the future cost. The problem is incurring it without accounting for it, which is what happens when teams adopt AI coding tools primarily because they make code faster to produce without thinking carefully about whether the code will be faster to maintain.

The tools will keep getting better at generation. Human comprehension speed will not change much. Managing that gap is a design problem for teams, not just a prompt engineering problem for individual developers.