· 7 min read ·

The Code You Accepted but Never Owned

Source: lobsters

There is a category of cost that software projects absorb slowly and then all at once. Technical debt is well-documented: you make a shortcut, the shortcut calcifies into load-bearing code, and refactoring it eventually costs more than doing it properly would have. Comprehension debt is adjacent but distinct. Addy Osmani’s piece on the hidden cost of AI-generated code puts a name to something that has been building since AI coding assistants went mainstream: the accumulation of code in a codebase that nobody on the team actually understands.

The distinction matters. Technical debt is about code quality, brittle architecture, missing abstractions, duplicated logic. Comprehension debt is about team knowledge. You can have clean, well-structured, passing-all-tests code that represents a significant comprehension liability if the team maintaining it did not write it and did not read it carefully enough to own it. These two problems compound, but they have different causes and different remedies.

Why Writing Builds Understanding

The cognitive mechanism is worth being precise about. When a developer writes code, they translate a mental model of the problem into a formal representation that a machine can execute. Every decision in that process, which data structure, which abstraction boundary, how to handle the error case, is a small act of understanding. The code that results is a record of those decisions.

This is why the common advice to “read code, not just write it” has always been partially dissatisfying. Reading code is harder than writing it because you are working backwards: inferring the mental model from the artifact rather than expressing a mental model as an artifact. Research on code comprehension, including Felienne Hermans’ work in The Programmer’s Brain, points to working memory constraints as the core bottleneck. Developers can hold a limited number of meaningful code chunks in working memory at once, which is why decomposing logic into small, named functions works as a practical proxy for comprehension. It reduces chunk size rather than total complexity.

When an AI generates a block of code and a developer accepts it, that comprehension process is largely skipped. The mental model lives in the prompt, not in the team’s collective head. The code works, the tests pass, the PR gets merged. Nobody has done the work of understanding what the code does or why it does it that way.

The Scale Shift

This is not a new problem in kind. Developers have always copied from Stack Overflow, borrowed algorithms from Wikipedia, or lifted configuration from blog posts. The comprehension debt from those sources was real. What AI coding assistants change is the volume and the confidence.

A developer pasting a regex from Stack Overflow knows they did not write it. They will likely read through the answers, understand at least roughly what it does, maybe run a few test cases against edge cases they can think of. The friction is a signal: this came from somewhere outside my understanding, I should verify it. With an AI assistant, that friction is reduced. The code appears inline, in context, in the style of the surrounding codebase. It feels like something you wrote. That is a feature of good autocomplete, and it is also what makes comprehension debt accumulate faster.

GitHub’s own research on Copilot adoption has found that developers accept a substantial fraction of suggestions without modification, with acceptance rates climbing for longer completions in some categories of work. The simple completions, a loop variable, a function signature, a struct field, carry minimal comprehension risk. The complex ones, full async handlers, serialization logic, error recovery chains, are where the accumulation happens. Those are also the completions that feel most valuable in the moment, because they save the most keystrokes.

Where It Surfaces

The places where comprehension debt reveals itself are predictable in retrospect.

Debugging. When something breaks in AI-generated code, the developer has to reverse-engineer both the code and the intent. They did not write it, so they may not have a working theory of where the bug could be. Debugging becomes archaeology rather than diagnosis. This is particularly costly for asynchronous code and complex state management, where generated logic may have subtle ordering assumptions baked in that the accepting developer never thought through. A race condition in a handler you wrote is something you can reason about. A race condition in a handler the model wrote, that you accepted and moved on from, is something you have to rediscover from first principles.

Onboarding. New developers absorb a codebase’s mental models by reading the code and talking to the people who wrote it. AI-generated code breaks this chain. The code’s author may be genuinely uncertain about some design decisions because the AI made them. “Why is this structured this way?” can become an unanswerable question, or gets answered with “that’s what the model suggested and it worked.” That is not an explanation a new developer can build on.

Code review. A reviewer seeing a complex block of AI-generated code faces a subtle incentive problem. Reading it carefully enough to evaluate it properly is expensive. It looks like it works. The tests pass. The author says it came from Copilot or Cursor and they checked it over. Approving without full comprehension is the low-friction path, and low-friction paths accumulate over time into norms. The review process that was meant to be a comprehension checkpoint becomes a rubber stamp.

What Does Not Fix This

The obvious answer is to require developers to read and understand generated code before accepting it. This is correct in principle and largely unenforceable in practice. The value proposition of AI coding tools is speed. Adding a review gate for generated code that approaches the thoroughness of original authorship eliminates most of the productivity gain, and developers under normal project pressure will not consistently make that trade.

Documentation helps at the margins. A comment explaining the intent of a complex generated block is better than nothing, but documentation answers “what does this do” more reliably than “why does it do it this way” or “what are the edge cases,” which are the questions that matter most when something breaks.

Test coverage is a more durable compensation mechanism. A test suite written to document behavior rather than just verify it can serve as executable comprehension, catching regressions when code’s behavior changes in unexpected ways. The problem is that tests are also frequently generated now, and AI-generated tests tend to verify the happy path rather than the interesting edge cases. Comprehension debt in production logic pairs naturally with comprehension debt in the test suite, and the combination is worse than either alone.

Linters and static analysis do not address this either. They can catch structural problems, but comprehension debt is not a structural problem. It is a knowledge problem. No tool can tell you whether the developers maintaining the code understand it.

The Ownership Problem

Comprehension debt is fundamentally an ownership problem. Code that nobody truly understands is code that nobody owns. Ownership in a practical sense means someone can make confident changes, has a working theory of how the code behaves and why it might break, and can explain it to a new developer without hedging. Without that, you have working code and no team behind it.

The teams that navigate this best treat AI-generated code differently at the review stage. Rather than tightening acceptance criteria across the board, they require the author to be able to explain what they accepted. “Walk me through what this does” is a different ask than “looks good,” and it changes the incentive for the developer accepting the suggestion. Knowing they will need to explain it, they read it more carefully before accepting. That is a social norm rather than a technical control, but social norms are what engineering culture runs on in practice.

There is also an argument for being explicit about where AI generation gets used freely and where it gets constrained. Boilerplate, configuration, test fixtures, CLI argument parsing: these are domains where comprehension debt is low because the patterns are well-understood by everyone and the surface area for unexpected behavior is small. Complex business logic, security-sensitive code, distributed system primitives, anything where the edge cases matter: these are domains where skipping the comprehension step has higher consequences. Most teams have not drawn this line explicitly, and “use your judgment” tends to mean no consistent judgment gets applied.

The Practical Trade-off

Building things faster has real value. Shipping a feature in a week instead of three weeks is worth something, and AI coding tools genuinely deliver that for many categories of work. The question is whether the speed is being traded against something that is being tracked.

Technical debt gets tracked. It appears in backlogs, gets discussed in architecture reviews, sometimes even gets measured through proxies like change failure rate or time to restore service. Comprehension debt is mostly invisible until it causes a problem: a bug that takes three days to trace because nobody understood the code, an incident that worsens because the responder cannot make confident changes, a rewrite that happens because the codebase has become genuinely unknowable to the people maintaining it.

The DORA research on software delivery performance has consistently found that engineering teams improve not just by shipping faster but by reducing the time between making a change and understanding its consequences. Comprehension debt works against exactly that feedback loop. When you do not understand the code you shipped, you cannot efficiently reason about what it might do next.

The most honest accounting for comprehension debt is probably the simplest: if you accepted generated code, you should be able to explain it to someone who has never seen it. If you cannot, you have not finished the work yet. That is not an argument against using AI coding tools. It is an argument for being clear about what “done” means when you use them.

Was this interesting?