· 5 min read ·

Familiar Code Is Not the Same as Understood Code

Source: simonwillison

Simon Willison’s piece on the cognitive cost of coding agents focuses on the workload of supervision: the attention you pay, the interruptions you manage, the context you reconstruct. There’s another dimension worth pulling out separately, one that’s specifically psychological rather than logistical. It concerns not how much attention you pay to AI-generated code, but how much you think you understand it after you’ve read it.

Processing Fluency and What It Does to Your Judgments

In 2009, Alter and Oppenheimer published a review of research on processing fluency: the subjective ease with which information is processed. Their central finding was that fluency influences a wide range of judgments that should, rationally, be independent of it. Information that is easy to process gets rated as more true, more familiar, more credible, and better understood, relative to equivalent information that is harder to process. The effect is consistent and difficult to override deliberately.

For code, fluency is shaped by familiarity of idioms, readability of variable names, consistency with the surrounding codebase, and clarity of the overall structure. When all of these are high, you read through code without resistance. That reading-without-resistance feels like understanding.

AI-generated code, when it works, produces exactly this condition. It learns from your codebase. It matches your naming conventions, your indentation preferences, your error handling patterns, the shape of your abstractions. Code produced by Claude Code or Cursor in a well-established project does not read like foreign code; it reads like code a competent colleague wrote. The stylistic fluency is high by design.

The Moses Illusion at Scale

The Moses illusion was first documented by Erickson and Mattson in 1981. They found that when people were asked “How many animals of each kind did Moses take on the Ark?”, most answered “two,” despite Moses having nothing to do with the Ark. The correct name (Noah) is displaced by the wrong one without triggering any rejection signal. The sentence reads fluently; the reading system reports comprehension; the semantic anomaly goes undetected.

Code review is susceptible to exactly this failure mode. When an agent writes code that is structurally familiar, using the right types and function names and error patterns, your reading system reports comprehension. Subtle semantic errors, off-by-one conditions, wrong comparisons, incorrect assumptions about mutability, inverted boolean conditions, can pass through a fluent read without triggering a rejection signal, because the code looks like the code you would have written.

This is different from the general claim that AI code can have bugs. All code can. The specific point is that AI code in your style exploits the mechanism by which bug-catching normally works. You catch bugs by noticing that something looks wrong. When code is stylistically consistent with your expectations, the threshold for “looks wrong” rises. A bug has to be quite prominent to break through fluency.

The Illusion of Explanatory Depth

Rozenblit and Keil’s 2002 study established what they called the illusion of explanatory depth: people systematically overestimate how well they understand causal and mechanical systems they have not thought through carefully. When asked to rate their understanding of how a toilet works or a bicycle works, most people rate themselves as moderate to high. When then asked to produce a step-by-step causal explanation, they discover large gaps they did not know were there. The subjective sense of understanding collapses once they are asked to produce the explanation.

The test for understanding is production, not recognition. You understand a piece of code when you can produce an equivalent, not when you can read the given one and nod.

I can verify this in my own experience building Ralph. When I review Claude Code’s output and move on, I have a sense of having understood it. When, weeks later, something breaks in that code and I need to debug it, I regularly discover that my understanding was shallower than I thought. The causal model I need to trace the bug is absent; I reconstruct it from the code, slowly, as though reading it for the first time. The fluent read did not leave a usable mental model behind.

Recognition Is Not Recall

The distinction in cognitive psychology between recognition and recall matters here. Recognition is whether you can identify something as correct when presented with it. Recall is whether you can produce or reconstruct it independently. Recognition is much easier; recall is what you need in practice.

When you approve a diff during an agent session, you are performing a recognition task under mild time pressure. The code looks right. It probably is right. You proceed. When a bug appears three weeks later, or when you need to extend the function, or when you need to explain it to a colleague, you need recall, not recognition. The approval required a lighter form of cognitive engagement than the downstream uses will demand.

Two things make this more pronounced with agent-generated code than with ordinary code review. First, the volume is higher. An agent can generate code faster than you can deeply understand it, so the gap between recognition-level and recall-level comprehension accumulates across a session. Second, the stylistic fluency makes it easier to pass on recognition alone; unusual code creates friction that forces deeper engagement, while familiar code offers none.

This dynamic connects to research on GitHub Copilot from Stanford that found developers using AI assistance were more likely to introduce security vulnerabilities and, critically, more likely to rate their insecure code as secure. The cognitive load reduction was genuine; so was the reduction in scrutiny. Fluency and reduced scrutiny move together.

What Breaks Through the Illusion

The test that forces genuine comprehension is forced explanation. Not “does this code look right?” but “explain what this function does, including edge cases and failure modes.” Asking yourself this before approving agent output reliably surfaces gaps that a visual scan missed.

Writing specifications before delegating to the agent achieves something similar. Specifying forces you to think through the solution space before seeing the agent’s approach, which gives you an independent model to compare against rather than just reading the output and evaluating it against itself.

Code review research supports this direction. Studies comparing inspection techniques consistently find that active interrogation catches significantly more defects than passive reading. The technique matters because passive reading by experts routinely misses bugs that structured interrogation catches. Questions like “what happens if this value is null?” or “what are the invariants that need to hold here?” engage the causal reasoning that passive reading skips. AI-generated code raises the stakes for this distinction rather than lowering them.

Putting It Together

Willison’s framing centers on the workload of supervision. What processing fluency adds is a specific mechanism by which that supervision can feel adequate while not being adequate. The code reads like your code; your brain reports comprehension; you proceed; the understanding is not there when you need it.

The mitigation is not to distrust AI-generated code categorically. It is to treat the fluency of a read as a weak signal about comprehension rather than a strong one. The same code that reads easily might or might not be understood at the level you will need later. The relevant test is not whether reading was comfortable, but whether explanation would be easy. That distinction, consistently applied, is what separates productive agent-assisted development from a growing codebase you have reviewed but do not truly own.

Was this interesting?