Verification Is Work: The Cognitive Costs That Coding Agents Add While Removing Others

The standard pitch for coding agents is that they reduce cognitive load. You stop holding library APIs in working memory. You delegate the mechanical translation of intent into syntax. The parts of programming that feel like overhead, the parts that tax attention without requiring judgment, get handed off. This framing is accurate, but incomplete.

Simon Willison wrote about the cognitive cost of working with coding agents recently, and it surfaces a question worth pursuing further: if there are real savings, what are the new expenses? Because there are new expenses, and they are not evenly distributed across developers or tasks.

What cognitive load theory actually says

Cognitive load theory, developed by John Sweller in the late 1980s to explain failures in instructional design, divides mental effort into three types. Intrinsic load comes from the inherent complexity of the problem itself. Extraneous load is friction from poor presentation or tooling. Germane load is the useful work of building mental models and long-term understanding.

Most claims about AI coding tools reducing cognitive load are specifically claims about extraneous load: the friction of syntax recall, API signatures, boilerplate construction. Those claims are largely correct. But coding agents also change the intrinsic and germane components in ways that matter, and those changes mostly go undiscussed.

The verification tax

When you write a function yourself, you understand it as a byproduct of writing it. The mental model forms during construction. When an agent writes it, that model does not transfer to you automatically. You have to read the code, trace the logic, identify edge cases, and determine whether the approach matches your intent. This is a separate cognitive step that writing your own code does not require.

The overhead here is real. A 150-line function that an agent generates in ten seconds might take five minutes to properly understand and verify. For code in domains where you are already expert, verification is fast, because the agent tends to produce exactly what you would have written. For code in domains where you are less expert, which is often precisely where agents are most useful, verification is slow and uncertain. You might not catch what you do not know to look for.

Research from Stanford on GitHub Copilot found that developers using AI assistance were more likely to introduce security vulnerabilities and, critically, more likely to rate their insecure code as secure. The cognitive load reduction was genuine; so was the reduction in scrutiny. These two things move together, and that is the concern.

Context management as a new cognitive skill

The mental work of writing code has been partially replaced by the mental work of constructing context. You decide which files to include, which constraints to state explicitly, which prior decisions to surface. You write prompts that constrain the solution space, front-load requirements the agent would otherwise discover by trial and error, and describe intent without over-specifying implementation.

This is a skill, and it is not the same skill as programming. Developers who use agents well report spending real attention on what might be called context hygiene: keeping the context window relevant, pruning stale information, knowing when to start a fresh conversation versus when to persist. Managing what an agent knows about your codebase is a cognitive task that simply did not exist before agents existed.

The learning curve here is underestimated. Most developers who start using coding agents go through a phase of writing prompts that are too vague, getting plausible but wrong output, and then spending more time debugging the output than they would have spent writing the code directly. The agents are not at fault for this; the skill of directing them has to be built deliberately.

Working memory extension and its costs

One of the clearest concrete benefits of coding agents is what they do to working memory. A non-trivial codebase carries thousands of decisions that need to be mutually consistent. Holding that in your head requires familiarity accumulated over months. An agent can ingest fifty files simultaneously and generate code consistent with all of them, which is a genuine capability advantage.

But that context window is now something you have to manage. What goes in it, what to trim when it grows large, when to restart versus when to continue: these decisions consume attention. The extension of working memory comes with its own overhead, smaller than what it saves but present nonetheless. Developers who treat the context window as a passive input rather than something to actively curate get noticeably worse results, and curation is cognitive work.

The ownership gap in debugging

Code you wrote lives in your head differently from code you reviewed and approved. When you build something yourself, you retain the shape of the logic, the tradeoffs you considered, the edge cases you noticed while writing. When something breaks three weeks later, you have residual context; you know roughly where to look.

Code an agent wrote occupies your memory only as well as you understood it on first review. The debugging-relevant mental model, built from the act of construction, is absent. You reconstruct it from the code itself, which is always the slower path. On a project where a significant fraction of the functions were agent-generated, this compounds. Individual functions may be readable, but the developer’s relationship to the whole system is shallower.

John Ousterhout’s work on software complexity argues that maintainability depends substantially on developers internalizing the design decisions that produced the code, not just the code itself. Agents generate from requirements; they do not transfer the designer’s mental model alongside the output.

Where the net actually lands

The counterargument is valid and worth stating plainly. A lot of the cognitive load that agents reduce is load that does not contribute much to understanding the problem or making good decisions. Holding pandas syntax in working memory while trying to think about data transformation is exactly the kind of extraneous overhead that benefits from offloading. When that drops, attention becomes available for architecture, requirements analysis, and the question of whether you are solving the right problem.

For most developers on most tasks, the net is probably positive. The savings on mechanical implementation work exceed the new costs of verification, context management, and ownership gap. But the new costs are not uniformly distributed. Junior developers, who have less expertise for rapid verification, carry a heavier verification tax. Developers working in unfamiliar domains, where agent help is most tempting, have the weakest ability to catch errors. The cognitive economics of AI-assisted programming are more favorable for experts working in familiar territory than for the people most likely to reach for an agent precisely because they feel out of depth.

What follows from this

The practical implication is not to use agents less. It is to be deliberate about verification rather than treating generation as equivalent to understanding. Reading agent output with the same attention you would give a code review, treating unfamiliar output as a hypothesis rather than a solution, maintaining genuine ownership of architectural decisions rather than delegating those too: these habits close most of the gap between the productivity narrative and the cognitive reality.

The picture that Simon Willison’s piece points toward is one of redistribution rather than removal. Some loads decrease substantially. New loads appear. The character of the work changes, which means the skills required to do it well change too. Treating coding agents as pure cognitive offloading mechanisms, rather than as tools that shift what kind of thinking you need to do, is the mistake that produces both the worst output and the worst experience of using them.