· 6 min read ·

The Hidden Tax of Delegating Your Thinking to a Coding Agent

Source: simonwillison

Simon Willison wrote recently about something that has been quietly nagging at developers who work extensively with coding agents: the cognitive cost is real, it is different from what people expected, and it does not simply go away as you get more experience with the tools.

The usual framing of AI coding tools is throughput. You produce more code faster. The mental model most people carry is that the agent does the tedious parts and you do the interesting parts. What Willison is pointing at is that the trade-off is more complicated than that, and I think it is worth pulling apart the specific mechanisms involved.

What actually changes about your thinking

When you write code yourself, your working memory is engaged in a particular way. You hold the problem model, the constraints, and the syntax all at once, and the act of writing forces a kind of continuous verification. You cannot write a function signature without mentally committing to what it accepts and returns. You cannot write a loop body without tracking the loop invariants. The writing is slow, but the slowness is doing cognitive work.

When a coding agent writes the code, that continuous verification loop breaks. You get output to review rather than a process to participate in. This sounds easier, and in terms of raw keystrokes it obviously is, but it introduces a different kind of cognitive demand: you have to reconstruct the intent behind code you did not write, check it for correctness in a much more explicit way than you would your own code, and maintain enough skepticism to catch the subtle errors that agents produce with remarkable consistency.

This is not an argument against using agents. It is an observation about what the job actually becomes.

Automation bias is the baseline risk

There is a well-studied phenomenon in aviation called automation bias, where operators of automated systems tend to over-trust automation outputs and under-invest in independent verification. The effect is strong enough that cockpit automation designers have to actively work against it through interface design and training protocols. The same dynamic appears in radiology with AI-assisted diagnosis, in autonomous driving monitoring, and in any system where a human supervises rather than directly controls.

Coding agents put developers in exactly that supervisory role, and there is no reason to expect developers to be immune to the same bias. The agent produces code that looks correct, compiles, and passes the tests you thought to write. The path of least resistance is to accept it. The cognitive cost of thorough review is real and it competes directly with the productivity gains the agent provides.

What makes this particularly tricky with coding agents is that the errors tend to be plausible. A junior developer’s mistakes are often syntactically or structurally obvious. Agent mistakes often look like code a senior developer might write: they follow conventions, use the right abstractions, and fail in subtle semantic ways that require understanding the full context to catch. The classic example is agents confidently implementing the wrong algorithm for a problem, producing something that handles the test cases but fails on edge conditions.

The context management problem

Long-running agent sessions introduce another cost that is harder to name but easy to feel. When you use a tool like Claude Code or Cursor for an extended session, you are essentially maintaining a shared mental model with the agent across many exchanges. The agent’s context window holds a partial and increasingly compressed version of the conversation. Your mental model has to track what the agent knows, what it has forgotten or summarized away, and where its understanding might have drifted from reality.

This is different from the ordinary cognitive load of working on a complex codebase. When you work alone, your mental model of the code is the ground truth. When you work with an agent over a long session, there are now two models in play and they can diverge. Developers who have spent hours in a deep agent session often describe a specific kind of exhaustion that is not about the volume of code produced but about this continuous tracking work.

The tooling has started to address this. Claude Code’s --resume flag and Cursor’s indexed codebase features both attempt to give agents more stable context. But the fundamental problem, that you are working with a system that has a finite and imperfect memory of the shared state, does not go away.

Skill atrophy is a legitimate concern

Willison’s piece touches on something that gets dismissed too quickly in discussions about AI coding tools: the question of what happens to skills that are not exercised. The argument for dismissal is usually that we do not worry about calculators atrophying arithmetic skills, and that is true as far as it goes. But coding is not arithmetic.

The skills that get replaced or reduced by coding agents include things like: holding a complex algorithmic problem in working memory while constructing a solution, developing fluency with a language or framework through repetitive writing, and building the intuition about failure modes that comes from personally writing and debugging many implementations.

These are not skills that every developer needs at every level of seniority. A principal engineer coordinating a large system probably does benefit from externalizing more implementation work. But a developer still building those foundational intuitions faces a genuine trade-off. Using agents extensively during the period when you would otherwise be grinding through the repetitions that build expertise may produce faster short-term output at the cost of slower long-term skill development.

This is not a reason to avoid agents. It is a reason to be deliberate about when you use them.

What good tooling should do

The interesting design question is whether coding agent interfaces can be built to reduce some of these cognitive costs without sacrificing the productivity gains.

A few patterns seem promising. First, better explanation of agent reasoning: tools like GitHub Copilot’s workspace feature have moved toward showing the plan before executing it, which gives you a chance to verify the intent before you are reviewing a fait accompli. This shifts some of the cognitive work earlier in the process, where it is cheaper.

Second, tighter feedback loops between agent output and tests. Tools that run the test suite and show the agent’s output against it continuously make it harder to miss failures. aider has done good work here with its test-driven mode, where the agent iterates until tests pass rather than producing a single output for you to verify.

Third, clearer delineation of what the agent changed. Any coding agent that produces a diff you have to review should make that diff as legible as possible. This sounds obvious but the variance in how tools handle this is significant. A hundred-line diff from a tool that highlights only the semantically meaningful changes is much less cognitively expensive than the same diff from a tool that presents every line equally.

The productivity number is incomplete

When people cite productivity gains from coding agents, the numbers being measured are almost always lines of code produced, features shipped, or task completion time. These are real and meaningful. But they do not capture the review overhead, the debugging time when the agent’s subtle errors surface in production, the context management cost during long sessions, or the compounding effect of skill development that did not happen.

None of this means the tools are not worth using. For many tasks and many developers, the productivity gains genuinely outweigh the cognitive costs. What it does mean is that the framing of coding agents as simply making development faster, with no trade-offs in how attention and mental energy are spent, is incomplete. Willison is right to name this, and developers are right to take it seriously rather than assuming the costs will disappear with more practice.

The tools are improving fast enough that some of these problems will get better. The context management issues are fundamentally engineering problems, and they will be engineered around. The automation bias problem is older and harder, because it is about human psychology rather than software architecture. That one is worth watching.

Was this interesting?