Simon Willison published a piece on the cognitive cost of coding agents that articulates something I’ve been sitting with for months without quite finding the right frame. The productivity gains from tools like Claude Code, Cursor, and Copilot Workspace are real and measurable. But there’s something else happening alongside those gains, something in the texture of the workday that’s harder to quantify.
The short version: coding agents don’t remove cognitive load. They relocate it.
What Cognitive Load Theory Actually Says
John Sweller introduced cognitive load theory in the late 1980s to explain learning difficulties in educational contexts, but its framework maps cleanly onto software development. The theory splits mental effort into three types:
- Intrinsic load: the irreducible complexity of the task itself
- Extraneous load: overhead imposed by tools, interfaces, and procedures that don’t contribute to understanding
- Germane load: effort that builds lasting schemas and skills
When you write code yourself, intrinsic load scales with problem complexity, extraneous load is mostly your editor and compiler, and germane load is high: you’re building a mental model of the system as you build the system. The two activities are coupled.
Coding agents decouple them. The agent handles intrinsic load, which sounds like a straightforward win. But extraneous load rises in ways that aren’t immediately obvious, and germane load often collapses entirely.
The Supervisory Control Problem
There’s a field of research in human factors engineering called supervisory control, which studies what happens cognitively when humans oversee automated systems rather than operating them directly. Air traffic control, nuclear plant monitoring, and automated manufacturing all share this structure: the human isn’t doing the primary work, they’re watching something else do it and intervening when needed.
The findings are consistent and sobering. Supervisory control is not rest. It imposes its own cognitive demands, different in kind from direct operation but not lighter in total weight. You have to maintain a mental model of what the automated system is doing, predict where it might go wrong, decide when to intervene, and execute interventions quickly enough to matter. Miss a window and you’re not correcting a mistake; you’re unwinding a thread.
This is now the default mode for developers working with agentic coding tools. You set a task, the agent runs, and you monitor. The monitoring is the job now, and monitoring is not passive.
The specific shape of the monitoring burden depends on the agent and the task, but a few patterns recur. First, you can’t fully disengage. If the agent is refactoring a module or adding a feature across multiple files, you need to be close enough to catch it when it misunderstands the spec, introduces a subtle bug, or makes a structural decision that conflicts with the rest of the codebase. Stepping away for an hour and reviewing the diff afterward is technically possible, but in practice it means you’re reviewing a large block of changes without the context of watching them accumulate. The diff becomes hard to audit.
Second, there’s a decision cadence problem. Modern agents interrupt frequently, asking for clarification, approval, or direction. Each interruption is a small context switch. The classic Joel Spolsky argument about the cost of task switching for developers was about interruptions from colleagues. Agent interruptions are more frequent, less predictable, and harder to batch.
The Comprehension Debt
The deeper cost is what happens to your understanding of the codebase over time.
When you write a function yourself, you understand it at the level of the individual decisions: why this loop structure, why this error handling approach, why this abstraction boundary. That understanding doesn’t just help you maintain the function later; it accumulates into a working model of the system that informs every future decision.
When an agent writes the function, you understand it at the level of the spec you provided: the function does X, takes these inputs, returns this output. The implementation is a black box you’ve read but haven’t reasoned through. You know what it does; you have a weaker grip on why it does it that way.
This is fine for any given function. Across a codebase, over months, it compounds. Andrej Karpathy’s “vibe coding” framing captures the extreme version: you describe what you want and accept what you get, building software you don’t really understand. Most developers working with agents aren’t going that far, but they’re moving in that direction in degrees. The comprehension debt is real even if you’re reviewing every diff.
Nicholas Carr argued in The Shallows that tools which automate cognitive tasks atrophy the underlying skills. The GPS navigation example is overused but accurate: regular GPS use measurably degrades spatial memory and route-learning ability in studies going back to 2010. The mechanism is straightforward: you don’t practice a skill, you lose it. The question for coding agents is whether the same dynamic applies to implementation skills, and over what timescale.
Automation Complacency
There’s another effect from the human factors literature that’s worth naming: automation complacency. When an automated system is reliable most of the time, human overseers become progressively less vigilant. The system usually works, so close monitoring feels wasteful, and attention drifts. The catch is that complacency hits hardest exactly when it matters: in the novel situations the automation wasn’t designed to handle.
Coding agents are reliable enough to induce complacency. They handle the common cases well. They write straightforward CRUD operations, refactor simple functions, add tests to existing code. The cases where they go wrong tend to be subtler: architectural drift, misunderstanding of invariants that aren’t in the spec, introduction of patterns that are locally reasonable but globally inconsistent. These are also the cases that are hardest to catch in a casual review.
The failure mode looks like: the agent runs 20 tasks correctly, you lower your guard, it runs the 21st task and makes a decision that seems fine on its own but creates a structural problem you won’t notice for weeks.
What Changes in Practice
None of this is an argument against using coding agents. The productivity gains are too large to ignore, and the tools are improving fast. But working effectively with them requires treating supervision as a skill to develop rather than a burden to minimize.
A few things that actually help:
Keep the scope of individual agent tasks small. Not because agents can’t handle large tasks, but because the diff of a large task is genuinely hard to audit. The cognitive overhead of reviewing 800 lines of agent output is higher than the overhead of reviewing four 200-line outputs sequentially, even though the total line count is the same. Incremental changes are easier to hold in working memory.
Write the spec before you delegate. The act of writing a clear specification for an agent task is cognitive work that builds the same mental model you’d build by implementing the task. The spec forces you to think through the problem. It also gives you a reference point during review: does the implementation actually match what you described?
Stay in the loop on structural decisions. Agents are good at implementation and weak at architecture. If a task requires a structural decision, make it yourself before handing off. Don’t let the agent pick the abstraction boundary and then review it afterward; you’ll tend to accept whatever it chose because refactoring at review time is expensive.
Treat code review skills as a first-class investment. If your role is shifting toward oversight, your ability to read unfamiliar code quickly and catch subtle bugs is now more valuable than your ability to produce code quickly. These are different skills. Reading code carefully is a learnable practice, and it’s one that tends to atrophy in developers who spend most of their time writing.
The Shape of the New Work
The cognitive shift that coding agents introduce is real, and it’s asymmetric in an uncomfortable way. The work that gets easier is the part most developers find satisfying: the flow state of building, the direct connection between thought and code. The work that gets harder is the part that was already the overhead: coordination, review, maintaining a coherent mental model of a growing system.
This isn’t a reason to use the tools differently. It’s a reason to be clear-eyed about what you’re optimizing for. Speed of implementation is one variable. Understanding of the result is another. For most production software, you need both, and the tools as they exist today make the trade-off between them more explicit than it’s ever been.