· 6 min read ·

The Hidden Overhead of Delegating to a Coding Agent

Source: simonwillison

Simon Willison recently wrote about the cognitive cost of coding agents, and it touches something I have been trying to articulate for months. The conversation around AI coding tools tends to collapse into two camps: productivity maximalists who count lines per hour, and skeptics who worry about deskilling. Neither framing captures what actually changes when you bring an agent into your development workflow.

What changes is the shape of the cognitive work, not just the volume of it.

From Writing to Directing

When you write code yourself, the mental model of the system and the act of expressing it in code are one continuous process. You hold the data structure in your head, then you type it out. The friction of typing is real, but it is not the bottleneck. The bottleneck is understanding. Writing code is, in large part, the process of building that understanding.

Coding agents disrupt that continuity. You describe what you want, the agent produces code, and now you have to understand code you did not write to verify it does what you intended. The cognitive work has not disappeared; it has been reorganized. You are no longer building a mental model incrementally as you type. You are reverse-engineering one from output you did not produce.

This is not inherently worse. Code review is a legitimate skill, and experienced developers do it constantly. But there is a meaningful difference between reviewing a colleague’s code in a shared codebase you both understand and reviewing an agent’s code where the agent has no persistent model of your system, your constraints, or your standards.

The Context Management Tax

With a human collaborator, shared context accumulates over time. You stop explaining background. With a coding agent, context is session-scoped at best. Every invocation, you are deciding how much context to inject: which files to include, which prior decisions to summarize, which constraints to name explicitly.

This is a new cognitive task that did not exist in the pre-agent workflow. It is not especially hard, but it is constant, and it compounds. For a small isolated task, the overhead is negligible. For a multi-session project with architectural constraints and accumulated design decisions, managing what the agent knows, what it needs to know, and what it might misunderstand becomes a significant part of the work.

Tools like Claude Code address this partly with CLAUDE.md files that persist project context across sessions. Cursor does something similar with .cursorrules. These are good solutions, but they shift the burden from runtime to authorship: now you are maintaining a living document that accurately describes your codebase and its conventions, which itself requires continuous attention.

Verification Without Construction

John Sweller’s cognitive load theory, developed in the context of educational psychology, distinguishes between intrinsic load (inherent complexity of the material), extraneous load (how it is presented), and germane load (the processing that builds understanding). The theory has been applied to software engineering to explain why certain practices, like well-named variables and small functions, reduce cognitive cost.

Coding agents alter the distribution across these categories. They can dramatically reduce extraneous load: you no longer look up API signatures, write boilerplate, or wrestle with syntax. But they can increase what might be called verification load: the cost of confirming that generated code is correct, secure, idiomatic, and fits your architecture.

Verification load is especially high when the agent produces code that looks plausible but has subtle problems. A hallucinated API method, a race condition in async code, a SQL injection vector in a string-interpolated query. These are not obvious at a glance. Catching them requires the same depth of understanding you would have needed to write the code yourself, but now applied to someone else’s output without the benefit of having constructed it.

The research here is still thin, but a 2023 study from Stanford found that developers using GitHub Copilot wrote less secure code on average and were more likely to be confident that their insecure code was secure. The mechanism matters: it is not that Copilot is particularly bad at security. It is that the act of generating plausible-looking code creates a false sense that it has been verified.

Flow State and the Interruption Problem

Mihaly Csikszentmihalyi’s concept of flow is well-known in developer culture. Deep programming work often involves extended periods of focused attention where the code and the problem feel continuous. Context switching, interruptions, and task fragmentation are known to fragment flow and reduce output quality.

Coding agents have a complicated relationship with flow. On one hand, they can handle the mechanical parts of implementation, leaving the developer in a higher-level problem-solving mode that can itself be a kind of flow. On the other hand, they introduce a new rhythm: prompt, wait, review, correct, prompt again. For some tasks this is a fine-grained loop. For others it is a series of interruptions into the agent’s work, each requiring you to rebuild context about where you were.

My own experience building with Claude Code over the past several months is that the rhythm varies enormously by task type. For well-scoped, isolated tasks, the agent handles it and I move on. For tasks that require deep understanding of a system’s state over time, the back-and-forth is genuinely disruptive in a way that solo coding is not.

The Deskilling Concern Is Real but Overstated

The worry that relying on AI coding tools will erode fundamental programming skills is not baseless. If you never write a sorting algorithm, you forget how sorting algorithms work. If you always reach for an agent when you need a parser, you may lose the intuition that comes from having written several parsers.

But the historical pattern with developer tools is more nuanced than straight deskilling. Compilers did not make developers worse at understanding computation. IDEs with autocomplete did not eliminate the ability to type. Higher-level languages did not destroy systems programming knowledge. Each abstraction layer changed the distribution of what skills mattered, not just reduced total skill.

The more plausible risk is not that developers forget how to code, but that they lose touch with the code at a particular level of abstraction. If you never read the generated code carefully, you may write systems that work until they do not, and lack the depth to understand why when things break. This is a professional judgment problem as much as a technical one.

Willison’s framing in the original piece resonates: the costs are real but they are manageable once you name them. The problem is when developers treat coding agents as a productivity tool with no cognitive cost at all, rather than a trade-off with a specific shape.

What Managing This Actually Looks Like

In practice, I have settled into a few habits that seem to reduce the overhead without sacrificing the benefits.

First, I read generated code at the same depth I would code I wrote myself before I commit it. This sounds obvious, but the pull to just run it and see if tests pass is strong. The reading is the verification. Tests confirm behavior; reading confirms structure and intent.

Second, I write the context documents that agents need as if they are going to outlive the agent session, because they are. A CLAUDE.md that accurately describes your architecture, your conventions, and your known constraints is also documentation for the next human who works on the project.

Third, for exploratory or architectural work, I write more of the scaffolding myself before involving the agent. The phase of the work that builds understanding should stay with me. The phase that expresses that understanding in code is where the agent adds the most value.

None of this eliminates the cognitive overhead Willison describes. It just makes the trade-off explicit and manageable rather than something that accumulates as invisible technical and cognitive debt.

Was this interesting?