The Middle Loop: What Engineers Actually Do When AI Writes the Code

There is a useful mental model for understanding where developer time goes. The inner loop is the tight cycle of writing code, running tests, debugging, and writing more code. It happens dozens of times per hour. The outer loop is slower: committing, opening a pull request, waiting on CI, shipping, observing in production. Together these two loops describe most of what engineers do on a given workday.

Martin Fowler wrote about research from Annie Vella that complicates this picture in an interesting way. Vella studied 158 professional software engineers to understand how AI tools are changing where their effort goes. Her finding is not just that AI speeds up the inner loop. It is that AI is moving engineers out of the inner loop entirely, into something she calls supervisory engineering work.

Supervisory engineering work is the effort required to direct AI, evaluate its output, and correct it when it is wrong. Fowler proposes thinking of this as a middle loop: a new layer inserted between the inner loop and the outer loop, one where engineers supervise AI doing what they used to do by hand.

This framing is worth sitting with, because it captures something that productivity benchmarks miss.

What the Benchmarks Say and What They Leave Out

GitHub’s controlled study on Copilot found roughly a 55% speedup in completing a specific coding task, particularly for newer contributors. Microsoft Research extended this work and found that gains are highly task-dependent: boilerplate and routine code see large improvements, while complex architectural decisions see little benefit and sometimes negative throughput due to the overhead of prompting and reviewing AI output.

The Stack Overflow Developer Survey from 2024 showed around 76% of developers using or planning to use AI tools, with the most common use cases being code completion, explaining unfamiliar code, and writing boilerplate. The distribution is telling: engineers are using AI most heavily for the parts of the inner loop that are most repetitive.

What these numbers do not capture is the cost of the supervisory work itself. When you accept a Copilot suggestion, you are not just saving time writing code. You are taking on a verification obligation. You need to understand what the suggestion does, whether it handles edge cases correctly, whether it fits the codebase’s existing patterns, and whether it introduces security or correctness issues. A 2024 study from Purdue University found that Copilot produced incorrect code in approximately 40% of test cases, while developers accepted suggestions uncritically at high rates. The inner loop got faster; the verification burden stayed constant or grew.

That gap is where Vella’s research sits.

The Cognitive Shape of Supervisory Work

Supervisory engineering is not the same as code review, even though it superficially resembles it. Traditional code review happens after a human has written code, at a defined checkpoint, with context about the author’s intent. Supervisory work happens in a much tighter feedback loop, iteratively, often with partial outputs, and without the implicit contract that comes from knowing a human colleague wrote the code.

The cognitive demands are also different. Writing code is primarily synthesis: you hold a mental model of the desired behavior and translate it into implementation. Supervisory work is primarily evaluation: you hold a mental model of the desired behavior and compare it against what the AI produced. Evaluation is not easier than synthesis. It requires all the same domain knowledge, plus the additional burden of maintaining vigilance against plausible-but-wrong outputs.

Lisbeth Bainbridge’s 1983 paper “Ironies of Automation” described this problem in the context of industrial process control, and her argument maps cleanly onto software engineering. The more capable an automated system becomes, the more critical human oversight becomes when the automation fails, but the less opportunity humans have to practice and maintain the skills needed for that oversight. Automation erodes the competence it requires.

For software engineers, this creates a specific and underappreciated risk. If most of your hands-on coding time goes away, replaced by supervisory work, you still need to recognize subtle bugs in AI-generated code. You need to understand when a function that looks correct is semantically wrong for the context it is being used in. You need to spot when an AI has optimized for test passage rather than correctness. These are skills developed through the practice of writing code, and that practice is what the middle loop is taking away.

What Supervisory Engineering Actually Requires

If supervisory engineering is a real and distinct mode of work, it has a skill profile worth naming explicitly. Based on what Vella’s research describes and what the broader literature on AI-assisted development shows, that profile includes at least the following:

Intent specification. Communicating requirements to an AI system precisely enough that it produces useful output. This is not just prompt engineering in the narrow sense of crafting clever instructions. It requires decomposing ambiguous requirements into concrete, verifiable tasks, understanding what the AI can and cannot reliably do, and maintaining a clear enough mental model of the desired outcome to recognize when the output diverges from it.

Output evaluation. Reviewing AI-generated code with sufficient depth to catch not just syntax errors but semantic errors, security implications, and architectural incompatibilities. Research on automation bias consistently finds that human reviewers are less rigorous when they know output came from an automated system. Effective supervisory work requires actively compensating for this tendency.

Correction and steering. When AI output is wrong, figuring out why and how to redirect it. This is often more time-consuming than writing the code from scratch would have been, particularly when the AI has produced something plausible-but-wrong that requires careful analysis to characterize.

Context maintenance. Keeping track of what has been delegated, what constraints apply, and how partial outputs fit together into a coherent system. This is load-bearing cognitive work that disappears from view in productivity measurements.

None of these are new skills in the abstract. Engineers have always needed to specify requirements, evaluate code, course-correct, and maintain system context. What is new is that these skills are now the primary work, rather than skills that support the primary work of writing code.

The Calculator Argument and Its Limits

The standard counter-argument to concerns about skill atrophy is the calculator analogy. Calculators did not make mathematicians worse at math. They shifted what mathematicians spend time on, moving them away from arithmetic and toward more interesting problems. AI coding tools will do the same thing: shift engineers away from writing boilerplate and toward higher-level design and reasoning.

This argument has merit, but it has a structural limitation. Calculators handle a well-defined subset of mathematical work, and the boundary between what they do and what humans do is clean. AI coding tools handle a less well-defined subset of engineering work, and the boundary is fuzzy. An engineer using a calculator always knows the calculator is handling arithmetic. An engineer using Copilot does not always know whether the generated code is handling the problem correctly, because the AI operates in the same conceptual space as the engineer does.

The calculator analogy also assumes the human maintains independent competence in the automated domain, which is what makes verification possible. A mathematician can spot when a calculator gives a wrong answer because they have enough number sense to recognize the result is implausible. If engineers stop writing code regularly, they may lose the equivalent intuition about code, the sense that a function is probably wrong before they have formally proven it.

What the Middle Loop Changes About the Job

Fowler notes that Vella’s research completed in April 2025, before the latest generation of models significantly improved software development capabilities. His read is that better models have only accelerated the shift to supervisory work, not reversed it.

This seems right. More capable AI means more inner loop automation, which means more supervisory work, which means the middle loop grows. The question is not whether engineers will spend more time in the middle loop. They will. The question is whether the field will treat supervisory engineering as a distinct discipline with its own practices, or whether it will be treated as a natural extension of existing work that requires no particular attention.

The history of other automation transitions suggests the latter is the default, and the default tends to go badly. Industrial automation eroded operator expertise in exactly the way Bainbridge described, and the consequences showed up in incident post-mortems rather than productivity reports. The productivity numbers looked fine right up until they did not.

Vella’s contribution is in naming the problem clearly enough to reason about it. Whether the field takes supervisory engineering seriously as a distinct practice, develops frameworks for it, builds training around it, and measures it honestly, is a separate question. But having a name for it is where that work has to start.