· 5 min read ·

The Middle Loop: What Supervisory Engineering Demands

Source: martinfowler

The inner loop and outer loop have been useful mental models for software engineering work for at least a decade. The inner loop is tight and local: write code, build, run tests, debug. It lives on your machine and takes seconds to minutes. The outer loop is slower and shared: commit, open a review, wait for CI to pass, merge, deploy, watch production. It involves your team and your infrastructure and takes hours to days.

These two loops shaped how we think about feedback, how we design tools, and how we describe engineering velocity. Fast inner loops mean developers can experiment cheaply. Healthy outer loops mean teams can ship safely. Most productivity tooling since the DevOps era has been an attempt to tighten both loops simultaneously.

AI coding tools are now doing something different. They are not just tightening the inner loop; in many cases they are taking it over entirely. That creates a new tier of work that does not fit neatly into either loop.

The Research

Annie Vella studied 158 professional software engineers using AI tools and found a consistent pattern: participants reported a shift from creation-oriented tasks to verification-oriented tasks. But Vella distinguishes this from the verification engineers already do in code review and testing. In her thesis, she names this new category “supervisory engineering work”: the effort spent directing an AI, evaluating what it produces, and correcting it when it goes wrong.

This framing matters because it names something practitioners are experiencing but struggling to describe. Using an agent to write a function is not the same as writing the function yourself. It is also not the same as reviewing a colleague’s implementation. The cognitive mode is different, the skill set it draws on is different, and the failure modes it introduces are different.

What the Middle Loop Involves

Martin Fowler, writing about Vella’s research, proposes a “middle loop” that sits between the inner and outer loops. It looks roughly like: prompt the AI, read the output, decide whether it is correct, decide whether the approach is right, iterate if not. In agentic tools like Cursor’s composer mode, Aider, or Claude Code, this loop can span dozens of files and several rounds of correction in the space of a few minutes.

The verification work in this loop has distinct properties. First, volume. An AI can generate code faster than any engineer can write it, which means the review surface expands substantially. Second, confidence calibration. AI-generated code tends to look correct even when it is not; it follows conventions and passes superficial pattern-matching in a way that human-written bugs often do not. A variable named user_id that actually holds a session token is a clean example of semantically wrong but syntactically fine output. Third, the engineer did not participate in building the mental model that produced the code. Code review involves reading a diff from someone who was thinking through the same problem you discussed. Supervisory review starts cold.

There is also a design-level problem. AI tools are good at producing locally coherent code and poor at maintaining global architectural consistency. An agent that writes a new API endpoint will produce it correctly in isolation while getting it wrong in terms of your existing authentication patterns, error handling conventions, or data model assumptions. Catching that requires the engineer to hold the whole system in mind while reading code they had no hand in writing.

The Skill Trade

The generation effect in cognitive psychology describes why we retain knowledge better when we produce it ourselves rather than simply read it. When engineers write code, they rehearse and deepen procedural knowledge: understanding why a particular loop structure works, what the bounds of an API are, how a data structure behaves under specific conditions. Reviewing AI-generated code does not provide the same rehearsal.

This does not mean engineers will stop understanding code. But the texture of that understanding is likely to change. Engineers who spend more time supervising AI and less time writing code will develop strong pattern-recognition for evaluating outputs and weaker fluency in generating them from scratch. Whether this is an acceptable trade depends on what engineering requires, and that question does not have a clean answer yet.

What the middle loop rewards is a specific set of capabilities: reading unfamiliar code quickly, holding system-level constraints in mind while evaluating local correctness, writing precise specifications that guide an AI toward the intended behavior, and recognizing the failure modes of AI-generated output. These are real engineering skills, not trivial ones; but they differ from what years of writing code from scratch develops.

The Research Gap

Vella’s research concluded in April 2025, before the most capable current models were released. The gap matters because model quality directly affects the character of the middle loop. With weaker models, engineers spend more time in correction cycles and the loop resembles debugging more than reviewing. With stronger models, the outputs are more often correct on the first pass, and the verification work shifts toward catching subtler errors that confident-sounding output produces.

Fowler notes that improvement in model capability has accelerated the shift toward supervisory work rather than reversing it. Stronger models take more of the inner loop, which means more of engineering work migrates to the middle loop. The volume and nature of what needs supervising changes, but the structural role does not disappear.

The Career Implication

The middle loop is a genuine engineering challenge, not a degraded version of the inner loop. It is also a different challenge, one that rewards different strengths and penalizes different weaknesses. Engineers who built their fluency around holding large amounts of code in working memory and generating it quickly are not automatically well-positioned for a workflow that centers evaluation, specification, and correction.

The field has been through structural shifts before. The move from manual memory management to garbage-collected languages changed what system-level knowledge was valuable. The abstraction of infrastructure into cloud services changed what deployment knowledge was required. Each transition left some skills less relevant and made other skills more central.

The middle loop is the current version of that transition. The engineers who navigate it well will understand their tools’ failure modes as clearly as they understand the problem being solved. They will write specifications precise enough to get consistent AI output. They will evaluate generated code against system-level constraints, not just local correctness. These are learnable skills; they are also skills the field has not historically trained for, and that gap is worth closing deliberately.

Was this interesting?