The Middle Loop: How AI Is Restructuring What Software Engineers Actually Do

A researcher named Annie Vella spent time studying how 158 professional software engineers use AI tools, and the finding she landed on is one of the more honest framings I’ve seen of what’s happening to the profession right now. She calls it supervisory engineering work: the effort required to direct AI, evaluate its output, and correct it when it’s wrong. Not writing. Not reviewing in the traditional sense. Something new in between.

Martin Fowler’s recent commentary on her research introduced a spatial metaphor that I think clarifies this well. Software developers have long talked about two loops. The inner loop is the tight, fast cycle on your own machine: write code, build, run the tests, see what broke, fix it, repeat. The outer loop is slower and crosses team boundaries: commit, open a pull request, watch CI, deploy, observe in production, respond to what you see. These two loops operate at different speeds and involve different kinds of attention.

Vella’s research suggests AI is carving out a third layer between them. The middle loop, as Fowler frames it, is where you direct AI work, evaluate what it produces, and correct the errors. This is distinct from the inner loop because you’re not writing the code yourself. It’s distinct from the outer loop because you haven’t shipped anything yet. It’s a new category of labor that didn’t have a name until recently.

Why “Verification” Doesn’t Quite Cover It

The shift Vella observed was from creation-oriented work to verification-oriented work, but she’s careful to note it’s not the same kind of verification engineers were already doing. Code review is a form of verification, but it operates under a particular assumption: you’re reviewing a colleague’s considered judgment, built up over time, representing their understanding of the problem. You’re checking reasoning you expect to be mostly sound.

Reviewing AI output is different. The AI doesn’t have judgment in that sense. It has pattern completion. It will produce code that looks right, compiles, passes surface-level tests, and is nevertheless subtly wrong in ways that require domain understanding to catch. The GitClear report on AI-assisted codebases found increased code churn rates in repositories with heavy AI usage, suggesting that code accepted from AI tools gets replaced at higher rates later. Something is passing review that shouldn’t.

This is precisely what makes supervisory engineering work cognitively demanding in its own right. You’re not rubber-stamping. You’re trying to reconstruct intent from output, verify correctness in a system you understand but didn’t write, and do this repeatedly and quickly enough that the AI remains worth using.

This Pattern Is Not New

Automation research has documented this dynamic in other industries for decades. The foundational paper is Lisanne Bainbridge’s “Ironies of Automation” from 1983, published in the journal Automatica. Bainbridge wrote about nuclear plant operators, but her observation generalizes: the more reliable an automated system is, the less practice operators get in manual control, and the less competent they become at the moments when manual control is actually needed.

Aviation developed this problem visibly. Research by Casner, Geven, and Williams published in Human Factors documented measurable degradation in manual flying skills among commercial pilots who relied heavily on autopilot. The FAA has issued guidance on automation dependency for years. The skill that gets automated away is also the skill that would let you catch autopilot errors.

The parallel for software engineering is uncomfortable. If the inner loop, writing code and debugging it, is what builds the foundational understanding of how software behaves, and AI increasingly handles the inner loop, then the engineers doing supervisory work may be drawing down a reservoir of understanding they’re no longer filling. You can supervise AI-generated code competently today because you spent years writing code manually. The question Vella’s research implicitly raises is what happens to engineers who begin their careers in the middle loop, having never built the skills that middle loop supervision depends on.

The Productivity Numbers Are Partial

The most-cited controlled study on AI coding assistance, Peng et al.’s 2023 paper on GitHub Copilot, found roughly 55% faster task completion. That number gets used to justify a lot of tooling decisions. But controlled studies measure the inner loop. They give participants a defined task, a clean environment, and measure time to working solution. They do not measure the cost of reviewing and correcting AI output at scale, the debugging sessions that happen when AI-generated code interacts unexpectedly with the rest of the system, or the long-term effect on codebase maintainability.

Addy Osmani’s piece on the 70% problem in AI-assisted coding makes this concrete: AI can get you to roughly functional very quickly, but the remaining work, the edge cases, the integration, the correctness under real conditions, often costs more time than the initial generation saved. That’s not an argument against using AI tools. It is an argument for being clear-eyed about where the cost has moved rather than assuming it disappeared.

What Supervisory Work Actually Requires

If supervisory engineering is a distinct form of work, it has its own skill requirements. Understanding what those are matters for how teams hire, how engineers develop, and how organizations should think about this transition.

Directing AI well is not trivial. Prompt quality affects output quality in ways that are domain-specific and not always predictable. Knowing when to accept, when to iterate, and when to discard and start manually is a judgment call that requires understanding what correct output looks like. That understanding comes from having written the thing yourself.

Evaluating AI output accurately requires what might be called adversarial reading. You have to approach generated code looking for what it got wrong rather than confirming what looks right. Automation bias, the tendency to under-scrutinize automated output, is well-documented in HCI research and applies here. The engineers who do this well are the ones who maintain active skepticism, which is cognitively expensive and does not come naturally.

Correcting errors efficiently closes the loop. When something is wrong, you need to decide whether to correct the AI’s direction and regenerate, fix the output manually, or understand the problem well enough to know which option is faster. This decision depends on the same domain knowledge that creation work builds.

The Middle Loop Needs Its Own Tooling

One thing that strikes me about this framing is that most of the tooling conversation around AI and development has focused on the inner loop. Better autocomplete. Faster generation. More context in the prompt. The outer loop has seen some movement with AI-assisted code review tools like CodeRabbit. But the middle loop, the supervisory layer, is underserved.

What would tooling built specifically for supervisory engineering look like? Probably something that makes the AI’s reasoning transparent, not just its output. Tools that surface where generated code departs from established patterns in the codebase. Better diff interfaces for comparing AI output against intent. Test harnesses that specifically probe edge cases AI generation tends to miss. The infrastructure for doing supervisory work well does not yet match the infrastructure for generation.

Vella finished her research in April 2025. Fowler notes that model capabilities have improved considerably since then, and his read is that improvement has accelerated the shift toward supervisory engineering rather than reversing it. Better models generate more plausible code, which means more code that requires careful evaluation to assess, not less. The middle loop gets heavier as the models get better.

The profession is restructuring around a form of work that has its own demands and its own risks, and the skills that make it possible are the same skills that creation work has always built. That tension is the thing worth sitting with.