Martin Fowler recently highlighted research by Annie Vella, who surveyed 158 professional software engineers about how AI tools changed where they spent their time. The finding that stands out is not just that engineers are doing less creation, but that the verification replacing it is structurally different from any verification they did before. Vella’s term for it is supervisory engineering work: directing the AI, evaluating what it produces, and correcting it when it goes wrong.
Fowler proposes a spatial metaphor for this: a middle loop. Most engineers already think in terms of two loops. The inner loop is the tight feedback cycle of writing code, running tests, hitting an error, and fixing it. The outer loop is the slower cadence of committing, opening pull requests, waiting on CI, deploying, and watching metrics. AI is rapidly eating the inner loop. The middle loop is what’s left when you’re no longer the one doing the typing.
Where the Inner/Outer Loop Framing Came From
The inner/outer loop model didn’t originate with AI. Microsoft’s developer experience research team popularized it in the context of cloud development, using it to explain why container build times and environment parity between local and production were causing so much friction. The inner loop was local iteration, the outer loop was the cloud deployment pipeline, and the goal was to make the boundary between them as thin as possible.
That framing was useful because it identified where cognitive context got destroyed. Switching from the inner to the outer loop meant context-switching: from flow state to waiting, from debugging to code review, from implementation detail to system behavior. Every crossing cost time and attention.
The middle loop, as Fowler and Vella describe it, sits at a different kind of crossing. It’s not between your local environment and the cloud. It’s between what you intend and what the machine produces. You’re no longer iterating on the code directly. You’re iterating on your instructions to the system doing the coding, then evaluating whether the output matches what you meant.
What Supervisory Work Actually Requires
The phrase “supervisory engineering” is apt, but it risks making the work sound passive. Supervision in an industrial sense conjures someone watching a dashboard. This is different. Evaluating AI-generated code requires the same understanding of the domain, the architecture, and the edge cases that writing it would require, possibly more.
When you write code yourself, errors surface through the compiler, the test runner, or the runtime. The feedback loop is tight and mechanical. When you evaluate AI output, the errors that matter most are the ones that pass the compiler and the tests but are still wrong: wrong algorithm, wrong abstraction, subtly incorrect behavior under load, missed security assumption. Those errors require judgment, and judgment requires deep familiarity with the problem space.
The GitHub Copilot productivity research from 2022 reported that developers completed tasks 55% faster with the tool. But that research measured task completion speed for well-defined tasks, not the downstream cost of validating the output in production systems where the stakes are higher. More recent practitioner reports have been more nuanced: the first draft arrives faster, but the review burden shifts from reviewing other people’s code to reviewing the machine’s.
This is the part Vella’s research captures that productivity benchmarks miss. The shift isn’t just in speed. It’s in what kind of attention you’re paying.
The Skill Erosion Problem
There’s a structural tension at the core of the middle loop. Supervisory engineering depends on your ability to recognize correct code, which depends on having written a lot of code yourself. If the inner loop is increasingly automated, and engineers spend progressively less time in it, the foundation for good supervisory judgment erodes.
This isn’t a new problem in automation. Aviation researchers have written extensively about automation complacency and skill degradation in cockpit contexts. Pilots who rely heavily on autopilot can lose instrument proficiency and lose the manual flying skills needed when automation fails. The solution in aviation isn’t to avoid automation, it’s to mandate periodic manual operation to maintain the underlying competency.
The analogy to software isn’t perfect, but the shape of the problem is similar. If you want to maintain the judgment needed to supervise AI effectively, you probably need to keep a hand in the inner loop even when the AI could handle it. The question is how much, and which parts.
For senior engineers, the concern is somewhat different. Much of what makes senior engineers valuable is pattern recognition built from years of implementation experience. If junior engineers enter the profession primarily doing supervisory work, they’re building a different kind of experience base. It’s not necessarily worse, but it is different, and the profession doesn’t yet have a clear model for what that career development looks like.
The Specification Problem
One underappreciated aspect of the middle loop is how much it amplifies the importance of specification. When you write code yourself, the act of writing often clarifies what you’re actually trying to do. You start with a vague intention, implement it, run into a case you hadn’t considered, and refine your understanding through the implementation. The code and the spec co-evolve.
When you’re directing an AI, you have to externalize the spec before the implementation begins, or at least before you can evaluate whether the implementation is correct. Vague instructions produce plausible-looking code that doesn’t do what you meant. The feedback loop is: write spec, evaluate output, refine spec, repeat. This is closer to requirements engineering than to coding, and it rewards different skills.
This is why prompt engineering, despite the eye-rolls, is a real skill. Not because prompts are hard to write, but because writing a prompt that produces reliably correct output requires you to think precisely about what you want before you’ve seen it. That’s a discipline most development workflows have historically deferred. The inner loop let you discover requirements through implementation; the middle loop forces you to front-load that work.
Some teams are responding to this by investing more heavily in design documents, architecture decision records, and written specifications before any code is generated. This is arguably a healthy shift toward rigor that the industry has always claimed to want. In practice, it also means the bottleneck for AI-assisted development often isn’t the AI’s capability, it’s the engineer’s ability to articulate what they want clearly enough for the output to be trustworthy.
Where the Middle Loop Lives in Team Structure
Fowler notes that Vella’s research finished in April 2025, before the latest generation of models. The implication is that these findings might already be outdated, but his read is that better models have only accelerated the shift rather than reversed it. Better models make the inner loop more automated, not less, which pushes more of engineering work into the middle loop rather than returning it to the inner loop.
This has organizational implications that are just beginning to surface. In most teams, the outer loop is heavily structured: pull requests have reviewers, CI pipelines have defined stages, deployments have approval gates. The inner loop is largely individual: each engineer manages their own write-test-debug cycle with minimal coordination overhead.
The middle loop is new enough that most teams haven’t built structure around it. Who reviews the prompts? How do you maintain consistency in AI output across a codebase where different engineers are prompting differently? How do you share supervisory judgment as organizational knowledge rather than letting it remain tacit in individual engineers’ heads?
Some of this will be solved by tooling. AI coding tools are beginning to support project-level context files, shared prompt libraries, and output evaluation frameworks. But tooling solves the mechanical parts. The harder problem is cultural: what does it mean for a team to do supervisory engineering well, and how do you build that as a shared practice rather than an individual skill?
What Changes, What Stays
The middle loop framing is valuable precisely because it doesn’t claim AI replaces engineers. It claims AI replaces a specific category of work that engineers do, and that the work replacing it is real, demanding, and not automatically natural for people trained in the old model.
The skills that transfer cleanly are domain knowledge, systems thinking, and the ability to reason about correctness. The skills that transfer less cleanly are the habits built around direct implementation: the muscle memory of the inner loop, the intuition developed through writing the code yourself.
Where this lands is that the middle loop isn’t a demotion. Supervisory engineering at its best is higher-leverage work than implementation at its worst. But it’s a different kind of work, and treating it as the same thing with AI assistance bolted on is a mistake. The engineers who adapt well will be the ones who recognize that distinction early and deliberately build the skills the middle loop actually requires, which starts with being honest about what those skills are.