The inner loop and outer loop are well-worn concepts in software development. The inner loop is what a developer does alone: write code, run tests, fix bugs, repeat. The outer loop is what happens when work leaves the developer’s desk: commit, code review, CI pipeline, deploy, observe in production. These two loops have different cadences, different feedback timings, and different failure modes.
Annie Vella’s research into how 158 professional software engineers use AI tools proposes something that fits neatly between them: a middle loop. AI is progressively absorbing the inner loop. A developer prompts a coding agent, watches it generate code, runs the tests. But someone has to direct that agent, decide whether the output is acceptable, and intervene when it’s wrong. That work is distinct from original inner loop work, and distinct from outer loop review and deployment work.
Vella calls this “supervisory engineering”: the effort required to direct AI, evaluate its output, and correct it when it’s wrong. Martin Fowler’s write-up of the research surfaces one concern worth sitting with: the study concluded in April 2025, before the most recent wave of model improvements. Fowler’s read is that better models have only accelerated the shift toward supervisory work, not reversed it. That seems right, and the implications are more uncomfortable than most teams are acknowledging.
The Ironies of Automation
The concept of supervisory engineering is not new outside software. Human factors researchers have studied this mode of work for decades in aviation, nuclear power, and industrial process control. Lisanne Bainbridge’s 1983 paper “Ironies of Automation” identified the central paradox: the more capable an automated system becomes, the less practice the human operator gets at the underlying skills, but the more consequential their occasional interventions become.
An autopilot handles 99% of a commercial flight. When something goes wrong and a pilot needs to take manual control, the situation is typically unusual enough that it would challenge even a highly experienced manual pilot. The skill that atrophied during the easy 99% is exactly the skill needed for the hard 1%.
Software is entering a version of this. AI handles routine code generation. The cases where a developer needs to override the model, catch its reasoning errors, or redirect it are precisely the cases requiring deep understanding of the codebase, the domain, and the constraints the model cannot see. That understanding is built through the inner loop work the AI is now automating.
What the Middle Loop Actually Demands
Supervisory engineering sounds lighter than programming. In some workflows it looks lighter: the developer writes a prompt, watches code appear, clicks accept. But the evaluative work underneath that accept decision is not trivial, and it differs in character from both writing code and reviewing a pull request.
Writing code gives immediate feedback. You see whether the logic compiles, whether the tests pass, whether the behavior matches expectations. The inner loop is tight and concrete. You know quickly whether you’re right.
Reviewing a colleague’s pull request is supervisory in a narrow sense, but it’s a bounded, asynchronous task. You read code that another human wrote with intent; you can ask clarifying questions; the author shares context you don’t have.
The middle loop is neither. When supervising an AI coding agent, you need to:
- Specify what you want precisely enough that the agent doesn’t go sideways, but not so prescriptively that you might as well write the code yourself
- Evaluate output against a specification that may only partially exist in writing
- Detect subtle correctness issues that will not surface as test failures
- Decide when a partial result is good enough versus when to redirect
- Accumulate a working model of where the AI will drift and where it will fail
That last point matters most. Good supervisory work requires a theory of your AI’s failure modes. Where does it hallucinate library APIs? Where does it miss edge cases? Where does it over-engineer? Where does it confidently produce plausible-looking but subtly wrong code? Building this mental model requires paying close attention over many sessions, and it’s a form of experience that does not transfer cleanly between models or tools.
The Feedback Problem
The inner loop produced fast, honest feedback. Write bad code and the compiler tells you. Write a flawed algorithm and the test fails. The feedback is specific, immediate, and cheap to act on.
The middle loop’s feedback is slower and less honest. If a developer accepts AI-generated code without deeply understanding it, and that code contains a subtle bug, the bug may surface in production weeks later. The connection between the supervisory decision and the failure is long and indirect. This makes it hard to learn from.
Deliberate practice, as understood in the skill acquisition literature, requires a tight feedback loop between action and outcome. The inner loop provided that almost automatically. The middle loop does not. Teams that want engineers to develop strong supervisory skills will need to construct feedback mechanisms that do not exist yet: retrospectives on AI-assisted work, post-mortems that trace bugs back to supervisory decisions, structured review of the accept and reject decisions made during agent sessions.
None of that is standard practice at most organizations. Most teams track whether code shipped on time, not whether the supervisory decisions that produced the code were sound.
The Skill Atrophy Concern
There is a version of this transition that goes well. Engineers who built deep inner loop skills over years carry those skills into the middle loop. They know what good code looks like because they wrote it themselves for a long time. They can evaluate AI output against a standard that came from experience. The AI handles the volume; the human provides the judgment.
There is another version that goes poorly. Junior engineers entering the field now may spend most of their formative years in the middle loop, accepting or rejecting AI output without the underlying understanding that makes supervisory work reliable. They develop pattern-matching over AI outputs, not over the problem domain itself. The evaluation they perform is shallow; carelessness is not the cause, but rather a lack of depth that reliable evaluation requires.
Vella’s framing of the role shift as traumatic reflects this. It is not just that the work is different; it is that the path to competence is different, and no one has fully mapped that path yet. The inner loop had apprenticeship built in: you wrote code, you saw what broke, you got better. The middle loop has no equivalent natural structure.
The Career Ladder Problem
Most engineering career ladders were built around inner loop contribution. Technical progression meant writing more complex code, owning more systems, mentoring others on implementation. The artifacts were legible: pull requests, system designs, shipped features.
The middle loop produces different artifacts. A skilled supervisory engineer might write fewer lines of code than a junior engineer while doing substantially more valuable work. They have written tighter specifications that kept the AI on track, caught a class of subtle bugs the AI consistently produces in this codebase, and redirected an agent that was halfway into an architectural mistake. None of that is easy to see in a diff.
Organizations that do not update their model of what engineering contribution looks like will mismeasure who is doing good work. That’s a slower-burning problem than skill atrophy, but equally real.
The research Vella conducted is a snapshot of a transition that is still early. The models have improved since April 2025; the supervisory frameworks have not kept pace. At some point, the gap between how fast AI is taking over the inner loop and how slowly organizations are building the infrastructure for the middle loop will produce its own class of failures. The question worth asking now, before those failures arrive, is what it means to deliberately train for the middle loop rather than simply fall into it.