The phrase that deserves the most scrutiny in Martin Fowler’s recent fragment on Annie Vella’s research is not “supervisory engineering work,” though that label is doing useful clarifying work. It’s the question that motivated the study: whether AI tools are shifting what skills we practice and, ultimately, the definition of the role itself.
Vella studied 158 professional software engineers and found a consistent pattern: the work is moving from creation to verification, from writing to directing, evaluating, and correcting. Fowler adds a spatial metaphor, the middle loop sitting between inner loop coding and outer loop delivery, that clarifies where this new work lives. Both framings capture what’s happening to how engineers spend a given day. Neither addresses what this means for how engineers develop, how seniority gets defined, or how the profession identifies and develops the people who do supervisory work well.
Those are harder questions, and the industry has mostly avoided them so far.
How Engineering Careers Were Actually Built
The traditional software engineering career path is built on inner loop accumulation. You get better at programming by programming. Junior engineers write features under supervision, learn to debug by debugging production failures, develop architectural taste by writing and rewriting abstractions until they understand why some hold up and others collapse. The inner loop is not just where work happens; it’s where understanding gets built.
This is why seniority in software engineering correlates so reliably with years of experience doing specific kinds of technical work. You know which SQL patterns will destroy performance under load because you’ve debugged enough slow queries. You know the mistake categories that come easily in concurrent code because you’ve shipped several of them. You know when a system design is going to create problems eighteen months from now because you’ve lived through the systems that created those problems. None of that knowledge arrives through review alone. It accumulates through construction.
The GitClear analysis of AI-assisted codebases found that code churn rates increased in repositories with heavy AI tool usage, meaning code accepted from AI gets replaced at higher rates later. This is partly a quality problem in the generated output. It’s also a signal about the supervisory layer: the engineers reviewing that code often lack the depth to catch what’s wrong because the errors are subtle, context-dependent, and not apparent from reading the code against its own local logic.
The Skill That Supervision Requires
Writing code and evaluating code look similar from outside but are different cognitive activities. Writing forces you to reason about invariants as you construct them; you can’t produce correct concurrent code without thinking through the concurrency as you go. Evaluating requires you to reconstruct that reasoning from the artifact after the fact, hold in mind both what the code does and what it should do, and notice the gap.
The second task is harder in a specific way. It requires an internal model of failure modes to search against. You find the bug in the AI-generated authentication flow because you’ve thought carefully about authentication failure modes before. Without that model, the code looks fine. It compiles, it passes the tests, it looks syntactically correct. The gap is invisible if you don’t have a prior sense of what gaps look like in this class of problem.
Automation bias compounds this. Research documented in Human Factors on operators in high-automation environments shows consistent under-scrutiny of automated output, especially when the system is usually correct. AI coding tools are usually correct enough that the bias is easy to rationalize. The engineers who resist it are the ones who maintain active skepticism, which is effortful and does not come naturally. The ones who resist it most reliably are the ones whose inner loop experience gave them concrete expectations about what failure looks like.
Supervisory engineering, in other words, depends on inner loop experience even as it replaces inner loop work.
What This Does to Junior Engineers
If junior engineers enter the profession and spend most of their time in the middle loop, directing and evaluating AI that handles code generation, they are not building the reservoir that supervisory work draws on. They’re doing supervisory work without the foundation that makes it possible to do well.
This is the part of the role definition problem that has no clean answer yet. The profession has always trained junior engineers through doing: junior developers write code, get it reviewed, learn from the feedback, and build up mental models through that cycle. If AI generates the code and the junior engineer’s job is to review AI output rather than write, that learning cycle is disrupted at its core. The review produces different learning than the writing did.
Fowler notes that Vella’s research concluded in April 2025, before the current generation of models significantly improved their software development capabilities, and that his read is that the improvement has accelerated the shift toward supervisory engineering rather than reversed it. More capable models generate more plausible code, which requires more careful evaluation, not less. The middle loop gets heavier as the models improve, and the demand for engineers who can supervise it well increases at the same time that the traditional path for building those engineers is narrowing.
The Metric Problem
Beyond the pipeline question, there is a measurement problem that career frameworks have not solved.
Seniority in software engineering has historically been evaluated through visible artifacts: the complexity of systems designed, the scope of code reviewed and shipped, the number of incidents resolved, the junior engineers mentored. These are imperfect proxies for engineering judgment, but they are at least observable. Supervisory engineering contribution is harder to observe. The engineer who consistently catches subtle errors in AI-generated authentication code before it ships is providing enormous value; that value looks invisible in any metric that tracks output rather than harm prevented.
Job descriptions for senior software engineers still emphasize the ability to design and build systems. Leveling frameworks at most companies reward technical depth measured primarily through creation. Performance reviews ask how much shipped, not how well AI output was evaluated before it shipped. The incentive structure has not been updated to match the work structure.
The practical consequence is predictable. When verification quality is not measured and shipping speed is, engineers are rewarded for fast supervisory work even if it lets bad code through. The engineer who accepts AI output with confidence and ships quickly outperforms the engineer who scrutinizes carefully and ships slower. The correct equilibrium, fast shipping of verified code, requires that both dimensions get measured. Right now, most organizations are only measuring one.
What the Role Definition Actually Requires
Vella’s research identifies a shift in what software engineers do. Fowler’s middle loop framing names where the new work lives. The question neither addresses directly is what the profession needs to do about the gap between the work that is emerging and the career structures that were built for a different version of the work.
The engineers who will be most effective in the supervisory mode are almost certainly the ones who built strong inner loop skills before the mode shifted, because those skills are what make supervisory evaluation reliable. That observation suggests the field should think carefully about preserving space for junior engineers to develop inner loop depth even in AI-saturated environments, not because code generation is inherently valuable as an activity, but because the understanding it builds is what supervisory work requires.
It also suggests that organizations need frameworks for evaluating and rewarding supervisory contribution, not just creation. If the work has changed, the definitions of good work need to follow. The definition of the role is shifting whether the formal definitions update or not. The engineers and organizations that notice this early will be in a better position than the ones who recognize it only after the mismatch has become costly.