· 6 min read ·

Creation Was the Part We Did Not Expect to Automate

Source: martinfowler

Annie Vella’s study of 158 professional software engineers, summarized in Martin Fowler’s March 16 fragments post, gives a name to something that has been happening in plain sight: engineers are shifting from creation-oriented tasks to verification-oriented ones. Vella calls the new mode “supervisory engineering work” — directing AI, evaluating its output, and correcting it when wrong. Fowler adds a structural frame: a middle loop, sitting between the fast inner loop of write/build/test/debug and the slow outer loop of commit/review/CI/deploy/observe.

That frame is accurate. What I want to think through is Fowler’s specific word for describing the shift: traumatic. Previous tool generations did not earn that word.

What Previous Transitions Automated

The history of software tooling is largely a history of automating things developers were happy to be rid of. Manual memory management was tedious and error-prone; garbage collection took it off the plate, and few mourned the loss. Build system configuration was a distraction from actual work; package managers made it someone else’s problem. Manual deployment steps were nerve-wracking and inconsistent; CI/CD pipelines absorbed them, and the profession embraced that transition enthusiastically.

Each of those shifts automated overhead: the surrounding apparatus of work rather than the work itself. The thing engineers were hired to do — understand a problem, design a solution, translate that solution into working code — remained squarely in human hands. The automation created space for more of it. Engineers experienced these transitions as expansions of what they could accomplish, not reductions in what constituted their contribution.

The middle loop is different in structure. AI tools are absorbing the writing itself: the code generation, the test implementation, the first pass at a solution. These are not peripheral activities. For many engineers, they are the reason they became engineers in the first place. The satisfaction of seeing a function work, of translating a mental model into executable form, of debugging a subtle race condition through careful instrumentation — these are the core experiences that shaped how most working practitioners understand their own competence.

When that is what gets automated, the disruption hits closer to the center of professional identity.

The Competence Signal Problem

Engineering identity is partly built on creation as evidence of understanding. When you write a solution, the act of writing demonstrates that you grasp the problem. You cannot implement a correct sorting function without understanding ordering. You cannot build a caching layer without understanding consistency trade-offs. The output is proof of the knowledge, to yourself as much as to anyone else.

Supervisory engineering disrupts this feedback loop in a specific way. An engineer who prompts an AI agent to implement a feature and correctly evaluates the output has demonstrated genuine technical judgment. But from the outside, and sometimes from the inside, it is hard to distinguish that from an engineer who accepted output they did not fully understand. The code looks the same. The productivity numbers look the same. The difference only becomes visible when something breaks in a way the AI could not anticipate, and the supervising engineer either catches it or does not.

This is not a problem of fraud or competence theater. It is a problem of lost signal. The inner loop provided continuous feedback about whether an engineer’s mental model was accurate. The test that fails when you expected it to pass is direct evidence of something you did not understand. The compiler error that does not make sense forces you to confront a gap in your model of the type system. The middle loop attenuates that signal. You can supervise effectively for months on code you partially understand, and the gap only surfaces under stress.

The Stanford 2022 research on AI coding assistants found that developers using these tools were more likely to introduce security vulnerabilities than those who were not — partly a signal of competence calibration failing under conditions where output looks correct and nothing surfaces the gap until production.

The Institutional Lag

Vella’s research ended in April 2025. Fowler notes that model capabilities have improved substantially since then, which has accelerated the shift rather than reversed it. The implication is that supervisory engineering is becoming more central at roughly the time organizations are least prepared to evaluate it, hire for it, or develop it deliberately.

Engineering organizations are still largely structured around inner loop productivity. Velocity is measured in pull requests merged, features shipped, story points completed. These metrics made sense when friction in the inner loop was the bottleneck. As AI absorbs that friction, what remains is the quality of supervisory judgment, which is harder to measure and currently captured almost incidentally, in defect rates and postmortems and the slow accumulation of technical debt that comes from supervisory failure.

Hiring is similarly misaligned. Technical interviews still heavily weight algorithm implementation and code production under time pressure. These assess the skills that AI is absorbing. The ability to evaluate whether a complex piece of AI-generated code is secure, architecturally sound, and consistent with the rest of a codebase is not well assessed by asking someone to implement a binary tree in thirty minutes. The GitClear 2024 research on AI-assisted codebases found higher rates of code churn compared with non-AI codebases — code written and then significantly modified or reverted at higher frequency. That is a supervisory quality signal, visible at the aggregate level even though individual supervision failures are invisible at review time.

The adjustment will come. Misalignment between what a job requires and how it is assessed tends to resolve over time. But the transition period is the hard part. Current practitioners are being asked to develop supervisory skills that were never explicitly cultivated, against evaluation criteria that still reward creation, with tooling that does not yet make the quality of supervisory work visible.

What the Supervision Depends On

The deepest requirement for effective supervision is a thorough mental model of the systems being built. You cannot catch a concurrency bug you cannot recognize. You cannot evaluate a database migration strategy without understanding the operational implications. You cannot spot a hallucinated API call in generated code without knowing the actual API well enough to notice the discrepancy.

That knowledge comes from implementation experience. This is the uncomfortable part of the current transition: the inner loop, which AI is absorbing, is also where that knowledge was built. Experienced engineers draw on reservoirs accumulated over years of inner loop work. Engineers entering the profession now are building those reservoirs under conditions where inner loop practice is substantially less central than it was.

The aviation comparison appears often in discussions of AI automation, and it is directly applicable here. Instrument ratings and manual flight training requirements exist not because autopilot is unreliable under nominal conditions but because the judgment to recognize when automation is doing something dangerous requires deep understanding of what it should be doing. Startle-and-surprise training requirements were added by aviation regulators specifically because incidents showed that pilots who had not maintained manual skills could not intervene effectively when automation encountered conditions outside its parameters. The manual capability is a prerequisite for effective supervision even when it is rarely exercised in practice.

The middle loop is not a simpler version of software engineering. It is a different mode that requires most of the same foundations, organized toward different ends. The trauma Fowler identifies is real: practitioners who built careers around the act of creation are being asked to step back from it. What the research period ending in April 2025 could not answer is whether the institutions around software engineering, the hiring criteria, the career ladders, the interview formats, the measurement systems, are moving at anything close to the same pace as the models that are driving the shift.

Was this interesting?