· 6 min read ·

When the Agent Owns the Inner Loop

Source: martinfowler

The debate about AI in software development tends to cluster around tasks: which ones agents can handle, which still require human judgment, and what happens to careers as the list shifts. Kief Morris’s piece on martinfowler.com, published in early March 2026, reframes the question more cleanly. The right unit of analysis is the loop, not the task.

The Loop Structure of Software Work

Software development has always been organized around nested feedback cycles. The terminology got formalized around 2020 through the developer experience research community, but the structure was always there. The inner loop is the tight cycle an individual developer runs constantly: edit code, build, run tests, interpret results, iterate. It is the grain of daily engineering work, and it maps closely to what flow state researchers mean when they discuss uninterrupted focus. The outer loop is broader, spanning everything that happens once code leaves a developer’s local environment: pull requests, code review, CI pipelines, staging deployments, production observability, retrospectives. It operates on hours to days rather than minutes.

The distinction matters because the two loops have different owners, different latencies, and different failure modes. The inner loop is personal and tight. The outer loop is social and structural. Optimizing one does not automatically improve the other, and pressure on one often creates backpressure on the other. The DevEx research by Noda, Storey, Forsgren, and Greiler formalizes this: inner loop friction degrades flow state, but outer loop friction degrades team throughput, and the two require different interventions.

What Agents Actually Change

AI coding tools started by accelerating the inner loop. Autocomplete gave way to suggestion panels, then inline generation, then conversational refactoring. The improvement was real and measurable: developers report spending less time on mechanical translation from intent to syntax. But the current generation of agentic tools does something structurally different. A tool like Claude Code, GitHub Copilot Workspace, or SWE-agent is not faster autocomplete. It can run the inner loop on its own: read a codebase, form a plan, edit multiple files, execute tests, interpret failures, and iterate. The human is no longer inside the loop. The human is, potentially, outside it.

Morris’s argument is that this is where the framing needs to shift. When an agent can run the inner loop autonomously, the question for humans is no longer “how do I code faster?” but “what do I need to manage so the agent produces good outcomes?” That is a different kind of work, requiring a different kind of infrastructure.

Managing the Loop Is Not Reviewing the Output

There is a failure mode that practitioners are already discovering with autonomous coding agents: treating “manage the loop” as meaning “approve or reject what the agent produces.” That collapses into code review with added latency. It does not capture what loop management actually requires.

Running a working loop means handling the conditions that make the loop productive in the first place. For a human developer, that includes understanding the goal clearly enough to recognize when a solution is wrong even if it passes tests, maintaining context across sessions, managing dependencies between tasks, and deciding when to stop iterating and ship. When an agent runs the inner loop, those responsibilities do not disappear. They migrate upward to whoever is directing the agent.

This is where specification quality becomes load-bearing. An agent working from a vague prompt will produce output that satisfies the prompt, not the actual goal. The gap between those two is exactly what a well-specified task description would close. Writing that specification is not a clerical task. It requires understanding the domain, the constraints, the existing architecture, and the intended user behavior well enough to express them precisely. That is skilled engineering work; it just does not appear in a commit diff.

Context management is the other dimension. Current agentic tools have finite context windows and imperfect memory across sessions. A long-running agent accumulates decisions that later iterations may contradict. Keeping the working context coherent, through structured files like CLAUDE.md, explicit session state, or deliberate intervention at key decision points, is a real operational concern. It is closer to running a pipeline than writing a function.

The Throughput Problem

The Theory of Constraints, applied to software delivery by Gene Kim in The Phoenix Project and operationalized through the DORA metrics, makes a point that is directly relevant here: accelerating a non-bottleneck does not improve overall throughput. If the constraint on your delivery pipeline is code review capacity, faster code generation makes the review queue longer, not shorter.

The “idea to outcome” framing Morris uses captures this. The metric that matters for a software organization is not how quickly an agent writes a function. It is the elapsed time from a clear business or product idea to working software that users can interact with. That pipeline includes specification, implementation, review, testing, deployment, and feedback collection. AI tools have, so far, applied most of their force to implementation. The other stages are largely unchanged.

This means teams adopting agentic tools without restructuring the surrounding loop are likely to see a specific pattern: the implementation stage compresses, review backlogs grow, and the overall idea-to-outcome time does not improve proportionally. Kief Morris has the right background to see this clearly. His Infrastructure as Code work is fundamentally about the same problem: replacing manual, human-in-the-loop operations with automated processes that still require careful design and management. The agent running a coding loop has a direct structural parallel to Terraform applying an infrastructure plan. The human’s job in both cases is to define the desired state, constrain the blast radius, and review what diverged from intent.

What the Tools Do Not Yet Support

Current agentic coding tools are built around the inner loop. Their interfaces, mental models, and observability affordances are oriented toward the session in which the agent writes code. They do not provide much infrastructure for the human work of loop management.

There is no standard tooling for expressing the kind of structured specification that reliably directs agent behavior across a complex multi-file task. There is no common format for tracking which agent decisions were intentional versus incidental, so that a reviewer can focus attention appropriately. There is no feedback path from outer loop signals, such as production errors, user complaints, or performance regressions, back into the context that shapes agent behavior on the next task. These are infrastructure gaps, and they are the gaps that will determine whether teams using agents actually compress their delivery timelines or just replace one kind of bottleneck with another.

The analogy to early CI/CD is instructive. Automated build and test pipelines compressed parts of the outer loop significantly, but the teams that captured the most value were the ones that also restructured how they wrote tests, how they scoped commits, and how they treated the main branch. The tool was only part of the change. The same logic applies here.

The Skill Set Shift

Morris’s framing carries a practical implication worth naming directly. The skills that make someone effective at managing the working loop are not the same skills that make someone fast at writing code. They are closer to the skills required in platform engineering or site reliability engineering: systems thinking, clear specification, feedback loop design, and recognizing when a process is producing bad outputs before those outputs become expensive to reverse.

This does not mean that coding skill becomes unimportant. Understanding what the agent is producing, recognizing when output is subtly wrong, and knowing which abstractions to reach for all require deep technical knowledge. But the primary interface between a skilled engineer and the delivery pipeline is shifting. The engineer who can design and run the outer loop well, who can keep the working loop pointed at the right outcomes, and who can write the specifications that make agent behavior predictable, is becoming more valuable than one who is primarily fast at the inner loop. That is the shift Morris’s framework is pointing at, and it is worth taking seriously before the tooling catches up.

Was this interesting?