Faster Code Writing Just Moves the Queue

Andrew Murphy put up a post arguing that if you thought code-writing speed was your core problem, you have bigger problems. It landed at 303 points on Hacker News and generated 205 comments, which is a reasonable signal that it resonated. The argument is correct, but I think it undersells how predictable this outcome was, and why that predictability matters for how you think about adopting these tools.

Eliyahu Goldratt laid out the core principle in The Goal in 1984. The Theory of Constraints says that every system has exactly one binding constraint at any moment, and that improving non-constraints does not improve throughput. It makes the output faster to produce more work-in-progress that piles up behind the constraint. Goldratt’s factory analogy was a physical production line, but software delivery is a production system with stages, queues, and handoffs, and the same dynamics apply.

The inner loop of software development is the part you control alone: writing code, running local tests, iterating on an implementation. GitHub Copilot, Cursor, and Claude Code have genuinely accelerated this loop. GitHub’s own research measured a 55% reduction in task completion time for isolated coding exercises. Those numbers hold up in practice for well-defined tasks. The inner loop is faster.

The outer loop is everything that happens after you push: code review, CI/CD pipeline execution, deployment approvals, staging environment verification, and the coordination overhead of getting multiple people and systems to agree that a change is ready. This is the part that was already the constraint before AI tools existed. Making the inner loop faster increases the rate at which work arrives at the outer loop. If the outer loop’s capacity does not change, latency gets worse.

Where Engineers Actually Spend Their Time

There is decent data on this. A 2021 McKinsey survey of software developers found that developers spend roughly 35% of their time on tasks that directly produce code: writing, debugging, and testing locally. The remaining 65% goes to meetings, requirements gathering, code review, waiting for CI, resolving comments, navigating deployment processes, and coordination work of various kinds.

The DORA State of DevOps Report has been tracking delivery performance since 2014 across thousands of teams. The metrics it uses, deployment frequency, lead time for changes, time to restore service, and change failure rate, tell a consistent story. The highest-performing teams deploy on demand, sometimes dozens of times per day per developer. The lowest-performing teams deploy monthly or less. The gap between those tiers is not explained by how fast developers write code. It is explained by CI/CD maturity, automated testing coverage, deployment automation, and the cultural norms around code review.

In the 2023 DORA report, the median lead time for changes among elite performers was less than one day. Among low performers, it was between one week and one month. No amount of AI-assisted coding closes that gap, because the gap is not located in the writing stage.

Brooks Was Describing the Same Constraint

Fred Brooks named a related dynamic in No Silver Bullet in 1986. He distinguished between the accidental complexity of software, the difficulty introduced by tools, languages, and processes, and the essential complexity, which is inherent in what the software needs to do. His argument was that most of the remaining difficulty in software development was essential, not accidental, and that no tool would provide an order-of-magnitude productivity improvement because the hard parts were not about the mechanics of writing.

Brooks was not talking specifically about code-writing speed, but the underlying point applies: the difficulty of software development does not live primarily in translating requirements into syntax. It lives in understanding requirements, coordinating across teams, managing change in live systems, and making good decisions under uncertainty. These are coordination and judgment problems, not typing problems.

Brooks’ Law, the separate observation that adding people to a late project makes it later, also has a structural explanation in the Theory of Constraints. Adding developers increases the rate of code production, which increases the load on review, integration, and testing processes. If those processes are not scaled proportionally, the bottleneck worsens. More contributors means more coordination overhead: more PRs to review, more conflicts to resolve, more state to synchronize. The load compounds faster than the benefit.

AI tools do not add headcount in the traditional sense, but they do increase the rate at which code is produced per developer. The structural effect is similar. More code means more to review, more to test, more to integrate, and more surface area for defects to hide in.

The PR Queue Is Already the Evidence

Anyone who has worked on a team of more than five engineers has observed the PR queue problem directly. A developer opens a pull request. It sits for hours, sometimes days, before receiving a first review. The author context-switches to something else. When review comments arrive, they require switching back, re-reading the original change, and responding. If the review requires significant revision, the cycle repeats. The latency is not in the writing; it is in the waiting.

Research by Laura MacLeod and colleagues, published with Microsoft Research, found that the median time for a code review at Microsoft was around 24 hours for the first response, and that developers cited waiting for review as one of the primary frustrations in their workflow. A separate study by Bernardo et al. examining open-source projects found median review turnaround times ranging from a few hours to several days depending on project size and contributor count.

AI-generated code does not reduce this latency. It may increase it. Reviewers still need to understand what the code does, verify that it handles edge cases correctly, and check that it fits the existing architecture. Code that was generated by an AI but not carefully reviewed by the author is harder to review, not easier: it may be syntactically coherent but architecturally wrong, or it may fix the stated problem while quietly changing adjacent behavior. The METR finding that many SWE-bench-passing patches would not be merged into the projects they target points at exactly this gap between technical correctness and mergeability.

What Shifting the Constraint Looks Like in Practice

I build Discord bots. The bots themselves are not large codebases, and AI tools have meaningfully cut the time I spend on the mechanical parts: command handler boilerplate, embed formatting, permission check scaffolding, the repetitive plumbing between the Discord API and whatever backing store I am using. That part is faster.

The parts that are not faster are: figuring out exactly what behavior a command should have in ambiguous cases, deciding how to handle rate limit backpressure without breaking the user experience, reasoning about what happens when two shards receive conflicting state, and testing interactions that depend on Discord’s actual API behavior in ways that are hard to mock reliably. These require judgment and experimentation, and no amount of faster code generation speeds up the judgment part.

For a team shipping production software, the shape of the problem is similar but scaled. A team that was previously constrained at code writing and adopts Copilot or Cursor will, if the theory holds, find itself with more PRs in flight, a longer review queue, and more pressure on the parts of the pipeline that were not the bottleneck before. The teams that get the most out of AI coding tools are the ones that simultaneously invest in the outer loop: more reviewers, faster CI, better deployment automation, clearer requirements processes. Teams that only invest in the inner loop will see diminishing returns faster than they expect.

The DORA metrics framework is a practical tool for diagnosing where the constraint actually lives. If your deployment frequency is low and your lead time is long, you already know the inner loop is not the binding constraint. Measuring these numbers before and after adopting AI coding tools gives you the evidence to make the investment case for the outer loop improvements that will actually move the numbers.

The Productivity Claim and What It Measures

The 55% task completion time improvement from GitHub’s Copilot research is cited frequently. It is worth reading the methodology. The tasks were isolated coding exercises completed in a controlled setting: participants implemented a web server in JavaScript without existing code context. This is approximately the best-case scenario for an AI coding assistant, and it still measures only the inner loop.

What the study did not measure: whether the code was correct beyond the test cases, how long review would take, whether it fit an existing codebase’s patterns, or how often it would need revision before being mergeable. The productivity gain is real within its scope; the scope excludes most of what determines whether a team ships faster.

Goldratt’s point was never that optimizing a non-constraint does nothing. It does something. It makes the non-constraint cheaper and faster, which is useful if you plan to eventually remove the actual constraint. But if your review queue has a median 48-hour turnaround and you reduce your coding time by 55%, the net effect on delivery latency is small, because coding was not contributing 55% of the total delivery time. It was contributing something closer to the 35% that McKinsey estimated, and the 55% reduction applies to that fraction.

The teams that will get genuine delivery improvements from AI coding tools are the ones treating the tools as a reason to invest in the rest of the pipeline, not as a substitute for that investment. Making the inner loop faster is an opportunity to expose and address the outer loop bottlenecks that were always there. That is what Goldratt would have predicted in 1984, and it is what the evidence shows now.