· 6 min read ·

The Pipeline Stage Your AI Coding Tool Can't Reach

Source: hackernews

The argument in Andrew Murphy’s piece earned over 300 upvotes and 200 comments on Hacker News, which is a reasonable signal that it touched a nerve. The core claim is tight: code writing speed is not what limits software delivery, and organizations that invest in tools to accelerate it without examining the rest of their pipeline are optimizing the wrong stage.

The argument is not new. Fred Brooks made a version of it in The Mythical Man-Month in 1975, arguing that adding engineers to a late project makes it later, partly because more code production without proportional review capacity compounds coordination overhead. The DORA research program has been measuring this empirically since 2014. Queuing theory gives the mechanical explanation. And yet each new generation of productivity tooling prompts the same misapplication.

Where Developer Time Actually Goes

The premise that code writing speed is the primary constraint assumes developers spend most of their time writing code. Research consistently shows otherwise. Microsoft Research studies on developer activity and the Stripe Developer Coefficient report from 2018 both found that direct code production accounts for roughly a quarter to a third of a developer’s working time. The rest distributes across code review, debugging, waiting for CI/CD pipelines, meetings, and reading unfamiliar code to understand what it does.

If you halve the time spent writing code, you improve a stage that represents roughly 30% of total cycle time. In the most optimistic case, that produces a 15% reduction in cycle time. In practice, the reduction is smaller, because time saved writing code tends to accumulate in the queue for the next constrained stage rather than translating directly to faster delivery.

Little’s Law Makes This Precise

Little’s Law from queuing theory gives this problem a precise formulation: L = λW, where L is the average number of items in the system, λ is the throughput rate, and W is the average time an item spends in the system.

When you increase λ, the rate at which code gets produced and submitted for review, W stays constant unless the downstream bottleneck also improves. For most software teams, W is dominated by review wait time, not production time. Increasing λ without reducing W means L grows: more work in progress, more branches open simultaneously, more review requests arriving per day than reviewers can process. The queue lengthens even as individual developer output rises.

This is the direct application of Goldratt’s Theory of Constraints from The Goal: improving a non-bottleneck resource does not improve system throughput. It increases the buffer in front of the actual constraint. Gene Kim’s The Phoenix Project dramatized exactly this dynamic applied to software delivery, and the DORA research program has provided empirical evidence for the same conclusion across thousands of teams.

Code Review as the Specific Bottleneck

Code review has structural properties that make it persistently slow at most organizations. It requires context that reviewers may not have immediately available. It demands schedule coordination between author and reviewer. It depends on reviewer capacity, which is constrained by their own active work. These factors compound: a reviewer blocked on their own in-flight PR cannot review yours; a reviewer missing context for your change asks clarifying questions that add a day to the cycle; a change requiring multiple reviewers needs all of them to align.

Review latency also grows with PR size. A large change takes longer to understand, invites more feedback rounds, and cycles through multiple revisions. Teams that batch work into large PRs to reduce submission overhead pay compounded costs at review. The teams with the shortest delivery cycles keep changes small and independently deployable, not for ideological reasons but because the queue dynamics reward it.

The 2023 DORA State of DevOps report found that elite-performing teams had lead times for changes under one hour, while low performers measured lead time in weeks to months. The differences between those groups centered on deployment frequency, branch lifetime, and automation coverage, not code writing speed.

What AI Coding Tools Do to the System

GitHub’s 2022 Copilot study found that developers completed a coding task 55% faster when using Copilot compared to a control group. That is a real, significant individual productivity gain. Subsequent research found that Copilot users submitted more pull requests per week than matched non-users.

More PRs submitted per week against fixed review capacity produces a longer review queue. A developer who is individually more productive but whose output accumulates in a review backlog is not delivering faster at the team level. They are accumulating work in progress, which introduces its own costs: context switching when reviews return after delays, rework cycles across multiple open branches, and the overhead of keeping several in-flight changes coherent with one another.

There is an additional effect specific to AI-generated code. Reviewers reading AI-generated output face a different task than they do with hand-written code. AI-generated code is often structurally plausible but may carry semantic errors that require careful inspection to catch. The review burden per line may increase even as the production cost per line decreases. A team that adopts AI coding tools without adjusting its review practices can end up with a queue that grows in both size and per-item cost simultaneously.

I have seen this in my own work building Discord bots. A feature might take an afternoon to write. In a solo project, it ships the same day. In a collaborative one, it waits. No tool that accelerates the writing changes the waiting, and the ratio of writing time to queue time gets worse as the writing gets faster.

The Stages That Actually Constrain Delivery

For teams that have not deliberately optimized their delivery pipeline, the bottlenecks worth addressing are specific.

Review wait time responds to smaller PRs, better PR descriptions that give reviewers the context they need to orient quickly, and automated pre-review checks that catch obvious issues before human review. Strong linting, type checking, and test coverage narrow the surface area a reviewer needs to examine. These investments reduce review latency without requiring any change to code generation speed.

CI/CD pipeline duration is a second consistently underaddressed constraint. A test suite that takes 30 minutes to run adds 30 minutes to every development cycle at minimum, and more when developers wait for results before continuing. Parallelism, test result caching, and selective test execution reduce this without changing the application code. The DORA program’s associated research in Accelerate identified fast automated test feedback as one of the strongest predictors of delivery performance across the dataset.

Deployment automation addresses the third major latency source. Manual deployment steps, approval gates for routine releases, and environment configuration requirements add time that compounds with deployment frequency. The highest-performing teams in the DORA data deploy on demand, multiple times per day, without manual intervention. That capability requires investment in deployment infrastructure, not in the speed of writing application code.

Measuring the Right Stages

The practical starting point is measuring lead time end-to-end, from first commit to production deployment. Most teams have good visibility into activity metrics (commits, PR submissions, lines changed) and poor visibility into wait metrics: how long a PR sits before its first review, how long between merge and deployment, how long CI takes on average across all builds.

Exposing wait times reveals which stage is the actual constraint. For most teams that have not deliberately optimized their pipeline, that measurement surfaces review latency or CI duration as the binding factor, not code writing time.

Murphy’s piece is right that if you thought writing code faster was your problem, the implication is that your actual problems are more structural and harder to address. They involve process change, infrastructure investment, and in some cases organizational trust. But they are tractable once you identify which queue is holding things up, and that requires looking past the stage that is easiest to instrument and easiest to sell a product against.

Was this interesting?