· 5 min read ·

The Queue Your AI Tools Don't Reach

Source: hackernews

Goldratt’s Theory of Constraints, developed for factory floors in the 1980s, makes a simple claim: any production system has exactly one bottleneck, and improving any step that is not that bottleneck produces zero net throughput gain. Speed up the cutting machine; if welding is the limit, nothing ships faster. You have just made the queue in front of welding longer.

Software teams keep rediscovering this. Andrew Murphy’s post on the limits of coding speed adds to a long line of observations that writing code is rarely what slows a team down. The timing matters because AI coding tools are now genuinely fast, and the implicit pitch behind most of them is that developer velocity is the constraint worth solving. The data suggests otherwise.

Where developer time goes

Multiple surveys of developer time allocation have converged on a consistent picture over the past decade. The Stack Overflow Developer Survey and similar studies consistently find that developers spend roughly 30% of their time writing new code. The rest is split across code review (giving and receiving), debugging, testing, meetings, documentation, and deployment work.

A widely cited study from UC Irvine found that developers switch context every three minutes on average and need roughly 23 minutes to return to full concentration after an interruption. The bottleneck is not the typing; it is the discontinuity.

The clearest empirical picture of what separates fast-shipping teams from slow ones comes from DORA, the DevOps Research and Assessment program that has been tracking software delivery performance since 2014. Their key metrics are: lead time for changes (how long from commit to production), deployment frequency, mean time to recovery from incidents, and change failure rate. None of these measure how fast code is written. The DORA State of DevOps reports show that elite-performing teams deploy multiple times per day; low performers deploy less than once per six months. The gap is not explained by how fast the developers type.

Little’s Law and the review queue

The mathematical reason this matters comes from queueing theory. Little’s Law states:

L = λW

L is the number of items in the system, λ is the throughput rate, and W is the average time each item spends in the system. Applied to a PR review workflow: L is open PRs, λ is merge rate, and W is lead time.

If you add AI coding tools and developers write PRs twice as fast, the incoming PR rate increases without a matching increase in reviewer throughput. Open PRs accumulate. Little’s Law predicts that lead time either stays flat or increases. You have more work in flight but it moves no faster.

This is not a hypothetical. Teams that have adopted AI coding tools without simultaneously changing their review process often report that their review queues have grown. The code arrives faster; the reviewers are the same people moving at the same speed.

The compounding effect of sequential reviews makes this worse. If a PR requires three approvals and each reviewer has a 24-hour expected turnaround, the expected calendar time for approvals is not 24 hours, it is 72 hours at best, and longer once you account for requested changes that restart the clock. For a feature a developer implements in four hours, a 72-hour review wait represents an 18:1 ratio of waiting to working. Halving the implementation time to two hours changes the ratio but not the lead time in any meaningful way. The feature still arrives in production at roughly the same calendar date.

What high-performing teams actually optimize

The DORA research points clearly at the practices that improve delivery speed. Trunk-based development, where engineers commit directly to the main branch or use very short-lived branches (under a day), is one of the strongest predictors of elite performance. The practice enforces small batches by making large changes structurally difficult. A PR that takes 15 minutes to review gets reviewed the same day; one that takes two hours sits in a queue.

Continuous integration with fast pipelines removes another common wait. A 45-minute CI pipeline imposes a 45-minute wait on every review iteration. Teams that bring CI under 10 minutes report meaningful reductions in lead time, not because developers write code faster, but because the feedback cycle compresses.

Review culture also matters in ways that tooling cannot fix directly. Teams that treat code review as a first-class responsibility, rather than something done when there is spare time, have consistently lower lead times. This is an organizational norm, not a software problem, and no amount of AI-assisted code generation changes it.

When inner-loop speed does matter

Coding speed is not irrelevant. For a solo developer or a team of two or three where there is no review queue and the person writing code is the same person deploying it, the inner loop is most of the outer loop. Compressing it has direct throughput impact.

Exploration and prototyping work has similar characteristics. When the goal is to generate multiple candidate solutions quickly, writing code faster lets you cover more options in a session. AI coding tools are genuinely valuable here, and I use them for this constantly when building out bot features.

The issue is that most teams above a handful of engineers are not in this situation. They have review processes, approval chains, deployment gates, and QA cycles. For them, faster code generation produces a faster arrival rate into the review queue without changing what happens next.

The optimization mismatch

Murphy’s post frames this as a diagnostic question: if you believe coding speed is your constraint, that belief points to something about your workflow. Teams where the inner loop is actually the bottleneck are either very small, very autonomous, or have already solved their outer loop problems well enough that the inner loop has become visible.

The tools that would have a larger impact on most teams are less exciting to build and harder to demo. Better requirement specification reduces rework rate. Async review tooling that surfaces context reduces the cognitive cost for reviewers. Deployment automation eliminates manual approval steps. Observability that surfaces production failures quickly reduces debugging time. None of these generate the same kind of visible, session-level speedup that watching an AI complete a function does, which is part of why they keep getting deprioritized.

There is a broader measurement problem underneath all of this. The things that are easiest to measure, lines committed, code coverage, PR count, are mostly inner-loop metrics. The things that predict delivery performance, lead time, batch size, review latency, are harder to instrument and harder to improve without changing how a team works. Velocity metrics measure activity; DORA metrics measure throughput. They are not the same thing, and confusing them is expensive.

Teams that consistently ship fast tend to have optimized their outer loop to the point where the inner loop becomes visible as a constraint. Getting there requires working on the queue before working on the upstream supply rate that fills it.

Was this interesting?