The Queue You Are Not Looking At

Andrew Murphy’s post on debugging leadership landed with a familiar kind of resonance: everyone who has sat with a pull request open for three days knows it is true. But the intuition runs ahead of the mechanism, and the mechanism is where the real argument lives.

The claim is not that coding speed is unimportant. It is that coding speed is not the constraint. And in systems thinking, improving a non-constraint does not improve throughput. It fills the queue.

Where Developer Time Actually Goes

The empirical picture has been fairly stable across surveys for years. McKinsey’s 2021 Developer Velocity research found that roughly 35% of developer time goes to tasks that directly produce code: writing, debugging, and testing locally. The remaining 65% accumulates in meetings, requirements gathering, code review (giving and receiving), waiting on CI, resolving comments, deployment coordination, and context-switching overhead.

The Stripe Developer Coefficient report from 2018 found that roughly 42% of developer time went to managing technical debt and maintenance rather than new development. Stack Overflow’s developer surveys have consistently put code writing at around a quarter to a third of working hours.

If writing code is 30% of a developer’s time, and an AI assistant makes that 30% go twice as fast, you have freed up 15% of total time. That is not nothing. But if the constraint is in the other 70%, the constraint is untouched.

This is Goldratt’s observation from The Goal, applied directly: in any system optimizing for throughput, there is exactly one binding constraint. Improving anything that is not that constraint does not improve throughput. It builds the queue. Gene Kim applied this same framework to software organizations in The Phoenix Project and the DevOps Handbook, and the pattern repeats across every transformation story in that literature: the team made developers write code faster, and delivery got slower, because the code piled up at review.

The Queuing Math Is Not Optional

Most engineering managers have a qualitative sense that slow review is bad. Fewer have worked through the quantitative severity of it, because the math is surprisingly brutal.

Little’s Law states that for any stable system, the average number of items in the system equals the throughput rate multiplied by the average time each item spends in the system: L = λW. If your team’s review throughput (λ) holds steady while an AI tool doubles the rate at which you open pull requests, the average time each PR spends waiting (W) doubles. You have more code in flight, but it moves no faster to production.

The M/M/1 queue model describes wait time as a function of utilization. The formula for expected waiting time is:

W_q = ρ / (μ(1 - ρ))

Where ρ is utilization (arrival rate divided by service rate) and μ is the service rate. At 50% utilization, wait time equals service time. At 80%, wait time is four times service time. At 90%, nine times. At 95%, nineteen times. The relationship is hyperbolic: you do not pay linearly for higher utilization.

Kingman’s formula refines this for real-world variability:

W ≈ (ρ / (1 - ρ)) × Cv² × S

The Cv² term captures coefficient of variation in both arrival and service times. Software review is extremely variable: some changes are reviewed in 20 minutes, others sit for days. High variability at high utilization is catastrophically expensive in wait time terms.

Now add the second reviewer. Most teams think of serial review stages as additive: if each stage has a four-unit expected wait, two stages cost eight units. That is wrong. Serial queues multiply. Two independent reviewers each at 80% utilization produce an expected wait closer to 16x the base service time, not 8x. Three reviewers: 64x. Four: 256x.

Avery Pennarun worked through a version of this reasoning in a post that circulated widely in March 2026, arguing that every additional review layer makes a team roughly 10x slower, and that this is probably conservative rather than exaggerated. The Lobsters discussion around it split between “yes obviously” and objections about correctness, but the queuing math does not respond to the correctness objection; it just describes where time goes.

What the AI Coding Studies Actually Measure

GitHub’s widely cited Copilot productivity study found that developers completed isolated coding tasks 55% faster with the tool. That number is real. It is also measured in the most favorable possible conditions: controlled exercises, fresh code with no existing context, clear specifications, no downstream review queue.

Apply that to the time distribution above. If coding is 30% of total time and AI makes it 55% faster, total time reduction is around 16%. If the constraint is not in the coding stage, that 16% reduction in coding time translates to near-zero improvement in lead time from idea to production.

The perverse case is worse. Teams using AI tools are submitting more pull requests per week. GitClear’s analysis of AI-assisted repositories found that code churn rates increased as AI tool adoption rose: code accepted from AI gets replaced at higher rates later, suggesting review quality is insufficient for catching subtle architectural misfit. Against fixed review capacity, more PRs arriving means longer wait times and more context-switching as reviewers switch between more open changes. Lead time from commit to production can increase even as individual developers report feeling more productive. Both things are simultaneously true.

This is not an argument against AI coding tools. It is an argument about what problem they solve. For a solo developer, or a very small team with no review backlog, or a team working on well-specified greenfield code, AI assistance at the coding stage delivers real throughput improvement. The DORA research on software delivery performance tracks the conditions that separate elite performers from low performers, and those conditions are not about tool choice at the coding stage. They are about cycle time, batch size, and automation coverage across the full delivery pipeline.

The Bystander Effect in Your Review Queue

There is a second problem that accumulates on top of the queuing problem: adding more required reviewers does not produce proportional quality improvement.

The SmartBear study of 2,500 code reviews at Cisco found that one to two reviewers produced the highest defect detection rates per reviewer-hour. Above three reviewers, defects found per person declined while cycle time and total review effort both increased. Google’s Modern Code Review research found that most review comments in practice address style, naming, and minor refactoring, not correctness bugs, and that serious defects tended to surface in production or automated tests regardless of review depth.

The mechanism is diffusion of responsibility. The Latané and Darley bystander effect research generalizes: as group size increases, each person’s felt accountability for the outcome decreases. With five required approvers, each one’s implicit reasoning is that the others will catch anything serious. The organization that added each approval requirement was trying to add protection; the behavioral result is that each individual review is less careful.

GitHub’s CODEOWNERS system amplifies this. Entries accumulate after incidents and audits. Removing an entry looks like accepting risk; adding one looks like prudence. The ratchet only turns one direction. Teams end up with changes requiring sign-off from four or five distinct groups, with no single reviewer chartered to examine the interactions between domains.

What Actually Separates Fast From Slow Teams

The Accelerate research (Forsgren, Humble, Kim, 2018) tracked what distinguished high-performing software organizations over years and across thousands of organizations. The predictors of high performance were architectural and cultural, not tooling:

Trunk-based development, with branches measured in hours rather than days
Continuous integration with automated test suites running in under 10 minutes
Deployment automation, decoupled from the code review step
Small batch sizes enforced structurally rather than by policy
Feature flags separating deployment from release
Fast rollback capability reducing the psychological stakes of each deploy

The Ship/Show/Ask framework from Rouan Wilsenach, published on Martin Fowler’s site, captures the calibration problem directly. Not all changes need the same review posture. Routine, well-understood changes that fit established patterns can ship without blocking review. Changes worth communicating but not blocking on can merge first. Only genuinely novel, high-risk, or cross-domain changes need the full blocking review cycle. Most teams apply the third posture uniformly because it is the safe-looking choice, not because it is the correct calibration.

For the teams that have actually solved the outer loop, AI coding assistance is a genuine multiplier. Short review cycles and fast CI pipelines absorb increased code production rate without building queue. For teams still running multi-day review times and manual deployment steps, the coding speed improvement lands entirely in the queue and produces no delivery improvement.

The Underlying Career Incentive Problem

One thing the original article gestures at but does not fully develop: the review bottleneck persists in part because the individual incentive structure does not align with fixing it. A developer who closes ten tickets a week looks productive by the metrics most engineering organizations track. Whether those tickets spend three days in review before merging is not in the same dashboard.

Annie Vella’s research on supervisory engineering adds a complication from the AI direction: as AI tools handle more of the code writing, the job is shifting toward evaluation and verification. That shift requires the kind of deep architectural judgment that comes from having written code yourself, and it requires fast feedback loops to function well. A developer who spends most of their time managing AI outputs through a slow review pipeline is developing neither set of muscles well.

The DORA research found that elite performing teams deploy multiple times per day with lead times under an hour. Low performers deploy monthly with lead times between one week and one month. That gap is not explained by how fast developers type. It is explained by what happens to code between the moment it is written and the moment it reaches production. AI tools touch the moment of writing. The gap lives elsewhere.

The McKinsey 2023 developer productivity work proposed the SPACE framework (Satisfaction, Performance, Activity, Communication, Efficiency) precisely because activity metrics, including commit counts and PR volumes, are the dimension most likely to mislead in isolation. Doubling PR volume while doubling review latency is not a productivity improvement by any measurement that tracks outcomes rather than motion.

Faster code writing is a good thing. It is just not the variable that explains why software takes as long as it does to ship. The variable is the queue you are not looking at.