· 6 min read ·

Review Overhead Compounds Because Review Is a Queue

Source: hackernews

Avery Pennarun published a piece this week arguing that each additional layer of review doesn’t slow you down by a fixed increment, it multiplies your existing slowdown. The Hacker News thread drew 295 comments and 510 points, which means the post landed somewhere in the overlap between “this resonates immediately” and “I want to argue about this.” Both reactions make sense, because the claim is counterintuitive until you think through the mechanics of how review actually works.

The intuitive model most engineering managers carry is additive. You have a process that takes some time, you add a review step, that step takes some time, total time goes up by that amount. Manageable. The problem is that this model treats review as computation rather than as a queue, and those two things behave very differently.

Queues, Not Computations

A computation takes time proportional to the work. A queue takes time proportional to utilization. This is Little’s Law, and it applies directly to code review: the average number of items in a queuing system equals the arrival rate times the average time each item spends in the system. As utilization of a reviewer’s attention approaches 100%, wait time approaches infinity asymptotically. You don’t need a reviewer to be overloaded for queue time to dominate; you just need them to be consistently busy.

In practice, most reviewers are consistently busy. They have their own work. Pull requests arrive somewhat randomly throughout the day. A reviewer who could theoretically clear a PR in 20 minutes might still leave it sitting for 4 hours because they’re deep in something else. The 20 minutes of actual review work gets dwarfed by queue wait time.

Now add a second reviewer. That person has their own queue, their own work, their own schedule. The PR has to clear both queues in sequence. The probability that both reviewers have low queue depth at the same moment is the product of each individual probability, not the sum. Two reviewers each with a 50% chance of being available becomes 25%, not a flat “50% plus some overhead.” Three reviewers each with a 50% chance: 12.5%. The wait times multiply.

This is the mechanism behind the 10x claim. It’s not that each reviewer is 10x slower than the previous one. It’s that the compounding of serial queuing stages produces multiplicative delay even when each individual stage looks cheap in isolation.

The DORA Evidence

This isn’t theoretical. The DORA State of DevOps research has tracked software delivery performance across thousands of organizations for over a decade. Elite performers, those in the top quartile, deploy multiple times per day and have lead times for changes under one hour. Low performers deploy monthly or less frequently and have lead times measured in weeks or months.

The difference isn’t raw engineering skill or code quality. Elite performers cluster around practices like trunk-based development, small batch sizes, and automated verification. Low performers cluster around long-lived feature branches and multi-stage approval processes. The DORA researchers are careful about causal claims, but the correlation between heavy review overhead and long lead times is one of the most consistent findings in the dataset.

Lead time for changes, measured from commit to production, includes everything that happens in between: review queues, staging environments, manual testing, change approval boards. Every serial stage is a queue. Every queue multiplies the total.

The Bystander Effect in PR Queues

There’s a second mechanism that compounds the first. When a pull request has multiple required reviewers, each reviewer has less individual responsibility for clearing it. This is diffusion of responsibility, the same dynamic that makes bystanders less likely to help when a crowd is present. Each reviewer assumes someone else is about to look at it. The PR sits longer than it would with a single, clearly accountable reviewer.

Organizations often add required reviewers precisely because previous reviews missed things. The reasoning is sound individually but the system-level effect runs backward. More reviewers reduce per-reviewer accountability, which increases wait times, which increases the cost of the whole process.

The fix that doesn’t break the accountability goal is to make ownership clearer rather than distributed. One primary reviewer, time-bounded, with explicit escalation paths. Not three reviewers with no coordination and no deadline.

What Fast Teams Actually Do

Teams that ship quickly don’t skip verification; they move it earlier in the process and parallelize it with development rather than serializing it after.

Trunk-based development with feature flags is the most common pattern. Changes land on the main branch continuously, hidden behind flags, and get verified in production incrementally rather than accumulated in a branch and reviewed all at once. The review surface for any single commit is small. The queue drain time is proportional to the size of the diff, not the size of the accumulated feature.

Automated testing does the work that human review most struggles with: catching regressions, checking edge cases, verifying behavior at the boundary of specifications. A test suite that runs in under 10 minutes and blocks merge on failure replaces a large fraction of what review catches, without introducing a queue at all. The Google Engineering Practices documentation describes a goal of review turnaround in hours, not days, specifically because the cost of context-switching back to a stalled change is high for the author.

Pair programming is the extreme end of this: review happens continuously, in real time, with zero queue. It has costs, primarily in scheduling coordination and cognitive load, but the lead time effect is real. Changes that get reviewed as they’re written don’t need a separate review stage at all.

The AI Coding Acceleration Problem

There’s a dimension to this that Pennarun’s post gestures toward but that’s worth being direct about. AI coding tools have accelerated the rate at which developers produce diffs. A developer using Claude or Copilot might produce three to five times as many pull requests in a day as before. If review capacity hasn’t scaled proportionally, the review queue has gotten longer. The wait time has gotten worse. The 10x multiplier per review layer is now applied to a higher volume of work.

This means teams that adopted AI coding tools without rethinking their review process may have made their cycle times worse, not better. The throughput of code generation is no longer the bottleneck. The bottleneck is review, and you can’t solve a queue problem by adding more to the queue.

The organizations that benefit from AI coding acceleration are the ones whose review processes were already thin: small team sizes, high trust, automated verification, short-lived branches, fast iteration loops. AI makes their fast loops faster. For organizations with heavy review overhead, AI just fills the queue with more work.

Where the Pushback Lands

The Hacker News comments predictably split between “yes obviously” and “but what about correctness?” The correctness objection is real but it’s not an argument for serial review layers; it’s an argument for better tooling at the point of development. Static analysis, type systems, fuzzing, formal verification in high-stakes contexts: these catch correctness problems without introducing queues. Review catches things that automated tools can’t, but its marginal value per layer decreases while its marginal cost increases.

The other pushback is domain-specific: regulated industries, safety-critical systems, financial infrastructure. These have genuine requirements for multi-party authorization that aren’t just bureaucratic overhead. The argument isn’t that all review is bad; it’s that each layer has to justify its existence against its real cost, including the queue multiplication effect, not just its face value.

Pennarun’s framing is useful because it makes the cost visible. “We just need one more set of eyes” sounds cheap. “We are multiplying our existing cycle time by the wait probability at this reviewer’s queue” sounds expensive, because it is. The math doesn’t change when you describe it differently, but being precise about the model changes the decisions you make.

The question for any given team isn’t whether to do review. It’s whether each layer of review is worth the multiplicative cost it actually imposes, and whether that cost could be reduced by moving the verification earlier, automating it, or reducing the queue depth rather than adding another stage.

Was this interesting?