The Queue Behind Every Review: Why Approval Chains Cost More Than They Look

There is a post making the rounds from apenwarr.ca titled “Every layer of review makes you 10x slower,” and the HN comments split predictably between vigorous agreement and lists of counterexamples. That debate tends to collapse into opinions about whether review is good or bad, which misses the more interesting argument: the cost is not in the reviewing, it is in the queueing.

Understanding why requires a brief detour through operations research.

The Queue You Are Not Seeing

When most developers think about the cost of a code review, they think about the time spent reading, commenting, and responding. That is the visible work. What is less visible is the wait time before any of that starts.

Little’s Law, a foundational result in queuing theory, states that the average number of items in a system equals the arrival rate multiplied by the average time each item spends in the system. Applied to software delivery: cycle time scales with work in progress. But the relationship between reviewer utilization and queue length is not linear. For a basic M/M/1 queue, a reasonable rough model for a single reviewer handling a stream of requests, the expected queue wait time is:

W_queue = ρ / (μ(1 - ρ))

where ρ is utilization (arrival rate divided by service rate) and μ is the service rate. As utilization approaches 1, wait time approaches infinity. In intuitive terms: at 80% reviewer utilization, the expected queue wait is four times the actual review time. At 90% utilization, it is nine times. Reviewers do not need to be overloaded for wait times to become severe; they only need to be reasonably busy.

Don Reinertsen’s Principles of Product Development Flow applied this framework rigorously to product development in 2009. He showed that at 80% capacity utilization, average queue size is four times what it is at 50% utilization. At 95%, queue sizes reach 19 times the 50% baseline. Engineers who are nominally productive are simultaneously imposing severe delays on anything waiting for their attention.

Why Layers Compound

A single review step with a moderately busy reviewer might add a day of wait time to a change that took two hours to write. Irritating, but not catastrophic in isolation. The problem with multiple sequential approval layers is that each one is an independent queue, and their delays combine multiplicatively.

Consider three sequential approval steps. If each operates at 80% utilization and each individual review takes 30 minutes, each step adds roughly two hours of queue wait before the review even begins. Three steps add six hours of queue wait alone, before counting any time for revision cycles, context-switch overhead, or the possibility of being routed back to step one after step three surfaces a concern.

The DORA State of DevOps research has collected empirical data on this for over a decade across tens of thousands of teams. Elite performers have lead times under an hour from commit to production. Low performers have lead times of one to six months. One of the strongest predictors of which bucket a team lands in is whether they use external approval chains. The 2022 DORA report found that teams using Change Advisory Boards, which require sign-off from people outside the delivery team, had deployment frequency roughly 3.5 times lower and lead times roughly five times longer than teams using lightweight peer review within the team.

The most striking part of that finding: CABs did not produce lower change failure rates. Teams with external approval gates were slower without being more reliable. The DORA report described them as a form of security theater, which is a pointed conclusion from a large-scale empirical study.

The Context Switch Multiplier

Queue wait is only part of the cost. When a PR sits waiting for review, the author moves on to something else. That is the rational response to an idle queue position. But returning to address review feedback after a day or more away carries a real cognitive reload cost. Research on developer context switching suggests recovery times ranging from 10 to 20 minutes for minor interruptions, but returning to substantive work after several days away involves reconstructing mental state that may take considerably longer.

Multiplied across a team and across many in-flight changes, a significant fraction of every sprint goes to this overhead, and none of it shows up in a ticket tracker or sprint velocity chart. It is invisible until someone starts tracking lead time end to end.

The Abandonment Effect

There is a cost that rarely gets measured: work that starts but never finishes because the review process outlasted the author’s or team’s patience.

Wessel et al. (2020) studied pull requests in active open-source projects and found that PRs left open more than five days had a 35% abandonment rate. The code was written; the change was just never merged. On internal teams, outright abandonment is less common because social pressure is stronger, but delay increases revision churn, and churn increases the probability that a PR will be superseded by someone else’s work or quietly deprioritized when the underlying priority shifts.

LinearB’s engineering benchmarks from 2022 and 2023, drawn from over 2,000 engineering teams, show a median PR pickup time of 13.5 hours and a median total cycle time of 3.5 days. The top quartile of teams achieves pickup under one hour and total cycle under four hours. That is roughly a 10x difference in cycle time between good teams and median teams, driven primarily by review latency, not by coding speed.

What Review Provides in Practice

None of this argues against review. Bacchelli and Bird’s 2013 study at Microsoft, drawing on surveys of 873 developers and analysis of over 200 code reviews, found that developers expected review to catch defects but its most common actual outcomes were knowledge transfer and team awareness. Both are legitimate goals, but they suggest that a substantial portion of review overhead is being justified as defect prevention while delivering outcomes that are fundamentally about communication and team coherence.

That distinction matters for how review processes should be optimized. If the primary value is knowledge sharing and catching obvious mistakes, the process should be optimized for low latency and fast iteration, not for thoroughness at the expense of days. Zanaty et al. (2018) found empirical support for this: faster review cycles with fewer back-and-forth iterations produced better defect escape rates than slow, exhaustive ones.

There is also a saturation effect. Baum et al. (2016) found that as review load exceeded roughly 10 PRs per engineer per week, review thoroughness dropped by approximately 50%. Reviewers under high load start rubber-stamping rather than reading carefully. Adding required reviewers to compensate tends to make it worse, since GitHub’s Octoverse data shows PRs with more than two required reviewers take on average twice as long to merge.

The Structural Argument

Code review is the most-discussed case, but the pattern generalizes to every approval step: design reviews, architecture reviews, legal sign-off, security clearance, manager approval. Each is a queue. Each has utilization dynamics. Each layer compounds with every other layer in sequence.

The Reinertsen framework offers a useful lens: the question is not whether a review gate provides value, but whether that value exceeds the full systemic cost of the queue it creates. Not just the reviewer’s time, but the idle time of everything waiting upstream, the context-switch overhead, and the revision cycles triggered by the gate.

Most organizations treat approval steps as low-cost checkpoints because the only visible cost is the reviewer’s time. Queue wait, context-switching tax, rework overhead, and abandonment risk are all invisible unless someone is tracking cycle time end to end. When organizations do look at those numbers, they tend to be worse than intuition suggested.

The 10x claim from the apenwarr.ca piece sounds like a round number chosen for rhetorical effect. Given queue dynamics at normal reviewer utilization, compounding across multiple sequential layers, context-switching costs, and revision overhead, it is closer to a lower bound for organizations with deeply layered approval processes than to an exaggeration.