The Queue Tax: Why Approval Chains Multiply Rather Than Add

Avery Pennarun posted a short argument that each layer of review makes you 10x slower. The instinct is to push back on the number. Ten times? For one extra reviewer? But once you work through the mechanics, the figure stops seeming like hyperbole and starts looking like it might be conservative.

The core mistake most teams make is treating review latency as additive. If one reviewer takes two days on average, two sequential reviewers take four days. That framing misses almost everything interesting about how review actually behaves.

Review is a queue, and queues behave non-linearly

The relevant model here is the M/M/1 queue from queuing theory. In a simple single-server queue, average wait time is not proportional to arrival rate — it’s proportional to utilization divided by one minus utilization. When a reviewer’s available capacity is at 50% utilization, wait times are manageable. At 80%, wait times roughly double. At 90%, they quadruple. At 95%, they go to eight times the baseline. The relationship is a curve that bends sharply upward, not a straight line.

Most engineers who are good enough to be in a mandatory review chain are also busy. They have their own code to write, their own bugs to fix, their own meetings. A reviewer who spends 20% of their time reviewing and operates at 80% total capacity creates very different queue dynamics than one who has dedicated review time. The Accelerate research from DORA repeatedly finds that review wait time is one of the dominant contributors to lead time for changes, and that lead time is one of the four key metrics separating elite engineering organizations from average ones. Elite teams ship changes with lead times under an hour. Low performers measure lead times in weeks or months. The difference is almost entirely in wait states, not in actual coding time.

Add a second mandatory reviewer and you’ve doubled the number of queues your change must pass through in series. Even if both reviewers have identical utilization curves, the compounding effect on total wait time is multiplicative, not additive. If each reviewer adds an expected 1.5 days at their utilization level, two reviewers don’t add 3 days — they add 3 days plus the probability that the first reviewer’s feedback requires changes, which triggers a re-review cycle with the second reviewer after the changes land.

The rework multiplier

That rework cycle is where the math gets genuinely ugly. Suppose each review layer has a 30% probability of requesting at least one round of changes before approving. A single layer means you face a 30% chance of one extra cycle. Two layers means a 51% chance of at least one rework cycle across either reviewer. Three layers puts you at 66%. By the time you have four or five approval stages — which is not unusual in larger organizations with security review, architecture review, product sign-off, and manager approval — the probability of sailing through without at least one rework round drops below 20%.

Each rework cycle isn’t just the time to make the changes. It’s the time to make the changes, re-request review, wait in queue again (with the same utilization dynamics as before), have the reviewer re-read the diff to verify their previous requests were addressed, and potentially discover new issues uncovered by the first round of changes. Rework cycles often take as long as the initial review, sometimes longer because the context is more fragmented.

Context loss is not a soft cost

The queuing and rework math alone would get you to 10x. But there’s another factor that doesn’t show up in utilization curves: context loss over time. The Accelerate book and subsequent DORA research are explicit that long lead times are not just a productivity problem — they’re a quality problem. The longer a change sits in review, the more the codebase evolves underneath it. Merge conflicts accumulate. The author’s mental model of the surrounding code drifts from the actual state of the branch they’re about to merge into.

When a change finally gets approved after two weeks of review cycles, the author often has to do a non-trivial rebase, re-run tests, re-verify assumptions that were valid two weeks ago but may not be now. That rebase might itself surface issues that require another review round. The branch age problem is self-reinforcing: slow review makes branches live longer, which makes merging harder, which increases the probability of problems being found late, which triggers more review rounds.

Google’s internal code review practices, which they’ve written about publicly, address this directly. Their guidance emphasizes small, focused changes reviewed in under 24 hours. Not because large changes are inherently bad, but because the queue dynamics for large, slow-moving changes are dramatically worse. A 50-line change reviewed in two hours has almost no branch age problem and minimal rework risk. A 500-line change sitting in review for a week has compounding problems on every dimension.

The bystander effect in review queues

There’s also an organizational psychology problem that the queuing model doesn’t fully capture. When multiple people are assigned to review the same change, each person’s sense of urgency drops. This is the diffusion of responsibility effect, the same mechanism that makes bystanders less likely to help in emergencies when more witnesses are present. If three people are on a review, each one rationally assumes one of the others will get to it first, that their contribution might be redundant, that their time is better spent elsewhere for now. The result is that adding reviewers often increases wait time rather than decreasing it, the opposite of the intended effect.

This partially explains why the 10x figure per layer isn’t crazy. Each layer doesn’t just add its own wait time. It adds wait time inflated by bystander dynamics, multiplied by rework probability, compounded by context drift, and stacked on top of all the previous layers’ contributions.

What actually reduces risk instead

The reason review layers proliferate isn’t malice. It’s that they genuinely do catch problems, and each individual decision to add a review stage is locally rational even if the global effect is harmful. Security review catches security bugs. Architecture review catches design problems. The error in reasoning is treating review as the only mechanism for quality, rather than one mechanism among several.

The alternative toolkit is well-understood at this point. Trunk-based development with feature flags decouples deployment from release, which reduces the perceived stakes of each change and makes it easier to revert quickly if something goes wrong. Automated test coverage narrows the surface area that humans need to review manually. Pair programming replaces asynchronous review with synchronous collaboration, eliminating queue wait time entirely at the cost of requiring both people’s time at once. The ship/show/ask model described by Rouan Wilsenach categorizes changes by their risk profile and routes them to the appropriate level of scrutiny, rather than applying the same heavyweight process to everything.

None of these eliminate human judgment from the process. They concentrate that judgment where it’s most valuable and minimize the time changes spend in queue.

Approval as organizational cover

The harder problem to solve is that many review layers exist not primarily to catch bugs but to distribute blame. If a change passes through five approvals before breaking production, responsibility is diffuse. If one person ships it directly, responsibility is clear. Multi-stage approval processes offer psychological safety to the organization at the cost of velocity.

This is worth naming directly because it means the solution isn’t purely technical. You can instrument everything, show the queue depth data, demonstrate the compounding latency with Little’s Law, and still not change the review process if the underlying incentive is blame distribution rather than quality. The engineering argument is easy to make. The organizational argument is harder.

Pennarun’s 10x figure will strike some people as provocative. The mechanisms behind it are not.