· 6 min read ·

The Compounding Cost of Review: Why Approval Chains Don't Add Up, They Multiply

Source: hackernews

Avery Pennarun’s recent post makes a claim that reads as hyperbole until you sit with the math: each layer of review doesn’t slow a team down by a fixed amount, it multiplies the existing slowdown. The title says 10x. The actual number depends on your organization, but the direction is always the same.

Pennarun runs Tailscale and before that spent years at Google, so he’s writing from inside two very different review cultures. That vantage point matters. The argument isn’t that review is useless. It’s that review is expensive in ways that don’t show up in the obvious place, and that organizations systematically undercount those costs while overcounting the benefits.

Why the Cost Compounds

The intuitive model of review overhead is additive. You have a change. It takes two days to write. It waits one day for review A. One day for review B. Total: four days. Review added two days, or 50% overhead. That model is wrong, and the error compounds as you add layers.

The correct model comes from queuing theory, specifically Little’s Law: the average number of items in a queuing system equals the arrival rate multiplied by the average time an item spends in the system. What this means in practice is that cycle time isn’t determined by how long work takes, it’s determined by how long work waits. And waiting time is not linear in the number of queues.

Each review stage is a separate queue. Each queue has its own utilization rate. As utilization approaches 100%, wait time approaches infinity, non-linearly. A reviewer who is 80% utilized introduces far more than twice the waiting time of a reviewer who is 40% utilized. The relationship is approximately wait = service_time * utilization / (1 - utilization) for an M/M/1 queue. At 50% utilization you wait one service time. At 90% you wait nine service times. At 95% you wait nineteen.

Most senior engineers who do design review, security review, or architecture review are not lightly loaded. They are the people everyone wants to talk to. Their queues are deep. So a change that requires two of these reviewers doesn’t experience 2x the wait time. It experiences something closer to the product of two heavily-loaded queues, each of which is already near saturation.

The 10x number starts looking conservative.

The Context Restoration Tax

Queuing theory gives you the wait time, but there’s a second cost that doesn’t appear in it: the cost of restoring context.

When a reviewer finally picks up a change, they need to understand it. That takes time proportional to the complexity of the change and inversely proportional to how familiar they are with the surrounding code. For a domain expert reviewing their own team’s code, that cost is low. For a security team reviewing code across dozens of services, or a legal team reviewing API behavior against a data processing agreement, the cost is high and often uncounted.

The author of the change also pays a context restoration tax. Changes that sit in review for three days are changes the author no longer has fully loaded in working memory. When the review comes back with comments, they have to reconstruct the problem they were solving, the tradeoffs they considered, and why they made the choices they did. The longer the review takes, the higher this cost.

Large batches of changes make this worse. Teams that accumulate review debt tend to send larger and larger changes to review, because smaller changes feel inefficient when the per-review transaction cost is high. Larger changes are harder to review, take longer, generate more comments, require more context restoration, and produce larger subsequent changes. This is a positive feedback loop in the bad direction.

The Accountability Diffusion Problem

Organizations add review layers in response to incidents. Something breaks in production. The postmortem identifies a gap in oversight. A new review requirement appears. Over time, these accumulate. The organization feels safer because more eyes are on each change.

What actually happens is more subtle. When multiple people must approve something before it ships, individual accountability for the outcome is diluted across all of them. Diffusion of responsibility is well-documented in social psychology. In software organizations it tends to produce review that is thorough in appearance and shallow in practice: reviewers approve changes they haven’t fully understood, because the social cost of blocking a change is concrete and immediate while the risk of a future incident is abstract and shared.

This means the review theater that accumulates in large organizations doesn’t just add latency. It also provides less safety than advertised, because the incentive structure of multi-approver systems rewards approval over scrutiny.

What High-Trust Teams Do Instead

The alternative isn’t no review. It’s review that is fast, targeted, and accompanied by structural safeguards that don’t live in the review process itself.

Continuous integration gates. Automated tests, linters, type checkers, and fuzz targets catch whole categories of bugs faster and more reliably than human review. A CI pipeline that runs in four minutes and blocks on failure is a review layer that adds four minutes of latency. It doesn’t accumulate queue depth because it scales horizontally. Teams that invest in fast, comprehensive CI can trade human review time for automated verification time at very favorable rates.

Trunk-based development with feature flags. Keeping changes small, merging to trunk frequently, and gating incomplete features behind runtime flags reduces the blast radius of any individual change. When the cost of a bad merge is low, the required rigor of pre-merge review is also lower. Google’s internal practices, described in their software engineering book, push in this direction: small changes, fast review, fast merge, with automated verification doing most of the load-bearing work.

Post-deployment monitoring with fast rollback. A change that can be rolled back in thirty seconds if metrics degrade is a different risk profile than a change that cannot. Investing in observability and deployment infrastructure changes the acceptable risk threshold for pre-deployment review. You can ship with lower confidence in review outcomes if your production feedback loop is fast enough.

Pair programming and design review upstream. The most effective review happens before code is written, not after. A thirty-minute design conversation between two engineers eliminates entire categories of implementation mistakes. Pair programming catches errors in real time, without the queue, without the context restoration, and without the accountability diffusion of asynchronous review. These are more expensive in scheduled time but cheaper in cycle time.

Why Organizations Don’t Fix This

The mismatch between the felt experience of review and the actual cost is hard to close because the costs are distributed and invisible while the benefits are concentrated and visible.

When a reviewer catches a bug, everyone sees it. It’s concrete evidence that the review process worked. When a change takes two extra weeks to ship because three reviewers are backed up, nobody writes that down as a review cost. It shows up as a vague sense that the team is slow, attributed to unclear causes, and addressed with hiring or process improvements that don’t touch the review structure.

Pennarun’s argument, restated in the terms above, is that organizations are systematically miscounting their own costs because the accounting categories don’t capture queue wait time, context restoration tax, or accountability diffusion. They measure bugs caught per review, not cycle time per change, and they optimize accordingly.

The Structural Intervention

The practical implication, if you accept the argument, is that reducing review layers requires changing the structural conditions that create them, not just deciding to do fewer reviews.

That means investing in automated verification until you trust it enough to replace human sign-off on routine changes. It means building deployment infrastructure that makes rollback fast and cheap. It means creating accountability structures that don’t rely on pre-approval, so that post-incident reviews don’t produce new mandatory gatekeepers.

It also means being honest about what review is actually buying. If the real function of a legal or security review layer is to distribute accountability across multiple people so that no one person can be blamed for a bad outcome, that’s a different problem than a technical one, and it has different solutions.

Pennarun’s framing is blunt: every layer of review makes you 10x slower. The math behind that claim is real. The difficulty is that the people adding each layer have local incentives that make the addition look rational, even as the accumulation becomes irrational at the system level. That’s the harder problem, and the one that no amount of CI investment or deployment tooling fully solves on its own.

Was this interesting?