· 7 min read ·

The Approval Ratchet: Why Review Requirements Only Ever Grow

Source: hackernews

Avery Pennarun’s recent post makes a claim that most engineers feel in their bones but rarely quantify: each layer of review in a software process doesn’t add to your cycle time, it multiplies it. The title says 10x per layer. That number comes directly from queuing theory, and once you see the math, the organizational pattern that follows is hard to ignore.

The queue you don’t think about

A review step is a queue. You submit a pull request, and it sits in a waiting state until someone picks it up. The reviewer is a server in the queueing sense: they have a service rate, and your PR is waiting for capacity.

For a basic single-server queue with random arrival and service patterns (the M/M/1 model used widely in operations research), the expected total time a job spends in the system is:

W = 1 / (μ - λ)

Where μ is the reviewer’s service rate (reviews per day they can complete) and λ is the arrival rate (PRs per day coming in). The ratio ρ = λ/μ is utilization: the fraction of the reviewer’s capacity being consumed.

Rearranged in terms of utilization:

W = (1/μ) × 1/(1 - ρ)

The first factor is the actual review work time. The second factor is the queue multiplier. At 50% utilization, wait time is 2x the work time. At 80%, it’s 5x. At 90%, it’s 10x. At 95%, it’s 20x. The curve is nonlinear and it becomes steep fast.

Senior engineers are almost always the reviewers that matter, because junior engineers can’t approve the things that require judgment. Senior engineers are also almost always heavily loaded. It is not unusual for a senior engineer on a medium-sized team to be running at 85-90% utilization across their combined review, meeting, and implementation work. Put them at 90% utilization as a reviewer and every PR they touch spends 10x longer in queue than the review itself takes.

Compounding across layers

This is where the “10x per layer” framing earns its precision. If your process requires two sequential review steps, and each reviewer is at 90% utilization, the multipliers compose: 10x from step one, then another 10x from step two. The result is 100x slower, not 20x.

Three review layers at 90% utilization: 1000x. That is not a dramatic rhetorical figure. That is the formula applied to realistic team conditions.

Little’s Law, proven by John D.C. Little in 1961, gives a complementary view. For any stable queue:

L = λ × W

Average number of items in the system equals arrival rate times average time in system. If a team is merging 10 PRs per day and average end-to-end cycle time is 5 days (counting review wait), the team has 50 PRs in flight at any given moment. That number matters beyond just the calendar time. Fifty concurrent PRs means 50 sets of merge conflicts to resolve, 50 contexts to reload when finally addressing review feedback, and 50 opportunities for changes to invalidate each other.

What the DORA research found

The DORA research program, which has tracked software delivery performance across thousands of teams annually since 2014, consistently finds that lead time for changes is the metric that separates elite teams from low performers most sharply. Elite teams measure lead time in hours. Low-performing teams measure it in weeks or months.

Lead time is dominated by wait time in approval queues, not by implementation time. The code itself rarely takes that long to write. It is the journey through mandatory review stages that expands calendar time.

The DORA research identifies “streamlining change approval” as one of the technical capabilities most strongly correlated with software delivery performance. Specifically: replacing sequential manual approval gates with peer review plus automated testing is associated with substantially better performance on all four core DORA metrics. Teams that made this shift did not experience more production incidents. Their change failure rates were comparable or lower, because faster feedback loops caught problems sooner and revert was cheaper.

Why review requirements accumulate

If review layers are this expensive, why do they persist and grow? The mechanism is a ratchet, and it operates on a simple asymmetry between how benefits and costs are perceived.

Every review requirement has a founding incident. The DBA review was added after a production migration corrupted data. The security review was added after a vulnerability was found in prod. The API compatibility review was added after a breaking change took down a partner integration. In each case, the benefits of the new review step are vivid, specific, and tied to a real event that people remember.

The costs are statistical and invisible. When the proposal to add a DBA review is discussed, no one runs the queuing numbers. The question on the table is whether DBA review catches schema mistakes, and the answer is yes. The question not on the table is what happens to average PR cycle time when you insert a queue at this stage backed by a reviewer who is already at 80% capacity. Even if someone raised that question, the answer would require data the team doesn’t have: actual reviewer utilization, actual PR arrival rate, actual service time per review.

So review requirements accumulate after incidents and almost never get removed. Removal requires someone to argue that the review is not worth its cost, which means arguing against a visible safety mechanism, which is politically difficult. Addition requires only that someone argue the review would have caught the recent incident, which is nearly always true.

The ratchet turns one notch at a time, each click individually defensible, each click adding to a compounding multiplier on team throughput.

What high-performing teams do instead

The teams with fast cycle times don’t skip quality controls. They move the control mechanism earlier in the process and replace human queue-points with things that run at machine speed.

Automated gates have zero queue wait time. Static analysis, type checking, test coverage thresholds, dependency audits, schema migration validators, and security scanners all run in CI. They don’t have utilization; they run immediately. The review step they replace had a human reviewer at 85% utilization, which in queuing terms means 7x overhead on the review work time. The automated replacement eliminates that overhead entirely.

Pair programming is worth considering in this framing. A pair produces code that has already been reviewed, in real time, by someone who understands the context. There is no review queue because the reviewer was present during implementation. Laurie Williams’ research on pair programming found that pairs are roughly 15% slower at producing code but generate substantially fewer defects. When you factor in the cost of bugs found after merge, the defect reduction more than compensates for the speed difference. And it entirely removes the queue.

Trunk-based development with feature flags changes the risk calculus for review. When you can toggle a feature off instantly in production, the consequence of a bad merge is recoverable in minutes. Teams operating this way can afford lighter pre-merge review because the blast radius of any individual change is bounded. The heavy review requirement exists partly because merging to main feels irreversible; feature flags reduce that irreversibility directly.

Post-merge review is used by some teams for low-risk paths: the change goes to production, monitoring and automated rollback act as the safety net, and a reviewer looks at the code afterward. The reviewer still catches problems; they just don’t block the merge. This is not appropriate for all domains, but for product software with good observability, it removes the queue entirely for the code paths where the cost of a bug is recoverable.

The domains where this doesn’t apply

Medical device firmware, aviation flight control software, financial settlement systems, and cryptographic infrastructure have formal review requirements because the cost of a single error is catastrophic and largely unrecoverable. Audit trails, segregation of duties, and multi-party sign-off are not optional in those contexts. The queuing penalty is worth it.

The mistake is treating regulated, safety-critical domains as the template for all software development. Most product software does not have those properties. A bad config change at a consumer app is recoverable in minutes with a revert; a certification error in a pacemaker is not comparable. Organizations that apply safety-critical review norms to product software are paying safety-critical throughput costs for product-software risk levels.

Making the cost visible

The practical first step is measurement. Most teams know intuitively that PRs “take a while,” but they don’t track median and p95 cycle time broken out by stage: time in implementation, time waiting for review, time in review, time waiting for second review, time waiting for merge. Without that breakdown, it is impossible to locate where the queue is forming or to have a productive conversation about whether a specific review stage is earning its cost.

The DORA framework exists to make this visible. Lead time for changes, measured end-to-end from commit to deploy, aggregates all the queue penalties into a single number. Once a team can see that 70% of their lead time is spent in review wait rather than review work, the conversation about whether each stage is worth its queuing tax becomes grounded in data rather than competing intuitions about risk.

Pennarun’s 10x framing is the directional claim that gets the conversation started. The queuing model behind it is what gives it teeth.

Was this interesting?