· 7 min read ·

The Org Chart Encoded in Your PR Approval Requirements

Source: hackernews

The argument in Avery Pennarun’s recent post is simple enough to state: every layer of review multiplies your delivery latency by roughly 10x. Two layers means 100x slower. Three layers means 1000x. Most engineers nod at this and move on. The ones who stop to work through the math tend to come away more concerned than they expected.

The multiplier is not a metaphor. It follows directly from queueing theory, specifically the behavior of sequential queues under realistic utilization.

The Queueing Math Behind the Claim

For a single-server queue (the M/M/1 model in queueing theory), average wait time scales with utilization according to the relationship: wait time equals utilization divided by service rate times one minus utilization. At 90% utilization, average wait is nine times the service time. At 95%, it is nineteen times. At 50%, wait and service time are roughly equal.

This is why the phrase “a review only takes two hours” is almost always misleading. A reviewer running at 90% utilization will, on average, return that two-hour review after an 18-hour wait. String two such reviewers together in sequence, and the expected elapsed time compounds. The Kingman formula, which generalizes this to queues with variable arrival and service times, makes the picture worse: variability in either arrival rate or service time multiplies wait time further on top of the utilization effect.

The compound effect of serialized queues is not a consequence of dysfunction or bottlenecks. It is the default behavior of sequential queues under realistic load. Teams that are surprised by how long PRs take are usually measuring service time (how long review takes when someone is actively reviewing) rather than lead time (how long the PR sits from submission to merge). Those two numbers can differ by an order of magnitude.

Little’s Law ties this together: the average number of items in a queue equals throughput multiplied by average time in the system. If your security reviewer processes five PRs per day and each sits in queue for an average of two days, you have ten PRs in flight through that one gate at any given time. Add another gate and you have multiplied the in-flight inventory and the average lead time again.

Why Each New Layer Looks Locally Rational

The insidious part of adding review layers is that each decision is locally justified. A security review requirement gets added after a breach. A staff engineer sign-off gets added after a bad architecture decision ships to production. A compliance check gets added after a legal complaint. Each one makes sense at the moment it is added. None of the people who added them were asked to account for the compound effect on every future change that would pass through that gate.

This is the same dynamic that produces distributed systems with long serialized call chains. Each service team optimizes for their own service’s latency in isolation. Nobody owns the end-to-end p99. By the time the system is in production, you have six or seven serialized network hops, each performing fine in isolation, and a user-visible latency that would embarrass anyone who traced it end to end.

The incentive structure is asymmetric in both cases. The person who adds a review requirement gets credit if that review catches a problem. The latency cost is diffused across hundreds of future PRs, absorbed by the team as a whole, and almost never attributed to the approval requirement that created it. The person who added the gate has no feedback mechanism telling them what their decision cost.

The Org Chart Hidden in Your Review Checklist

Pennarun’s post is ultimately about organizational structure as much as process. Review layers are org chart nodes in disguise. When a PR requires sign-off from a security team, a platform team, a staff engineer in your organization, and a manager, you have four bureaucratic nodes in sequence, each with its own queue depth and its own competing priorities.

This structure encodes a theory of trust, or more precisely, a theory of mistrust. Organizations add review layers when they do not trust the people doing the work to make sound decisions unilaterally. That mistrust can be earned, for genuinely new engineers or genuinely high-stakes changes, but it can also become structural: a culture where sign-off serves as accountability distribution rather than quality control. Approval theater, where the goal is to have someone else’s name on the decision rather than to improve the decision.

The problem with approval theater is that it is indistinguishable from genuine oversight at the moment the layer is added. A manager adding a new approval requirement cannot know whether they are adding necessary oversight or multiplying overhead onto every subsequent change through that path. They are almost never asked to find out.

What Trunk-Based Development Actually Changes

The standard remedy proposed in conversations about review latency is trunk-based development with feature flags. It is worth being specific about why this helps, because “use feature flags” often gets stated as a practice without explaining the mechanism.

Trunk-based development shortens the review unit. Instead of accumulating a week of work into a single PR with broad scope, you review individual commits, each small enough to evaluate in minutes rather than hours. Smaller review units reduce per-review service time, which reduces reviewer utilization for a given throughput, which improves queue dynamics significantly. A reviewer at 70% utilization has dramatically shorter expected wait times than a reviewer at 90% utilization, even if their active review rate is the same.

Feature flags change the risk calculus of each review. A change that sits behind a flag does not need the same level of pre-merge certainty because the cost of a wrong decision is lower. You can disable the behavior in seconds without a revert and redeploy cycle. When rollback is cheap, the trust required to approve a change drops, which means the review can be lighter, which means turnaround is faster.

Neither technique eliminates review; they change its shape. Smaller units, lower stakes per unit, faster rollback paths, and consequently lower utilization per reviewer: these are the operational conditions under which review stops being a throughput bottleneck.

Google’s internal engineering practices, described in the Software Engineering at Google book, reflect this approach: most changes are reviewed by a single engineer, pre-submit testing handles automated correctness checking, and the readability review program (which gates full merge rights for new contributors) is scoped narrowly by language and time-bounded. The system treats review latency as a real cost rather than a free safety net.

The Utilization Trap

The most common attempted fix for slow review is assigning dedicated reviewers. This sounds like it should help. In practice, dedicated reviewers get assigned other responsibilities because organizations try to keep people productively busy. Utilization climbs to match available work. Queue dynamics remain poor.

Fast review requires reviewer slack. Slack is expensive to maintain, because an engineer spending 30% of their time waiting for review requests to arrive looks idle from a utilization standpoint. Organizations that want short review cycles must either accept deliberately under-utilized reviewers, narrow the set of changes that require human review, or invest in automated checks that replace the class of problems a human reviewer was catching. The third option is the only one that does not require either paying for idle time or accepting more risk.

Automated security scanning, linting with enforced standards, type checking, and comprehensive test suites all reduce the surface area that human review needs to cover. The teams with the fastest review cycles tend to be the ones that have invested most heavily in automated correctness tools, not the ones with the most thorough manual review processes.

Measuring What You Are Actually Paying

The DORA State of DevOps research has tracked engineering performance metrics across thousands of teams for nearly a decade. Change lead time, the elapsed time from code commit to production deployment, consistently separates high-performing organizations from low-performing ones. Elite performers measure lead times in hours. Low performers measure them in weeks or months. The gap is not primarily explained by CI/CD pipeline speed; it is explained by the time changes spend waiting for human approval at each stage.

That waiting time is rarely measured directly. Teams measure how long their pipelines take. They do not routinely measure how long PRs sit in queue before a reviewer looks at them, or how long they sit after review comments are posted before the author responds, or how many round trips a typical change requires. Without measuring those intervals, the cost of each review layer remains invisible, which is part of why layers accumulate.

Pennarun’s 10x-per-layer claim is a heuristic calibrated to realistic reviewer utilization, not a precisely measured constant. The actual multiplier depends on utilization, variability, and round-trip frequency. But the directional claim is what matters: review layers compound multiplicatively, not additively. A team trying to understand why their delivery has slowed down should count their mandatory approval steps before looking anywhere else.

Was this interesting?