Code Review Was Designed for Strangers

Avery Pennarun made the case this week that each layer of code review doesn’t add latency linearly, it multiplies it. The post hit 500 points on Hacker News almost immediately, which suggests it landed on something people already believed but hadn’t seen stated cleanly. The math behind the claim is real. But the more interesting question is why organizations built review systems that work this way in the first place.

The answer, I think, is that most teams imported a trust model designed for strangers.

Where the PR model actually came from

The pull request as a workflow artifact was popularized by GitHub around 2008, and it matched perfectly the problem GitHub was solving: how do you let an unknown contributor, with no established reputation, submit a change to a codebase maintained by someone who has never met them. The economics of that situation are specific. You don’t know this person. You have no idea if they understand the architecture, follow the conventions, or will still be around when their change causes a regression. The cost of a bad merge is high and visible. The cost of rejecting a patch is low and invisible. Under those conditions, conservative review makes sense.

The Linux kernel workflow, often cited as the gold standard for distributed open-source development, is worth examining carefully here. Linus Torvalds does not review most patches. What looks like a large funnel of review is actually a delegation tree: subsystem maintainers review patches for their domain, and Linus pulls from subsystem trees that have already been filtered. Each layer has genuine domain ownership, not just a bureaucratic checkpoint. The structure scales because each reviewer has a narrow, well-defined scope and real accountability for it.

GitHub’s PR workflow generalized this pattern and made it accessible to everyone. That’s a genuine contribution to how software gets built. But when internal teams adopted it, they didn’t bring along the underlying logic. They brought the mechanics: branch, review, approve, merge. The context that made those mechanics sensible, namely that reviewers are strangers, didn’t transfer.

The trust economics don’t match

In an internal team, the contributors are not strangers. They went through a hiring process. They’ve already merged dozens or hundreds of changes. The codebase has automated tests. There’s a staging environment. A bad change can be reverted in minutes, not months. The asymmetry that justified conservative open-source review, where bad merges are expensive and rejection is cheap, is almost exactly reversed.

When you apply high-friction review to a low-risk, high-trust environment, you get the worst of both worlds: the latency of conservative gatekeeping with none of the quality benefits that conservative gatekeeping provides in its original context.

This is the underlying mechanism behind Pennarun’s 10x claim. The basic queuing math, specifically Little’s Law and the M/M/1 queue model, shows that when a resource approaches 100% utilization, wait times grow non-linearly. A reviewer handling five PRs per day who receives five new PRs per day is at 100% utilization in theory; in practice, with any variance in arrival rate or review complexity, the queue backs up fast. Add a second required reviewer and you’ve doubled the number of queues a PR must pass through sequentially. The Kingman formula for mean waiting time in a queue scales with both utilization and variability. Add both, and you’re not looking at 2x overhead per reviewer, you’re looking at compounding.

DORA research has been documenting the outcome side of this for years. Elite-performing engineering organizations have lead times from commit to production of under one hour. Low performers measure lead times in weeks or months. The gap isn’t primarily about talent; it’s about how much work-in-progress accumulates in queues. Code review is one of the largest sources of that accumulation.

What Google does instead

Google’s internal code review practices are publicly documented and they look different from the default GitHub PR workflow in a few specific ways. OWNERS files establish fine-grained, per-directory ownership, so the right reviewer gets the request and has genuine context on the code being changed. Reviewers are expected to respond quickly, often same-day. The culture around LGTM is that it means something, not just a checkbox. Critically, Google invests heavily in automated testing and presubmit checks so that reviewers spend their attention on things tests can’t catch, like design coherence and maintainability, rather than on things tests can catch.

This is a different model than the one most teams actually run. Most teams have PR review as the primary quality gate, with tests as a formality and reviewers expected to catch bugs as well as provide architectural feedback. That conflation is expensive. The SmartBear study found that reviewers find defects at a meaningful rate but that the marginal contribution of each additional reviewer drops sharply. A second reviewer might catch 20% of what the first missed. A third reviewer catches less still. The bureaucratic cost of the additional approval layer doesn’t reflect the marginal quality improvement.

The ratchet problem

Review requirements are politically stable in one direction only. Adding a new approval requirement is a visible, accountable action that makes the person who added it look like they care about quality. Removing a requirement is a visible, accountable risk. If something goes wrong after you removed a review gate, you own the outcome. The Diffusion of Responsibility effect cuts both ways: more reviewers means lower individual accountability for catching problems, but removing reviewers feels like accepting individual accountability for any future problems.

This is why organizations don’t converge to the optimal number of review gates over time. They ratchet toward the maximum number the team can tolerate before shipping grinds to a halt.

What trust infrastructure looks like instead

The teams that ship quickly without accumulating technical debt aren’t skipping review. They’ve built the conditions where review can be fast and targeted. That means comprehensive automated testing so reviewers can focus on what automation misses. It means trunk-based development with feature flags so that incomplete changes don’t need to wait behind review queues to be safe to merge. It means observability good enough that a bad production change surfaces immediately and can be reverted in minutes. It means OWNERS structures that route reviews to people with actual domain context, not just whoever is available.

The Accelerate research by Nicole Forsgren, Jez Humble, and Gene Kim documented that these practices correlate with both higher delivery performance and higher software stability. The tradeoff between speed and quality that review theater promises to manage is mostly false; the teams shipping fastest also have the fewest incidents.

Pennarun’s 10x-per-layer framing is a useful way to make the queuing math visceral. But the problem it’s describing isn’t that review is bad. It’s that most teams built review workflows optimized for a trust environment that doesn’t match their actual situation, and then kept adding layers because each individual addition looked safe. Unwinding that requires building the underlying trust infrastructure, not just removing checkboxes.