The Lead Time Tax: How Sequential Approvals Compound Against Engineering Velocity
Source: hackernews
Avery Pennarun’s latest post makes a pointed claim: each sequential review layer doesn’t add linear overhead to your process, it multiplies it. With nearly 500 upvotes and hundreds of comments on Hacker News, the claim has clearly resonated. The intuition is correct. The math behind it is worth understanding precisely, because “review is slow” is easy to say, but “review compounds geometrically with each layer” has specific engineering implications worth tracing.
The Queue Problem
A single reviewer doesn’t block you because they review slowly. They block you because of queuing. Work arrives asynchronously. The reviewer has other work. Your PR lands in a queue. By the time it surfaces, you’ve context-switched twice and the reviewer needs to rebuild their mental model of the change. A single back-and-forth round on a comment doubles the wait.
This is classic M/D/1 queue territory. The Kingman formula for mean wait time in a queue is:
W ≈ (ρ / (1 - ρ)) × (Cv_a² + Cv_s²) / 2 × E[S]
Where ρ is utilization (how busy the reviewer is), Cv_a is the coefficient of variation of arrival times, and Cv_s is the coefficient of variation of service times. The key result: as utilization approaches 1.0, wait time approaches infinity. A reviewer who is 90% occupied imposes roughly 9x the wait compared to a reviewer who is 10% occupied, independent of how fast they actually review.
Chain two reviewers in sequence, each at 80% utilization. Wait for layer one: roughly 4x base service time. Wait for layer two: another 4x. Total: roughly 16x. Three layers at 80% utilization: roughly 64x. The per-layer multiplier isn’t always exactly 10x; it depends heavily on utilization. But the compounding structure is real, and any organization where reviewers are genuinely busy, which is most organizations, will experience supralinear latency growth as layers are added.
What the DORA Research Shows
The Accelerate research by Forsgren, Humble, and Kim operationalized this through four key metrics: deployment frequency, lead time for changes, mean time to restore, and change failure rate. Lead time, defined as the time from code commit to running in production, is exactly where review latency accumulates.
High-performing teams in the 2023 DORA State of DevOps report had lead times under one day. Low performers measured lead times in weeks to months. The difference between those categories isn’t primarily technical capability; it’s process structure. Automated checks run in minutes. Human review gates run in hours to days. The number of required sequential human approvals is one of the strongest predictors of which bucket a team lands in.
The research also found that high performers deploy more frequently and have lower change failure rates. This upends the intuition that more review produces better quality. The mechanism is well-documented: smaller, more frequent changes are easier to review, easier to revert, and surface problems earlier. Serial approval gates push teams toward larger batches, because you want to “make the wait worth it”, and those larger batches are harder to review and riskier to deploy. The review overhead intended to reduce risk ends up increasing it through a batch size effect.
The Ratchet That Only Turns One Way
What Pennarun’s article points to, and what is underappreciated in engineering culture broadly, is the asymmetry in how review requirements accumulate over time.
Review layers get added in response to incidents. Something goes wrong in production, a post-mortem happens, someone proposes that a new class of changes needs sign-off from security, or architecture, or legal. The proposal is reasonable. The risk was real. The layer gets added.
Review layers almost never get removed. There’s no symmetric incident trigger. Nobody holds a post-mortem on “we shipped 40% fewer features this quarter because our approval chain grew by three steps.” That cost is diffuse, invisible in retrospect, and distributed across dozens of slow-moving projects. Organizations optimize for the visible. Over time, approval chains grow monotonically.
Kent Beck identified this pattern in the early XP literature and called it permission-based development. His alternative was trust-based: hire people with good judgment, give them autonomy to use it, fix mistakes quickly rather than preventing them exhaustively. That philosophy went somewhat out of fashion as teams grew larger and compliance requirements became more binding, but the underlying tradeoff it identifies hasn’t changed.
Where the 10x Actually Comes From
Pennarun’s “10x” framing is probably derived from observed latency gaps between submitting an async review request and receiving a first response. A PR submitted at 2pm might not get a first look until the next morning; that’s already 16 hours of elapsed time for a change that might take 20 minutes to review. A round of comments, a fix, and a re-review adds another 24 hours. A two-day elapsed process for a one-hour change is a roughly 16x inflation on the review step alone.
Two layers of that: 256x. The work itself hasn’t changed in complexity; only the serial blocking structure has.
Google’s engineering practices documentation acknowledges this directly. It recommends reviewers respond within one business day and notes that reducing review latency is one of the highest-leverage changes an engineering organization can make. Google’s internal review tool, Critique, is explicitly designed around fast turnaround rather than thoroughness. The reviewer surface area is constrained, approval scope is narrow, and there are strong norms against over-reviewing. The goal is “good enough to merge” within hours, not “exhaustively analyzed” within days.
What Actually Addresses the Root Cause
The empirical answer from DORA and from high-performing organizations is a combination of approaches that reduce serial human gates without simply eliminating quality checks.
Trunk-based development with feature flags means merging to main continuously, using feature flags to decouple deployment from release. A 50-line diff with a flag wrapping the new behavior is low blast radius. A 2,000-line diff produced after a two-week review cycle is high risk even after thorough review, because no reviewer can hold that much context simultaneously.
Automated gates instead of human gates for mechanical checks is the other side of the same coin. Security scanning, static analysis, license compliance, and test coverage can and should run in CI with no human in the loop. Shifting “does this use an approved library?” from a human security reviewer to a dependency policy enforced in CI removes a serial gate without removing the check. The OWASP Dependency-Check project and tools like Snyk or Renovate exist precisely for this reason.
Narrow, well-scoped review requirements address the human gates that remain. If your security team reviews every PR, they are a bottleneck by definition. If your security team has defined a set of approved patterns that don’t require review, and only novel security-relevant changes route to them, the throughput problem changes. This requires security teams to invest upfront in tooling and documentation, which most don’t prioritize because the output is invisible compared to blocking a bad deploy.
The Organizational Problem Behind the Technical One
The technical solutions above are real, but the organizational dynamics that created the review layers don’t disappear unless they’re explicitly addressed. The approval ratchet keeps turning unless post-mortem processes are modified to also consider “what does this new review requirement cost in lead time?” alongside “what did this incident cost?”
Some organizations do this deliberately. Netflix’s culture of high autonomy with strong observability is a formalized version of the tradeoff: move fast, maintain excellent monitoring, fix things quickly when they break. That’s a conscious choice of recovery speed over prevention layers. It works because the observability investment is genuine.
Most organizations don’t make that tradeoff explicitly. They add prevention layers after incidents and don’t invest proportionally in recovery speed or review tooling. Lead time climbs, deployment frequency falls, and the organization wonders why velocity has declined. The answer is generally visible in the deployment pipeline. Count the required sequential human approvals. Estimate the average latency of each. The product of those estimates is a floor on lead time, before a single test has run or a single line of code has been touched by a reviewer.
The uncomfortable implication of Pennarun’s framing is that most engineering organizations are living well below their potential shipping velocity not because their engineers are slow or their systems are complex, but because their approval structures have grown through an asymmetric process that has no natural check on accumulation. The fix is partly technical and partly a matter of making the cost of each review layer as visible as the incident that created it.