· 6 min read ·

Why Slow Code Review Keeps Getting Slower

Source: hackernews

The argument from Avery Pennarun’s recent post is grounded in queuing theory: each review layer is an independent queue, reviewer utilization determines wait time non-linearly, and sequential layers compound multiplicatively. The math holds. But the static queuing model treats PR size and revision cycles as fixed inputs. In practice, both respond to review latency, and the feedback mechanisms this creates explain why organizations with slow review tend to keep getting slower.

The PR Size Inflation Loop

When developers learn through experience that reviews take 48 hours or longer, they adapt. Opening a pull request carries overhead: writing a description, linking tickets, getting reviewers assigned, fielding questions about scope. When that overhead is paid once per PR, and when each PR means waiting two days before anything can proceed, the rational adaptation is to batch more work into each change. A week’s worth of changes goes into one PR instead of five.

Larger PRs are harder to review. A reviewer looking at 600 lines of changed code takes longer than one looking at 150 lines, produces more comments because there is more surface area to comment on, and is more likely to surface concerns that require substantive revision. That revision may itself be large. The larger revision takes longer to review. The loop closes.

LinearB’s 2023 engineering benchmarks, drawn from over 2,000 engineering teams, found that teams with review cycle times over 24 hours had a median PR size roughly three to four times larger than teams with cycle times under four hours. This is consistent with adaptive batching behavior. It also means that a team trying to reduce review latency by adding reviewers will see diminishing returns until PR size shrinks, because larger PRs keep per-review work high regardless of how many reviewers are available.

The Context Loss Loop

When a PR sits waiting for review, the author moves to other work. When feedback arrives two or three days later, they reconstruct the relevant context before addressing it. That reconstruction is partial. The code the author reads at revision time is code they wrote while in a mental state they no longer fully have access to.

Gloria Mark’s research at UC Irvine found that it takes an average of 23 minutes to fully re-engage with focused work after an interruption. For software development, returning to a PR after multiple days requires more than re-engagement: it requires rebuilding the design rationale, the constraints that shaped specific decisions, and the connection between code and requirements. That reconstruction takes longer and is less complete than the original.

The practical result is that review comments on delayed PRs get addressed technically but not always thoughtfully. The author makes the change that satisfies the comment without necessarily understanding whether it improves the overall design, because their model of the overall design has decayed. Reviewers who sense this leave more follow-up comments. The revision cycle extends. Each additional round adds more queue wait, more context loss, and more opportunity for the same failure to repeat.

The Merge Conflict Loop

As a PR sits waiting for review, the main branch keeps moving. Other changes land. The longer the wait, the greater the divergence, and divergence creates merge conflicts. Resolving a merge conflict is not purely mechanical. Understanding whether a conflict should favor the incoming change or the change already on main requires understanding the intent of both branches. When a PR has been sitting for days, the author’s recollection of their original intent has degraded, making conflict resolution more error-prone.

Error-prone conflict resolution can introduce new defects. New defects visible in the diff may trigger another round of review, or reopen questions reviewers thought were settled. The worst form of this is rebase churn on active repositories: repeated rebases, each introducing the possibility of subtle merge errors, each requiring re-verification by reviewers who are already context-switching into week-old code.

Some PRs never recover. Wessel et al.’s 2020 study on pull requests in open source projects found that PRs left open more than five days had a 35% abandonment rate. The work was completed; the change was just never merged. On internal teams, social pressure keeps outright abandonment lower, but the equivalent is the PR that gets superseded by a larger refactor, quietly deprioritized when the underlying feature shifts in priority, or merged in degraded form because everyone involved is exhausted by the cycle.

Two Attractors

These mechanisms mean that review speed is not a linear function of reviewer availability. The system has feedback, which means it has dynamics. Organizations with fast review cycles tend to stay there: small PRs are quick to review, quick reviews preserve context, clean merges avoid conflict churn, and short wait times keep PR size small. Organizations with slow review cycles tend to stay there too: large PRs accumulate slow review, slow review leads to context loss and conflict accumulation, each revision adds more queue time, and the wait teaches authors to batch more aggressively.

The DORA State of DevOps research has accumulated over a decade of data showing that engineering organizations cluster at the extremes. Elite performers have lead times under an hour and deploy frequently. Low performers have lead times of weeks to months and deploy rarely. The middle ground is sparsely populated. That bimodal distribution is consistent with a system that has two stable attractors, not a smooth performance spectrum.

What Changes the Attractor

Reducing reviewer headcount will not shift an organization from the slow attractor to the fast one. More reviewers reduce individual utilization, which reduces queue wait, but if PR size and revision cycles remain unchanged, the improvement is bounded. You push down one part of the queue and the behavior feeding the queue continues.

The interventions that actually shift the attractor are structural. Trunk-based development with feature flags decouples “merged” from “deployed,” which allows PRs to be small and frequent without deployment risk. When merging a change carries low blast radius because the feature sits behind a flag, the review cycle can be short and iterative rather than thorough-because-blocking. The Google engineering practices guide explicitly recommends keeping changes as small as possible, for exactly this reason: it is not just about reviewer convenience, it is about keeping the feedback loop tight enough that context survives the cycle.

Automated pre-screening through CI, static analysis, and type systems reduces the defect surface that human reviewers need to cover. A codebase where the linter, type checker, and test suite catch the mechanical issues leaves human reviewers focused on design, logic, and intent. Narrower review scope means faster reviews. Faster reviews mean shorter waits. Shorter waits mean smaller PRs. The loop runs in reverse.

Draft or early-opened PRs allow incremental review before a change is complete. A reviewer who has seen the design sketched in an early draft provides more directed feedback and needs less time to orient to the final version. The total review effort may be similar, but it is distributed across smaller, lower-stakes interactions rather than concentrated in a single high-stakes gate.

What the Original Claim Is Actually Saying

The apenwarr.ca piece makes the 10x multiplier argument from a queuing perspective. The feedback loops described here do not change the order of magnitude, but they do change the diagnosis. A static queuing model suggests the fix is lower utilization, achieved through more reviewers or fewer PRs per reviewer. The dynamic picture suggests that adding reviewers without changing PR behavior is treating the symptom.

Organizations that have escaped the slow attractor have generally done it by changing the structure of how code reaches review, not by optimizing the review itself. Smaller changes, automated pre-filtering, and deployment risk reduction through feature flags all push the system toward the fast attractor by directly attacking the inputs that feed the feedback loops. The 10x cost per review layer is real, but the more persistent cost is the self-reinforcing behavior that makes slow review compound over time in ways no ticket tracker or sprint velocity chart will surface.

Was this interesting?