The Bystander Effect in Your PR Queue

Avery Pennarun’s recent post on review layers has generated the usual Hacker News debate: is 10x realistic, does this apply to regulated industries, what about security-critical code. The throughput argument is well-grounded in queueing theory, and the case for how sequential review stages compound latency is correct. There is, however, a second failure mode in multi-layer review that gets less attention, one that is simultaneously less obvious and better documented by social psychologists than by software teams: adding reviewers reduces the quality of each individual review. The code review research has documented this in specific, quantifiable terms.

The Ringelmann Effect

In 1913, French agricultural engineer Maximilien Ringelmann published research on the productivity of groups pulling ropes. Groups did not scale linearly. A group of eight people did not pull eight times harder than a single person; they pulled with roughly four times the force, about 49% of per-person efficiency. Each additional person added to the group reduced the per-person contribution.

Ringelmann attributed this partly to coordination loss, but later researchers confirmed the psychological dimension. Bibb Latané, Kipling Williams, and Stephen Harkins replicated and extended this work in 1979, showing that even in controlled conditions where coordination loss was eliminated, individual effort dropped as group size increased. The mechanism is motivational: when multiple people share responsibility for an outcome, each person’s sense of personal accountability diminishes proportionally. This pattern holds consistently across cultures and task types, and it applies whether the group is pulling ropes or reviewing pull requests.

The Bystander Effect at Review Time

Latané and John Darley’s foundational bystander intervention research from 1968 established a closely related dynamic: the more witnesses are present at an emergency, the less likely any individual is to intervene. The probability that at least one person helps increases with group size, but the probability that any specific person helps drops sharply. People in groups are worse responders than people alone, because the presence of others creates ambiguity about who bears responsibility for acting.

Code review does not involve emergencies, but the responsibility structure is isomorphic. When a PR requires approval from three teams, each approver operates in an environment where two other teams are also reviewing the change. The implicit reasoning, rarely conscious but structurally present, is that if the other reviewers approve it, the change must be fine, and that if there is a real problem, someone else will catch it before the merge. The individual reviewer’s felt accountability for catching a defect is lower because the responsibility is shared across more people.

This produces the wrong incentive structure for a process whose entire purpose is defect detection.

What the Code Review Research Shows

SmartBear’s analysis of code review practices at Cisco examined over 2,500 code reviews across a development organization and found that review effectiveness dropped with reviewer count. One to two reviewers produced the highest defect detection rates per reviewer hour. Above three reviewers, defects found per person declined while total review time and cycle time both increased.

These findings align with the broader literature on formal software inspection methods. Research on inspection processes has consistently found that adding inspectors beyond a certain threshold produces diminishing defect detection while increasing the cost of each review. The mechanism is consistent with diffusion of responsibility: when more people share accountability for catching defects, each person’s individual engagement decreases.

The organization that adds a third required reviewer to “ensure quality” may end up catching fewer defects than if it had kept the review to one or two people with clear ownership, while simultaneously slowing the delivery cycle. The same decision that was supposed to buy more quality buys less of it at a higher price.

What CODEOWNERS Encodes

GitHub’s CODEOWNERS mechanism is a direct tooling expression of the multi-reviewer model. A CODEOWNERS file specifies which teams must approve changes to which paths. Over time, these files accumulate entries: security teams added after incidents, platform teams added after stability issues, compliance functions added after audits.

The result is changes that require sign-off from four or five distinct teams. Each reviewer looks at the diff through the lens of their own concern; security looks for vulnerabilities, platform looks for stability risks, architecture looks for design conformance. No single reviewer sees the full picture or takes responsibility for the overall quality of the change. The format of the approval process explicitly divides responsibility by domain, which means no domain owns the whole.

When something ships with a defect that crossed domain boundaries, the diffusion of responsibility in the CODEOWNERS file becomes diffusion of blame in the postmortem. The security reviewer approved it because the security-relevant sections looked fine. The platform reviewer approved it because the infrastructure changes looked fine. The defect was in the interaction between those concerns, which no single reviewer was chartered to examine.

The Accountability Distribution Motive

The accumulation of review layers is often motivated less by quality improvement than by accountability distribution. After a production incident, an organization faces pressure to ensure this class of problem gets caught before it ships again. A mandatory review gate by the relevant team is a response to that pressure. It is also, functionally, a mechanism for spreading responsibility for future failures across more people so that no individual or team bears the full accountability weight.

The incentive structure genuinely rewards gate creation. The team that adds a mandatory security review can point to that review when the next security incident occurs, regardless of whether that review process would have caught the specific defect. The team that removes a review gate takes on the risk that the next incident will be attributed to its removal. The asymmetry makes accumulation rational at the individual and team level even when it is harmful at the system level.

This is not cynical speculation about individual motives. It is a straightforward observation about how the incentives are structured. Nobody in the organization has a job that rewards them for removing a review layer. Several people have jobs that reward them for adding one.

What Good Review Looks Like

The code review conditions that produce high defect detection rates have a consistent profile across the research: one or two reviewers with domain authority over the changed code, change sets below 400 lines, review sessions under 90 minutes, and turnaround within a day. These conditions produce engaged reviewers who feel personally responsible for the quality of what they approve. The ownership is unambiguous, which makes the accountability real rather than diffuse.

Organizations that have moved from multi-layer approval toward concentrated, clear ownership typically pair this shift with expanded automated pre-checks. Type checking, linting, security scanning, and test coverage requirements handle the mechanical questions that human reviewers were previously approving by checkbox. Human review shrinks to the surface area where genuine judgment is required, and the reviewer who is covering that surface area knows they are the last line before it ships.

Reducing review layer count does not reduce accountability. Accountability is concentrated rather than distributed, which produces more of it in practice even as the headcount of approvers drops.

The Double Indictment

The throughput argument against review layers is well understood in engineering productivity circles: each sequential queue compounds latency, and four approval stages can turn a two-hour change into a two-week cycle. The quality argument runs in the same direction but gets made far less often. More reviewers means slower delivery and, past a small threshold, worse defect detection per reviewer. Both failure modes stem from the same structural problem: review is being used to distribute accountability rather than concentrate expertise.

Pennarun’s framing puts the throughput cost front and center because it is the most legible. The quality cost is harder to measure because you cannot easily count the defects that reviewers missed due to diffusion of responsibility. What the code review research does show, consistently, is that review effectiveness per person scales inversely with reviewer count. The organization adding a fourth required approver is not buying twice as much protection as the one with two approvers; it is buying marginal additional coverage at substantially more cost, with each approver individually less engaged than either of the two would have been.

The review layer question is usually framed as a velocity tradeoff: how much slowdown is acceptable in exchange for quality assurance. The research suggests the exchange is less favorable than that framing implies. Past a small number of reviewers, additional approval layers buy marginal quality improvement while imposing substantial latency costs. Concentrated review with clear ownership produces faster cycles and more effective defect detection, because it aligns individual accountability with the outcome the review is supposed to deliver. The DORA research covering tens of thousands of engineering organizations has not found a population of teams that achieved high quality through high review burden. It has found teams that achieved high quality through short feedback loops, small changes, and clear ownership, which are the conditions that make individual accountability legible rather than diffuse.