· 6 min read ·

The Four-Day PR Cycle With Twenty Minutes of Coding

Source: lobsters

The Debugging Leadership piece that circulated on Lobsters recently opens with a number worth sitting with: across teams surveyed by LinearB, the average PR spends about 4.4 days from open to merge, with the active coding portion totaling roughly 20 minutes. That works out to coding accounting for less than one percent of total cycle time in the average case.

This single data point reframes the entire discussion about AI coding tools. GitHub Copilot can meaningfully reduce that 20-minute coding phase. Cursor’s repository-aware context helps developers write code that fits an existing codebase faster. These are genuine improvements to a phase that accounts for roughly 0.3 percent of total cycle time in the LinearB dataset. The rest of the cycle sits in queues: waiting for a first review, waiting for comments to be addressed, waiting for CI, waiting for someone to merge.

The Math of Sequential Review Queues

The article introduces Little’s Law in its simplified form: Cycle Time = Work In Progress / Throughput. This captures the intuition behind WIP limits but undersells how sharply latency compounds when reviewers are arranged in sequence.

The M/M/1 queuing model is more precise about what happens at individual review stages. Average wait time in queue is:

W_q = ρ / (μ(1 − ρ))

where ρ is utilization (arrival rate divided by service rate) and μ is the service rate. At 50% reviewer utilization, wait time equals service time. At 80%, it quadruples. At 90%, it multiplies by nine. At 95%, by nineteen. Most reviewers at established organizations are not operating anywhere near 50% utilization.

The relationship between sequential reviewers is multiplicative, not additive. Two reviewers each at 80% utilization in sequence produce wait times of 4x × 4x = 16x the baseline, not 8x. Three reviewers at the same utilization: 64x. Four: 256x. Donald Reinertsen’s analysis of product development flow reaches the same conclusion through empirical study: high utilization of shared resources is the dominant source of latency in modern development pipelines, and code review is a shared resource by definition.

This is why CODEOWNERS files, added one team at a time after various incidents, function as lead-time multipliers over time. Each additional required approval introduces another queuing stage with its own utilization rate, and the wait times compound rather than sum.

Where Brooks’ Law Reappears

Fred Brooks argued in No Silver Bullet (1986) that the difficulty of software development does not live in translating requirements into syntax. It lives in understanding requirements, coordinating across teams, and managing change in live systems. His Mythical Man Month formalized the counterintuitive result that adding developers to a late project makes it later: more developers increases integration overhead and the load on shared coordination stages faster than it increases raw output.

AI coding tools reproduce this dynamic without adding headcount. Each increase in per-developer output raises the arrival rate to the review queue while leaving review capacity unchanged. Swarmia’s research on PR size finds that pull requests with more than 400 lines of changes take roughly three times longer to merge than those under 50 lines. If AI tools help developers write larger changes faster, they compound the size effect on review time in addition to the volume effect.

The DORA research program has tracked software delivery performance across thousands of organizations since 2014, and its most consistent finding is that elite performers maintain short batch sizes and fast review cycles while low performers accumulate large changes and long approval chains. The reported difference in lead time between elite and low performers is not explained by elite teams writing code faster; it is explained by how their delivery pipelines are structured. Elite teams deploy on demand with lead times in hours. Low-performing teams deploy monthly with lead times measured in months.

The Diagnostic Value of Faster Writing

There is one thing AI coding tools have done that is genuinely useful, though probably not by design: they have made the actual constraint visible.

When code writing was slow, organizations could attribute slow shipping to slow coding. A feature took three days to implement, so of course it took a week to ship. The review queue, CI pipeline, and deployment window were all obscured by the coding phase in front of them. Now that code generation is substantially faster, the 4.4-day PR cycle is exposed as a distinct problem rather than background noise. Teams that adopt generation tools and find their delivery frequency unchanged are receiving accurate information about their systems: the constraint was somewhere else.

This is the useful version of the DORA finding about observability. You cannot improve what you cannot see. AI coding tools, by eliminating writing time as a credible explanation for slow delivery, function as a crude diagnostic. The bottleneck becomes visible precisely because the step before it stops being an excuse.

Where AI Tooling Does Reach the Queue

Most AI coding tools do not address review latency directly; the exceptions are worth noting specifically.

GitHub’s Copilot PR summary feature and tools like CodeRabbit apply AI to the review stage rather than the writing stage. They remain advisory, requiring human approval, but PR summaries reduce the time a reviewer spends understanding what a change does, which reduces per-review service time, which reduces wait time at the queue. The improvement is smaller than the tools’ marketing suggests, but it is aimed at something real.

Repository-aware context tools that help developers write code matching existing patterns reduce the frequency of pattern-mismatch review comments. Research on code review practices at Google, documented in Software Engineering at Google, found that a substantial portion of review activity involves style, naming, and minor refactoring rather than catching functional bugs. Reducing that category of comment shortens the review cycle indirectly.

The more tractable improvements remain organizational. Swarmia’s data on PR size and merge time is directly actionable: keeping pull requests under 50 lines reduces review time by roughly 3x compared to PRs over 400 lines. This requires no new tooling. It requires discipline about how work is structured before writing starts, which is a different kind of practice entirely.

The Actual Levers

Explicit review SLAs matter more than most teams acknowledge. When there is no shared expectation about when a review should happen, it happens whenever a developer finds a free moment between their assigned work, which compounds the queue utilization problem. Google’s internal research found review latency correlated more strongly with developer satisfaction and throughput than almost any other single factor they measured. Their internal expectation was that reviews should happen within 24 hours, not because that is a special number but because having any explicit expectation forces review to be treated as a first-class task rather than background work.

Mik Kersten’s measurement of flow efficiency in enterprise software organizations, published in Project to Product (2018), found that below 15% of total lead time was spent on active work. The rest was waiting. DORA’s research shows the same structural pattern across a much larger sample: teams that optimize lead time do so by attacking review latency, deployment friction, and batch size, and teams that optimize for approval gates end up with both longer lead times and higher failure rates because large batches create integration complexity that small batches avoid.

The article’s conclusion holds. Writing code faster gets you to the queue sooner, and it makes the queue visible as the problem it was already. The constraint was almost always downstream of the editor, in the handoffs and approval chains that code passes through after writing finishes. AI coding tools have made that more legible than years of process audits and retrospectives managed to.

Was this interesting?