· 5 min read ·

AI Coding Tools Are Making the Review Bottleneck Impossible to Ignore

Source: lobsters

The pattern has become common enough to study. Engineering teams adopt AI coding assistants, individual output climbs, and then the release calendar barely moves. More code gets written, the review queue grows, and the time between “done” and “deployed” stays flat or increases. This counterintuitive outcome is exactly what this Debugging Leadership post points toward: most teams were never constrained by how fast engineers write code. The observation itself is not new; AI coding tools have made it viscerally visible by exposing the mismatch at scale.

What DORA Measures and Why It Matters

The DORA program has tracked software delivery performance since 2014. Their four key metrics, codified in Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim, are deployment frequency, lead time for changes, change failure rate, and time to restore service. None of these measure coding speed.

Lead time for changes is the metric closest to writing speed: the elapsed time from a commit to running in production. But that window encompasses code review wait time, CI pipeline duration, environment provisioning, deployment steps, and manual approval gates. In most teams, the actual writing time is a small slice of this total. Review wait time and pipeline latency dominate.

The DORA research consistently finds that elite teams achieve lead times measured in hours to days. What separates them from median teams is not faster typing. It is smaller changes, short-lived branches, automated pipelines, and review cycles that close in under a day.

The Queue Has Physics

There is a mathematical principle worth applying here. Little’s Law from queuing theory states that the average number of items in a queue equals the arrival rate multiplied by the average time each item spends in the system: L = λW. If you increase the arrival rate without increasing the service rate, wait time increases proportionally. This is not a hypothesis; it is a mathematical property of queues.

AI coding tools increase the arrival rate into the review queue. When engineers write more code faster, more pull requests get opened. If review capacity does not scale, each PR waits longer. The delivery calendar gets worse even as individual output metrics improve.

GitClear’s 2024 analysis of GitHub activity across millions of commits found that AI-assisted repositories showed higher PR volumes and increased code churn rates. Merge velocity did not scale proportionally. The review step absorbed the additional output as queue depth rather than as throughput.

The Incentive Mismatch

The reason this persists is structural. Engineers are evaluated on visible output: features shipped, tickets closed, PRs merged. Code review is treated as overhead, something layered on top of primary delivery responsibilities. Reviewers carry their queue alongside their own commitments, which means review quality and cycle time degrade as PR volume grows.

In manufacturing, Eliyahu Goldratt’s Theory of Constraints would identify the review step as the system constraint and direct optimization effort there. The rule is to subordinate everything else to the constraint, because optimizing a non-bottleneck step increases work in progress without increasing throughput.

Software teams routinely invert this. They optimize before the constraint, writing code faster, and watch WIP accumulate downstream. Higher WIP means larger diffs, more context switching for reviewers, more opportunities for merge conflicts, and increasing coordination overhead. The productivity gains from AI tools dissolve into this overhead.

What Elite Teams Do Differently

The DORA elite cohort shares a specific set of practices, and they are mostly about reducing change size and review cycle time rather than writing speed.

Trunk-based development keeps branches short-lived, often under a day. Diffs stay small and reviewable, merge conflicts are rare, and integration overhead stays low. The DORA research consistently identifies this as one of the strongest predictors of delivery performance.

Pair and mob programming eliminate the async review step for complex changes. The review happens inline during writing. This requires coordination but removes the queue entirely for work that benefits from it.

Automated quality gates handle the mechanical checks: linting, type errors, test coverage thresholds, security scanning. These reduce reviewer cognitive load to what humans are actually better at, specifically design review, logic verification, and domain correctness.

Feature flags decouple deployment from release. When a change can be deployed dark and toggled on independently, the deployment risk that inflates review scrutiny goes down. Reviewers spend less time on rollback scenarios and more time on correctness.

None of these practices require engineers to write code faster. The throughput improvement comes from redesigning the system around the constraint, not from accelerating the step before it.

AI Tools as a Forcing Function

The more useful framing for AI coding tools is as a forcing function. For teams that had already optimized review cycles and deployment pipelines, they provide real leverage: more changes get written and flow through a system built to handle them. For teams that had not done that work, AI tools accelerate input into a constrained system and make the constraint impossible to rationalize away.

There is an argument that this is net positive even when delivery does not immediately improve. Teams that could not see their review queue as the primary constraint now have data. The PR backlog is larger and harder to ignore. Lead time numbers are worse in ways that are attributable to a specific step. This creates organizational pressure to fix the right problem rather than the visible one.

Whether teams respond to that pressure productively is a different question, and largely an organizational one. Teams that have never measured lead time by stage will need to instrument their pipelines before they can even identify where time goes. Teams where code review is treated as a favor rather than a shared responsibility will need to restructure who owns it and how.

Where to Start

If a team is shipping slower than it wants to, the diagnostic question is: where does time actually go between a commit and a deployment? For most teams, the breakdown looks something like this: writing takes a few hours, review takes one to three days, the pipeline takes fifteen to thirty minutes, and deployment waits another day because of process gates or release schedules.

The intervention that moves the lead time number is almost never “write code faster.” It is reducing review cycle time, automating more of the pipeline, shrinking change sizes, or eliminating manual gates that exist for historical reasons rather than current risk.

These are harder problems than installing a coding assistant. They require changing how teams work together, not just how individuals produce output. The Debugging Leadership post is right that this is a bigger problem, and the reason it is bigger is that the solution requires coordination across a team rather than a single engineer updating their toolchain.

That coordination cost is also why the payoff is large. Removing the constraint from code review multiplies every engineer’s output, not just the ones who already write quickly.

Was this interesting?