The Inner Loop Is Not the Constraint

The article circulating on Lobsters this week makes an argument worth sitting with: if AI tools have made you faster at writing code, and your team is still slow at shipping, you probably identified the wrong problem. The original post puts it plainly: faster coding without addressing what surrounds it is not an improvement to delivery speed. It is an improvement to queue-building speed.

This deserves more than a nod. The reasoning connects to a body of evidence that software teams consistently ignore, and the math behind why ignoring it is self-defeating is worth spelling out.

Where Time Actually Goes

Multiple independent surveys converge on roughly the same picture. A McKinsey 2021 developer velocity survey found that developers spend around 35% of their time on work that directly produces code: writing, debugging, running local tests. The other 65% goes to meetings, requirements clarification, reading existing code to understand it, waiting on CI pipelines, handling review comments, and navigating deployment processes. A 2018 Stripe survey put time spent on technical debt and maintenance at 42% of the working week.

These numbers differ somewhat depending on methodology and sample, but the direction they point is consistent: code authorship is a minority share of the total delivery pipeline. GitHub’s own research on Copilot found developers completing isolated coding tasks 55% faster. That is a real number, but context matters enormously here. If coding represents 35% of total delivery time and tools compress it by 55%, the reduction in overall lead time is roughly 0.55 × 0.35, about 19%. The rest of the pipeline is entirely unchanged.

That 19% assumes the coding step was the constraint. If it wasn’t, the actual improvement to delivery speed is zero.

The Theory of Constraints Applied Directly

Eliyahu Goldratt’s Theory of Constraints, developed in manufacturing and described at length in his novel The Goal, states that every system has exactly one binding constraint, and optimizing anything that is not that constraint does not improve system throughput. It only builds inventory. In software, inventory is work in progress: open pull requests, tickets in review, features built but not yet deployed.

If review is the constraint, writing code faster makes the review queue longer. Reviewers face more open PRs in the same amount of time. Cognitive load increases. Quality degrades. The SmartBear code review research found that defect detection drops sharply above around 500 lines reviewed per hour. Reviewers reading fast are mostly not catching the bugs review is supposed to catch; they are producing the feeling of oversight while the actual substance evaporates.

A team adopting AI coding tools without addressing review throughput may end up worse than before, because the PR queue grows faster and reviewer attention per PR shrinks.

What the DORA Data Shows

The DORA research program, now in its tenth year and maintained by Google, tracks software delivery performance across thousands of teams. Their metrics are not about coding speed: they measure lead time for changes (commit to production), deployment frequency, change failure rate, and mean time to restore service.

The 2023 and 2024 State of DevOps reports show that elite performers deploy multiple times per day with lead times under one hour. Low performers deploy monthly or less with lead times measured in weeks to months. The gap between these groups is not explained by how fast developers write code. It is explained by CI/CD pipeline maturity, automated test coverage, deployment automation, batch sizes, and cultural norms around review and ownership.

From the Accelerate research by Nicole Forsgren, Jez Humble, and Gene Kim, the finding that has held up consistently is that short lead times and high deployment frequency are not achieved by trading off quality. They are correlated with lower change failure rates. Teams shipping fastest are also the most stable. The mechanism is feedback speed: short cycles surface defects quickly, limit the scope of any single failure, and keep engineers from losing mental context across long-lived branches.

Trunk-based development is one of the most consistent differentiators in the DORA data. Committing to a shared branch multiple times daily, or using branches shorter-lived than a day, enforces small batches and keeps review latency structurally low. You cannot build a review queue when changes are small and frequent.

Queuing Theory Makes This Precise

Little’s Law states that average queue length equals arrival rate times average wait time. If AI tools double the rate at which PRs arrive without changing reviewer throughput, open PR count doubles and wait time increases proportionally.

The utilization cliff is sharper than intuition suggests. In an M/M/1 queue model, a reviewer at 80% utilization produces wait times four times longer than the service time. At 90% utilization the multiplier is nine. At 95%, it is nineteen. These numbers multiply across sequential reviewers: two reviewers at 80% utilization each produce not 4x + 4x but 4x times 4x, sixteen times the service time. Three sequential reviewers at 80% produce 64x. Donald Reinertsen covers this in Principles of Product Development Flow: high utilization of shared resources is the dominant source of latency in any development process.

No one designs review chains to be sequential bottlenecks. They accumulate. Each gate was added in response to a past incident; each one is individually defensible; none are systematically removed when the original justification no longer applies. The incentive structure strongly favors addition and offers no mechanism for subtraction.

The Open Source Asymmetry

The problem extends outside team boundaries. Before large language models, the cost of submitting a patch was roughly proportional to the cost of reviewing one. A contributor who understood the code well enough to write a useful fix could usually produce something reviewable without much wasted effort on either side. That symmetry has broken.

A developer with no prior exposure to a project can now generate a syntactically plausible, test-passing patch in minutes. A maintainer reviewing it must still do all the deep contextual work: verifying that the change fits architectural conventions, checking edge cases the tests don’t cover, understanding whether the approach is consistent with ongoing design decisions. The contributor’s cost approached zero. The reviewer’s cost did not move.

At scale, if an open source project’s PR intake doubles because code generation became cheap and maintainer bandwidth stays fixed, the project does not ship twice as much. It ships less, because maintainers spend more time evaluating contributions that don’t pan out. Research on SWE-bench has found that many technically passing patches would not be merged in practice, because passing tests and being mergeable are different conditions.

Where the Leverage Actually Lives

The SPACE framework, published in 2021 by researchers at GitHub and Microsoft, explicitly identifies activity metrics (commits, lines added, PR count) as the most misleading dimension of developer productivity when taken in isolation. The dimensions that predict delivery outcomes are efficiency and flow in the context of communication and collaboration.

Empirically, the changes that improve delivery lead time are: reducing PR size so review is tractable, reducing review latency through explicit SLAs or clear ownership models, investing in CI pipelines that complete under ten minutes, deploying feature flags to decouple deployment from release, and building observability sufficient to catch problems quickly in production. Faster code authorship sits somewhere below all of these.

Building Discord bots over the past couple of years, I have been the only developer, so the review queue problem is absent by construction. But even in solo work, the non-coding work consistently dominates: understanding library behavior, debugging async event ordering, waiting on API responses during local testing, figuring out why a rate limiter is hitting in unexpected places. The coding step is fast. Everything adjacent to it is where time goes.

The AI coding tools available today are genuinely useful for the inner loop, and they deliver on the benchmarks they’re measured against. The argument here is not that they underperform their claims. The argument is that those claims measure the wrong portion of the pipeline for most teams. If you can identify your actual constraint and that constraint is code authorship speed, the tools help directly. If the constraint is anywhere else in the delivery pipeline, faster code generation feeds a queue that was already full.