There’s a version of the AI coding productivity story that goes like this: developers spend most of their time writing code, AI tools make writing code faster, therefore AI tools make developers more productive. The conclusion sounds reasonable until you look at where development time actually goes.
Andrew Murphy’s article on Hacker News cuts straight to it: if you thought code writing speed was your bottleneck, you were already misdiagnosing the problem. With 303 points and over 200 comments, it clearly struck a nerve. The argument isn’t new, but the timing is pointed. Teams are spending real money on Copilot seats, Cursor subscriptions, and Claude Code licenses while their PR queues stretch to three-day turnarounds and their deploy pipelines require four approvals from three time zones.
What the Metrics Actually Show
The DORA (DevOps Research and Assessment) program has been measuring software delivery performance for over a decade. The four key metrics are deployment frequency, lead time for changes, change failure rate, and time to restore service. Of these, lead time for changes — measured from the moment a commit is made to when it’s running in production — is the most diagnostic for understanding where delay accumulates.
High-performing teams, per the 2024 State of DevOps Report, achieve lead times measured in hours. Low performers measure in weeks to months. The gap isn’t explained by typing speed. It’s explained by how long code sits waiting: waiting for review, waiting for CI to finish, waiting for a deployment window, waiting for sign-off from someone in a meeting.
Research consistently shows that active coding time accounts for a minority of total lead time in most organizations. A developer might spend two hours writing a feature, then watch that feature spend forty-eight hours in a PR queue. AI tools compress the two hours. They do nothing about the forty-eight.
The Perverse Effect on Review Queues
Faster code generation doesn’t just leave the review bottleneck intact, it pressurizes it. When a developer can scaffold a feature in an afternoon that would have taken two days, they open more PRs per week. Those PRs need reviewers. If review capacity hasn’t scaled, the queue gets longer, not shorter.
This is a throughput problem in the classic queuing theory sense. Little’s Law states that the average number of items in a system equals the average arrival rate multiplied by the average time spent in the system. If you increase the arrival rate of PRs without increasing review throughput, wait times grow. AI tools that increase coding output without addressing review bandwidth are, in the worst case, making shipping slower.
I’ve seen this pattern with Discord bot development. Writing the code for a new slash command or a guild event handler is fast, sometimes embarrassingly fast with modern tooling. The delays come from testing against live Discord API rate limits, verifying behavior in actual guild contexts, coordinating with server admins who own the test environments. None of those delays compress when I write the initial implementation faster.
The Actual Bottlenecks
If coding speed isn’t the constraint, what is? The honest answer varies by organization, but a few candidates appear repeatedly.
Review latency is the most common culprit in knowledge-work software teams. PRs sit because reviewers are context-switching, because the codebase is large and review is cognitively expensive, or because ownership is diffuse and nobody feels clearly responsible. Tools like GitHub’s code review assignment and explicit reviewer rotation help, but they’re organizational interventions, not technical ones.
Deployment confidence is the second. Teams that deploy infrequently do so because deployment is expensive: it requires coordination, it risks instability, it demands monitoring attention. This is a function of test suite quality, observability infrastructure, and rollback capability, not how fast the code was written. Feature flags and trunk-based development address deployment confidence directly. They require investment in infrastructure and cultural change.
Organizational trust and approval chains are the third. In many companies, deploying to production requires sign-off from people who aren’t engineers. This is sometimes regulatory necessity, sometimes accumulated bureaucracy. Either way, it doesn’t respond to faster code generation. The approval chain has its own latency, and that latency is measured in business hours and meeting schedules.
PR size compounds all of the above. Large PRs are slower to review, more likely to have conflicts, harder to reason about, and more frightening to deploy. The solution is smaller, more frequent commits, which is again a workflow and culture question, not a tooling question.
What High-Performing Teams Actually Do
Teams that consistently ship fast tend to have a few things in common that have nothing to do with how quickly they write code.
They deploy continuously, often dozens of times per day, using automated pipelines that run a comprehensive test suite and gate deployment on passing. The Google engineering practices documentation describes a culture where code is reviewed quickly, PRs are kept small, and the expectation is that code moves from commit to production in hours, not days.
They invest heavily in test infrastructure. Not test coverage as a metric to hit, but test suites that developers actually trust, that run fast enough to not break flow, and that fail clearly when something is wrong. A test suite that takes forty minutes and produces flaky results doesn’t provide deployment confidence. It provides the appearance of process.
They reduce the cost of reverting changes. If rolling back a bad deploy is a five-minute operation, the fear around each deploy drops substantially. Blue-green deployments, canary releases, and solid observability make reversion cheap. When reversion is cheap, teams are willing to ship more often.
They keep review cycles short through structural means: mandatory response SLAs for reviews, pairing sessions instead of async review for complex changes, and explicit ownership so that “someone will review it” doesn’t become “nobody owns reviewing it.”
The Diagnostic Value of Faster Coding
There’s one way AI coding tools are genuinely useful beyond raw output speed: they make the real constraints visible and undeniable. When code writing was slow, it was easy for organizations to attribute slow shipping to slow coding. The code took three days to write; of course it took a week to ship. Now that the code takes half a day, the three-day PR queue is exposed as the actual problem.
In this sense, AI tools function as a diagnostic. Teams that adopt them and see no improvement in delivery frequency are learning something important: their constraint was never where they thought it was. That’s valuable information, even if it’s uncomfortable. The investment in tooling becomes an investment in organizational clarity.
The teams that will extract the most from faster coding are those that already have the rest of the pipeline working well, short review cycles, automated deployment, feature flags, small PRs. For them, faster code generation genuinely moves the needle because it removes one of the few remaining delays. For teams still running week-long review cycles and quarterly deploy windows, the marginal value of Copilot is close to zero on shipping velocity.
Where the Leverage Actually Is
If you want to ship faster, the interventions with the most leverage are roughly ordered like this: reduce PR size, reduce review latency, automate testing and deployment, add feature flags for safer deploys, and reduce approval chains. Faster code writing is somewhere below all of these.
This isn’t an argument against AI coding tools. They have real value for code quality, for exploring unfamiliar APIs, for catching issues early, and for reducing the cognitive load of mechanical tasks. But the pitch that they’ll dramatically accelerate team shipping velocity is only true if coding time is actually the bottleneck, and in most organizations it isn’t.
The Accelerate book by Nicole Forsgren, Jez Humble, and Gene Kim covered this ground in 2018 with rigorous data. The conditions that predict high software delivery performance are cultural and architectural, not tooling-dependent. Psychological safety, loose coupling between services, continuous integration, and deployment automation are the predictors. Those haven’t changed because LLMs got good at writing Python.
If your team is slow, the honest diagnostic is to time each stage of your delivery pipeline and find where changes actually wait. The answer is almost certainly not in the editor.