· 6 min read ·

The Bottleneck Was Never the Typing

Source: hackernews

There is a reasonable argument that most software teams have spent years optimizing the wrong thing. The introduction of AI coding assistants has made this more visible than any process audit or retrospective ever could, because now the code arrives faster than the rest of the system can absorb it.

Andrew Murphy’s piece landed with 303 points on Hacker News for a reason. It names something practitioners have sensed for a while but rarely stated plainly: if you thought typing speed was your throughput problem, you were measuring the wrong thing.

Where the Time Actually Goes

Studies of developer time distribution consistently show that writing new code is a minority activity. A 2021 analysis by GitClear found that developers spend substantial time on code review, debugging, and navigating existing codebases. Microsoft Research’s developer productivity studies published in the SPACE framework found that activity metrics like lines of code or commits are poor proxies for actual productivity. The real drag is elsewhere.

The DORA State of DevOps research, which has tracked engineering team performance across thousands of organizations for over a decade, identified four key metrics that distinguish high-performing teams: deployment frequency, lead time for changes, change failure rate, and mean time to recovery. None of these metrics are about how fast someone can write a function. They are about how smoothly code moves through the system after it is written.

High-performing teams in the DORA data deploy multiple times per day with lead times under an hour. Low performers deploy monthly with lead times measured in months. The difference is not raw coding speed. It is everything that happens after the code exists.

Constraints Shift When You Remove One

Eliyahu Goldratt’s Theory of Constraints, described in The Goal and later applied to software in The Phoenix Project, makes a prediction: fixing any bottleneck reveals the next one. The system does not suddenly flow freely. The queue backs up somewhere else.

For most software teams over the past thirty years, the constraint was a combination of requirements clarity, developer capacity, and implementation speed. These three things competed for attention, and it was genuinely hard to say which mattered most in any given sprint. The ambiguity let teams feel productive even when output was slow, because there was always something plausibly blocking progress.

AI code generation removes implementation speed from that conversation almost entirely. A capable developer working with a modern AI assistant can produce working implementations in a fraction of the time that would have been required two years ago. What this does is ruthlessly expose whatever the next constraint is. For most teams, it turns out there are several.

The Review Queue

Pull request review has been a known bottleneck for years, but it was tolerable when code arrived slowly. When a developer opened one or two PRs per week, a 48-hour review cycle was annoying but manageable. When that same developer can now open five or eight PRs in the same period, the 48-hour cycle becomes a hard ceiling on throughput.

The review queue does not scale linearly with code volume. Reviewers are not faster because there is more to review. They may actually be slower, because larger code surfaces require more cognitive load per change. The GitHub Engineering blog has written about this in the context of their own internal practices, noting that keeping PRs small is a discipline that compounds over time.

There is a secondary problem: AI-generated code often looks correct and compiles cleanly while carrying subtle architectural assumptions that are wrong for the specific codebase. Reviewers who might have caught a logic error in hand-written code because they recognized the author’s reasoning patterns now face code that has no fingerprints. It requires more careful reading, not less.

Deployment Infrastructure

Deployment pipelines are the second place the constraint surfaces. Many teams have CI/CD pipelines that were designed for a certain volume of changes per day. They run test suites that take 20 minutes, require manual approval gates, and notify on-call engineers for any production push. These pipelines made sense when changes arrived infrequently. They become a chokepoint when code arrives faster.

The DORA research is specific here: teams that want to improve deployment frequency need to invest in test automation, trunk-based development, and deployment automation in that order. The test suite has to be fast and trustworthy enough that engineers do not feel compelled to add manual gates. That investment requires time and organizational will that many teams have deferred, because the old pace of development made it survivable.

Accelerating code generation without fixing the pipeline just means the code sits in staging longer.

Requirements Clarity and Coordination Overhead

The third constraint, and the one hardest to fix with tooling, is the clarity of what needs to be built. Writing code fast is only useful if the code is solving the right problem. In most organizations, the gap between what is asked for and what is needed is substantial.

This gap was partially obscured by slow implementation. When a feature took two weeks to build, there were natural checkpoints: design reviews, mid-sprint demos, stakeholder check-ins. The time itself created feedback opportunities. When that same feature can be prototyped in an afternoon, those checkpoints do not automatically appear. The requirement has to be correct before work starts, not corrected iteratively over two weeks.

Product management practices that were borderline adequate for slower delivery become inadequate at higher velocity. The specification needs to be more precise. The acceptance criteria need to be more specific. The ambiguity that was previously resolved during implementation now needs to be resolved before it.

What This Looks Like in Practice

I have seen this play out concretely in bot development, where the iteration cycle is naturally fast. When I started building Discord bots, the bottleneck was genuinely implementation: figuring out the Discord API, handling gateway events correctly, managing state in a stateless environment. Those problems took time.

With better tooling and accumulated knowledge, the implementation time collapsed. What surfaced underneath was requirements drift. The bot’s behavior needed to be specified precisely before building it, because iterating on deployed behavior that users have already learned is costly in a different way than iterating on undeployed code. The bottleneck moved from implementation to specification, and specification is a human problem.

The same dynamic applies at larger scale. When a team’s code output doubles because everyone is using AI assistants, the product backlog, the review queue, and the deployment pipeline all reveal their true capacity. Teams that invested in those areas find the transition smoother. Teams that did not invested in them find they have traded one bottleneck for three.

The Underlying Measurement Problem

Part of why this catches teams off-guard is that lines of code, story points closed, and PRs opened are all visible and easy to track. Review latency, deployment lead time, and requirement churn are harder to instrument and easier to wave away as process problems rather than engineering problems.

The SPACE framework from Microsoft Research, published in ACM Queue, proposes measuring developer productivity across five dimensions: satisfaction, performance, activity, communication, and efficiency. Activity, the dimension that covers things like code commits and PRs opened, is explicitly the dimension that becomes misleading when taken in isolation. A team that doubles its PR volume while doubling its review latency has not improved.

The teams that are navigating AI-assisted development well are the ones that started measuring lead time and deployment frequency before the tools changed. They knew where their constraints were. The teams that were measuring story points and commit counts are discovering their constraints now, the hard way.

The code is not the problem. It never was.

Was this interesting?