Reading Before Writing: What Anthropic's Internal Claude Data Reveals
Source: martinfowler
The most interesting part of Anthropic’s internal report on AI-assisted development is not the headline productivity number. It is the breakdown of what developers are doing with the tool.
Most usage at Anthropic goes toward debugging and understanding existing code. Writing new features comes second, though that category has been growing. The reported 59% figure, meaning Claude accounts for roughly 59% of the code produced internally, and the 50% productivity increase, tend to dominate when the report gets summarized. Those numbers only make sense if you first understand what the underlying work looks like.
The Existing Code Problem
Software development in any organization beyond the early startup phase is primarily an archaeology exercise. The codebase accumulated over years of decisions, experiments, and migrations outnumbers any given sprint’s new code by orders of magnitude. A developer picking up a bug report spends more time understanding what the code was supposed to do and what path led to the failure than writing the fix. This is true whether the codebase is 50,000 lines or 5 million.
What this means for AI tools is that the most common daily challenge is not “generate this function for me.” It is “explain what this function does,” “trace where this variable is modified,” “why does this test fail when I change this config.” Those tasks suit language models better than most people initially assumed, because they do not require the AI to be correct in the same way code generation does. If Claude explains a function slightly imprecisely, the developer corrects their mental model from context. If Claude generates buggy code, you might ship it.
Anthropics’s own developers apparently discovered this organically. The debugging and comprehension use case dominated early on, before the new-feature use case caught up and began growing as a share of overall usage.
Comparing the Productivity Claims
The 50% productivity increase is a large number, and it invites scrutiny. GitHub’s Copilot research from 2022 found a 55% reduction in task completion time for isolated coding exercises, which generated both enthusiasm and substantial methodological criticism. The core objection was straightforward: completing a self-contained algorithm task faster in a controlled study says little about the distributed, communication-heavy, context-switching reality of a development team.
Anthropics’s number comes from a different setting. It is self-reported productivity from developers at the company that builds the model, working on a complex, evolving codebase. That introduces its own biases. These are among the earliest and most skilled users of the tool; they have institutional reasons to report positively; and they work in an environment where the model has been tuned extensively against software development tasks. Treating this figure as a ceiling on what highly skilled teams might achieve seems reasonable. Treating it as representative of what a median engineering organization will see reads too much into the data.
Google’s internal measurements around the same period reported productivity improvements in the 25 to 34% range across broader developer populations, a more conservative but still substantial figure. The variance between studies reflects genuine differences in task type, developer skill level, codebase maturity, and how strictly productivity is defined and measured.
What 59% Code Generation Means
When Anthropic reports that Claude produces 59% of the code written internally, the figure needs some unpacking. “Code written” measures characters appearing in editors, not code that ships or code that is accepted without modification. Language models produce code that developers then read, revise, sometimes discard, and occasionally accept wholesale.
A more useful frame is that Claude handles a large share of the keystroke-level work while the developer retains all the structural decisions: what to build, how to decompose the problem, what the tests should verify, what the interface should look like. Claude fills in the implementation of decisions already made. This is a meaningful shift in where a developer’s time goes, but it is a different claim from the model replacing developer judgment.
The growing use of Claude for implementing new features suggests developers are becoming more comfortable with this dynamic over time. Early adoption concentrated on the safer direction, asking for explanations and debugging help, where the cost of an incorrect answer is low. As trust in the model’s outputs builds, more developers are moving toward the riskier direction of generating code that will run in production.
The Self-Reference Loop
There is a recursive element to this report worth keeping concrete. Anthropic is using Claude to build Claude. The model is being applied to the codebase that produces future versions of itself, which means any productivity gains from AI-assisted development compound directly into the development of better AI tools. The feedback loop between model capability and the tooling used to improve model capability is tighter at Anthropic than at most organizations.
This also means Anthropic’s developers have both the strongest institutional familiarity with the model’s failure modes and the clearest view of its improving capabilities across releases. Their reported increase in using the model for new feature implementation over time may reflect genuine model improvements as much as developer comfort. Separating those two variables from the outside is not straightforward.
What the Debugging-First Pattern Suggests
The dominance of debugging and comprehension in Anthropic’s usage data offers a practical starting point for teams thinking about where to introduce AI coding tools. Framing adoption around code generation triggers the most anxiety about correctness and reliability. Starting with comprehension tasks lets developers calibrate the model’s usefulness in lower-stakes contexts before extending trust to code that ships.
Reading code is an underrated productivity bottleneck. Onboarding new developers, investigating production incidents, reviewing unfamiliar services before adding a dependency, understanding what a deprecated API was supposed to do before migrating off it: all of these are common and expensive in engineering time. A tool that reliably helps with comprehension has immediate value before you ever let it write a line of production code.
The Martin Fowler post covering this report frames the data simply, but the underlying pattern it points to carries more weight than the summary conveys. AI tools are seeing their heaviest use on the parts of development that were hardest to automate because they required understanding, not because they required typing. That is a more significant shift than most early AI-coding narratives suggested, and Anthropic’s internal data, retrospective and self-reported as it is, gives that claim more grounding than it previously had.