· 5 min read ·

AI Tools Are Productive for Contributors and Expensive for Maintainers

Source: martinfowler

Looking back at Martin Fowler’s December 4, 2025 Fragments post, he highlighted a Carnegie Mellon study on AI’s effects on open-source software projects with characteristic caution: the findings shouldn’t be taken as definitive, but they’re worth noting as a data point. The key finding, he wrote, was that AI code “probably reduced” something. The phrasing trails off, but the surrounding research context makes it clear what territory we’re in.

This is a retrospective look at that moment, now a few months on. The CMU study is one point in a cluster of research that’s been accumulating since AI coding tools hit mainstream adoption. What makes it worth dwelling on is the open-source context specifically. The dynamics of AI-generated code in a proprietary codebase and in an open-source project are meaningfully different, and most of the public conversation about AI code quality conflates them.

The Social Contract of Open Source Review

In a company codebase, AI tools and their outputs exist within a shared social and architectural context. The same people who write the code review it; institutional knowledge about system boundaries, naming conventions, and acceptable patterns is held collectively, even if imperfectly. When AI-generated code drifts from those conventions, someone notices, because the reviewer carries the same context the author was supposed to be working from.

Open source breaks that assumption. The distance between a contributor’s mental model of a project and a maintainer’s has always been significant, and bridging it has always been the core challenge of the maintainer role. AI tools don’t create this gap. They widen it in a specific way: they lower the barrier to producing code that passes automated checks while doing nothing to bridge the gap in architectural understanding.

A contributor using an AI coding assistant can produce a working implementation of a feature much faster than before. The implementation will compile, pass the test suite, and handle the documented API surface. What it won’t necessarily do is fit the project’s internal vocabulary, use the right abstraction level, or extend the right component rather than duplicating an existing one. Maintainers have always had to evaluate these things. What changes with high-volume AI-assisted contributions is the rate at which they need to do it.

What the Data Shows

The GitClear 2024 analysis of over 150 million lines of code across repositories with tracked AI tool adoption found that copy-paste and duplicate code patterns nearly doubled year-over-year, correlating with periods of rising AI tool use. Code churn, defined as code committed and then substantially revised or removed within two weeks, also increased over the same period. These metrics matter in the open-source context because they compound the review burden rather than just reflecting it.

A PR that introduces duplication is harder to review correctly; it requires the reviewer to recognize what’s being duplicated, which means holding a map of the existing codebase in their head. High-churn patches, merged across multiple contributors’ work, create a noisier baseline for everyone who follows. Each subsequent AI-assisted contribution works from a context that includes the previous one’s drift, and the compounding is not linear.

The pattern Erik Doernenburg documented in his CCMenu experiment, published in Fowler’s Exploring Gen AI series, illustrates the mechanism at the individual PR level. Working on a real macOS Swift application with an AI agent, the agent produced code that compiled and passed tests, but consistently preferred addition over modification, missed existing abstractions, introduced coupling violations, and generated naming drawn from the prompt context rather than the codebase’s established vocabulary. These aren’t correctness failures. The code works. The failure mode is that it works in isolation while degrading the properties that make a codebase maintainable over time.

Where the Costs Land

Most of the evaluation tooling available to open-source maintainers, CI pipelines, automated test suites, static analysis, doesn’t capture the difference between “works” and “fits”. A PR that passes CI has cleared a meaningful bar. But the gap between passing CI and fitting the codebase is where the real review work happens, and that work is manual. AI tools that generate code optimized for passing automated checks shift more of the review burden into the manual layer, where it’s slowest and most expensive.

For popular open-source projects, this creates structural pressure. Contribution volume has increased as AI tools lower the activation energy for sending a PR. Maintainer bandwidth, which scales with maintainer count rather than contributor count, hasn’t grown to match. The result is either longer review queues, more superficial reviews, or an effective raising of the acceptance threshold that makes it harder for all contributors, AI-assisted or not.

This is the asymmetry worth naming directly. The productivity gains from AI-assisted contribution accrue primarily to contributors: faster time from idea to working implementation, lower friction on feature exploration, reduced cost of engaging with unfamiliar codebases. The costs, increased review burden, architectural coherence evaluation, duplication detection, accumulate primarily on maintainers. In a model that depends on volunteer maintainer time, that distribution is structurally problematic.

The Epistemic Honesty in “Probably Reduced”

Fowler’s careful phrasing, and his reminder not to treat the CMU study as definitive, is doing real epistemic work. AI code isn’t uniformly lower quality, and the relationship between AI tool adoption and code quality outcomes is entangled with confounders: which projects, which kinds of contributions, which metrics of quality, which maintainer practices. A single study covering a batch of open-source projects can establish correlation and flag patterns; it can’t resolve the causal picture.

What the accumulation of studies does establish is a pattern of concern that’s specific to the open-source context and worth responding to practically, rather than either dismissing or catastrophizing.

What Can Change

On the contributor side, the most direct lever is context. AI coding assistants perform better when they have better context about the project’s internal structure, not just its public API. Detailed CONTRIBUTING documentation that describes abstraction expectations, not just style rules, gives AI tools better material to work from. The same documentation helps human contributors, so there’s no downside to improving it.

On the maintainer side, some projects have started requiring more substantive PR descriptions that address architectural fit explicitly, separate from behavioral correctness. This shifts some of the evaluation burden back to contributors, where the context for generating it is at least partially present.

The harder problem is tooling. Static analysis that detects duplication at the semantic level rather than the syntactic level would catch one of the most common failure modes. Code review tooling that surfaces related implementations when a PR introduces new code would help both contributors working with AI assistants and maintainers evaluating the results. Neither of these is a trivial engineering problem, and the open-source ecosystem hasn’t converged on solutions.

The CMU study Fowler flagged in December 2025 is a data point in a conversation the open-source community will need to keep having deliberately. The contribution bar has changed in ways that benefit contributors. The review infrastructure hasn’t changed to match. Adjusting the balance is the work, and it’s mostly ahead of us.

Was this interesting?