· 5 min read ·

The Reviewer Who Wasn't There From the Start

Source: martinfowler

Erik Doernenburg’s experiment with CCMenu, documented on Martin Fowler’s site in January 2026, contributes something specific to the discussion of AI coding agents: a careful, project-level assessment of internal quality conducted by someone who designed the system being modified. That specificity makes the findings credible. It also contains a premise worth naming.

The experiment works because Doernenburg is CCMenu’s original author and long-term maintainer. CCMenu2, a macOS menu bar application written in Swift that monitors CI/CD pipeline statuses, reflects years of deliberate architectural decisions. When the agent-generated implementation fit poorly into the existing structure, Doernenburg could identify the violation. He recognized the missed abstraction. He noticed the duplication. He understood why the agent’s approach was structurally worse than one produced by a developer who had internalized the codebase’s conventions.

For most teams, code review does not happen this way.

The Structural Knowledge Gap

On most teams, review is distributed across whoever has sufficient context and available time. A senior engineer reviews code in a service she hasn’t touched in four months. A new team member reviews against documentation that describes what the code is supposed to do, not the design principles that shaped how it does it. On open source projects, a maintainer reviews contributions from developers who may have spent hours in the codebase rather than years.

The specific kind of quality degradation that Doernenburg describes is exactly the kind that requires contextual knowledge to catch. Duplication is only visible as duplication if you know where the original lives. A layer boundary violation is only recognizable if you know which layers exist and what they contain. An abstraction opportunity is only apparent if you recognize the pattern is recurring. A reviewer who picked up the task yesterday cannot compensate for that gap by reading more carefully.

Coding agents sharpen this problem without creating it. The GitClear analysis from early 2024, examining over 150 million lines of code changes, found that copy-paste code patterns increased substantially in years correlating with AI tool adoption. That accumulation happens in review passes where nobody had the structural knowledge to see the problem, not in review passes where someone looked and accepted it anyway.

What Static Analysis Catches, and What It Does Not

Standard CI tooling provides a partial floor. SwiftLint flags style violations and common code smells. SonarQube reports cyclomatic complexity, within-file duplication, and coupling metrics. These are useful, but they are coarse instruments for the class of violations agents tend to introduce.

SwiftLint’s custom rules get you closer to structural enforcement. A regex-based rule that flags any UIKit import in files under a path matching Model/ catches a specific boundary violation. A rule that prohibits URLSession calls outside files matching *Service.swift enforces a networking boundary. These are blunt, but they run on every commit without requiring any reviewer to remember the constraint.

# .swiftlint.yml
custom_rules:
  no_urlsession_in_models:
    name: "No URLSession in Model layer"
    regex: 'URLSession'
    included: '.*/Model/.*\.swift'
    message: "Network calls belong in the Service layer, not in Model types."
    severity: error

The gap these tools leave is the more sophisticated structural violations: a class that now carries two distinct responsibilities, a method placed in the wrong layer because of what it knows and calls, an abstraction that was correct six months ago but now has enough adjacent duplication that it should be extended. No regexp catches those.

Module Boundaries as Compiled Enforcement

Neal Ford, Rebecca Parsons, and Patrick Kua’s Building Evolutionary Architectures introduces architecture fitness functions: executable tests that verify structural properties rather than behavioral ones. The canonical example is a test that fails if any class in the service layer imports from the controller layer. This does not test what the code does. It tests how the code is arranged, and it runs automatically.

For Swift projects, the most direct mechanism is Swift Package Manager’s target system. If architectural layers correspond to separate targets with explicit dependency declarations, the compiler enforces boundaries without any human reviewer in the loop.

// Package.swift
.target(
    name: "NetworkLayer",
    dependencies: [],
    path: "Sources/Network"
),
.target(
    name: "DomainLayer",
    dependencies: [],
    path: "Sources/Domain"
),
.target(
    name: "ViewLayer",
    dependencies: ["DomainLayer"],  // intentionally not NetworkLayer
    path: "Sources/Views"
),

If ViewLayer depends only on DomainLayer in the package manifest, any agent-generated code in the view layer that reaches directly into network types will fail to compile. The structural constraint is enforced before any human sees the code. A CI step that checks the package manifest diff can flag new cross-layer dependency additions before they are merged.

This does not catch everything. It does not catch a method placed in the wrong type, or duplication of logic within a single layer. But boundary violations, which are among the most common structural problems in agent-generated code, become compilation errors rather than review observations.

The Open Source Maintainer’s Problem at Scale

For a maintainer in Doernenburg’s position, the structural quality question is not only about agent-generated code. It is about receiving contributions from any developer who may have read the codebase for a day before submitting a PR. The architectural knowledge gap between contributor and maintainer has always existed in open source; AI tools change the volume calculation.

A project that received ten external contributions per month may receive fifty as the barrier to producing working code decreases. The maintainer’s structural knowledge does not scale with contribution volume. The review process that worked at ten PRs per month, relying on the maintainer’s architectural judgment applied to each one, becomes a bottleneck or a quality problem at fifty.

Some open source projects have begun addressing this with Architecture Decision Records, short documents that capture specific architectural decisions, the context that motivated them, and their consequences. An ADR explaining why network calls are isolated in service types, with a reference to why that boundary exists, gives a contributor a concrete reference. A reviewer pointing to an existing ADR rather than re-explaining the design principle creates shared knowledge rather than repeated back-and-forth.

The other tool is contribution guidelines that go beyond style conventions to explain structural expectations. Where should a new feed parser be placed? Which protocol defines the integration point for a new CI provider? What is the expected data flow for a feature that needs both network access and persistence? These are questions a contributor needs answered before writing the code, not discovered during review.

Making Tacit Knowledge Explicit

The deeper issue Doernenburg’s experiment surfaces is that architectural knowledge in most projects is tacit. The design conventions embedded in CCMenu’s module structure and data flow patterns exist primarily in one person’s head, transmitted through code review and occasional documentation. That is defensible in a small, carefully maintained project. It becomes a liability at any scale of collaboration.

What AI coding agents do is sharpen the case for making that knowledge explicit and machine-checkable. Module boundaries in Package.swift encode what documentation describes in prose. Custom lint rules encode review observations that would otherwise depend on the reviewer recognizing a pattern. ADRs encode design decisions that would otherwise require asking the original author.

None of this replaces the architectural judgment that Doernenburg applied when evaluating his own codebase. A reviewer who understands the design will still catch things that no automated tool will. The question is what happens at the volume and pace that agent-assisted development enables, when that reviewer is not always available, and when the contributor is not the person who designed the system. Building structural constraints into the toolchain means the parts that can be automated are automated, leaving human review for the judgment that actually requires it.

Was this interesting?