The Amplification Problem: Harness Engineering in an Agent-First World
Source: martinfowler
Birgitta Böckeler’s February 2026 commentary on martinfowler.com draws its frame from OpenAI’s harness engineering post by Ryan Lopopolo, written about Codex in an “agent-first world.” That phrase is doing more work than it first appears. The three components of the harness, context engineering, architectural constraints, and codebase garbage collection, are worth having in any AI-assisted codebase. Their importance and the consequences of neglecting them are substantially different, however, once your AI coding tool is executing multi-step work rather than producing single-line suggestions.
The Suggestion-to-Agent Shift
For most of the period when AI coding assistance became widely used, the interaction model was completion or suggestion. Copilot offers a completion; you accept or decline. The AI proposes; you dispose. The review layer is every keystroke. The cost of a model working from poor context is proportional to suggestion frequency: occasionally wrong, correctable in one interaction, bounded by the fact that nothing commits itself.
Agentic tools change the cost structure: an agent given a task reads the relevant parts of your codebase, forms a model of which patterns are in use, and implements the feature, writing files, calling tools, running tests, potentially opening a PR. Every decision it makes during that process reflects its model of your codebase. If that model is wrong because your codebase contains contradictory signals, the error is not a single wrong completion. It is a design decision replicated across multiple files, requiring either the agent to redo the work or a reviewer to unwind it.
The harness components make suggestion tools more accurate. For agentic tools, they determine whether the agent’s internal model of the codebase is reliable enough to be trusted with autonomous multi-step work.
The Incomplete Migration Problem
The clearest illustration is a migration left at seventy percent. Suppose a codebase is midway through moving from direct pg client calls to a centralized database package. The majority of the code has been migrated; a significant portion still uses the old pattern. An agent asked to implement a new route reads the relevant modules, sees both patterns present, and makes a probabilistic judgment. In practice, agents in this situation often do something more problematic than picking one pattern: they blend both, because both are represented in nearby files and both appear structurally coherent.
A simplified version of what this looks like:
// A new route handler written by an agent working in a partially migrated codebase.
// The agent used /packages/db for most operations but reached for 'pg' directly
// for the transaction, mirroring a legacy file two directories up.
import { db } from '../../packages/db';
import { Pool } from 'pg'; // architectural violation; missed by the agent
const pool = new Pool(); // should never be instantiated outside packages/db
export async function createOrder(userId: string, items: CartItem[]) {
const client = await pool.connect(); // old pattern
try {
await client.query('BEGIN');
const order = await db.orders.create({ userId });
// ...
} finally {
client.release();
}
}
The resulting PR passes the type checker. A reviewer who knows the migration is ongoing will catch it; a reviewer returning from leave may not. The review burden has shifted from correctness verification to migration-state verification, which requires either institutional memory or dedicated tooling. The harness engineering response is to complete the migration and remove the old pattern, converting the problem from probabilistic to structural. When one pattern exists in the codebase, the agent uses it.
What a Harness Pre-Flight Looks Like
The implication is that the right question before deploying an agent on a feature area is not “is this codebase generally well-maintained” but “is this specific area free of the ambiguities that agents propagate.” A useful pre-flight for agentic work covers five things:
Harness pre-flight
- [ ] Only one pattern exists for each primary operation in scope
- [ ] Any active migrations in this area have completed (old pattern removed)
- [ ] CLAUDE.md / AGENTS.md covers the relevant architectural constraints with reasons
- [ ] Key interfaces and types are accurate for the modules the agent will touch
- [ ] Context files reflect the current architecture, not a past state
The last item is worth dwelling on. A context file updated during a 2024 architectural refactor and not revised since directs the agent to build against a model of the system that may no longer match the code. One developer using a stale context file gets one wrong session. An agent using it implements an entire feature against an architecture that no longer exists, and does so with the confidence of something that has read the documentation.
Garbage Collection as a Pre-Condition
The codebase garbage collection pillar is often framed as a code quality practice with gradual benefits. In an agentic workflow, its role is more operational: cleaning a module before deploying an agent is analogous to checking test coverage before merging, a gate rather than an afterthought.
Deprecated utility functions, commented-out implementations, and stale TODO comments referencing migrations that completed two years ago are not inert from an agent’s perspective. They are present in the context window as evidence of valid patterns. The failure mode is subtle: an agent writing a new authentication flow that calls a deprecated token validator, because three legacy files in the same module called it, will produce code that works under current conditions and fails under load. The test suite passes because the deprecated function is still live. Code review catches it only if the reviewer knows the deprecation status of a function they did not write and did not recently touch.
Tooling for this work is mature. ts-prune finds unused TypeScript exports; Knip extends coverage to unused dependencies and configuration; Python has vulture; Go’s compiler enforces unused variables at the language level. Running these tools before handing a module to an agent converts what is usually treated as aspirational housekeeping into an automatable, schedulable gate.
The Structural Boundary
The harness determines the boundary inside which an agent’s autonomous judgment can be trusted to produce reviewable output. A suggestion tool operating in a poorly maintained codebase produces a steady stream of mildly wrong proposals that a developer corrects individually. An agent operating in the same codebase produces implementations that look structurally correct and are wrong in ways that require architectural knowledge to detect, not just code-reading ability.
This changes the economics of technical debt in areas where agents operate. The cost of an incomplete migration or a deprecated pattern left in place is no longer a fixed liability charged at code review time; it is a recurring cost charged every time an agent works in that area, proportional to how much autonomous work the agent does. Teams that complete migrations fully and maintain current context files have agents that propagate correct patterns. Teams that do not have agents that preserve and extend their technical debt.
The three pillars Böckeler describes compound in the same direction. Consistent architectural patterns make context engineering easier, because there are fewer edge cases to document. Codebase garbage collection makes both more effective, because the agent’s inferences from reading the code align with the explicit constraints in the context files rather than contradicting them. None of these investments requires changing how you write code; they require being deliberate about when work is actually finished.