· 6 min read ·

The Engineering Work That Happens Before the AI Writes a Line

Source: martinfowler

The parts of AI-assisted software development that attract attention are the model behaviors: hallucinations, context limits, code that compiles but doesn’t work. The parts that don’t attract attention are the structural decisions that happen before any prompt is written, the engineering work that shapes what the model receives, what constraints it operates under, and how the codebase stays coherent as AI-generated code accumulates over months.

Birgitta Böckeler’s commentary on Martin Fowler’s site names this body of work “harness engineering,” picking up a framing from OpenAI’s internal practices for AI-enabled development teams. The term earns its place because it separates the harness from the model. The coding model is the tool; the harness is everything else: the context it receives, the architectural shape of the system it reads and writes, and the maintenance disciplines that prevent codebase entropy from degrading AI performance over time. These three concerns are distinct, and each requires deliberate engineering effort.

Context Engineering: Systemic Input, Not One-Off Prompts

Context engineering is the work of giving AI models consistent, accurate, high-quality information about your codebase and your intentions. It is distinct from prompt engineering in roughly the way application architecture is distinct from writing a function: prompt engineering concerns the interaction at a given moment; context engineering concerns what persists across all interactions.

In practice, context engineering takes several forms. Configuration files like CLAUDE.md or .cursorrules describe conventions, constraints, and project-specific knowledge that every AI session should know. These files are machine-readable documentation, telling the AI what is true about this codebase in a form it can use when generating code. Writing them well requires the same discipline as writing good documentation, plus awareness of how models weight and use contextual information.

Beyond configuration files, context engineering includes decisions about what the AI retrieves at query time. In RAG-backed tools, the retrieval strategy determines what code samples, documentation fragments, or architectural descriptions land in the context window. The engineering work here involves choosing what to index, how to chunk it, and what metadata makes retrieval useful rather than noisy. A retrieval pipeline that returns the five most lexically similar code snippets, regardless of whether they represent current or deprecated patterns, is worse than one that returns fewer, more relevant results.

Negative context matters as much as positive context. Dead code, deprecated modules, commented-out experiments, and obsolete test fixtures all consume context budget and introduce false signals. A model that reads twenty patterns for database access, fifteen of which reflect three previous approaches to the same problem, will produce code that reflects that inconsistency. Curating what the model sees is part of context engineering, not an afterthought.

There is a useful parallel to how teams think about internal documentation. The practices that make documentation valuable for new human engineers, keeping it current, scoping it to what is actually true, removing what has gone stale, are exactly the practices that make context valuable for AI models. The discipline transfers; the tooling is different.

Architectural Constraints as AI Legibility

The second axis of harness engineering is the codebase’s structural shape. Architectural decisions that have always made code easier for humans to read and modify turn out to have a direct relationship to AI tool effectiveness.

AI coding models work best when patterns are consistent, when module boundaries are clean, when the interface to a component is small relative to its implementation, and when concerns are separated. These properties reduce the amount of context a model needs to read before it can safely write correct code. A component that touches the database, the UI state, and the external API in a tightly entangled way requires the model to understand all three domains before it can modify any of them. A well-bounded service with a narrow interface can be understood and modified with much less context.

The arguments for bounded contexts, single responsibility, and clear interfaces have always included making the codebase easier to understand. The feedback loop has historically been slow: the cost shows up in velocity, bugs, and onboarding time over months. With AI tools, the feedback is more immediate. When a model struggles with a tangled module, it produces wrong code, wrong abstractions, and missed constraints. Treating architectural legibility as a property to maintain for AI readers, not just human ones, changes how teams evaluate design decisions.

Architectural constraints also operate at a policy level: which directories the AI can modify, which external services it can call, which patterns are approved and which are deprecated. Encoding these constraints in the harness rather than repeating them per-prompt is the difference between a durable policy and a fragile reminder. A developer working on a feature shouldn’t need to re-specify “don’t touch the payments module” every session; that constraint belongs in the context infrastructure.

This has implications for teams evaluating or redesigning systems. The cost-benefit calculation for breaking up a large service or introducing a cleaner interface now includes the effect on AI tool effectiveness, not just the human-engineering benefits. Both should factor into the decision.

Codebase Garbage Collection

The third piece of harness engineering is ongoing maintenance aimed at keeping the codebase legible as AI-generated code accumulates. Böckeler names this “garbage collection,” and the analogy holds: without active collection, entropy compounds.

AI-assisted development accelerates code production. The review and understanding work does not scale at the same rate, which means the codebase can grow faster than the team’s collective model of it. Dead branches accumulate; experimental features get merged and forgotten; naming inconsistencies proliferate. Each of these degrades the signal-to-noise ratio for the AI on the next task.

Concrete habits that support garbage collection include running dead code detection as part of CI rather than periodically, treating deletion as a positive contribution rather than a risky one, auditing third-party dependencies that no longer serve their original purpose, and keeping naming conventions synchronized as the codebase evolves. None of these are new engineering practices. What changes is their priority: in an AI-assisted workflow, they are maintenance of the harness, with direct effects on the quality of code the AI produces.

There is also a compounding effect worth naming. Technical debt has always compounded, but the rate was limited by how fast humans could write code. AI-assisted development raises that rate, which means entropy accumulates faster if garbage collection doesn’t keep pace. The useful measure for a team is not how much code exists but how legible the codebase remains after six months of AI-assisted development.

The Engineering Discipline Behind “Using AI Tools”

The value of the harness engineering framing is that it makes visible a category of work that teams often do informally or not at all.

Teams that treat AI-assisted development as “install the extension, start writing prompts” will encounter the costs of an unmaintained harness within a few months: inconsistent code generation, AI suggestions that contradict existing patterns, accumulation of technical debt at AI-assisted speed. Teams that invest in context engineering, architectural legibility, and active codebase maintenance will find that the quality of their AI assistance compounds over time.

There is a parallel here to developer experience work. A good developer experience does not happen by accident; it requires investment in tooling, documentation, onboarding, and feedback loops. Harness engineering is, in effect, developer experience work aimed at AI collaborators. The underlying discipline is the same: understand what information the consumer needs, structure the environment so that information is available, and maintain the environment so it stays accurate.

Context files need owners who keep them current. Architectural decisions need to be evaluated for their effect on AI tool effectiveness alongside their effect on human readability. Codebase maintenance needs to be measured against the right outcome: not fewer bugs or faster features in isolation, but a codebase that continues to produce high-quality AI assistance over a sustained period.

Naming harness engineering as a distinct activity matters because it separates it from both “using AI tools” and “writing prompts,” making it visible as something that requires dedicated engineering attention. The models will continue to improve; the harnesses that teams build around them will determine how much of that improvement translates into reliable, sustained development velocity rather than inconsistent output that decays as the codebase grows.

Was this interesting?