The Codebase is Now Part of the Prompt

A few weeks ago, Birgitta Böckeler published a piece on Martin Fowler’s site commenting on OpenAI’s write-up about what they call “Harness Engineering.” Böckeler argues the framing is valuable, and she’s right. The term gives a name to something developers using AI tools have been doing informally for a while, without treating it as a first-class engineering discipline.

The harness, as defined, has three components: context engineering, architectural constraints, and garbage collection of the codebase. These are not independent techniques. They compose into a coherent model of how you shape the environment around an AI agent so it produces output worth keeping. What’s worth sitting with is the implication buried inside that framing: when you work with an AI coding assistant, the entire codebase becomes part of the prompt.

Context Engineering Is Not Prompt Engineering

Most developers who have used AI coding tools understand prompt engineering in the narrow sense: what you type into the chat window, what instructions you put in a system prompt. Context engineering is broader. It is the discipline of controlling what information reaches the model’s context window at all, from any source.

For an AI coding agent operating inside a repository, context includes:

The files you explicitly attach or reference
Files the agent retrieves automatically based on import graphs or similarity search
Configuration files like CLAUDE.md, .cursorrules, or copilot-instructions.md
The current conversation history
Tool call results (test output, compiler errors, grep results)

A model with a 200k token context window sounds like it should be able to ingest everything. In practice it can’t, for two reasons. First, retrieval is lossy: agents don’t read entire large codebases, they sample from them. Second, attention degrades with context length. There is a well-documented phenomenon called the “lost in the middle” problem, where models perform worse on information buried in the middle of long contexts than on information at the beginning or end. The practical upshot is that what lands in context matters enormously, and designing that carefully is real engineering work.

A concrete example. Consider a Discord bot project structured like this:

src/
  commands/
    admin/
      ban.ts
      kick.ts
    moderation/
      warn.ts
      mute.ts
  events/
    messageCreate.ts
    interactionCreate.ts
  lib/
    db.ts
    permissions.ts
  index.ts
CLAUDE.md

If you ask an AI agent to add a new moderation command, it will likely retrieve files from src/commands/moderation/ and src/lib/permissions.ts through semantic or structural search. But if your permissions.ts contains a mix of current patterns and three-year-old helpers that no longer reflect how permissions work in the bot, the model will happily pattern-match off the old helpers. It does not know they are stale. This is the garbage collection problem, and it is much more concrete than it sounds.

What Architectural Constraints Actually Mean for AI Workflows

Architectural constraints in the harness engineering sense are not just good software design. They are specifically the subset of design choices that affect how well an AI model can navigate and extend the codebase.

Small, single-purpose modules are a well-understood principle. Their value in AI-assisted development is that they produce predictable retrieval results. When a model retrieves src/lib/permissions.ts and that file is 80 lines that do exactly one thing, the model’s context is precise. When that file is 600 lines covering permissions, rate limiting, feature flags, and some utility functions that ended up there by accident, the context is noisy.

Naming conventions matter more than in human-only codebases. Humans tolerate inconsistency because we use surrounding knowledge to disambiguate. A function called handleMsg and another called processIncomingMessage both make sense to a developer who knows the codebase. An AI model retrieving code by semantic similarity will find them both, but will not reliably know they’re doing similar things unless the naming is consistent. This affects the quality of code generation in ways that are subtle but cumulative.

There is also the question of how constraints propagate through agent tool use. A coding agent that can run tests, lint, and compile gets feedback about whether its output is structurally valid. Architectural constraints that are enforced by tooling (strict TypeScript configs, linters with teeth, module boundary rules enforced by something like dependency-cruiser) close the feedback loop. The agent can see when it violated a constraint and self-correct. Constraints that only live in documentation or convention cannot be enforced in the loop.

This points toward a design principle: prefer machine-checkable constraints over documented conventions when you are building for AI-assisted workflows. Not because humans don’t matter, but because the AI needs signals it can actually read.

Garbage Collection as a Maintenance Discipline

The most interesting component in the harness framework is the one that sounds most mundane. Codebase garbage collection means actively removing the things that mislead AI models: dead code, deprecated patterns that were never deleted, commented-out implementations, outdated inline comments that describe what the code used to do, and stale configuration examples.

This has always been good practice. What changes in an AI-assisted workflow is the feedback mechanism. Before, dead code was a maintenance cost you paid in human confusion and cognitive load. Now it is also a correctness cost you pay in generated code that uses the wrong patterns. The model has no way to distinguish “this function is the current approach” from “this function was how we did it before we switched to the new auth library.” It learns from what is present.

Consider a common scenario: you migrate from one database client to another. You update the active query files, but leave several utility functions in src/lib/db.ts that still use the old client’s API. Three months later, an AI agent tasked with adding a new query generates code that imports the old client. This is not a hallucination. The model found real code in the repository that used that pattern. It extrapolated from genuine examples.

The “strangler fig” pattern describes how to migrate systems incrementally. In AI-assisted development, you want to complete the strangle as quickly as practical and then delete the strangled code. Leaving the old and new patterns coexisting in the codebase for longer than necessary increases the period during which the model might use the wrong one.

This reframes garbage collection from a nice-to-have into a prompt hygiene practice. The question shifts from “is this code causing bugs?” to “is this code teaching the model something wrong?”

How This Changes the Software Engineering Discipline

The harness engineering frame suggests that software engineers working with AI tools now have two audiences: the human developers who will read and maintain the code, and the AI models that will extend it. These audiences have overlapping but not identical needs.

Human developers benefit from:

Comments explaining non-obvious design decisions
Historical context about why certain choices were made
Breadcrumbs pointing toward related code

AI models benefit from:

Present-tense accuracy (no stale context)
Consistent patterns with low variance
Machine-checkable constraints
Clean separation of concerns that maps onto predictable retrieval

Most of the overlap is real. Good code is good for both audiences. But there are tensions. A comment that says “Note: we tried approach X but it caused issue Y” is useful human documentation and potentially confusing AI context, because the model might use approach X from the comment even though the code itself rejected it. Some teams are experimenting with ADR (Architecture Decision Record) conventions specifically structured so AI models parse them correctly, separating decision history from current guidance.

The other shift is in maintenance prioritization. In a human-only team, refactoring old code competes with feature work and often loses. When that old code is actively degrading AI output quality, the calculus changes. The feedback loop is tighter and more visible: you ask the model to do something, it generates code using the old pattern, you realize the cleanup needs to happen. This could be a forcing function for technical debt reduction that was hard to justify before.

The Meta-Skill

What Böckeler’s commentary highlights is that harness engineering is not a configuration task you do once. It is an ongoing discipline that runs in parallel with feature development. You make architectural decisions with AI context in mind. You run garbage collection passes when you notice generated code degrading in quality. You invest in machine-checkable constraints so the feedback loop stays closed.

This is a real expansion of the software engineering job description. It does not replace writing code or thinking about systems design. It adds a layer of maintenance work whose primary beneficiary is the AI agent working alongside you. The developers who do this well will get noticeably better results from AI coding tools than developers who ignore it, because they will have built a codebase the model can navigate accurately.

The harness is not invisible infrastructure. It is the environment you construct deliberately, and the quality of that environment determines a lot about what the AI can do inside it.