The Documentation Debt That Coding Agents Are Calling In

Martin Fowler’s team published a taxonomy of context engineering options for coding agents in February 2026, cataloging a substantial expansion of configuration surfaces: CLAUDE.md files, MCP server registrations, lifecycle hooks, subagent orchestration, per-directory instruction files, slash command customizations. Each major coding tool has developed its own variant of the same basic insight, which is that what goes into the context window shapes what an agent does as much as the model interpreting it.

Andrej Karpathy argued that “context engineering” is a more accurate term than “prompt engineering” because the real discipline is managing everything the model sees across a session, not crafting the text of a single query. A prompt is a message; a context window is an information architecture decision. And the more revealing question is not which configuration options to use, but what the process of configuring them forces you to figure out about your own codebase.

What Sits in the Context Stack

The full context assembled before Claude Code processes any user message follows a fixed order:

Anthropic’s built-in system prompt
Global user preferences in ~/.claude/CLAUDE.md
Project-level conventions in repo-root CLAUDE.md and per-directory files
Tool schemas contributed by registered MCP servers
Compacted conversation history
Files and tool results retrieved during execution
The current message

The ordering is not incidental. Research published as “Lost in the Middle” by Stanford and UC Berkeley researchers established that language models perform measurably worse on information positioned in the middle of long contexts compared to the beginning or end. This effect persists even for 200,000-token models. A critical architectural constraint buried forty thousand tokens deep in a verbose CLAUDE.md, after sections on style preferences, receives less weight than instructions at the top. Dense, prioritized entries at the beginning of static instruction files outperform comprehensive ones distributed throughout.

Cursor uses .cursor/rules/ with glob-scoped files (introduced in Cursor v0.43); GitHub Copilot uses .github/copilot-instructions.md; Aider reads .aider.conf.yml. The mechanism varies, but the underlying problem each tool is solving is the same: the model needs to know things about your codebase that are not visible in the code.

The Archaeology Problem

The process of writing a good CLAUDE.md reveals what the codebase has never formally documented.

Consider what belongs in one. Coding style belongs elsewhere; a linter handles that. Standard framework patterns belong elsewhere; the model already knows them. What belongs in a CLAUDE.md is the information a senior engineer would deliver in the first five minutes of onboarding: the non-obvious decisions, the constraints invisible in the current implementation, the reason a particular approach was prohibited even though the model would naturally reach for it.

# Database access
Do not use the `pg` package directly. All database access goes through
`/packages/db`. This exists because direct `pg` usage caused connection
pool fragmentation in the payments service in 2023 and the centralized
package enforces the pool configuration.

That entry needs to be written down because the prohibition is not visible in the code. The implementation shows what was done, not what was considered and rejected. The constraint lives in someone’s memory, or in a Slack thread from two years ago, or nowhere.

Writing a CLAUDE.md for a codebase with several years of production history is archaeological work. You are excavating tacit knowledge: decisions that made sense when made, rationales that were never documented because everyone involved already understood them, patterns that were deprecated but remain visible as negative examples in the existing files. The agent cannot infer any of this from reading the implementation. The model sees current state; the reasoning behind it is not available.

Anthropics’s own guidance on CLAUDE.md design recommends negative examples over prose for suppressing patterns the model learned during training. If you want to prevent the model from reaching for a particular library, a small code snippet showing the pattern to avoid with a comment explaining why outperforms a paragraph of prose. The model generalizes better when it sees the specific form of what not to do rather than a description of it.

Stale entries compound the problem. The model weights explicit instructions heavily over inferred patterns. A prohibition lifted eighteen months ago but still present in the file generates a correction loop every session until someone notices and removes it. A CLAUDE.md that accurately described your architecture in January but has not been updated after several quarters of refactoring is worse than no file at all, because it actively misdirects. This means CLAUDE.md needs the same maintenance discipline as any other infrastructure component: reviewed in pull requests, updated as part of any architectural change, explicitly owned by the team.

MCP and the On-Demand Shift

One of the more significant changes in context engineering practice is the move from pre-loading context to retrieving it on demand. The Model Context Protocol, an open standard built on JSON-RPC 2.0 over stdio or HTTP transports, enables agents to query external systems at runtime and receive structured results.

Rather than embedding a database schema in the system prompt (stale as of the last migration) or loading documentation at session start (consuming budget for information the current task may not need), an MCP-connected server returns live data when queried. To the agent, an MCP tool call is syntactically identical to reading a file or running a terminal command:

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_TOKEN": "ghp_..." }
    },
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres",
               "postgresql://localhost/mydb"]
    }
  }
}

An MCP-connected GitHub server can return the issue describing the bug being fixed; a postgres server returns the current schema; a CI server returns the failing test output. All of this arrives fresh at query time rather than stale at session start, and costs token budget only when actually invoked.

MCP servers expose three primitives: tools (callable functions), resources (readable data), and prompts (reusable templates). The practical distinction from pre-loaded context is freshness and precision. A static schema pasted into a system prompt was accurate when it was pasted. An MCP tool returns the schema as it exists right now. For any codebase with an active migration history, the difference is material.

The Budget Arithmetic

A 200,000-token context window feels generous until you account for what fills it. A moderately complex TypeScript monorepo can consume 50,000 to 80,000 tokens in retrieved files during a single task. Active conversation history adds more. Each reasoning step in an agentic loop accumulates.

In a ReAct-style loop where each step adds n tokens of new information, total context after m steps is n·m(m+1)/2, quadratic growth. At ten steps, total context is 55 times the first step’s cost. At twenty steps, 210 times. Claude Code’s /compact command and automatic compaction address this, but compaction destroys reasoning. What survives is a summary; intermediate observations, considered alternatives, and the path taken do not.

This is why the most important constraints belong in CLAUDE.md rather than in conversation messages. Static instruction files are re-injected into context after compaction. A constraint written in a message earlier in the session may be compressed away; the same constraint in CLAUDE.md is present at the start of every session and every compacted continuation. The persistence hierarchy is not incidental to the design.

Birgitta Böckeler’s companion piece on harness engineering, published shortly after the Fowler taxonomy, frames this around three components: context engineering, architectural constraints (design choices that make the codebase tractable for a model, such as small focused modules and strict static types), and semantic garbage collection (removing dead code, stale comments, and deprecated patterns that actively mislead). The observation underlying all three is that the entire codebase is now part of the prompt. Text in files the agent reads during a session shapes its understanding; contradictory examples or outdated documentation have measurable effects on output quality.

This is why .cursorignore and .aiderignore exist as patterns: excluding build artifacts, generated files, and large fixture files from default retrieval is the negative side of context engineering, preventing the context budget from filling with content that dilutes rather than informs.

What the Forcing Function Reveals

Teams have always needed to encode architectural decisions somewhere accessible to new contributors. The novel element is that an AI coding agent is an unusually unforgiving reader. A new engineer can ask clarifying questions when something seems inconsistent; a model cannot. It acts on whatever is present in the context window, including the absence of information.

When context is missing, the model falls back on training data defaults, which means open-source ecosystem conventions and common framework patterns, not your codebase’s specific constraints. The gap between those defaults and your actual requirements is the gap that a well-maintained CLAUDE.md closes.

The explosion of context engineering options that the Fowler article documents is a maturation of tooling around a constraint that was always present but previously diffuse. The institutional knowledge problem in software teams, the fact that critical architectural decisions often exist only in the memory of senior engineers and disappear when those engineers leave, has always been costly. Coding agents make it immediately costly in a form that is hard to ignore: the agent does the wrong thing, you correct it, it does the wrong thing again next session, you correct it again.

A CLAUDE.md that works well for a coding agent is a written transcript of the tacit knowledge that senior engineers carry. The agent is a reason to write it down. Not only because it improves agent performance, though that is true, but because the knowledge was always worth encoding and the process of configuring context is a structured forcing function for doing so. Teams that treat these files seriously tend to discover that the exercise clarifies architectural intent for human contributors as much as for the model.

The Fowler article catalogs tools. The more lasting story is what using those tools makes visible.