The Context Budget Problem: How Coding Agents Decide What to See

A recent article on Martin Fowler’s site from February 2026 captures something that’s been quietly becoming the defining skill for anyone building with or configuring coding agents: the ability to engineer what goes into the context window, not just what you type into the chat box. Andrej Karpathy framed this well when he described context engineering as “the art of filling the context window with exactly the right information at the right time.” That framing distinguishes it from prompt engineering, which was mostly about crafting single instructions. Context engineering is about the ongoing, programmatic curation of what the model can see across an entire agentic session.

The model’s weights are fixed. Every gain you make from here comes from what you put in front of it.

The Layered Context Stack

When Claude Code processes a request, it’s not working from a blank slate. There’s a stack of context sources that get assembled before any user message even lands:

System prompt — Anthropic’s built-in instructions for the agent’s persona and tool usage
Global memory — contents of ~/.claude/CLAUDE.md, injected into every session
Project instructions — CLAUDE.md files found at the repo root and in subdirectories
MCP tool descriptions — registered servers contribute their own tool schemas and descriptions
Conversation history — prior turns in the session, compacted when they grow too large
Retrieved context — files the agent reads during execution, tool call results
Current task — the user’s actual message

Each layer competes for space in a 200K token window. For a moderately complex TypeScript monorepo, the retrieved files alone can consume 50,000 to 80,000 tokens without much effort. The architectural question that nobody in the tooling space has fully answered yet is how to allocate that budget deliberately.

The Static Layer: CLAUDE.md and Its Equivalents

The most visible context engineering primitive right now is the project instruction file. Claude Code uses CLAUDE.md, Cursor uses .cursorrules (or the newer cursor.rules), and GitHub Copilot reads .github/copilot-instructions.md. They all serve the same purpose: inject persistent, project-specific context before the model sees anything else.

A well-written CLAUDE.md does several things at once. It tells the agent where things live in the codebase, which commands to run for builds and tests, which patterns to prefer, and which things to avoid entirely. The avoid list matters more than most people realize. Instructions like “never modify schema.sql directly” or “run npm test before marking a task complete” are the kind of guardrails that prevent the agent from making plausible-looking but destructive changes.

# Project: Order Service

## Architecture
- `src/handlers/` -- Express route handlers (thin, no business logic)
- `src/services/` -- Business logic layer
- `src/db/` -- Knex query builders only; never raw SQL strings

## Commands
- `npm run build` -- TypeScript compile
- `npm test` -- Jest; must pass before any commit
- `npm run lint` -- ESLint with Airbnb config

## Rules
- All new endpoints require OpenAPI annotations in `docs/openapi.yaml`
- Use `z.infer<typeof schema>` for Zod type inference
- Never commit `.env`

The hierarchy matters here. Claude Code reads ~/.claude/CLAUDE.md for global preferences, then the project-root CLAUDE.md, then subdirectory CLAUDE.md files as it navigates the tree. This lets you set global preferences once (your preferred code style, your test philosophy) and override or extend them per project.

Cursor’s RAG-based approach takes a different trade-off. Rather than relying on human-authored instructions, it builds a vector index of the whole repo on first open and retrieves chunks semantically. The @codebase mention in Cursor pulls from this index. The upside is that it finds relevant code you didn’t think to mention. The downside is that AST-unaware chunking loses semantic boundaries. A function split across chunk boundaries becomes partially invisible to retrieval, and BM25 still outperforms embeddings for exact identifier matches in most benchmarks.

MCP and the Dynamic Layer

The Model Context Protocol is what makes the dynamic context layer composable. MCP servers expose three primitives to the host: tools (callable functions), resources (readable data), and prompts (reusable templates). Claude Code, Cursor, and a growing number of other tools treat registered MCP servers as native capabilities.

The practical effect is that you can wire your coding agent to your actual systems. A Jira MCP server means the agent can look up the ticket it’s implementing. A Postgres MCP server means it can inspect the live schema before writing migrations. A GitHub MCP server means it can read open PRs and comments as context. None of this requires manual copy-pasting into the chat window.

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_TOKEN": "ghp_..." }
    },
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"]
    }
  }
}

This configuration lives in .claude/settings.json at the project level, or in the global Claude settings file. The tools appear as first-class capabilities to the agent; it can call them the same way it calls its built-in file-read or terminal tools.

Hooks: Shaping Context at Execution Time

Claude Code’s hook system is the most underappreciated part of its context engineering story. Hooks are shell scripts that fire in response to agent lifecycle events: PreToolUse before any tool call, PostToolUse after, Notification for agent messages, and Stop when a session ends.

A PostToolUse hook on file writes can automatically run a linter and inject the results back into context. A PreToolUse hook on terminal commands can validate against an allowlist before execution. This is the layer where you enforce policies that you can’t reliably encode in natural language instructions alone.

The hooks receive structured JSON via stdin describing the tool call, its arguments, and its result. A simple bash hook that runs ESLint after every file edit and returns the output can prevent an entire class of “the linter found 12 errors after you were done” conversations.

The Structural Problem: Lost in the Middle

Packing a context window with relevant information is not sufficient on its own. The position of that information within the window affects how reliably the model uses it. Research on attention in long-context models consistently shows that content in the middle of a long context receives less reliable attention than content at the start or end. This is the “lost in the middle” problem, and it’s not fixed by larger context windows.

Anthropics’s own guidance recommends using XML tags to structure multi-source context:

<project_context>
  <architecture>...CLAUDE.md content...</architecture>
  <relevant_files>
    <file path="src/services/order.ts">...contents...</file>
  </relevant_files>
</project_context>
<task>
  Add a discount field to the Order model and update the service layer.
</task>

The tags give the model labeled regions it can reference by name rather than by position. Combined with putting the most critical constraints near the beginning and the specific task at the end, this structure makes context more reliable than raw concatenation of files.

Subagents and Context Isolation

When a task is complex enough that a single context window is a bottleneck, subagents become the architectural answer. Claude Code’s Task tool spawns isolated agent instances with their own context windows. The orchestrating agent describes a subtask, the subagent executes it with full CLAUDE.md injection and its own tool access, and the result comes back as text to the parent.

This pattern solves two problems. First, it parallelizes work: writing tests and implementing a feature can happen concurrently rather than sequentially within a single context. Second, it keeps the orchestrator’s context clean. A subagent that reads 30 files to accomplish its task does not pollute the parent agent’s context with all that retrieved content.

Where This Is Heading

The convergence across tools is visible. Cursor added MCP support in 2025. Copilot has its own extension protocol. Every major coding assistant now has some form of project-level instruction file. The primitives are standardizing.

What’s still tool-specific is the quality of the automatic context selection, the hook and lifecycle system depth, and how gracefully the tool handles context budget exhaustion. Claude Code’s /compact command and auto-compaction summarize older conversation turns to make room, but the summarization inevitably loses detail. Knowing when to manually clear context versus when to compact is a skill that sits outside any configuration file.

The Martin Fowler article frames this moment as an explosion of options. That’s accurate. The harder problem is developing the judgment to use them selectively. A 200K token context window is large enough that it’s tempting to include everything and trust the model to sort it out. The evidence suggests that deliberate curation still wins over brute-force inclusion, and that the engineers who treat context structure as an engineering concern rather than an afterthought consistently get better results from the same underlying models.