The Context Stack: How Modern Coding Agents Know What They Need Before Writing Code

Context engineering is the practice of deliberately deciding what information goes into an LLM’s context window, and when. The term got real traction in early 2025 when Andrej Karpathy used it to distinguish the architectural problem of context composition from the textual problem of prompt wording. By the time Birgitta Böckeler’s article landed on Martin Fowler’s site in February 2026, the tooling around it had grown considerably more complex. Looking back from a few weeks out, it’s useful to trace exactly what that stack looks like and why the design decisions in each layer have consequences for actual development work.

The Three-Layer Model

Coding agent context arrives from three distinct places: static files loaded at startup, dynamic content injected by the agent’s tool calls, and external systems connected via protocol integrations. Each layer has different tradeoffs. Static context is cheap and predictable but stale. Dynamic context is fresh and targeted but consumes tokens on every retrieval. External context can be vast and authoritative but introduces latency and reliability dependencies.

Understanding which layer carries which kind of information is the core of context engineering as a discipline.

The Static Layer: Instruction Files

The most visible form of context engineering is the project-level instruction file. Claude Code uses CLAUDE.md, Cursor uses .cursor/rules/ (formerly .cursorrules), Windsurf uses .windsurfrules, and GitHub Copilot reads .github/copilot-instructions.md. These files are loaded unconditionally at the start of every session, which means every token they consume costs you on every request.

A well-structured CLAUDE.md is a compressed representation of things you would otherwise re-explain every session:

# Project: payment-service

## Architecture
Go service using chi router, PostgreSQL via pgx/v5, and Redis for caching.
Do not use ORM libraries. Raw SQL queries go in the `db/` package.

## Testing
Unit tests use testify. Integration tests use testcontainers-go.
Run `make test` for unit tests, `make integration-test` for integration.

## Conventions
- Error messages are lowercase with no trailing punctuation
- Context propagation: always pass ctx as first argument
- No global state outside of main.go

The discipline here is the same as writing good documentation: include what the agent cannot infer from the code itself, and leave out what it can. File structure is inferrable from a directory listing. The reason you chose pgx over database/sql is not.

Claude Code extends this with a tiered file system: a global ~/.claude/CLAUDE.md for user-level preferences, a project-level CLAUDE.md at the repo root, and support for subdirectory-level files that override or extend the parent. This lets organization-wide conventions live in your home directory while project-specific rules travel with the code. Aider takes a similar approach with CONVENTIONS.md, reading it automatically when present in the project root.

The Dynamic Layer: Tool-Driven Context

The more interesting engineering happens at the dynamic layer. A coding agent doesn’t load the entire codebase into context; it uses tools to retrieve specific pieces as needed based on the current task.

Claude Code’s tool set covers the core cases: read_file and write_file for file operations, bash for running arbitrary commands, list_directory and find_files for navigation, and both grep_search and semantic_search for locating relevant code. The gap between those last two matters in practice.

grep_search is fast and deterministic: ask for all usages of a function name and you get exactly those lines. semantic_search uses vector embeddings of the codebase to find code that is conceptually related, even when the exact terms don’t match. Cursor’s codebase indexing is built primarily on this embedding approach, which is why @codebase can surface things that keyword search would miss entirely.

The tradeoff is trust and cost. Semantic search results can be surprising, and every embedding lookup costs tokens to include in context. Grep results are exact but may miss the broader picture. Good agents use both, with the embedding results serving as a discovery layer and the targeted file reads confirming what was found.

Managing Context Window Consumption

With Claude 3.5 Sonnet at 200k tokens and a substantial codebase, it’s easy to assume context window size isn’t the constraint. It often still is, because of cost and latency rather than raw capacity.

Claude Code’s approach to this is compaction: when the conversation grows large, it summarizes earlier portions, collapsing tool call results and prior reasoning into a condensed narrative. This keeps the most relevant recent state in focus while discarding the verbosity of intermediate steps.

Aider’s approach is different: repo maps, a compressed structural representation of the codebase that fits in a relatively small token budget. Rather than including full file contents, you get function signatures, class hierarchies, and import relationships. Aider generates these using tree-sitter to parse the AST, then scores them by relevance to the current task using a graph-based algorithm derived from PageRank. The result is a few thousand tokens that give the model enough navigational context to know where to look next.

Summarization and structural compression are complementary. Summarization manages conversation history. Structural compression manages codebase knowledge. Both are necessary at scale, and different tools have made different bets about which matters more for their target workflows.

MCP: Standardizing External Context

The Model Context Protocol, which Anthropic open-sourced in late 2024, provides a standardized interface for connecting agents to external systems. An MCP server exposes tools and resources that any compatible client can use. The ecosystem has grown to cover databases, documentation systems, issue trackers, CI pipelines, and monitoring platforms.

For context engineering, MCP matters because it moves external data from the “you have to describe it in prose” category to the “the agent can query it directly” category. Instead of pasting a Sentry error into the chat window, you configure an MCP server that Claude Code can query directly for recent errors in the current project. Instead of copying Linear tickets into context, the agent retrieves the relevant ones during its planning phase.

A minimal MCP server configuration in Claude Code’s claude_desktop_config.json looks like this:

{
  "mcpServers": {
    "linear": {
      "command": "npx",
      "args": ["-y", "@linear/mcp-server"],
      "env": {
        "LINEAR_API_KEY": "${LINEAR_API_KEY}"
      }
    }
  }
}

The server exposes tools like list_issues, get_issue, and create_comment. The agent calls them as needed during a session, pulling in exactly the data it needs rather than requiring manual copy-paste.

This is where context engineering starts to look less like a prompt strategy and more like a systems integration problem. The quality of an agent’s output depends on the quality of its connected data sources, and designing that integration layer involves the same considerations as any other service dependency: reliability, latency budgets, access control, and data freshness.

The Reproducibility Problem

Context engineering introduces a reproducibility challenge that is easy to underestimate. Two developers on the same codebase, with different CLAUDE.md files, different MCP servers configured, and different conversation histories, will get meaningfully different outputs from identical instructions.

This is the same problem that environment configuration has always had, and the solutions are similar: version-control your instruction files, document your MCP dependencies, treat your context stack as part of the project rather than a personal configuration layer. Teams that are handling this well commit CLAUDE.md to the repository root the same way they commit .eslintrc or pyproject.toml. The context layer is part of how the codebase gets modified, so it belongs in version control and deserves the same review process as other configuration.

The analogy holds further than most people initially expect. Just as linting rules and build configuration deserve deliberate discussion when they change, so do the instructions you’re giving your coding agent. When a rule in CLAUDE.md is ambiguous or wrong, the cost compounds across every session every developer runs. A five-minute review of a CLAUDE.md change has returns that are hard to see in any single session but become obvious over weeks of team use.

Where the Ceiling Is

The Fowler article from February frames context engineering as a capability in rapid expansion, which tracks. Claude Code alone has shipped memory features, improved compaction, deeper tool composability, and a growing MCP ecosystem in a short span. Cursor, Continue, and Aider are each pushing on different parts of the same problem.

The ceiling isn’t context window size, at least not primarily. It’s the quality of what you put in the window. A 200k token context full of irrelevant file contents is worse than a 10k token context of precisely the right information. The tools are getting better at the retrieval and compression sides of that problem. The human side, deciding what the agent needs to know about how your team works, what your conventions are, and what external systems matter, remains work that doesn’t automate cleanly. That’s where the investment pays off most clearly right now.