Context Windows Are Budgets: The Architecture Behind Modern Coding Agents

From Prompt Crafting to Context Architecture

Around mid-2025, Andrej Karpathy described a shift in how practitioners should think about working with language models. His framing was concise: “context engineering is the delicate art and science of filling the context window with just the right information for the task at hand.” The phrase resonated because it named something people were already doing but calling by the wrong name. Prompt engineering implies wordsmithing a clever string. Context engineering implies designing an information architecture.

Martin Fowler’s February 2026 article on context engineering for coding agents maps out how this architecture has developed in the past year. It is worth reading as a snapshot of where tooling has landed. What follows is a closer look at the technical mechanisms underneath that snapshot, and why each design decision reveals something about the fundamental constraints of the problem.

The Hard Constraint

A coding agent running on a model with a 200k-token context window has, at most, roughly 150,000 tokens of usable working space after accounting for the system prompt, tool definitions, and output reservation. A medium-sized codebase measured in megabytes of source text will not fit. Every piece of information that reaches the model has displaced something else. The core of context engineering is resource allocation under a fixed budget.

The approaches that have emerged fall into three categories: static injection, semi-static indexing, and dynamic retrieval. These are not competing strategies; they operate at different timescales and handle different scopes of relevance.

Static Injection: The Instruction Files

The simplest mechanism is a project-level markdown file injected into the system prompt unconditionally on every run. Claude Code uses CLAUDE.md, Cursor uses .cursorrules, and GitHub Copilot reads .github/copilot-instructions.md. All three serve the same purpose: give the agent persistent knowledge about the project that survives conversation resets.

A CLAUDE.md file might look like this:

# Project conventions
- Use `pnpm` not `npm`
- Tests live in `src/__tests__`, run with `pnpm test`
- This project targets Node 20; avoid APIs unavailable before Node 20
- The `db/` directory uses Drizzle ORM; do not write raw SQL strings

@docs/architecture.md

The @ import syntax in Claude Code allows pulling in additional files by reference, which matters for larger projects where conventions span multiple documents. The Anthropic documentation covers this in detail. Claude Code also reads ~/.claude/CLAUDE.md for user-global preferences, giving the system two levels of scope.

The trade-off is predictable. What you put in CLAUDE.md is always present but always consuming tokens. A file listing fifty outdated conventions costs the same as one listing five essential ones. Maintaining these files well, trimming obsolete entries, keeping them accurate as the codebase evolves, is a meaningful part of what context engineering means in practice. The discipline is less about authoring and more about curation.

Semi-Static Indexing: The Repo Map

Aider takes a different approach. Rather than requiring the developer to author a persistent context file manually, it generates a “repo map” automatically using tree-sitter and ctags to extract symbols, function signatures, and class hierarchies from the entire codebase. This condensed structural representation, typically a few thousand tokens, gets injected into every conversation.

The repo map gives the model a navigable overview without loading full file contents. A function that is not in the map will not be in context, but its name, signature, and location will be, which is often enough for the agent to decide whether to fetch the full file. This is closer to a pre-computed index than a context file.

The approach solves a real problem: developers should not have to manually curate what the agent knows about their own codebase. The costs are that the index can go stale between regenerations and that it is harder to encode the kind of soft conventions that only exist in collective team knowledge. The “we stopped doing it that way after the incident” knowledge lives better in CLAUDE.md than in a symbol table.

Dynamic Retrieval: Tools and MCP

The third approach gives the agent tools for fetching context on demand. Modern coding agents, including Claude Code, treat file reading as a tool call rather than a preload. The agent calls list_directory, invokes read_file on files it judges relevant, runs grep to locate usages, and builds understanding incrementally as it works.

This is, functionally, agentic retrieval-augmented generation. The model decides what to retrieve based on the task rather than having retrieval happen at a fixed point before the task starts. Irrelevant files never enter context; the agent only pays for what it actually needs. The cost is that the agent can make poor retrieval decisions, miss relevant files, or spend tokens on tool calls that return nothing useful.

The Model Context Protocol, released by Anthropic in November 2024, standardizes this dynamic retrieval layer. MCP defines a client-server protocol where tools, resources, and prompt templates are exposed by external servers and consumed by agents through a uniform interface. A coding agent connected to an MCP server for a database can request a schema on demand; one connected to a documentation server can retrieve specific pages when needed. The protocol decouples “what the agent can know” from “what was configured at startup.”

Claude Code, Cursor, and several other tools have adopted MCP support. The ecosystem of available servers has grown quickly, which means that a well-configured coding environment can now pull live context from databases, internal wikis, issue trackers, and external APIs, all through the same standardized mechanism.

Context Window Economics

Two structural problems emerge as tasks grow longer.

The first is positional degradation. A 2023 Stanford study by Liu et al. found that language models attend less reliably to information positioned in the middle of a long context compared to information near the beginning or end. For coding agents, this means context ordering matters. Critical project conventions should not be buried in the middle of a 150k-token window behind accumulated tool outputs and file contents. The system prompt position, where CLAUDE.md content lands, is the most reliable real estate in a long context.

The second problem is conversation compaction. As a session runs, tool outputs, file contents, and conversational turns accumulate. Claude Code exposes a /compact command that summarizes the existing conversation and replaces it with the summary, freeing space for continued work. This is lossy compression applied to working memory, and it introduces a failure mode: information that was in context but was not included in the summary is gone, silently. The agent will not know to re-fetch it unless the task makes its absence visible.

Effective context engineering means thinking about both problems together. Information that needs to persist across a long session belongs in a position that survives compaction, either in the system prompt via CLAUDE.md or in a format that summarization reliably preserves. Information that is relevant only transiently can live in the working context window and be safely lost when compacted.

What Tool Authors Are Actually Deciding

The proliferation of context configuration options described in Fowler’s article is not accidental. Each option is a different answer to the same allocation problem: given a fixed window and more information than fits, what gets priority?

Static injection answers: project conventions and standards, unconditionally. Semi-static indexing answers: the structural shape of the codebase, precomputed. Dynamic retrieval answers: whatever this specific task needs, determined at runtime. MCP extends dynamic retrieval to external systems outside the repository entirely.

A well-configured coding environment uses all three layers. CLAUDE.md holds the conventions that every task depends on. A repo map or similar index provides structural orientation. Tool calls handle file-level content retrieval. MCP handles external context like database schemas or internal documentation. Each layer operates at a different timescale of relevance, and each has a different cost profile in tokens.

The shift from “prompt engineering” to “context engineering” captures something real about what has changed. A prompt is composed once for a single interaction. A context architecture is maintained over time, calibrated to the constraints of a specific model working on a specific codebase, and updated as both the model and the codebase evolve. The tooling, still maturing, is only beginning to reflect that distinction clearly.