Context Engineering: The New Discipline Hiding Inside Your Coding Agent

Back in February 2026, Martin Fowler’s team published a piece cataloguing the explosion of options developers now have for configuring a coding agent’s context. Reading it felt like a taxonomy of things I’d been doing haphazardly for months, finally organized into a coherent picture. It’s worth taking that taxonomy and going deeper on the mechanics, because the underlying ideas are reshaping how we think about working with LLMs on real codebases.

What “Context Engineering” Actually Means

Prompt engineering was always a slightly awkward term. It implied that the work was about crafting clever inputs, finding magic phrases, coaxing a model with the right incantation. Context engineering is a more honest framing. The model’s behavior at inference time is entirely determined by what’s in its context window. Everything else, including fine-tuning and RLHF, happened before you called the API. So the entire discipline of “making models do what you want” at runtime is fundamentally about information architecture: what information do you put in the context window, in what form, and at what point in a conversation.

For a coding agent working on a non-trivial codebase, this gets complicated fast. A repository might have hundreds of thousands of lines of code, decades of architectural decisions encoded in file structure, team conventions documented somewhere in a wiki, and a current task that requires understanding a slice of all of it. No context window holds all of that, even the generous 200K tokens that Claude offers. So every coding agent is making constant decisions about what to include, what to omit, and how to retrieve what’s missing.

The Layers of Context

It helps to think about this in layers, from most persistent to most ephemeral.

Static project memory is the foundation. Claude Code introduced the CLAUDE.md file as a first-class primitive: a markdown file at the root of your project (and optionally in subdirectories) that the agent reads at the start of every session. Cursor has .cursorrules. GitHub Copilot has .github/copilot-instructions.md. Aider has .aider.conf.yml combined with conventions it learns from your codebase. These files are the place to put information that is always relevant and expensive to re-derive: the build system, coding conventions, the meaning of certain directory structures, things the agent should never do (modify generated files, skip tests, etc.).

The design of these files matters more than people initially realize. A common mistake is writing them like documentation, with everything that might ever be relevant. A better approach is to write them like a senior engineer’s onboarding notes to a peer: what would someone need to know in the first five minutes to avoid making a bad decision? What conventions are not obvious from the code itself?

# CLAUDE.md

## Build
Use `just build` not `cargo build` directly. The justfile handles feature flags.

## Testing
Integration tests in `tests/` require a running Postgres instance. Use `just test-unit` for
fast feedback. Never mock the database layer; we've been burned by divergence before.

## Generated files
Anything under `src/generated/` is produced by `just codegen`. Do not edit these manually.

That’s more useful than three pages of architecture documentation.

Dynamic retrieval is the second layer, and it’s where things get interesting. When you ask an agent to implement a feature or fix a bug, it needs more than the project conventions. It needs the specific files that are relevant. Early coding assistants handled this by including entire files, which burned context quickly and often included irrelevant code. Modern agents use a mix of strategies: static analysis to find import graphs, semantic search over embeddings of the codebase, and explicit tool calls to read files on demand.

Claude Code’s approach here is to give the agent tools (file read, directory listing, grep, shell execution) and let it retrieve what it needs. This means the context at any point in a conversation is the result of a series of tool calls the agent made to gather information. The agent is, in effect, engineering its own context dynamically. You can watch this happen in verbose mode, and it’s instructive: a good agent will read the file it needs, then read the files that file imports, then check the test file, building up a picture incrementally rather than dumping everything at once.

Tool results as context is the third layer, and it’s the one that MCP (the Model Context Protocol) was designed to systematize. MCP defines a standard interface for servers that provide tools and resources to LLM clients. Instead of each coding assistant building its own integrations for GitHub, Jira, database schemas, documentation sites, and everything else a developer might need, MCP lets you run a server that exposes these as tools, and any MCP-compatible client can use them.

For context engineering, this matters because it expands what counts as relevant context. When an agent is fixing a bug, the relevant context might include the GitHub issue that describes it, the PR that introduced it, the CI run that caught it, and the database schema that the buggy code operates on. With MCP servers for GitHub, CI systems, and databases, all of that becomes retrievable within the same tool-call framework the agent uses to read files.

The Budget Problem

More context options create a new problem: budget management. A 200K token context window sounds enormous until you fill half of it with file contents, a quarter with conversation history, and find you have 50K tokens left for the actual task. Agents that don’t manage this carefully end up truncating conversation history, dropping earlier tool results, or hitting limits at critical moments.

The research on this is humbling. Studies on long-context LLM performance have consistently found that models struggle with information placed in the middle of a long context, performing better when relevant information is near the start or end. This “lost in the middle” phenomenon means that naively stuffing context with everything you can find may actually degrade performance compared to a more curated selection.

Practical mitigations include:

Summarizing earlier parts of a conversation when they become less relevant
Using structured tool calls to fetch information on demand rather than pre-loading it
Keeping CLAUDE.md files concise, prioritizing information that can’t be retrieved dynamically
Breaking long tasks into subtasks with fresh context windows rather than maintaining one very long conversation

Claude Code has started adding explicit context management features: compacting conversation history, flagging when you’re approaching context limits, and suggesting when to start a new session. This is context engineering becoming a first-class UI concern, not just an implementation detail.

The Comparison Across Tools

Different coding agents make different bets on how to solve the context problem.

Aider takes a file-explicit approach. You tell it which files are in context with /add and /drop commands. This gives you precise control and predictable token usage, but requires you to do the information architecture work yourself. Experienced Aider users develop good intuitions about what to add, but it’s friction that Claude Code and Cursor try to remove.

Cursor embeds a codebase index and uses semantic search to pull in relevant snippets automatically. The .cursorrules file provides the static layer. Cursor’s “Composer” mode maintains a longer multi-step context similar to Claude Code’s approach. The semantic search is genuinely useful for large codebases where you don’t know which files are relevant, though it can occasionally retrieve plausible-looking but wrong context.

GitHub Copilot has historically been more conservative with context, focusing on the immediate file and open editor tabs. The addition of copilot-instructions.md and agent-mode features with workspace context has pushed it toward richer context, though the integration with VS Code’s extension model means context management works differently than in a terminal-first tool like Claude Code.

Claude Code’s bet is on agentic retrieval: give the model good tools and a capable enough model to use them, and it will retrieve the right context on its own. This works surprisingly well in practice, especially for tasks where the relevant context is hard to predict in advance.

What Developers Actually Need to Invest In

Treating context engineering as a first-class concern means a few concrete things.

Write your CLAUDE.md (or equivalent) like code, not documentation. Keep it in version control, review it periodically, and delete things that are no longer true. A stale instruction that tells the agent to use a library you’ve since removed is worse than no instruction at all.

Be intentional about what you expose via MCP. Running an MCP server that gives the agent read access to your database schema is powerful. Giving it write access to production systems in an agentic loop is something to think very carefully about before configuring. The context engineering surface is also an attack surface; prompt injection via tool results is a real concern when agents are reading content from external systems.

Learn your tool’s context window behavior. Know how your agent truncates history, which information it prioritizes, and when to start a fresh session versus continuing a long one. This is the kind of operational knowledge that separates productive agent use from frustrating conversations where the agent seems to have forgotten what you told it three messages ago.

The Fowler article frames this as context engineering “becoming a necessity”, and that’s accurate. The raw capability of the models has reached a point where the bottleneck is often not what the model can do, but whether it has the right information to do it. That’s an information architecture problem, and solving it well is increasingly part of what it means to use these tools seriously.