Context Engineering Is the Discipline Coding Agents Actually Needed

Martin Fowler’s site published a piece back in February 2026 on context engineering for coding agents, and a few weeks later it still reads as one of the cleaner framings of what has quietly become the most important engineering problem in this space. The terminology shift from “prompt engineering” to “context engineering” is not marketing. It reflects a genuine change in what practitioners are actually doing.

Prompt engineering, as most people practiced it, was about finding the right words to elicit a desired response from a model. It was largely a single-turn problem. Context engineering is different in kind. It is about designing the entire informational environment an agent operates in across a multi-step task, where each step potentially transforms what context is available and relevant.

For coding agents specifically, the challenge is acute. A codebase is a large, highly structured, deeply interdependent artifact. The relevant context for any given task might span a type definition three directories away, a migration file, a README explaining an architectural decision, and a configuration value buried in an environment variable. No model can see all of that simultaneously, so the system around the model has to decide what to surface, when, and in what form.

What lives in the context window

Think of the context window as a working scratchpad. Everything the model can reason about has to fit in it. For coding agents, this scratchpad gets filled by several distinct sources: the conversation history, the tool call results (file reads, shell output, search results), injected system content like project conventions, and any memory from prior sessions.

The window is finite, and the filling process is largely deterministic. What the agent “knows” at any given moment is exactly what made it into that window. This creates a design constraint that every agent framework has to solve: how do you get the right information into the window without flooding it with noise that dilutes the signal?

This is where context engineering diverges from prompt engineering. You cannot solve this by rewording things. You need architectural decisions: which files to index, which tool schemas to expose, what to summarize versus preserve verbatim, and when to compact conversation history.

The CLAUDE.md hierarchy

Claude Code’s approach to this problem is file-based, hierarchical, and composable. At the top level, a ~/.claude/CLAUDE.md file holds global conventions that apply to every project. At the project level, a CLAUDE.md in the repository root holds project-specific context: the tech stack, coding conventions, test commands, deployment notes, anything the agent should carry as background knowledge. Subdirectory CLAUDE.md files let you add context that is scoped to specific parts of the codebase.

The /init command generates a starter CLAUDE.md by analyzing the repository structure, reading package manifests, and inspecting the existing code. It produces something like this:

# Project Context

## Tech Stack
- Node.js 22, TypeScript 5.4
- PostgreSQL 16 via Drizzle ORM
- Vitest for testing

## Commands
- `npm run dev` - start development server
- `npm test` - run test suite
- `npm run db:migrate` - apply pending migrations

## Conventions
- All database queries go through the repository layer in `src/repos/`
- Error handling uses Result types from `neverthrow`
- No raw SQL outside of migrations

This file gets injected at the top of the system prompt on every invocation. The agent does not have to discover these conventions through tool calls; they are present from the first token. That matters because tool calls consume context and take time. Front-loading stable, slowly-changing knowledge as system content is cheaper than re-deriving it each session.

The hierarchy also allows for different abstraction levels. A monorepo might have a root CLAUDE.md with global conventions, service-level CLAUDE.md files with service-specific notes, and component-level files for particularly complex subsystems. Each layer composes with the others. The agent sees all of them, ordered by specificity.

The rest of the ecosystem converged on the same pattern

Claude Code is not alone here. Cursor moved from .cursorrules to a .cursor/rules/ directory structure, which allows multiple rule files scoped by glob pattern. Cline uses .clinerules. GitHub Copilot introduced .github/copilot-instructions.md. Aider uses a CONVENTIONS.md or aider.conf.yml.

Every major coding assistant independently arrived at file-based, repository-committed context configuration. This convergence is meaningful. It suggests the pattern is solving a real constraint, not just reflecting one vendor’s design taste.

The practical implication is that context configuration is becoming a first-class repository artifact. It lives in version control, gets reviewed in pull requests, and evolves with the codebase. A new team member cloning the repository gets not just the code but the agent configuration that makes the agent useful on that code. That is a non-trivial shift in how teams think about onboarding and knowledge transfer.

Dynamic context: tools as context generators

Static files handle what you know in advance. The harder problem is the context that only becomes relevant as a task unfolds: which files the agent needs to read, which tests are failing, what the current database schema looks like.

This is where tool design matters enormously. Each tool call the agent makes is both an action and a context injection. When the agent calls a read_file tool, the returned content enters the context window. When it runs npm test, the output enters the window. The tool schema, the output format, and the verbosity of results all shape what the model has to reason with.

Poor tool design produces context bloat. A search_codebase tool that returns 50 matching lines when 5 would do is not neutral; it is consuming context budget that could hold more relevant information. This is why some agent implementations have moved toward summarizing tool results before injecting them, or returning structured metadata instead of raw content where possible.

The tradeoff is fidelity versus efficiency. Summaries are cheaper but lossy. Raw content is faithful but expensive. Where you sit on that tradeoff depends on the task: exploratory navigation benefits from summaries, while precise edits require the verbatim source.

Memory across sessions

Context engineering gets more complicated when you need continuity across sessions. Within a session, the conversation history is the memory. Across sessions, you need something external.

Claude Code handles this with a compaction mechanism: when a conversation grows long, the system can summarize the prior turns into a compressed representation and replace the full history with that summary. This buys more room in the window for new tool calls without discarding all prior context. The summary is lossy, but for many tasks the high-level arc of what was decided and why matters more than the exact sequence of tool calls.

For longer-lived agent tasks, some frameworks have moved toward explicit memory stores, where the agent writes structured notes to a file or database at the end of each session. These notes become part of the injected context in subsequent sessions. This is essentially the CLAUDE.md pattern applied to episodic memory: persistent, file-based, injected at startup.

What this means for teams

The shift toward context engineering changes what it means to work well with a coding agent. Getting good results is no longer just about writing clear task descriptions. It involves maintaining the CLAUDE.md files, auditing what context the agent actually has when things go wrong, and designing tool integrations that produce clean output.

Teams that treat context files as an afterthought are leaving significant capability on the table. An agent running against a well-maintained CLAUDE.md with accurate stack documentation, up-to-date command references, and explicit architectural constraints produces meaningfully different results than one running blind.

The tooling for introspecting agent context is still immature. Most agents will show you what they’re doing but not give you a clean view of what is currently in the window, in what proportion, and which parts are actually influencing outputs. That observability gap makes it hard to iterate systematically on context quality. It is the next problem worth solving, and whoever builds the right debugger for agent context windows will have built something genuinely useful.

For now, the discipline is mostly manual: keep the config files accurate, watch what the agent reads before it acts, and treat context budget as a resource to be managed rather than a problem to ignore.