The Three Lifetimes of Coding Agent Context

Back in February, Martin Fowler’s team published a piece noting that the options available to configure and enrich a coding agent’s context had “exploded,” with Claude Code leading and others following quickly. That framing is accurate, but it frames context engineering as a space problem: what do you put in the window, and how much of it fits.

For coding agents doing multi-step autonomous work, that framing is incomplete. Context also has a time dimension. Different pieces of information have different lifetimes, different freshness requirements, and different behaviors as a session progresses. Ignoring that dimension produces setups that work fine for short interactions and break down for anything longer.

Why Agents Are Different From Assistants

An interactive coding assistant responds to a prompt with the full conversation history in context. The session is short, the scope is narrow, and the context window is mostly stable. A coding agent running an autonomous task over twenty or thirty steps is doing something categorically different. It reads files, executes commands, calls tools, accumulates observations, and makes decisions based on what it found. The context at step 25 is a product of prior actions, not just initial configuration.

This matters because language models do not treat all positions in a long context equally. The “lost in the middle” paper from Stanford and UC Berkeley documented measurably worse recall for information placed in the middle of a long context compared to the beginning or end. The effect holds for current models, including the 200K-token versions. For a static prompt it is a design consideration. For a growing agent context it is a dynamic problem: information that was near the beginning at step 5 may be in the middle by step 25, and the agent’s effective access to it degrades accordingly.

Three Lifetimes

The useful mental model is to classify context by lifetime rather than by content type.

Persistent context is information that needs to be available throughout the session and across sessions. Project conventions, architectural constraints, build commands, prohibited patterns. This is what lives in CLAUDE.md for Claude Code, .cursor/rules/ for Cursor, and .github/copilot-instructions.md for Copilot. These files are injected at session start and typically survive context compaction, which is what Claude Code does when the conversation approaches its token limit: it summarizes the accumulated exchange but re-injects the static instruction files into the compressed context. The implication is that constraints stated in CLAUDE.md remain enforced through compaction in a way that constraints established mid-conversation do not.

Ephemeral context is information that is needed right now, for the current tool call or reasoning step, and is not useful to carry forward. The current contents of a file being edited. The output of a shell command. A database query result that established a fact the agent needed to confirm. This context should be fetched on demand and allowed to fade. Loading it up front or holding it unnecessarily burns token budget and increases the amount of irrelevant material the model navigates around.

This is the core reason Model Context Protocol matters more for agents than for interactive assistants. MCP defines a standard interface for connecting agents to external systems via JSON-RPC 2.0 over stdio or HTTP transports. An agent with a Postgres MCP server does not need the database schema pre-loaded in the system prompt. It queries the schema when relevant, gets a fresh, accurate result, uses it, and moves on. The context is valid at the moment of use. A static version of the same schema loaded at session start would be stale the moment anyone runs a migration, and the agent would not know.

The MCP server ecosystem that grew through 2025 covers GitHub, linear issue trackers, filesystem access, browser automation, CI systems, and dozens of domain-specific integrations. From a context lifetime perspective, these are all mechanisms for replacing pre-loaded stale context with on-demand fresh context. That is a meaningful architectural shift, not just a feature expansion.

Transient context is the most overlooked category. It is the intermediate reasoning, the observations accumulated during a task, the half-formed conclusions the agent reaches between tool calls. Unlike persistent context (which outlasts sessions) and ephemeral context (which is valid for one step), transient context is valid for this task but not necessarily useful to preserve afterward.

The compaction problem hits transient context hardest. When Claude Code compacts a long conversation, it summarizes. What survives is what the summarizer determines is important: conclusions reached, files modified, decisions made. What gets dropped is the reasoning path that led there, the dead ends, the observations that turned out to be irrelevant. Usually this is fine; you want the conclusions, not every intermediate step. But it means an agent relying on transient context established twenty steps ago may lose access to it at compaction time, without any explicit signal that this happened.

Practical Implications

Thinking in lifetimes changes what you put where.

For persistent context, the goal is density and accuracy. Every line in CLAUDE.md competes for attention throughout the session. Stale or irrelevant content is not neutral; it adds noise the model must filter. A constraint that was accurate last quarter but has since been superseded will actively misdirect the agent. Persistent context files need to be treated as live infrastructure, reviewed when architectural decisions change, not as documentation that gets written once.

# Good: specific, non-inferrable, likely to persist
Do not use the `pg` package directly. All database access goes through
`/packages/db`. This prevents connection pool fragmentation; we found this
out the hard way in the January incident.

# Weak: inferrable from reading the code, adds noise
Use TypeScript for all new files.

For ephemeral context, the goal is precision. An MCP tool that returns five thousand tokens of raw JSON when the agent needed three fields wastes token budget and increases compaction pressure on everything else. Designing MCP server responses to be concise and structured is context engineering at the tool layer. A well-designed tool returns exactly what the agent needs in a form the model can use efficiently.

For transient context, the goal is knowing when to promote it. If an agent discovers something important during a task, something that will matter in future sessions, there needs to be a mechanism to write it down before the session ends. Claude Code supports a persistent memory directory (~/.claude/memory/) for exactly this purpose. The equivalent in other tools is often a custom MCP server that writes to a local file or database. The pattern matters more than the implementation: important observations need to move from transient to persistent before they disappear.

The Compaction Window

One concrete implication of all this that rarely gets discussed: if you are running a long agent task, the information you need the agent to consistently respect belongs in the static instruction file, not in early conversation turns. An explicit reminder in conversation about a constraint the agent should follow works at step 3. By step 40, after compaction, that reminder may be summarized away. The same constraint in CLAUDE.md is re-injected after compaction.

This is not a speculation about edge cases. It is the predictable consequence of how context compaction works. The session-level instruction and the project-level configuration file are not equivalent. They have different lifetimes, and the lifetime of the project-level file is longer.

Where the Discipline Is Going

The Fowler article’s observation that context engineering is becoming a competitive differentiator reflects real movement in the tooling. Claude Code’s explicit context compaction UI, Cursor’s project-scoped rules in .cursor/rules/, Continue.dev’s typed context providers, the growing MCP server ecosystem, all of these represent tools being built with context lifetime as a first-class concern rather than an afterthought.

What lags behind is developer mental models. Most guidance on context setup focuses on what to write, not when that content will be consumed or how long it will remain effective. The teams getting the most out of autonomous coding agents are the ones who have stopped treating context as static configuration and started treating it as infrastructure with states, transitions, and maintenance requirements. The tools are ready for that level of engagement. The practices are catching up.