Context Position Is Architecture: The Attention Problem Inside Your Coding Agent

Most discussion of context engineering for coding agents focuses on what information to include. The more interesting, and underappreciated, question is where that information ends up in the model’s attention.

Martin Fowler’s February 2026 piece on context engineering for coding agents is a useful taxonomy of the options available: static project files like CLAUDE.md, dynamic tool results via MCP, embedding-based retrieval, conversation compaction. It catalogues the container. What it does not fully address is the physics inside the container, specifically that models do not attend to long contexts uniformly, and the position of information relative to the window boundaries matters in ways that should influence how you engineer your context.

The Research Foundation

In 2023, Liu et al. published “Lost in the Middle: How Language Models Use Long Contexts”, examining how accuracy changes based on where relevant information sits in a long context. The experimental setup is straightforward: given a multi-document context where one document contains the answer to a question, how does performance vary when the relevant document is placed first, last, or somewhere in the middle?

The result is a U-shaped performance curve with a trough toward the center. When the relevant document appears first or last, accuracy is measurably higher than when it sits at position 5, 10, or 15 of a 20-document sequence. The effect holds across model families and context lengths. Longer context windows stretch the curve rather than flatten it; more capable models shift the absolute numbers but preserve the shape.

The implication for systems that depend on long contexts is direct: information position has measurable performance consequences, and treating it as incidental is leaving accuracy on the table.

How Agentic Sessions Compound the Problem

In a coding agent session, the context window is not static. It grows. Every tool call appends its result. Every assistant turn adds reasoning. Every file read extends the token count. Instructions you provided at session start are still there at step 25, but they have been pushed deeper into the middle by everything that accumulated after them.

Consider the structure of a multi-step task in Claude Code. The session opens with the system prompt, then the global and project-level CLAUDE.md files, then the user’s task description. By the time the agent has read ten files and run five shell commands, those instruction files are somewhere in the middle of a 60,000-token context. A critical constraint buried in paragraph three of your CLAUDE.md is now in a well-documented attention trough.

For a moderately complex TypeScript monorepo, retrieved file contents alone can consume 50,000 to 80,000 tokens without much effort, which means the growth from initial instructions to degraded middle territory can happen in fewer steps than developers usually anticipate. The issue is not unique to any specific tool; it is a structural property of how transformer attention distributes across long sequences.

What Compaction Does and Does Not Fix

Claude Code’s /compact command, and the automatic equivalent that fires when the session approaches context limits, summarizes accumulated conversation history to make room. The mechanism is necessary for long sessions, but it introduces its own positional dynamics.

After compaction, the system prompt and CLAUDE.md files are re-injected at the front of the refreshed context. This is the key structural privilege those files hold: they reliably anchor to high-attention positions after every compaction, regardless of how much accumulated history preceded the reset. The distilled conversation summary comes after them, followed by active tool results and the current task.

Instructions delivered mid-conversation do not share this privilege. Even important clarifications or corrections typed into the chat get absorbed into the compaction summary, and they may emerge with reduced specificity and fidelity. The operational consequence is that constraints you care about need to be in CLAUDE.md before the session starts, not added later as chat messages. Not because mid-session instructions will be ignored initially, but because they will not survive the session in the same form that instructions anchored in the instruction file will.

Structure as a Partial Mitigation

One mitigation for position-dependent attention degradation is explicit structural labeling. Anthropic’s guidance for complex system prompts recommends XML-tagged sections:

<project_context>
  <architecture>
    Services communicate via gRPC. No REST between internal services.
  </architecture>
  <constraints>
    <constraint id="no-orm">All queries are raw SQL via pgx. No ORM.</constraint>
    <constraint id="test-first">New functions require unit tests before merge.</constraint>
  </constraints>
</project_context>
<relevant_files>
  <file path="src/db/queries.go">...</file>
</relevant_files>
<task>...</task>

The labeling gives the model named anchors it can reference by label rather than by position. A model reasoning about a database change can attend to the <constraints> section by name. This does not eliminate the position gradient, but it reduces the model’s dependence on positional memory alone, because the model can locate a named section even when it cannot reliably attend to text at an arbitrary byte offset.

For CLAUDE.md content, the ordering convention this suggests is simple: most critical constraints near the top, context and background in the middle, examples and supplementary detail at the end. This matches human reading convention and aligns with the attention distribution the research documents. The instruction “never modify generated files” belongs in the first ten lines, not in a section called “Additional Notes.”

The Subagent Pattern as a Position Reset

Claude Code’s subagent mechanism, invoked via the Task tool, spawns an isolated agent instance with its own full context window. The orchestrating agent describes a subtask; the subagent executes with a fresh context, its own copy of the CLAUDE.md files, and full tool access. The result returns as text to the orchestrator.

This pattern has multiple motivations: parallelism, keeping the orchestrator’s context lean, isolating retrieval for focused subtasks. From a position perspective, it also functions as a reset. Each subtask starts with critical instructions at the front of a fresh 200K window rather than buried somewhere in the accumulated middle of a 60,000-token session. For long-running tasks that might push instructions into degraded attention territory over many steps, decomposing into subagents is a structural solution to the position problem, not just an architectural preference.

The tradeoff is overhead: spawning subagents costs latency and re-reads the full instruction set on each invocation. For short tasks that fit comfortably in a session, it is unnecessary. For tasks that require a dozen or more tool calls, the positional insurance may be worth the overhead.

A Design Principle

Context engineering for coding agents involves two distinct decisions that most developers conflate. The first is what information the model needs. The second is where that information will sit relative to the context window boundaries at the moments the model needs to use it.

Writing a CLAUDE.md file is not documentation work. It is writing text that will compete for model attention across a session that grows and compacts, where information in the middle of long sequences is measurably less accessible than information at the boundaries. Dense, ordered context that places the most important facts first outperforms comprehensive but uniformly structured context. That ordering is architecture, not style preference.

The point that Martin Fowler’s February piece gestures at, but does not fully develop, is that context engineering inherits many of the same concerns as software architecture at large: the structure matters as much as the content, and the runtime behavior under load differs from the static view of the artifact. An instruction file that looks complete on disk may be systematically underutilized in production sessions where context has grown and the critical constraints have drifted toward the middle. Accounting for that drift is part of the job.