The Position Problem: Why Static Instructions Fade During Long Agent Sessions
Source: martinfowler
The context engineering article published on Martin Fowler’s site in February 2026 named something practitioners had been building around for a while without quite articulating. Going back to it now, one of the most useful frames it offers is positioning context engineering as information architecture: the ongoing discipline of deciding what information a coding agent sees, in what form, and, critically, where that information sits in the context window.
That third dimension, position, gets less attention than content, and it shapes more of the tool design in this space than most developers realize.
The Shape of Long-Context Recall
The empirical foundation here is a 2023 paper from Stanford and UC Berkeley: “Lost in the Middle: How Language Models Use Long Contexts”. The finding is that model recall follows a U-shaped curve as a function of position. Content near the beginning of the context window and content near the end is retrieved more reliably. Content in the middle, under realistic task conditions, is measurably less accessible.
The effect holds across model families and context lengths. Longer windows stretch the curve but do not flatten it. More capable models shift the absolute performance numbers upward, but the shape persists.
For a document that gets loaded once and queried once, this is a solvable problem: important material goes at the top and bottom. For a coding agent running a 30-step session, it becomes a dynamic constraint. Information at position 0 on step 1 may be in the middle of the window by step 15. A 200k context window fills faster than most developers expect: twenty files at 300 lines each consume roughly 120,000 tokens in file content alone, before accounting for system prompts, assistant turns, and tool call results. A TypeScript monorepo task that seems modest can push accumulated context past 60,000 tokens within 10 to 15 steps.
Instructions that were at the front of a fresh window are now in the middle of an active one. The model is not ignoring them; they are competing against much more material, and the U-shaped curve assigns them less effective weight than they had when the session started.
Why CLAUDE.md Has Structural Privileges
The project instruction file pattern converged independently across tools: CLAUDE.md in Claude Code, .cursorrules in Cursor’s older versions and .cursor/rules/ in v0.43 onwards, .github/copilot-instructions.md for GitHub Copilot, AGENTS.md for OpenAI agent frameworks. Every major coding assistant reached the same solution without coordination, which suggests the problem it solves is genuine and the solution space is constrained.
These files are loaded and injected before the first user message, giving them position 0 in every session. More critically, when Claude Code compacts a session that is growing toward context limits, the project instruction files are re-injected at the front of the refreshed window. The conversation summary comes after them.
Instructions added as chat messages during the session do not share this privilege. They get absorbed into the compaction summary at the same priority as everything else in the conversation, reduced in specificity, and placed after the re-injected instruction files in the refreshed context. Instructions that should survive long sessions belong in the static file before the session starts, not in the conversation itself.
Even with this structural advantage, CLAUDE.md does not fully escape the position problem. A 3,000-token instruction file at the front of a fresh window is at position 0. At step 40 of a complex refactoring session, with 50,000 tokens of accumulated tool results between it and the current turn, it is somewhere in the early-middle of a long context. The re-injection on compaction helps, but it is a periodic reset rather than a continuous guarantee.
Hooks as Position-Independent Enforcement
Claude Code’s hooks system addresses this at the execution layer. Hooks are shell executables that fire in response to agent lifecycle events: PreToolUse fires before any tool call executes, PostToolUse fires after. A hook that exits non-zero blocks the tool call.
The operational difference from CLAUDE.md is that hooks enforce behavior at the execution layer, bypassing the reasoning layer entirely. Position in the context window does not affect a hook because the hook runs regardless of what the model decided to do. The model cannot drift away from a hook-enforced constraint the way it can drift away from an instruction buried in the middle of a long session.
A concrete case: ensuring lint runs after every file modification. Put “run ESLint after editing TypeScript files” in CLAUDE.md, and the agent follows it reliably in step 3. In step 35, after 40,000 tokens of context have accumulated, compliance depends on how prominently that instruction registers against everything else the model is currently attending to. A PostToolUse hook that runs ESLint deterministically and injects the output as context has no such dependency:
#!/usr/bin/env bash
input=$(cat)
tool=$(echo "$input" | jq -r '.tool_name')
if [ "$tool" != "Write" ] && [ "$tool" != "Edit" ]; then exit 0; fi
file=$(echo "$input" | jq -r '.tool_input.file_path')
lint_output=$(npx eslint "$file" --format compact 2>&1)
if [ $? -ne 0 ]; then
echo "ESLint found issues in $file:"
echo "$lint_output"
fi
This maps to a familiar pattern in software design: advisory behavior belongs in configuration; invariants belong in enforcement mechanisms. Pre-commit hooks in git, middleware in web frameworks, and capability-based security in operating systems all operate on the same logic. The enforcement is position-independent because it is structural, not attentional.
Cursor’s background language server integration reaches a similar property through a different mechanism. Running tsserver or rust-analyzer continuously and surfacing exact semantic facts, such as all call sites of a given method, provides information that bypasses the need for the model to search what it read 20 steps ago. Ground-truth facts injected at the moment of relevance do not depend on whether the original source file is still near the front of the attention window. The SWE-agent paper from Princeton made the underlying principle explicit: the design of the agent-computer interface, its tools and how they inject information, determines capability more than raw model intelligence.
Dynamic Retrieval and the MCP Layer
The Model Context Protocol, standardized by Anthropic in November 2024, formalizes the dynamic retrieval layer using a JSON-RPC 2.0 interface over stdio or HTTP. From the model’s perspective, an MCP tool call is structurally identical to a built-in tool call; both append results to the context at the current position.
This addresses the position problem for fast-moving information directly: retrieve it at the moment it is needed, placing it near the current turn rather than at a position established 30 minutes ago. A CLAUDE.md entry documenting a database schema reflects what the schema looked like when someone wrote it. An MCP server connected to the actual database returns the current schema at query time.
The division between static instruction files and MCP retrieval maps to information lifetime:
- Project conventions, prohibited patterns, architectural constraints, and build commands are slow-moving. They belong in CLAUDE.md, re-injected at the front of each session and each compaction reset.
- Database schema state, CI status, issue contents, and external API responses are fast-moving. They belong in MCP retrieval, fetched at the moment they are needed.
Putting fast-moving data in CLAUDE.md is not just inefficient; stale context actively overrides what the model can correctly infer from querying the live system. The static file confidently states something that is no longer true, and the model has no way to know which source to trust.
The Codebase as a Context Source
The Fowler article extends the frame past configuration files to the codebase itself. Every file the model reads during a session is context. Dead code, deprecated patterns left after incomplete migrations, commented-out implementations, and outdated inline comments are all signals the model reads as evidence of convention. If a migration from one library to another left old query patterns in 15 files, the model has statistical evidence in those files that the old approach is acceptable in this codebase.
This reframes routine maintenance. Completing strangler-fig migrations promptly, removing dead code, deleting outdated comments: these reduce noise in the context the model assembles from reading the codebase. The question shifts from “is this code causing bugs” to “is this code teaching the model something wrong.” For AI-assisted development workflows, code that accurately reflects current conventions has value that extends beyond compilation.
Naming consistency carries similar weight. Retrieval systems find by similarity; humans disambiguate from surrounding knowledge. If handleSessionExpiry and expireUserSession do the same thing in different modules, semantic search for one may or may not surface the other. Inconsistent naming creates retrieval gaps that produce no error signal; the model works with what it found, and silent omissions in context are harder to diagnose than explicit failures.
Writing What Belongs in CLAUDE.md
Given the structural properties of the file, the practical question is what content merits its limited token budget.
The baseline filter: would a capable developer working from standard training data get this wrong without it? “Use TypeScript for all new files” is unnecessary if every file in the codebase is already .ts; the model infers this correctly from what it reads. “Do not instantiate a database connection outside /packages/db” is necessary if nothing in the code structure makes that constraint visible and violating it would cause problems that are not immediately apparent from the code alone.
Rationale matters more than rules. “Do not use X” gives the model a constraint. “Do not use X because Y” gives the model a generative principle it can apply to cases the explicit rule did not anticipate. This is the same logic behind architecture decision records: recording the reasoning, not just the decision, prevents future participants from confident reinvention of what was deliberately rejected.
The file deserves the same maintenance discipline as the codebase. Reviewed in pull requests, updated when architectural decisions change, kept short enough that every line changes model behavior on some nontrivial class of task. A CLAUDE.md that accurately described last year’s architecture is worse than none at all; it confidently overrides what the model would correctly infer from the actual current code. Stale instructions do not just fail to help; they actively misdirect.