The Living Document Trick: How Context Anchoring Fights Attention Drift in Long Agent Sessions

There is a class of bug that is specific to working with AI coding assistants in long sessions. You describe your architecture at the start, specify constraints, rule out certain dependencies, and an hour later the model proposes exactly what you ruled out. Not because the message is gone from the context window. It is still there. The model just stopped attending to it with the same weight as more recent exchanges. Rahul Garg’s Context Anchoring on Martin Fowler’s site names this problem and gives it a tractable solution. The solution is simpler than most people expect, and the reason it works goes deeper than prompt engineering.

Why Attention Dilutes Early Decisions

The failure is structural. The transformer attention mechanism (Vaswani et al., 2017) computes attention weights over all previous tokens, and those weights are shaped by recency, proximity, and positional encoding. An early constraint does not disappear from the context window as a session grows. It just gets surrounded by an increasingly large body of more recent text, and its fractional share of the model’s attention shrinks accordingly.

This effect has been studied empirically. Liu et al.’s Lost in the Middle (2023) demonstrated that LLM recall performance degrades for content positioned in the middle of long contexts, even when that content fits comfortably within the window limit. The model attends best to content near the beginning and end. An architecture decision described at message three of a forty-message session is in the worst possible position: it has been displaced from the front by recent context and is nowhere near the end.

The second failure mode is subtler. When the model generates code under a wrong assumption and you correct it, the correction does not fully undo the effect of the wrong version. The wrong code is now part of the conversation history. It modifies the probability distribution for subsequent completions in ways that a single correction message does not fully counteract. The session has been poisoned by its own history.

What Context Anchoring Is

Context anchoring is a discipline, not a feature. The idea is to stop trusting the model’s implicit memory for anything consequential, and instead externalize decisions into a structured document that you actively maintain and re-inject at transition points. Garg calls this the living document.

The minimal form looks like this:

# Session Context
_Updated: 2026-03-17_

## Decisions
- Auth: JWT with 7-day expiry; refresh tokens in httpOnly cookies
- Database: PostgreSQL 15, raw SQL via pgx, no ORM
- Error handling: always wrap with fmt.Errorf("...: %w", err)

## Current Task
Implementing the user registration endpoint.

## Out of Scope
- Email verification flow
- OAuth integration
- Rate limiting (tracked in issue #42)

## Completed This Session
- [x] User model and migration
- [x] Password hashing utility
- [ ] Registration endpoint
- [ ] Input validation

The document records the decision and its reason, not the deliberation. PostgreSQL, not SQLite, because we need full-text search is the right level of detail. The decision is restated at transition points, the document is updated when decisions change, and a stale entry gets corrected immediately because a stale anchor is worse than none. The model will faithfully respect a constraint that no longer applies.

This is not revolutionary technology. It is disciplined state management applied to a context window. The analogy Garg uses is precise: a stateless HTTP server externalizes session state to a cookie or database rather than keeping it in process memory. The context window is your process memory. The anchor document is your store.

Position Zero Is Special

The living document handles session-specific state, but stable project invariants belong somewhere else: the file that gets auto-injected before any user message. Every major AI coding tool has independently converged on this mechanism.

Tool	Mechanism	Injection point
Claude Code	`CLAUDE.md`	Repository root, position zero
Cursor	`.cursorrules`	Workspace-level, prepended
GitHub Copilot	Workspace instructions	Injected before conversation

Position zero is the highest-attention position in the context window. A constraint stated there is more reliably respected than the same constraint mentioned mid-session. The convergence across tools is not coincidence. Every team maintaining these tools ran into the same attention decay problem and arrived at the same architectural response: inject invariants before the user speaks.

For a Go project, a useful CLAUDE.md looks like:

- Never use an ORM; raw SQL via pgx
- Error handling: fmt.Errorf("...: %w", err)
- No external dependencies without discussion
- Build: go build ./...
- Test: go test ./...

The separation of concerns is clean. Static files cover invariants that will not change within this project. The living document covers session state that is evolving. Mixing them makes both harder to maintain.

The Token Budget Reality

Context anchoring also forces a practical accounting of window consumption. The numbers accumulate faster than most people intuit:

System prompt: ~3,000 tokens
A 400-line TypeScript file: ~800 tokens
20 files at 300 lines average: ~120,000 tokens in file content
A complex multi-file task with test cycles: 25,000-40,000 tokens for the conversation itself

At those rates, a 200,000-token context window fills within a moderately ambitious session. The anchor document must stay compact. Its job is to give the model a high-attention, low-budget summary of what is true right now. If it grows into a comprehensive record of everything that ever happened, it has become part of the problem.

How This Scales: From Living Document to RAG to Subagents

The living document pattern works well for a single session or a small project. When the decision corpus grows large enough that it cannot fit comfortably in context, the logical extension is retrieval-augmented generation (RAG): a retrieval index over all decisions, architecture documents, and prior context, with relevant fragments fetched before each response. The conceptual move is identical at both scales: do not rely on implicit memory; make ground truth explicit and queryable.

A third response to the same problem is subagent context isolation. Instead of fighting attention drift in a long session, you scope each subtask into a separate context window. The parent agent accumulates task outcomes, not raw intermediate transcripts. Each subagent’s early context remains fresh throughout its scoped task because the window does not grow stale within a bounded operation. Claude Code’s Task tool implements this; the OpenAI Agents SDK (released March 2025) formalizes it with Agent.as_tool().

The right tool depends on scale:

Living document: one session, small project, manual control preferred
Static injection (CLAUDE.md): stable invariants, maximum attention reliability
RAG: large decision corpus requiring selective retrieval
Subagent isolation: tasks that exceed reliable single-window scope

The Automated Alternative and Its Trade-Offs

Automated context management exists. MemGPT gives the agent read/write access to a persistent memory store and lets the model decide what to store and retrieve as part of processing each message. Context anchoring is the manual, lightweight version of the same concept, where you are the memory manager and the anchor document is your storage layer.

The trade-off is automation versus auditability. A MemGPT-style system is more seamless but introduces failure modes that are hard to inspect. The model’s decisions about what to memorize or forget can be wrong in ways that are invisible until the wrong assumption surfaces several sessions later. A markdown file you maintain manually is always auditable. You can read it, correct it, and verify what the model is being told about the project state.

For most software development work, where decisions are precise enough that a wrong assumption genuinely matters, the manual approach is worth the maintenance overhead.

The Discipline Part

The pattern requires a habit that does not come naturally. When a decision gets made, you update the document immediately. When a subtask ends, you re-paste the relevant section into the conversation before starting the next one. When a constraint changes, you mark the old entry stale and write the new one before continuing.

What the pattern is compensating for is the mismatch between how LLM conversations feel and how they actually work. A long session feels like a coherent dialogue with a participant who remembers everything that was said. It is actually a series of completions over an increasingly diluted context window, where the relative attention weight of any given message decreases as the session grows. Managing that gap explicitly is the engineering work that Garg’s article formalizes.

The name matters as much as the technique. Naming a pattern makes it teachable. A team that agrees to maintain a session anchor document has a shared vocabulary for a class of problem that previously required each person to rediscover informally. The living document is one file with one job: keep the model’s understanding of current state accurate enough that the next completion is useful.