· 6 min read ·

The Session-Layer Gap That CLAUDE.md Was Never Designed to Fill

Source: martinfowler

AI coding sessions accumulate context loss in two distinct and separate ways. The first is gradual: decisions made early in a long conversation sink into the middle of the context window, where transformer attention is weakest. The second is complete: a new session starts with zero representation of what the previous one established. Rahul Garg’s article on Context Anchoring addresses both failure modes with the same mechanism, externalizing the decision context into a living document maintained alongside the conversation.

The Attention Degradation Problem

The within-session failure has empirical grounding. A 2023 paper from Stanford and UC Berkeley documented that transformer recall degrades significantly for content placed in the middle of long context windows. The effect holds across model sizes and context lengths: models perform reliably when relevant information is near the start or end of the context, and less reliably when it sits in the middle. In a 60-message coding session, an architectural constraint established at message 5 occupies exactly the worst position by message 40. The constraint is still present in the context, but it competes against many tokens generated since, and that competition is not equal.

Context compaction amplifies the problem. Claude Code, Aider, and other coding agents handle context overflow by summarizing earlier turns to free space. The summarization converts precise conversational constraints into paraphrased prose. A specific instruction like “do not touch the migrations directory” may survive compaction as something closer to “the user deferred changes to certain configuration files,” which is a weaker and less specific binding. Claude Code’s CLAUDE.md content is re-injected after compaction, which is why constraints that live in that file survive a compaction boundary while conversational constraints may not.

The between-session failure is not about attention degradation; it is structural. A language model is stateless between API calls. Every new session begins with no knowledge of what was decided in previous sessions unless something explicitly provides that knowledge at the start of the conversation.

The Convergent Evolution of Static Context Files

Many developers who work with a CLAUDE.md or .cursorrules file have already discovered the partial solution to the between-session problem. The file loads at session start, occupying the highest-attention position in the context before a single line of conversation. The model reads the project’s conventions, stack choices, and architectural rules first.

What is worth noting is that every major AI coding tool arrived at this same mechanism independently. Claude Code uses CLAUDE.md; Cursor uses .cursorrules and the .cursor/rules/ directory; GitHub Copilot uses .github/copilot-instructions.md; OpenAI’s agent frameworks use AGENTS.md; Windsurf uses .windsurfrules. These tools were not designed in coordination. The convergence reflects a real structural constraint: stable project-level context belongs at position zero in every session, and the only way to guarantee that is a file loaded before conversation begins.

But the static context file has a scope limit. It is the right place for tech stack choices, architectural conventions, and code style rules that settled months ago and belong to the entire project. It is the wrong place for decisions made during this specific session, for the scope boundaries of the refactor in progress, for the options explicitly ruled out in the last two hours. Committing every in-session decision to CLAUDE.md causes the file to grow session by session. A long file at position zero is attended to less reliably than a short one. The separation between a static project file and a session-level anchor is attention engineering as much as organization.

Context Anchoring as the Session Tier

Context Anchoring fills the session-tier gap with a short Markdown document, maintained during the conversation, capturing four things: standing constraints (which may overlap with CLAUDE.md), decisions made with their rationale, open questions not yet resolved, and the current scope of work. The document is re-injected at significant decision points, pulling it back to the front of the context window and counteracting the middle-window attention problem.

The rationale section matters in a specific technical way. Consider two versions of the same constraint:

## Constraints
- No ORM

versus:

## Constraints
- No ORM
  - We are actively profiling specific queries; the abstraction
    layer obscures EXPLAIN output in ways that complicate optimization

The first version tells the model what to avoid in cases the rule explicitly covers. The second tells the model why, which allows it to apply the constraint correctly to adjacent cases no one anticipated: evaluating a library with a thin convenience layer over raw SQL, deciding whether a batch insert abstraction qualifies, choosing between query builders. The rationale generalizes where the explicit constraint cannot.

This matters more than it might appear because a model trained on public internet data has strong priors toward conventional choices. Your project’s unconventional decisions need explicit justification, or the model quietly reverts them when the literal constraint is silent on the adjacent case. The rationale does real work in every case the rule itself does not cover.

The “Open Questions” section addresses a failure mode that is easy to overlook. Without a place to record unresolved decisions, the model resolves them silently and embeds the resolution as implementation detail. “We have not decided on the pagination strategy” becomes cursor-based pagination in the next function without the choice ever surfacing. An open question recorded explicitly stays open, becoming a prompt for resolution rather than an invitation for unilateral implementation.

The ADR Connection and the Feedback Loop Difference

Context Anchoring is directly grounded in Architecture Decision Records, the format Michael Nygard described in 2011. ADRs capture a decision, its context, the alternatives considered, and the consequences. A context anchor is doing the same work at a finer time scale.

The key difference is the feedback loop. An ADR written for a decision made today may benefit the team months from now when a new engineer encounters the code and wonders why it is structured a particular way. The benefit is real but deferred, and deferred benefits are notoriously hard to maintain discipline around. A context anchor updated now produces a better model response within the hour. The person maintaining the document experiences the benefit immediately, in the same session, which is a fundamentally different incentive structure than anything formal ADR practice has historically offered.

This also means a well-maintained context anchor converts naturally into permanent documentation when a session ends. The rationale is already written. The alternatives ruled out are already listed. Promoting it to a formal ADR or PR description requires removing session-specific scope, not reconstructing reasoning from memory.

Maintaining It

The honest cost of the practice is discipline during the session. Someone must update the document at each decision point. The most reliable approach is to instruct the model to update the anchor before proceeding with implementation. The model makes the edit; you review for accuracy. This adds a few seconds per decision and saves a variable amount of re-explanation when the session drifts or a new one begins. The accounting becomes favorable quickly in sessions longer than an hour, where the alternative is spending the back half of the conversation reorienting the model toward constraints it agreed to early on.

The document should stay short by design. When a decision completes and is no longer relevant to the current session, move it to a permanent ADR and remove it from the anchor. A long anchor is a signal that accumulated context should have been archived.

Plain Markdown is the right format for reasons beyond simplicity. An anchor file committed to the repository travels with the code, not with a vendor’s account. It is readable by Cursor, Claude Code, Aider, Copilot, and any future tool without conversion. Human reviewers can read it. New collaborators can read it without knowing which AI assistant the author used. The context lives in the repository, which is where it needs to be when the next session, the next engineer, or the next tool needs to build on what was decided.

Was this interesting?