· 6 min read ·

The Two-Tier Context Architecture That Every AI Coding Tool Independently Built

Source: martinfowler

Rahul Garg’s context anchoring, published as part of Martin Fowler’s series on reducing friction in AI-assisted development, describes a practice: externalize the decisions from your AI session into a living document, maintain it explicitly, and re-inject it at the start of each new session. The technique is sound and the explanation is clear.

What the article does not say, but what becomes obvious once you look at how the major AI coding tools are built, is that this pattern has already been implemented at the project level by every significant tool in the space. Claude Code, Cursor, GitHub Copilot, and Aider each shipped a mechanism for injecting stable project context before any user message. None of them coordinated this. They each arrived at the same architecture.

Context anchoring is what formalizes the second tier that all of those tools left to developers to figure out on their own.

The Static Tier

Every major AI coding tool includes a mechanism for project-level context injection:

  • Claude Code reads CLAUDE.md at the repository root (and optionally per-directory) before any session begins
  • Cursor reads .cursorrules (now cursor.rules) and makes it available to every query
  • GitHub Copilot reads .github/copilot-instructions.md and appends it to every request
  • Aider generates a repo map: a structural skeleton of the codebase showing signatures without bodies, supplemented by any custom conventions file

The content differs across tools, but the structure is identical. Project invariants, coding standards, dependency constraints, naming conventions, and things the model must never do are written once to a file that travels at the front of every session. The file serves as a persistent, high-attention prefix.

That placement matters mechanically. The Lost in the Middle paper from Liu et al. (2023) demonstrated empirically that language model recall degrades for content placed in the middle of long contexts. Models attend most reliably to content near the start and end of the context window. Content at the very beginning of a session, before any user turns accumulate, is in the highest-attention position that exists. Every tool puts their persistent context there.

A well-designed CLAUDE.md for a Go service might look like this:

# Project Context

## Build and Test

go build ./… go test ./…


## Conventions
- Error wrapping: `fmt.Errorf("context: %w", err)` always
- Database access through the repository layer only; no direct queries in handlers
- Use `pgx/v5` directly; no ORM

## Hard Constraints
- Do not add new third-party dependencies without explicit instruction
- Generated files in `internal/gen/` are never edited manually
- `internal/auth/` requires a security review before changes

This is the static tier. It encodes the invariants of the project, the things that are true across every session, every developer, every task. It gets committed to version control, treated as code, and reviewed when it changes.

The Gap the Tools Left Open

The static tier handles project invariants, but sessions are not just about invariants. They accumulate decisions. You specify that the current feature excludes email verification. You agree to use bearer tokens for this particular endpoint. You defer rate limiting to a later sprint. These decisions are real and consequential within the session, but they are not project invariants. They belong to this task, this week, this scope.

None of the tools handle this automatically. Claude Code does not maintain a session log of decisions. Cursor does not track what you scoped out. Copilot does not record what you deferred. At the end of a session, or after sixty messages within one, those decisions exist only in the conversation history, distributed through hundreds of tokens at varying distances from the current response.

This is the dynamic tier, and it is what context anchoring formalizes. Garg’s living document is the manual implementation of the session-level equivalent of what CLAUDE.md provides at the project level.

A minimal session anchor carries the structure the static tier already established, applied to the current task:

# Session Context
_Updated: 2026-03-17_

## Current Task
User registration endpoint. Returns 201 on success, 409 on duplicate email.

## Active Constraints
- No email verification this sprint; deferred to issue #47
- Auth method: bearer tokens, not session cookies
- Response format matches the existing `/users/{id}` shape

## Deferred
- Rate limiting (#52)
- OAuth integration (post-launch)

## Done This Session
- [x] User model and schema migration
- [x] Password hashing utility (bcrypt, cost 12)

The document is not long. It does not summarize the conversation. It records the decisions and their scope boundaries so that the model can attend to them at a consistent position rather than fishing for them in the middle of a long context.

Designing Each Tier

The convergence across tools is informative, but the practical question is what goes in which tier. The failure mode of conflating the two is common: project invariants end up mixed with session-specific state in the same document, the document grows, the signal-to-noise ratio drops, and the anchoring stops working.

A useful heuristic is temporal stability. If a constraint has been true across three consecutive sessions without modification, it belongs in the static tier: CLAUDE.md or its equivalent. If it has held for a quarter unchanged and represents a significant architectural choice, it belongs in a proper Architecture Decision Record. The session anchor is for decisions that are real but transient: decisions made today for this task that may not survive the next sprint.

Another design consideration is the token budget. A 200,000-token context window sounds generous until you account for the actual composition of a working session. A 400-line TypeScript file costs roughly 800 tokens. Twenty files at 300 lines each add up to around 120,000 tokens. A multi-file feature with tests can consume 25,000 to 40,000 tokens of active context. The session anchor competes for space with the actual work. Every redundant entry in the anchor is a token that could be paying attention to the code.

Density in the anchor is a virtue. Record the decision and its scope, not the reasoning that led to it. The reasoning belongs elsewhere, either in comments, commit messages, or an ADR if it warrants that level of permanence. An anchor entry like:

- Auth: bearer tokens, no session cookies (aligns with existing API contract)

is sufficient. A paragraph explaining the session in which that decision was reached adds length without improving the model’s ability to apply the constraint.

The Scaling Path

Manual context anchoring is the right starting point, but it is a point on a spectrum. The living document is, structurally, a manually curated retrieval system. You are selecting the relevant context from your project’s decision history and prepending it to each session. This is exactly what retrieval-augmented generation does, except that RAG does it automatically from a larger corpus.

For a solo developer working on a single service, a living document is sufficient. The decision corpus is small enough to curate by hand, and the overhead of building a retrieval system outweighs the benefit. For a larger team accumulating dozens of architectural decisions across multiple services, the manual approach stops scaling. A retrieval index over the decisions corpus, fetching relevant fragments before each query rather than requiring the developer to preselect them, is the natural evolution.

Beyond that, long-running agent sessions that exceed what any single context window can reliably hold benefit from a different structural response: subagent decomposition. Tools like Claude Code’s Task tool and OpenAI’s Agent.as_tool() (released with the Agents SDK in March 2025) allow breaking complex tasks into subtasks, each with its own fresh context window. The anchoring problem does not go away, but it resets at the start of each subtask rather than accumulating indefinitely. The parent agent collects task outcomes rather than raw transcripts, keeping the coordination layer compact.

What the Convergence Tells You

The fact that four independently developed tools arrived at the same architecture, without coordinating, provides reasonable confidence that the architecture reflects something real about the problem. The problem is not tool-specific; it is a consequence of how transformer attention distributes over long sequences. The solution, in each case, is to externalize state that the model cannot reliably maintain internally and inject it at a predictable, high-attention position.

Context anchoring at the session level is the same operation at a finer granularity. The tools handle the static tier automatically; the dynamic tier requires the developer to implement the same pattern manually. A session anchor is not a different kind of artifact from a CLAUDE.md file. It is the same idea applied to a shorter timescale, covering the decisions that are too transient to commit and too important to leave distributed across a conversation that the model is increasingly failing to attend to with full weight.

Was this interesting?