· 7 min read ·

From CLAUDE.md to Repo Maps: How AI Coding Tools Solve the Context Problem

Source: martinfowler

Martin Fowler’s piece on knowledge priming identifies something most AI coding users learn through repetition rather than intention: the quality of output is determined mostly by context provided before you type the first prompt, not by how that prompt is phrased.

That observation holds up well, but it leaves the more practical question open. The tools in the ecosystem have each taken architecturally distinct approaches to the same problem, and those differences shape what good practice actually looks like in each environment.

The Underlying Problem

When you send a question to a coding assistant without establishing context, the model works from whatever it learned during pretraining plus the tokens you just submitted. It does not know your project uses zod for validation but banned joi three years ago. It does not know your database connection lives in lib/db.ts and must never be instantiated directly in route handlers. It does not know you migrated off class components and consider any suggestion that includes one a regression.

So it produces code that is technically coherent and contextually wrong. You correct it. It adjusts. You correct it again. Eventually you get something usable, but you have spent several exchanges reconstructing context the model should have had from the start.

Fowler frames this as the difference between reactive and proactive context management. Reactive: you explain your codebase mid-conversation, repeatedly, one session at a time. Proactive: you explain it once, systematically, and it is available for every session automatically. The ROI comparison is straightforward once you think about it across a team rather than a single developer.

The Architectural Split

The tools have arrived at meaningfully different implementations of proactive context.

Claude Code uses CLAUDE.md, a markdown file in the repo root read automatically at session start. You can layer them: a global ~/.claude/CLAUDE.md for environment preferences, a repo-level file for project conventions, and per-directory files for module-specific rules. A practical repo-level file tends to look something like this:

# payments-service

## Architecture
Monorepo. Services in /services, shared libs in /packages.
DB access only through /packages/db — never raw pg client imports elsewhere.

## Commands
- Build: `pnpm build`
- Test: `pnpm test --filter=payments-service`

## Conventions
- Use zod for all input validation
- Typed errors from /packages/errors only, caught at route level

## Do Not
- Do not use dotenv directly; use the config package
- Do not add new npm dependencies without updating this file

The explicit, human-readable format means the context is auditable and version-controlled alongside the code.

Cursor takes a different approach. .cursorrules (and the newer .cursor/rules/ directory with glob-scoped rules introduced in 0.43) provides the static layer, but Cursor also indexes the entire codebase with embeddings and does RAG retrieval at query time. When you ask a question, it pulls semantically relevant chunks from your codebase without requiring you to specify which files. You can also reach explicitly with @filename or @Codebase. The tradeoff is that you do not always know exactly what the model is working from on any given response.

Aider is the most explicit of the group. You manually add files to context with /add and remove them with /drop, so you always know what the model can see. Its signature feature is the repo map: a compressed structural skeleton generated from the codebase showing file names, class names, and function signatures without bodies, included automatically in every prompt. Rather than retrieving code by similarity, it gives the model an architectural overview and lets the model determine what it needs. For codebases under a few hundred files, this approach often outperforms RAG outright.

GitHub Copilot added .github/copilot-instructions.md in 2024, which slots into every Copilot Chat context in a repository. Beyond that, Copilot relies heavily on implicit context from open tabs and recently edited files. This is convenient but difficult to reason about precisely, which makes systematic coverage harder to verify.

Continue.dev offers the most configurable model. Context providers are a first-class concept: @codebase for embedding-based retrieval, @git diff, @terminal, @docs, and custom TypeScript providers you can write yourself. The configuration lives in config.json per project, making the full context strategy auditable. It also supports a reranker pass after initial retrieval to improve precision before tokens hit the context window.

Why RAG for Code Is Harder Than It Sounds

The tools betting on embedding-based retrieval are solving a genuinely difficult problem. Code relevance is structural, not just semantic. Two functions can share a domain (say, user authentication) without being contextually related for a specific task. The function that matters for your current change might be three hops away in the call graph from any superficially similar code.

Standard cosine similarity on dense embeddings struggles with this. Identifier names often carry the most signal, and exact string matching via BM25 regularly outperforms dense vector search for cases like “find the function named handleSessionExpiry.” Current best practice is hybrid retrieval: BM25 plus semantic embeddings, merged with a reranker. This is what Continue.dev supports and what several internal implementations approximate.

TreeSitter parsing has become a common chunking strategy as well. Rather than splitting code at arbitrary token boundaries, you split at function or class boundaries, so retrieved chunks are semantically coherent units. The improvement in recall is significant enough that it has become near-standard in serious implementations.

Aider’s repo map sidesteps retrieval quality problems by taking the structural route. If every function signature is visible in the map, the model can identify which files contain relevant code and reason about what to examine. The cost is that the map occupies token budget even when large portions are irrelevant to the current task. For large repos this becomes expensive; for mid-size codebases it tends to work well.

SWE-bench results, which measure AI agents solving real GitHub issues, show a 40-60 percentage point difference in task completion rates between agents with good contextual coverage and those working from minimal context. Context quality matters more than context quantity; giving the model ten relevant functions outperforms giving it one hundred loosely related files.

Where You Put Things Matters

One aspect that receives less attention: position within the context window affects how well information is used. The “lost in the middle” paper from Stanford and UC Berkeley demonstrated that LLMs perform measurably worse on information placed in the middle of long contexts compared to the beginning or end. The effect is real enough to influence how you structure priming files.

For CLAUDE.md and similar files, this means the highest-value content belongs early. Prohibited patterns, unusual architectural choices, and non-obvious constraints are most effective at the top, not after lengthy architecture descriptions that push them toward the middle. The practical rule: if a constraint needs to be respected in almost every response, put it first.

This also argues against extremely long context files. A concise file where every item is high-signal, placed at the top of the injected context, performs better than an exhaustive catalog where critical rules are buried after pages of less important material.

Context Files Need Maintenance

Fowler mentions the staleness problem briefly. It deserves more emphasis.

A CLAUDE.md that accurately described your architecture in January but has not been updated after three quarters of refactoring is not neutral. It actively misdirects the model, which will generate code consistent with the stated architecture rather than the actual one. Stale context is often worse than no context because it is confident and wrong.

This means context files need to be treated as code, not documentation. They should be reviewed in pull requests, updated when architectural decisions change, and owned by the team rather than maintained by whoever wrote the first version. Some teams add a checklist item to their definition of done: does this change require updating the context file? The overhead is small relative to the cost of every subsequent AI interaction producing slightly wrong suggestions.

The analogy to CI configuration is useful. You would not let your pipeline config drift from the actual project requirements. Context files belong in the same category of infrastructure artifacts.

What Actually Belongs in a Context File

Based on how the tools work mechanically, the highest-signal content includes:

Prohibited patterns. Things the model would otherwise generate that are wrong for your project. Explicit prohibitions cut correction loops faster than anything else. If you have banned a library, a pattern, or an approach, state it clearly.

Non-obvious conventions. Deviations from common practice. If you follow standard conventions the model already knows, specifying them adds noise. If you have made unusual choices, document those specifically.

Architectural entry points. The files or modules that matter most for understanding the codebase, not a complete listing. The load-bearing files that anyone modifying the system needs to understand.

Build and test commands. Practically useful and often gets the model to produce runnable suggestions without asking.

Dependency choices. Which library to use when multiple options exist. “Use zod not joi for validation” saves a back-and-forth.

What to leave out: style preferences enforced by the linter, conventions the model handles correctly by default, and granular rules that generate noise without adding constraint. The test is whether a capable new developer reading the file would learn something non-obvious about the codebase. If yes, it belongs. If it is standard practice, skip it.

The Convergence

Despite different implementations, the tooling ecosystem has arrived at the same core conclusion from different directions: the model needs project-specific context, and it needs it reliably, not reconstructed from scratch in every conversation. The choice between CLAUDE.md, RAG retrieval, structural repo maps, or explicit file pinning is secondary to whether the team actually maintains good context coverage.

The correction loop that frustrates most AI coding users is not primarily a model quality problem. It is a context problem. The model is capable enough; it just does not know your codebase. Treating context setup as infrastructure rather than a one-time setup task is what changes the calculus from tool novelty to sustained leverage.

Was this interesting?