What the Model Actually Does with Your CLAUDE.md

Martin Fowler’s site recently published Rahul Garg’s piece on knowledge priming, which makes the case for front-loading project-specific context before asking an AI coding assistant to generate anything. The argument is sound and the practice is correct. What the article does not cover in depth is the underlying reason it works, which turns out to matter quite a bit for how you actually structure these files. If you understand the mechanics, you can make deliberate choices. If you do not, you will write context files that feel complete and still leave the model underperforming.

Why Context Files Work at the Model Level

An LLM processes your entire context window in a single forward pass. When Claude Code starts a session and reads your CLAUDE.md, those tokens are not stored in a separate “instructions” register with privileged access. They are flattened into the same sequence as your actual prompt, the conversation history, and any code the tool has retrieved. The model attends across all of it simultaneously.

This has a concrete implication. The model’s behavior is shaped by the statistical relationship between everything in context, not by a rule engine that applies your stated constraints with guaranteed fidelity. When your CLAUDE.md says “do not use the pg package directly,” the model is not parsing a prohibition and storing it in a blacklist. It is updating its prediction of what a competent developer working on your project would write. If the rest of the context window contains enough evidence that pg imports are normal for your codebase, that prohibition will compete with that evidence and sometimes lose.

This is why reasoning matters in a context file, not just rules. A rule says “do not do X.” Reasoning says “do not do X because Y,” which gives the model a generative principle it can apply to situations the rule did not anticipate. The model predicts what a competent developer would write; a developer who understands the constraint and its rationale makes better decisions at the edges than one who only knows the rule. The context file should provide the same advantage.

The Context Window Is Not Uniform

Transformer attention is theoretically capable of attending to any position in the context with equal weight, but empirical research shows that is not how it behaves in practice. The Lost in the Middle paper from Stanford and UC Berkeley measured performance on tasks requiring recall of information placed at different positions in long contexts. Performance was highest for information near the beginning and end of the context, and measurably lower for information placed in the middle.

The mechanism is not definitively established, but the pattern is consistent enough to be practically significant. For a CLAUDE.md injected at session start, the file’s contents will appear early in the context window, which is favorable. But within the file itself, the same positional gradient applies. A critical prohibition buried after three paragraphs of architecture overview occupies a worse position than if it had been placed first.

The practical implication: whatever constraint needs to be respected in nearly every response belongs at the top of the file. Not after the project overview. Not after the tech stack listing. First. Architectural context belongs after constraints, not before them.

The Signal-to-Noise Problem

There is a mistake that developers who understand the value of priming reliably make: they write too much. The logic is intuitive. More context means the model knows more about your project, which means better output. But the model is allocating attention across the entire context window, and a longer priming file does not automatically mean more useful priming.

Every token in your context file competes with every other token for model attention. A file that contains a hundred entries, sixty of which describe conventions the model already handles correctly from training data, is burying the forty entries that actually matter. You have added noise, meaning content the model already knew, and that noise dilutes the signal of the entries that represent genuine non-obvious constraints.

The test for any candidate entry is whether a capable model working from standard training data would get it wrong without that entry. If the answer is no, the entry is noise. “Use TypeScript for all new files” is noise if the codebase already has a tsconfig.json and every file is .ts. The model will infer this. Writing it down does not change behavior; it just adds tokens that displace something more useful.

High-signal entries look like this:

## Critical Constraints

Do not instantiate a database connection outside /packages/db.
All DB access goes through the pool manager in that package.
This was enforced after a connection pool exhaustion incident in 2024
where connections opened directly from route handlers could not be
diagnosed or limited without a centralized pool.

Do not use node-fetch or axios for HTTP requests. The project
uses the built-in fetch with the wrapper in /lib/http.ts, which
handles retry logic, timeout configuration, and request tracing
uniformly. Inconsistent HTTP clients caused tracing gaps that made
a production outage take four hours to diagnose instead of one.

Low-signal entries look like this:

## Style Guidelines

Use meaningful variable names.
Keep functions focused on a single responsibility.
Write clear comments for complex logic.
Follow the principle of least privilege for permissions.

The low-signal version describes practices every competent developer follows and that the model will apply without being told. These entries consume budget without changing behavior.

Token Budgeting Is a Real Constraint

Context windows have grown substantially. Claude’s context is now measured in hundreds of thousands of tokens, and GitHub Copilot works within its own context budget when assembling what to send the model. This might suggest that token efficiency in your priming file is not a meaningful concern.

The reality is more nuanced. A very long CLAUDE.md competes with the actual code the model needs to see to do useful work. When Claude Code is investigating a bug or generating a non-trivial feature, it needs to read relevant source files, understand related modules, and hold the task context simultaneously. Every token consumed by priming content is a token not available for that code.

The tradeoff becomes visible on larger codebases with complex tasks. A 10,000-token CLAUDE.md on a 200,000-token context budget looks manageable until you add in the retrieved source files, the conversation history, and the output space the model needs. On a task that requires reading a dozen non-trivial files, the priming file’s share of the budget is no longer negligible. The right size for a context file is the size where every entry changes behavior on some nontrivial class of prompts. Garg’s article recommends systematic priming, which is correct, but the discipline should include pruning, not just addition.

RAG vs. Static Priming

Cursor and Continue.dev take a different architectural approach to the same problem. Rather than requiring developers to maintain a static context file, they use embedding-based retrieval to pull relevant chunks of the codebase into context at query time. A developer asking about authentication code gets authentication-related modules retrieved automatically. The context is dynamic rather than declared.

This approach has real advantages. It does not require manual maintenance, it scales with codebase size without growing a static file, and it can surface code the developer did not think to reference explicitly. For general-purpose contextual awareness of what already exists in the codebase, RAG-based retrieval is often more complete than any static file a developer would write.

The limitations are specific but significant. RAG retrieval for code is harder than it looks because code relevance is structural, not just semantic. Two functions can share a domain without being relevant to a particular task. The function you need might be three hops away in the call graph from anything superficially similar to your query. Standard cosine similarity on dense embeddings handles semantic similarity reasonably well and structural dependency poorly; hybrid retrieval combining BM25 lexical search with dense embeddings, plus a reranker pass, is closer to production quality, but it still cannot find what is not in the codebase.

More critically, RAG retrieval does not capture the things most valuable in a priming file: prohibitions, incident history, the reasons a library was abandoned, architectural decisions not visible in the current code. These are not in the codebase. No retrieval system can surface them because they are not stored anywhere retrieval can reach.

The practical answer is that static priming and RAG retrieval are complementary rather than alternatives. The static priming file captures organizational context: decisions, prohibitions, reasoning, non-obvious constraints. RAG retrieval captures structural context: which code is relevant to the current task. Claude Code uses static priming via CLAUDE.md. Cursor uses both .cursorrules (or the newer .cursor/rules/ directory with glob-scoped rules introduced in 0.43) for static constraints and embedding retrieval for codebase search. Continue.dev makes the combination explicit through its context provider configuration in config.json. GitHub Copilot reads .github/copilot-instructions.md for static instructions and uses open-tab context for implicit retrieval. The tools are converging on a hybrid model because neither approach alone is sufficient.

The Staleness Problem Is Worse Than It Looks

Garg’s article notes that priming files need to be maintained as the codebase evolves. The severity of this failure mode is worth examining carefully, because stale context is not neutral.

When a model receives a context file containing a stated constraint, it treats that statement as authoritative over what it might otherwise infer from the code. The stated constraint competes with and often overrides the model’s interpretation of what it actually sees in the files. This is normally a feature: you want the model to follow your explicit conventions even when they deviate from common practice. But it becomes a liability when the stated constraint describes a system that no longer exists.

A CLAUDE.md that describes your 2023 module boundaries, written by a developer who left the team in mid-2024 and not updated through two subsequent architectural refactors, is not neutral documentation. It actively misrepresents the system. The model will generate code consistent with the stated architecture, which may conflict with the actual code at every module boundary it touches. You then need to correct it, and because the stated constraint is explicit, the model may produce inconsistent output as it tries to reconcile what the file says with what the code shows.

Stale context is often harder to debug than no context, because the problem does not look like missing information. It looks like the model being confident and specific about something wrong. The correction loops are also resistant to resolution in-session: you can tell the model the architecture has changed, but if your CLAUDE.md is still read at the start of every subsequent session, the problem recurs.

The structural fix is ownership and explicit triggers for updates. A change that refactors a module boundary should include a CLAUDE.md update in its definition of done. An architectural decision that deprecates a library should be accompanied by removal of any context file entry still recommending it. The .github/copilot-instructions.md, the .cursorrules, and the CLAUDE.md all need the same treatment as the CI configuration: reviewed in the pull request that makes them obsolete, not later.

Putting It Together

A well-maintained priming file is short, high-signal, reason-annotated, and positioned with the most critical constraints first. It describes things the model cannot infer from the code, not things it can.

Here is what a high-signal .cursorrules looks like against a low-signal version covering the same project:

// LOW SIGNAL: verbose, redundant, no reasoning

This is a TypeScript React application. Use functional components.
Follow React best practices. Use hooks for state management.
Keep components small and focused. Use CSS modules for styling.
Write unit tests for all new components. Use async/await for
asynchronous operations. Handle errors appropriately.

// HIGH SIGNAL: specific, non-obvious, annotated

Stack: Next.js 14 app router, React Query v5, Zod, Prisma.

Critical:
- Server components by default. Only add 'use client' for browser
  APIs or event handlers. We had a bundle size regression from
  overusing client components; this is enforced in code review.
- All API calls go through /lib/api.ts, never raw fetch in components.
  This ensures request deduplication and error boundary compatibility.
- Zod schemas from /lib/schemas for all form validation.
  Do not write inline validation logic.

Not enforced by config but required by team convention:
- Mutations use React Query useMutation with optimistic updates.
  Server actions are not used; the team decided against them in ADR-012
  due to caching complexity with our CDN setup.

The low-signal version describes generic React conventions the model knows well. The high-signal version tells the model things it could not know: that server actions are specifically off the table, that there is a designated API module, that the bundle size incident shaped component boundaries. Every entry changes behavior on some class of prompt that the model would otherwise get wrong.

The knowledge priming practice Garg describes is worth adopting seriously. Understanding why it works at the model level, where the budget constraints are real, and where the failure modes live means you can build context files that actually perform rather than ones that feel thorough but deliver noise.