The Institutional Knowledge Your Coding Agent Can't Absorb

The Martin Fowler piece on context engineering from February 2026 catalogued the explosion of mechanisms available to configure a coding agent’s context: CLAUDE.md files, MCP server integrations, dynamic tool-based retrieval, context compaction strategies. All of that is real and worth understanding. But the hardest part of making a coding agent genuinely useful on a production codebase is not selecting the right retrieval architecture or configuring the right MCP servers. It is answering the question: what would a senior developer know that is not written down anywhere?

That question belongs to a research tradition older than AI coding tools. Organizational researchers have been studying how knowledge is distributed across teams for decades. Michael Polanyi’s framing remains accurate: we can know more than we can tell. A developer who has worked on a system for two years holds knowledge that does not live in the code, the documentation, or the commit history. It lives in their intuitions about which abstractions hold up under pressure, which components have surprising interactions, which conventions look optional but are actually load-bearing. New team members absorb this through code review, pairing sessions, incident retrospectives, and the slow accumulation of context that comes from reading git blame at the right moments.

Language models do not accumulate tacit knowledge through osmosis; they only know what has been made explicit. This creates a gap that no retrieval architecture fully closes, and CLAUDE.md (or whatever equivalent your tool of choice uses) is where you have to try to close it.

What High-Value Entries Look Like

The best entries in a project instruction file share a common structure: they state a constraint, identify the mechanism that makes it non-obvious, and include enough reasoning that the agent can generalize correctly to situations the rule writer did not anticipate.

Compare these two entries:

# Low value
Do not use the `pg` package directly.

# High value
Do not use the `pg` package directly. All database access goes through
`/packages/db`. This wrapper handles connection pool management centrally;
we found in the January incident that connection pools opened outside it
caused exhaustion that was impossible to diagnose across services.

The first version tells the agent what to do. The second tells it why, which matters when the agent encounters an edge case the rule does not cover. An agent that understands the reasoning can recognize when a situation is exceptional and flag it, rather than silently violating the constraint or refusing to proceed.

This pattern draws from architecture decision records, documents that capture a decision alongside the context that produced it, the alternatives considered, and the reasoning behind the choice. The value of a good ADR comes from the context that makes the decision legible over time, not from the decision statement alone. The same applies here. A constraint without context produces compliance without comprehension.

Prohibitions are the highest-leverage category. A list of things the agent must never do, each entry grounded in the incident or principle behind it, provides more value per token than an equivalent length of positive guidance about preferred patterns. Existing code usually signals preferred patterns clearly; prohibitions rarely survive in any form the model can read.

What Low-Value Entries Actually Cost

A common mistake is writing CLAUDE.md like documentation: comprehensive, organized, exhaustive. The impulse is understandable. The effect is counterproductive.

Low-value entries include anything the model can infer from reading the code. If every file in the repository is TypeScript, “use TypeScript for all new files” adds noise without information. If the project uses Prettier with a committed configuration, “format code with Prettier” is inferrable from the configuration file. If every test file follows a naming convention visible in the directory structure, documenting the convention is redundant. These entries do not help the agent; they give it more material to process before reaching the instructions that matter.

The cost is not zero. The “lost in the middle” research from Stanford and UC Berkeley established that language models attend less reliably to information placed mid-context than to information near the beginning or end. In a long CLAUDE.md, a critical constraint buried after several pages of inferrable convention may receive less reliable attention than the same constraint in a shorter file where it sits near the top. Comprehensive files often dilute the signal and bury what matters.

The practical test for any entry: would a competent developer inferring conventions from the code reach the wrong conclusion without this? If yes, the entry earns its place. If no, it is noise.

The Organizational Forcing Function

Writing a good CLAUDE.md is harder than it appears, and that difficulty is diagnostic.

Teams that sit down to write project instruction files frequently discover that they cannot agree on what the current conventions are. The linting rules say one thing; the existing code reflects a pattern from two refactors ago. Two senior engineers hold different intuitions about the preferred abstraction for a shared concern. The prohibited pattern has legitimate exceptions that no one has ever formalized.

This difficulty does not indicate a problem with the format or the tooling. It reveals something about how the team holds knowledge. Uncodified conventions are implicit contracts: everyone roughly follows them, but no one has verified that everyone’s understanding matches. When a new team member deviates from a convention, a code review comment surfaces the disagreement. When an agent deviates, the deviation shows up as generated code that passes tests but violates maintainer expectations.

METR’s early 2026 analysis of SWE-bench patches documented exactly this gap: many patches that passed automated test suites would have been rejected by actual project maintainers, because tests capture functional behavior but not architectural philosophy, idiom preference, or the reasoning behind structural decisions that accumulated over years. The context engineering stack cannot fully bridge that gap, but writing explicit context files at least forces the team to confront how much of their shared understanding is unwritten.

The process of writing CLAUDE.md forces answers to questions that implicit conventions defer: the canonical approach, the legitimate exceptions, the behavior at the boundary of the rule. These questions need to be answered at some point. Context files for coding agents surface the need sooner than it might otherwise become visible, which tends to benefit the team independent of any particular AI tooling.

Maintenance as Engineering Discipline

The most common failure mode for project instruction files is stale content. An entry added when the team used one library persists after a migration to another. A convention documented for one architectural era survives into the next. A prohibition established after an incident becomes gradually less meaningful as the team changes and the incident recedes.

Stale entries are worse than missing entries. A missing entry leaves the agent to infer, and inference from existing code is often correct. A stale entry actively misdirects. An agent told to use a library the team replaced three months ago will generate code that fails to build, and the error is instruction-level rather than logic-level, which makes it harder to diagnose.

CLAUDE.md files need the same discipline as production configuration: reviewed when architectural decisions change, pruned when the conditions that produced an entry no longer hold, committed with meaningful messages explaining why entries were added or removed, and verified periodically by running the agent on representative tasks to confirm that stated constraints are respected in practice.

This last point matters more than it might seem. A CLAUDE.md file that has never been tested is documentation by another name. Running the agent on a task that should trigger a constraint, and observing whether the constraint actually holds, is the difference between a policy and an aspiration. The file should fail in observable ways when it goes stale, which means building in enough coverage of stated rules to notice when something has drifted.

The Gap That Remains

Even a well-maintained, incident-grounded, concise CLAUDE.md does not fully solve the tacit knowledge problem. There is a category of institutional knowledge that resists encoding in any rule: the reasoning behind architectural choices made under constraints that no longer exist, the judgment calls that reflect a philosophy about how the codebase should evolve, the shared understanding that comes from having worked through a difficult refactor together.

The Fowler article frames the expansion of context engineering options as an emerging frontier. That framing is accurate. But the foundational constraint beneath all the tooling is the same one that has always applied to software teams: knowledge that is not externalized is not reliably shared. Context engineering for coding agents makes this concrete in a way that creates pressure to address it. A codebase that has never had a strong forcing function for surfacing institutional knowledge now has one, and the teams that treat that seriously tend to find value in directions they did not originally anticipate.