· 5 min read ·

How Coding Agents Learn a Codebase Without Reading Every File

Source: simonwillison

The fundamental challenge for any coding agent operating on a real codebase is not intelligence, it is orientation. A model with 128,000 tokens of context can hold roughly 10-15 large source files in working memory at once. A production codebase has thousands. The agent must figure out which files matter for the current task before it reads them, which requires knowing something about the codebase before it has read any of it.

Simon Willison’s guide on how coding agents work covers the four main navigation strategies: iterative grep-and-glob, repository maps, embedding-based retrieval, and LSP queries. The repository map approach, pioneered by Aider, is worth understanding in detail because it solves the orientation problem in a way that reveals something general about how to represent large artifacts efficiently in a context window.

The Cost of Iterative Navigation

The default approach in agents like Claude Code is iterative: start from an anchor, trace outward through imports and references, read each file you find, repeat. Starting from an error message mentioning AuthMiddleware, you read the middleware file, find it imports from jwt.py, read that, notice a dependency on users/models.py, read that too. Five tool calls and several thousand tokens consumed before you have located the right place to write a fix.

This works reliably. The problem is sequential commitment. If the model takes a wrong turn early, following the wrong import or reading a helper module that turns out to be irrelevant, it burns context on dead ends before correcting course. Each wrong turn is not just wasted tokens; it is context that will no longer be available for reading the files that matter. On a moderately large codebase, top coding agent systems average 20-30 tool calls per resolved issue on SWE-bench Verified. A significant fraction of those calls is navigation rather than editing.

The Repository Map

Aider’s repository map is a different allocation of the context budget. Instead of reading files during navigation, Aider uses tree-sitter to parse the entire codebase at session start and extract a compact representation of its structure. Tree-sitter is a parser generator that produces concrete syntax trees for over 100 languages, and Aider uses it not to analyze semantics but to extract surface-level signatures: function names, class definitions, method signatures, and module-level declarations, with file paths and line numbers.

The output for a Python codebase looks something like this:

src/auth/jwt.py:
  def encode_token(payload: dict, expiry: int = 3600) -> str
  def decode_token(token: str) -> dict
  class JWTError(Exception)

src/auth/middleware.py:
  class AuthMiddleware
    def __init__(self, app: ASGIApp, secret: str)
    async def __call__(self, scope, receive, send)

src/users/models.py:
  class User
    id: int
    email: str
    def verify_password(self, password: str) -> bool
    def generate_token(self) -> str

For a 50,000-line codebase, this representation is typically 1,000 to 8,000 tokens. It fits in context alongside the conversation history and still leaves substantial budget for reading and editing. The model can see the shape of the entire codebase without reading a single file in full.

Dynamic Sizing and Relevance Scoring

The map is not a static snapshot. As conversation history grows, context budget shrinks, and Aider dynamically resizes the map to fit. The algorithm prioritizes files that have appeared recently in the conversation or that the model has already read or edited, and deprioritizes files that have had no relevance to the session. A session focused on authentication for 20 turns will show the auth module in full detail and compress unrelated modules down to bare module names or drop them entirely.

The relevance heuristic is straightforward: files mentioned by path in recent messages rank higher, as do files that were recently modified or read. Aider regenerates the map after edits so that newly created functions appear in subsequent turns. This creates a feedback loop where the map tracks the session’s working set without requiring the model to manage any of this explicitly.

What the Map Changes About a Navigation Task

Consider the same JWT expiry task from before, now with a repository map loaded from the start. The model sees in its first turn that AuthMiddleware exists in middleware.py, that User.generate_token exists in users/models.py, and that encode_token in jwt.py takes an expiry parameter. The relevant call path is visible from the signatures alone. The model can form a hypothesis about where the bug lives without any intermediate search calls, then read exactly the right file sections to confirm before editing.

This is the difference in practical terms: the map converts orientation from a sequential search process into a single-pass inspection. Aider typically resolves issues in fewer tool calls than agents relying purely on iterative navigation, which matters because each tool call extends the conversation history and consumes tokens that cannot be used for anything else.

The benchmark support for this is partial and indirect. The SWE-agent paper from Princeton, which introduced the Agent-Computer Interface concept, showed that tool design choices rather than model capability explain most of the performance gap between scaffolding implementations. Navigation efficiency is one of the largest contributors to that gap. Fewer navigation calls means more budget for the editing work that actually resolves issues.

The Limitations

Tree-sitter parsing is syntax-level, not semantic. It can tell you that generate_token exists on the User class and what its signature is, but it cannot tell you whether that method is the one actually responsible for JWT creation or just a thin wrapper around something else. For tasks requiring understanding of data flow across several abstraction layers, the map provides orientation, not understanding; the model still needs to read relevant file sections.

Repository maps also assume the codebase is coherent at the surface level. Projects with inconsistent naming conventions, deeply nested forwarding functions, or heavy use of dynamic dispatch gain less from the map than projects with clear module boundaries and self-documenting function names. The semantic gap between a function’s name and its purpose is invisible to tree-sitter.

The embedding-based approach used by Cursor and GitHub Copilot’s indexing layer handles the semantic gap better. It finds conceptually related code even when naming does not match. The trade-off is query-time retrieval latency, false positives from domain-terminology overlap, and the infrastructure overhead of maintaining a vector index on a changing codebase. The repository map works entirely at parse time with no index to maintain.

The General Principle

The repository map is a specific instance of a broader design problem: how do you represent a large artifact in a form compact enough to fit in a context window while preserving enough structure for the model to reason about which parts are relevant to the current task?

The answer tree-sitter provides is to extract the interface rather than the implementation. Function signatures, class hierarchies, module boundaries, parameter types where they appear at the declaration level. This is the information a developer uses when navigating an unfamiliar codebase for the first time, the same mental model that makes it possible to answer “where does this logic live” without reading every line.

For anyone designing context representations in AI systems more broadly, this points to a reusable principle: prefer representations that preserve addressable structure over representations that preserve content. A symbol map with file paths and line numbers gives the model enough to form a retrieval plan. Full file contents give it more information but at a cost that often exceeds the benefit until the model knows it is in the right place. Compact structure first, detail on demand, is a better use of a fixed context budget than trying to load everything upfront and hoping the model attends to the right parts.

Was this interesting?