Before the First Edit: How Coding Agents Orient Themselves to a Codebase
Source: simonwillison
Every coding agent runs the same basic loop: send a message to the model, dispatch tool calls, append results, repeat. Simon Willison’s guide on how coding agents work covers this structure clearly. What the loop description elides is everything that happens before the model makes its first move, specifically how the agent orients the model to the codebase it is about to work in.
This orientation step is the context loading problem, and the three approaches that major coding agents take to it explain more about their behavior in practice than the loop mechanics do.
The Problem
A model starting a coding task knows nothing about the codebase. It needs to understand where files are, what they contain, how components relate to each other, and which files are likely relevant to the task at hand. There are two broad ways to acquire that knowledge: load it upfront before the first tool call, or acquire it on demand through the tool loop.
Each approach encodes a bet. Loading upfront assumes the model needs a map before it can plan effectively. Loading on demand assumes the model can navigate effectively without a map, discovering what it needs as the task develops. Neither assumption holds universally, which is why different agents have converged on different answers.
Aider: Upfront Symbolic Loading
Aider generates a repository map before the model sees the user’s task. The map is built from tree-sitter parse output across all tracked files: it lists every symbol (function, class, method, variable) alongside its file path and a small amount of surrounding code context.
A representative slice looks something like this:
src/auth/session.ts:
class SessionManager:
constructor(config: SessionConfig)
createSession(userId: string): Promise<Session>
expireSession(sessionId: string): void
src/auth/middleware.ts:
function authMiddleware(req, res, next): void
function requireRole(role: Role): Middleware
For a medium-sized project with a few hundred files, the complete map runs to roughly 5,000-15,000 tokens. Aider pays this cost at the start of every session, before the user’s first message even reaches the model.
The payoff is reduced turn count on tasks requiring cross-file awareness. A model with the repo map already loaded can see that expireSession exists in SessionManager and is referenced from middleware.ts before making a single tool call. It can form a plan, identify all the files that need changes, and execute them without an exploratory phase. For tasks touching multiple interrelated modules, this compression of the exploration phase is meaningful: fewer turns, lower latency, less risk of the model losing the thread across many iterations.
Aider also pairs the repo map with model-specific edit formats. For models that handle unified diffs reliably, it uses the udiff format; for models where whole-file replacement produces cleaner output, it uses the whole format. The choice happens automatically based on which model is active. This detail does not appear in any description of the agent loop, but it affects edit quality considerably, since LLMs vary substantially in their ability to produce syntactically valid diffs without off-by-one errors or whitespace mismatches.
The limitation is upfront token cost. Single-file tasks, tasks in small repositories, and tasks where relevant code is concentrated in one module all pay the repo map overhead without benefit.
Cursor: Semantic Retrieval
Cursor maintains a continuous embedding index of project files and retrieves semantically relevant chunks per prompt. When a task description arrives, it queries the index, pulls the highest-scoring file chunks, and injects them into context before handing off to the model. The model starts with what the retrieval system judges to be relevant, not a symbolic overview of the entire codebase.
This scales differently than the repo map approach. For a large codebase, indexing every symbol is not viable; token cost grows linearly with codebase size and can exhaust the context window before the model processes the task. Semantic retrieval keeps initial context proportional to the task’s semantic footprint rather than to repository size, which makes it practical for codebases where a full repo map would crowd out everything else.
Cursor also has access to signals no standalone agent can replicate: the currently open file, the cursor’s position within it, recently edited files, and recently viewed files. These IDE-context signals supplement semantic retrieval with behavioral evidence about what the developer is working on right now, giving the retrieval step a head start that static embedding similarity alone cannot provide.
The failure mode is missed context. Semantic similarity does not reliably surface code that is architecturally related but terminologically distant from the task description. A bug in a public API method that traces back to a private utility function whose name shares no terms with the failure message may not appear in the retrieval results. Cursor mitigates this by allowing the model to issue explicit search calls during the loop, but the quality of the initial retrieval determines how often that fallback is needed and how many extra turns it costs.
Claude Code: Explicit Tool Exploration
Claude Code starts with almost no codebase context. The system prompt describes available tools and behavioral guidelines; the tool definitions explain how to navigate and modify files. The model is expected to orient itself by using those tools, the same way an engineer new to a codebase would read the directory structure, search for relevant symbols, and open files based on what they find.
The available tools are Read, Write, Edit, Bash, Glob, Grep, WebFetch, and Task for sub-agent delegation. The Edit tool takes old_string and new_string pairs rather than line numbers, a design decision that follows directly from the context mechanics: if the model reads a file in turn five and decides to edit it in turn fifteen, line numbers may have shifted due to intermediate edits, so string matching against existing content is more robust than line-indexed replacement.
This approach allocates context budget based on what the task requires rather than a fixed upfront investment. A task touching one function in one file may require reading two or three files total, keeping context manageable. A broad refactoring task will use more turns for exploration, but the model can prioritize which files to read based on what it discovers, rather than loading everything at the start.
The cost is exploratory overhead. The first several turns of any Claude Code session tend to be orientation: listing directory contents, searching for relevant symbols, reading files that look promising. Each of those turns consumes a model request. For tasks where Aider’s repo map would have immediately identified the relevant symbol locations, Claude Code uses several turns to acquire equivalent information.
The permission model adds a second dimension. Claude Code separates its tools by consequence: Glob, Grep, and Read are read-only; Edit and Write mutate files; Bash executes commands. This distinction is partly a safety boundary and partly behavioral specification, making the consequences of different actions visible in the tool schema so the model can incorporate them into its planning.
Where Each Approach Wins
SWE-bench, which measures coding agent performance on real GitHub issues against project test suites, provides a concrete comparison surface. Top agents consistently score in the 45-55% range on the Verified subset, with the gap between approaches narrowing as underlying models improve. The benchmark does not decompose cleanly by context loading strategy, but the task structures that prove hardest correlate with the failure modes of each approach.
Tasks requiring broad cross-file awareness, where a fix involves modifying an interface and updating all callers, tend to benefit from the repo map approach. The model can plan the full scope of changes before making any, rather than discovering new files to edit mid-task. Tasks in large codebases where most of the code is irrelevant to any given task tend to benefit from semantic retrieval; the upfront context stays small and the model receives only what matters. Tasks with unpredictable relevance patterns or strong dependency on execution feedback tend to suit the explicit exploration model, because the model can verify its understanding by running tests rather than relying on static analysis from a map or retrieval index.
None of these approaches wins on all task types. The structure of the work, how many files it touches, how semantically coherent the relevant code is, and whether correctness requires running code, determines which context loading strategy produces the best results.
Building for a Known Codebase
For teams building a custom coding agent against a specific codebase rather than using an existing product, the context loading decision cannot be deferred. The tool loop is well-documented in every LLM provider’s API and takes a few dozen lines to implement. How the model gets oriented before its first action is the decision with lasting consequences for token costs, turn counts, and task reliability.
A practical starting point for most custom agents is a lightweight hybrid: a compact repo map covering just the public interfaces of core modules, combined with explicit tools for deeper investigation on demand. This gives the model enough orientation to plan without the full upfront cost of symbolizing every file in the project, and preserves the flexibility to follow the task into unexpected corners of the codebase.
Aider’s documentation on repository maps covers the tree-sitter-based symbolic loading approach in detail. The Anthropic tool use documentation and Claude Code tool reference are practical references for the exploration-first model. The loop is the same in all of them; what the model sees before it starts is where the architectural differences live.