How Coding Agents Edit Files and Find Their Way Around a Codebase
Source: simonwillison
The basic architecture of a coding agent is well documented. Give a language model access to tools, let it call them, feed the results back, repeat until done. Simon Willison’s guide on agentic engineering patterns covers this clearly. What gets less attention is the layer below: the specific implementation choices inside the tools that coding agents use most, and what those choices mean for reliability.
The two most consequential tools in any coding agent are file editing and codebase navigation. Both seem straightforward; the implementation choices involved have real consequences for how well the agent performs.
Three Ways to Edit a File
Full-file rewrite is the simplest approach. The model produces the entire file content, scaffolding writes it atomically. Implementation is a single file write. No string matching, no diff parsing.
The failure modes are predictable. For large files, the model regenerates content it was never asked to touch, often introducing subtle changes to formatting or whitespace on unchanged lines. Token cost scales with file size. Beyond a few hundred lines, models start pulling content from imprecise memory rather than the actual file, hallucinating lines that are close but not quite right. This approach works as a fallback for short files; it degrades badly for anything longer.
String replacement is what Claude Code uses for its primary Edit tool. The model provides old_string, new_string, and a target file path:
{
"file_path": "/absolute/path/to/file",
"old_string": "exact text to find in the file",
"new_string": "replacement text",
"replace_all": false
}
Scaffolding performs a literal string match, replaces the first occurrence (or all occurrences with replace_all), and writes the file back. If the string is not found, the tool returns an explicit error. If the string appears more than once and replace_all is false, the tool rejects the call rather than making an ambiguous edit.
This is precise and deterministic. The model cannot accidentally overwrite lines it did not intend to touch. Self-correction works well: the model gets a clear error, adjusts the string, and retries.
The brittleness is proportional to that precision. A single incorrect character breaks the match. Models that summarize code rather than quoting it verbatim fail consistently on this tool. Trailing whitespace, different line endings, or any difference between what the model recalls and what the file actually contains causes a silent mismatch. For files read many turns ago, the probability of quoting accurately drops as the model’s attention shifts to more recent context.
SEARCH/REPLACE blocks with fuzzy matching is Aider’s approach. The model emits structured blocks in its response:
<<<<<<< SEARCH
old code here
=======
new code here
>>>>>>> REPLACE
Scaffolding tries an exact match first. On failure, it falls back to Levenshtein distance to find the closest approximation of the SEARCH block in the file. Minor whitespace differences and slightly misremembered variable names no longer cause hard failures.
The resilience comes with reduced auditability. Fuzzy matching is not deterministic: the user cannot tell from the output whether scaffolding found the intended location or a nearby one. In files with repeated patterns, boilerplate test setup, similar class definitions, a fuzzy match picks the wrong location more often than you would expect. Aider reports low-confidence matches explicitly, but it remains a non-deterministic operation where string replacement is deterministic.
Most production agents implement at least two of these three and choose based on context. Full rewrite for new files or short ones, string replacement as the default, fuzzy diff as a fallback when repeated string replacement failures suggest the model is misquoting the target.
Four Ways to Navigate a Codebase
Before editing anything, an agent needs to know what to edit. Reading the entire codebase upfront is too expensive in tokens and time. Reading nothing leads to edits built on false assumptions. The four navigation strategies used in production each make a different tradeoff.
Iterative grep-and-glob is Claude Code’s default. The agent starts from a known anchor, a failing test, an error message, a file mentioned in the task, and traces outward through imports and references. Each step is a tool call; each result narrows or expands the search area. This reliably locates relevant code and requires no upfront infrastructure.
Tracing through five layers of imports in a moderately large codebase takes fifteen to twenty tool calls before the model has enough context to edit confidently. For tasks requiring a broad surface area, this latency compounds across the session.
Repository maps are Aider’s approach. At session start, Aider parses the entire codebase using tree-sitter to extract function signatures, class definitions, and method names from every source file. No code execution, just syntax tree analysis. The resulting map runs 1,000 to 8,000 tokens and gets included in every prompt, giving the model a structural overview before any search.
Aider trims this map dynamically as conversation history grows and the context budget shrinks, prioritizing files touched recently in the session. This is an explicit design tradeoff: as you work deeper into a task, you see less of the broader codebase. The assumption is that later edits are more likely to be local to what you have already been working on.
Embedding-based retrieval is the approach Cursor and GitHub Copilot use. Source files are chunked, embedded with a text-embedding model, and stored in a vector index. At query time, the agent retrieves the top-K semantically similar chunks. This finds conceptually related code even when naming conventions differ from the query, and it scales to codebases too large to represent in any prompt.
The infrastructure requirement is the barrier. Local tools like Aider run entirely on your machine. Embedding search requires maintaining a synchronized vector index and an embedding model. Retrieval also introduces false positives: code that shares domain terminology with the query appears in results regardless of whether it is actually relevant to the task.
LSP queries give exact go-to-definition and find-all-references results for typed languages, without the false positives that plague text search on common identifiers. Claude Code exposes this as an LSP tool. For TypeScript or Go codebases, it is substantially more precise than grep: you get exact definition sites, not every file containing a common string.
The practical limitation is language server availability. Tree-sitter has parsers for most languages and works anywhere. LSP requires a working language server configured for the project. Well-configured projects benefit significantly; others fall back to grep.
Why These Choices Compound
A typical agent task involves several file reads, multiple edit attempts before a successful one, a few shell commands, and a verification pass. In SWE-bench evaluations, top systems average 20 to 30 tool calls per resolved issue.
Each implementation decision compounds across those calls. An agent using string replacement gets precise error messages on every failed edit. One using fuzzy matching might silently apply an edit to the wrong location and produce broken code without an obvious failure signal. An agent with a repository map has structural codebase context from the first prompt. One using iterative grep builds that context over 15 tool calls.
Princeton’s SWE-agent project formalized this as the Agent-Computer Interface, by analogy with Human-Computer Interface: tool design shapes model behavior. Deliberately designed tools with line numbers, surrounding context, and explicit error messages improved resolve rates by several percentage points over naive bash access. On SWE-bench Verified, where top systems score 50 to 60 percent, that margin is not small.
The same underlying model with different tool implementations lands at different points on that benchmark. The model is a fixed component. The implementation of the tools around it is where the engineering happens, and it is where the meaningful differences between coding agents come from.