Minimal Footprint Is the Design Principle Behind Good Subagent Boundaries
Source: simonwillison
The move to subagents starts from two concrete limits in the single-agent tool loop. Context exhaustion is the first: Claude’s 200k-token context window fills faster than people expect in practice. Twenty files at an average 300 lines each consumes roughly 120,000 tokens in file reads before the system prompt, tool call history, or reasoning steps are counted. Add multiple test runs, stack traces, and the re-reading that happens as earlier observations fall below the model’s effective attention range, and a moderately complex refactor fills the window before it finishes. Serial execution is the second limit: tasks that are structurally parallel, ten independent modules that each need test coverage, five service directories that each need documentation updates, are bottlenecked by the single agent’s sequential loop.
Subagents address both. Each runs in a separate context window, does its work, and returns a result to the orchestrator. The parent accumulates task outcomes rather than raw tool call transcripts, staying oriented while each subagent handles deep work in isolation. Simon Willison’s agentic engineering patterns guide on subagents covers the mechanics and introduces one principle that most of the agentic engineering literature treats as a security note: minimal footprint. Give each subagent only the tools it needs for its assigned task.
The security reading is well-founded. A subagent with broad tool access that encounters injected instructions in file contents or external data can cause more damage than one restricted to a specific operation type. The InjecAgent benchmark found GPT-4-turbo succeeded in attacks roughly 24% of the time under single-agent conditions, and that rate compounds across each hop in a multi-agent chain. Restricting subagent tool access is straightforward blast radius reduction.
But minimal footprint as a design principle goes further than security. The tool set you assign to a subagent is a precise statement about what the task actually is. If you can enumerate the tools a subagent needs, the task is well-scoped. If you cannot, it is not.
Tool Access as Task Scope Signal
Consider two versions of the same subagent task.
Version A: “Refactor the authentication module to use the new token format.” Tools assigned: read, write, bash, create_pull_request, send_notification.
Version B: “Update the token validation logic in src/auth/validator.py and src/auth/middleware.py to accept tokens following the format in docs/token-spec.md. Run pytest tests/auth/ and return the test output along with the diff.” Tools assigned: read, write, bash (scoped to the test command).
The second version is not just better specified, it is differently scoped. The tool list forced the refinement. When you ask what tools this subagent needs, you quickly discover that “refactor the auth module” includes implicit tasks (notification, PR creation) that belong to the orchestrator level, not the implementation subagent. The implementation subagent does not need to know what happens after its work passes tests.
This is the architectural value of minimal footprint: it forces separation of concerns at the task level. An orchestrator that manages the PR creation workflow knows about branches, reviews, and CI status. An implementation subagent that knows only about file edits and test execution is simpler, more predictable, and easier to verify. Assigning write access to create_pull_request to the implementation subagent conflates these concerns in ways that surface as hard-to-debug behaviors when something goes wrong.
Three Subagent Profiles
Most subagent tasks in a coding workflow fall into a small number of tool profiles, and identifying which profile a task belongs to is a useful first step in scoping it.
Read-only analysis: The subagent reads files, runs grep or glob queries, maybe executes a read-only command like git log. No writes. It returns a structured finding: a list of issues, a summary of a codebase region, a list of files matching a pattern. These subagents are trivially safe to retry, can run in parallel without conflict, and their outputs can be mechanically validated against a schema. The injection attack surface is smaller because even a compromised read-only subagent cannot modify state.
Scoped write: The subagent reads and writes within a defined set of files or a specific directory subtree, runs tests, and returns a structured result: files modified, test output, diff. The scope constraint is ideally enforced through the tool implementation, not just the prompt. If the write tool accepts a path parameter and you configure it to allow only paths under src/auth/, the subagent literally cannot write to src/payments/ regardless of what instructions it receives.
Verification: The subagent receives a description of what should be true and uses read and execution tools to confirm it. It does not write. It returns a structured verdict: tests pass or fail, invariant holds or not, schema matches or diverges. The orchestrator runs this after incorporating subagent results, treating it as a check against the principal-agent problem: the parent assumed the subagent fulfilled the task; the verifier confirms this before the orchestrator moves on.
The three profiles stack naturally. An orchestrator delegates scoped-write work to implementation subagents, then runs a verification subagent against the combined result before accepting it and proceeding to the PR creation step.
Shared State Is the Hard Case
The minimal footprint principle creates clean boundaries when subagents work on independent things. Shared state is where it strains.
A refactor that changes a shared type definition used across multiple modules cannot be cleanly decomposed into independent subagents, each touching only their module, because the type change is a dependency. Naive parallelization here produces incompatible changes: subagent A updates the type definition and all callers in its scope; subagent B simultaneously assumes the old type definition while modifying callers in its scope. The parent receives two results that cannot both be correct.
The minimal footprint analysis reveals this dependency before it becomes a runtime problem. If two subagent tasks require write access to the same file, they are not independent. The dependency must be sequenced explicitly: the type definition change happens in one subagent first, and the downstream subagents receive the updated definition as part of their context.
This means the orchestrator’s primary job before delegating is dependency graph analysis, not just task enumeration. Which tasks share write access to the same files? Which tasks produce outputs that other tasks consume as inputs? The answers define the sequencing constraints. Tasks with no write-access overlap and no input-output dependencies can run in parallel. Everything else requires ordering.
In practice this analysis is often done informally and incompletely, which is why multi-agent refactors frequently produce merge-conflict-like inconsistencies that the orchestrator has to reconcile after the fact. Building the tool access matrix, which tasks write to which files, surfaces the dependency graph cheaply before execution starts. It is also a good test for whether a proposed multi-agent decomposition is coherent: if every subagent needs write access to the same central file, the decomposition is wrong.
Hierarchy Depth
The orchestrator-subagent pattern implies a two-level hierarchy. The question of when a third level adds value versus adding coordination overhead without proportional benefit is answered the same way.
A coordinator level is warranted when the orchestrator’s work of managing subagent results becomes as cognitively expensive as the subagent work itself. An orchestrator managing twenty parallel subagents, reconciling results, detecting conflicts, and ordering dependent work, may benefit from an intermediate coordinator that handles a cluster of related subagents and presents the orchestrator with a coherent summary. The coordinator absorbs the intermediate context the orchestrator does not need to see.
But adding a level adds a delegation boundary, and each boundary compounds the information compression problem. The orchestrator sees the coordinator’s summary; the coordinator saw the worker’s outputs; the worker saw the file contents. By the time an anomaly in the worker’s file read reaches the orchestrator, it may have been summarized away at two compression steps. The lost-in-the-middle effect documented for long contexts applies at boundary crossings too: information that mattered for the worker’s decision but was not salient enough to include in the summary is gone from the orchestrator’s reasoning.
Debugging scales accordingly. A two-level system has two context trees to traverse when something goes wrong. A three-level system has a tree of trees. The practical heuristic is to start with two levels and add a coordinator only when the orchestrator is spending significant context on result reconciliation that could be delegated, not to design hierarchy preemptively.
Starting from the Agent That Already Exists
Minimal footprint suggests a bottom-up approach: start with a single agent doing the whole task, observe where it consistently fails or stalls, then introduce subagents at those points.
The alternative, designing a multi-agent system upfront, tends to produce subagents whose tasks reflect how humans expect the work to decompose rather than where the model actually hits limits. The model may handle what you expected to be two subagent tasks in one pass, or need to split what you expected to be one task into three. Designing the hierarchy before observing the model’s natural breakpoints adds scaffolding that may not correspond to real constraints.
Running the single-agent version first also produces the minimal footprint analysis as a byproduct: what tools did the agent actually use? Which of those were needed for which portions of the work? That usage trace gives you the natural task boundaries, because each boundary is where the tool set changes. The implementation phase uses read and write. The test phase uses bash. The PR creation phase uses the GitHub tool. These phases are natural subagent candidates with already-known tool sets.
Building multi-agent systems is mostly building good single-agent loops and then deciding where to draw delegation boundaries. Minimal footprint tells you when you have found a real boundary: the tool set changes cleanly, the scope can be stated precisely, and success can be verified without visibility into intermediate steps. When those three conditions hold, you have a well-scoped subagent. When they do not, the task probably belongs in the orchestrator’s loop rather than delegated out.