The Coordination Tax: Knowing When a Single Agent Is Enough

Spawning a subagent is not free. Every delegation to a fresh context window carries a cost: an additional API call, a context window bootstrapped from scratch, and an orchestrator that must integrate a result without having seen the subagent’s intermediate reasoning. Simon Willison’s guide on agentic engineering patterns lays out the mechanics of how subagents work and what the minimal footprint principle demands. What it does not give you is a decision procedure for when the overhead is justified versus when a single agent would handle the job more cleanly, and that gap is where many multi-agent systems accumulate unnecessary complexity.

The literature on multi-agent systems spends most of its time on how to build them. The implicit frame is that more agents equals more capability. In practice, introducing subagents into a system that would have worked as a single agent adds coordination tax without proportional payoff: more failure modes, harder debugging, and context that gets compressed at every boundary crossing.

The Context Window Arithmetic

A single-agent loop becomes impractical when a task requires reading more source material than fits comfortably in one context window. The numbers here are concrete enough to reason about.

Claude’s context window is 200,000 tokens. In a typical coding agent, this budget breaks down as follows: the system prompt and tool definitions consume 2,000 to 5,000 tokens. Tool call history grows by 200 to 500 tokens per completed round-trip. File reads consume roughly 750 to 1,000 tokens per kilobyte of source code. A 300-line Python file is roughly 4,000 to 6,000 tokens.

If a task requires reading 20 files of that size, the file content alone consumes 80,000 to 120,000 tokens before system prompt, reasoning, and accumulated tool history are counted. At this scale, an agent operating in a single context window will exhaust its budget before finishing, often without a clean signal that context pressure is affecting its reasoning. It might handle 14 files coherently and then start producing degraded output as it loses effective access to observations from the early rounds.

Subagents address this by distributing file reads across isolated context windows. Each subagent handles a bounded scope, finishes, and returns a compact result. The orchestrator accumulates task outcomes rather than raw content, keeping its context manageable.

The practical threshold for introducing subagents on context grounds is roughly 50,000 to 80,000 tokens of source material input, which is approximately 10 to 15 medium-sized source files. Below that threshold, a single agent can usually work through the task without meaningful context pressure. Above it, context-driven degradation becomes likely enough to justify decomposition.

The Parallelism Assessment

The second case for subagents is parallelism: tasks that are structurally independent can run concurrently in separate context windows, reducing total wall-clock time.

The key constraint is shared write targets. Two subagents that need to write to the same file are not independent, regardless of how their tasks are framed. If subagent A updates a shared type definition in types.py and subagent B simultaneously modifies a module that imports from types.py with the old signature, the parent receives two results that cannot both be applied without conflict. This is a task decomposition failure, not a model failure.

Before committing to parallel subagents, build the tool access matrix: for each proposed subagent, list the files it needs to write. Any file that appears in two or more subagent write lists is a shared dependency, and the tasks sharing that dependency must be sequenced. The subagent that owns the shared file runs first; the others receive its output as part of their context.

Many tasks that appear parallelizable have hidden shared dependencies. A “refactor 10 service handlers” task looks parallel until you check whether all 10 handlers import from a common utility module that also needs updating. The parallelism analysis is quick to do on paper and expensive to discover at runtime when you receive conflicting results from two subagents that both modified callers of a type signature that only one of them actually updated.

The Specialization Case

A third justification for subagents is when different phases of a task require genuinely different tool sets. An analysis phase that needs only read access and a different system prompt from an implementation phase that needs write access and test execution is a natural subagent boundary.

This is the minimal footprint principle from the other direction. Minimal footprint says: assign each subagent only the tools it needs. The corollary is: when two phases of a task need different tools, that boundary is a natural split point. You get the security benefit of tool isolation and the organizational benefit of cleaner task scope.

In the Anthropic Python SDK, this looks like nested agent loops, each with a restricted tool list and a specialized system prompt:

def run_analysis_subagent(codebase_paths: list[str]) -> dict:
    tools = [read_file_tool, glob_tool, grep_tool]  # read-only
    system = "You are a code analysis agent. Identify all callers of the deprecated API."
    task = f"Analyze these paths: {codebase_paths}"
    return run_agent_loop(system, task, tools)

def run_implementation_subagent(analysis: dict, target_files: list[str]) -> dict:
    tools = [read_file_tool, write_file_tool, run_tests_tool]  # scoped write
    system = "You are an implementation agent. Update the code per the provided analysis."
    task = f"Update {target_files} based on this analysis: {analysis}"
    return run_agent_loop(system, task, tools)

The split is justified here: the analysis subagent cannot write files even if injected instructions tell it to, and the implementation subagent has no reason to be running broad search operations. Each tool list is an accurate statement of what that phase of work actually requires.

The Debugging Cost

None of the justifications above apply to short, self-contained tasks. A bug fix in a well-understood function, a one-file refactor, a targeted API update across two related files: these do not benefit from subagent delegation. The overhead adds an API round-trip and a fresh context to bootstrap. The orchestrator’s context is consumed by the tool call and result. The subagent’s intermediate reasoning is invisible, making debugging harder if something goes wrong.

The debugging cost is the most underestimated part of the coordination tax. A single-agent failure has one context window to inspect: you can see exactly what the model read, what it decided, and where it went off course. A multi-agent failure requires traversing multiple context windows that were never connected by shared state. A subagent that received ambiguous instructions and produced the wrong result looks, from the orchestrator’s perspective, like a tool result with unexpected content. Tracing back to understand why the subagent made the decision it did requires having logged that subagent’s full context at invocation time.

LangSmith and similar tracing tools address this by maintaining a trace tree across agent invocations with correlation identifiers. Without equivalent instrumentation, a two-level agent system is already substantially harder to debug than a single-agent loop, and difficulty scales with depth. A three-level hierarchy where the orchestrator sees only coordinator summaries and the coordinator saw only worker outputs creates two compression points between the root cause and the place where you are looking.

A Decision Procedure

The decision to introduce subagents should follow from observable constraints, not from a preference for more sophisticated architecture.

Introduce subagents when the task requires reading more material than fits in a single context window without degradation. The rough threshold is 50,000 to 80,000 tokens of input material, which is 10 to 15 medium source files, before accounting for tool history overhead.

Introduce subagents when the task has multiple genuinely independent subtasks, meaning no shared write targets and no producer-consumer dependencies between them. Verify this by building the write-access matrix before decomposing. If every subagent needs write access to the same central file, the decomposition is wrong.

Introduce subagents when different phases require genuinely different tool sets or model configurations. The tool boundary is the natural subagent boundary.

Do not introduce subagents to make the system resemble a team of specialized workers. That framing applies organizational intuition to a technical system where model constraints, not human role specialization, define the actual decomposition points. Start with a single agent, run it, observe where it fails or stalls, and introduce delegation at those specific points. The failure modes of the single agent tell you where the real subagent boundaries are far more accurately than upfront architecture does.

Building a multi-agent system is mostly a matter of building a good single-agent loop and then deciding, based on evidence, where it actually needs help.