The loop before everything else
Every coding agent runs the same skeleton. The model receives a list of messages and a list of tool descriptions. It responds with either a final answer or one or more tool calls. If it calls tools, the host process executes them, appends the results to the message list, and calls the model again. The cycle continues until the model stops calling tools.
Here is that skeleton in Python using the Anthropic SDK:
import anthropic
client = anthropic.Anthropic()
messages = [{"role": "user", "content": task}]
while True:
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=8096,
tools=tool_schemas,
messages=messages,
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
break
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = dispatch(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({"role": "user", "content": tool_results})
The OpenAI version uses finish_reason: "tool_calls" and a tool_calls array; Gemini uses functionCall parts inside a STOP response. The loop logic is identical across all three providers. What differs between a minimal demo and a production coding agent is everything surrounding that loop: which tools exist, how they are described, and how the message list is managed as it grows.
What the tool schema is
Each tool is described as a JSON Schema object. For the Anthropic API:
{
"name": "read_file",
"description": "Read the contents of a file. Prefer this over bash cat. Returns truncated output for large files.",
"input_schema": {
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "Absolute path to the file."
},
"offset": {
"type": "integer",
"description": "Line number to start reading from."
},
"limit": {
"type": "integer",
"description": "Maximum number of lines to return."
}
},
"required": ["file_path"]
}
}
The description fields are not documentation for the developer; they are part of the model’s effective prompt. The sentence “Prefer this over bash cat” in that description changes model behavior. The description field is where you encode usage policy, safety constraints, and behavioral nudges. Claude Code’s published system prompt instructs the model to prefer dedicated tools over shell equivalents, and that instruction appears both in the system prompt prose and in the individual tool descriptions. Both layers reinforce each other.
Simon Willison’s guide to agentic engineering patterns frames the tool schema as the primary design surface of a coding agent, and that framing is correct. The loop itself is five lines of code. The design work lives in the schema.
The standard toolkit
Most coding agents converge on the same set of primitives:
- Read, write, and edit tools for file system access
- Glob for pattern-based file discovery
- Grep for content search
- Bash for everything else: running tests, compiling, checking git status
The Edit (or str_replace) tool deserves specific attention because it is a more principled design than write_file for most modifications. Rather than sending the full file content back to the model and having it reproduce everything with changes, Edit takes an old_string and a new_string. The model identifies and restates only the exact lines being changed. Context budget is preserved, and the tool fails if old_string is not found or not unique in the file. That failure mode is a useful safety constraint: if the model’s mental model of the file is stale, the edit fails rather than silently corrupting the content.
Aider takes the opposite architectural choice. It does not use structured tool calls. The model emits a custom text format:
src/utils.py
<<<<<<< SEARCH
def parse_date(s):
return datetime.strptime(s, "%Y-%m-%d")
=======
def parse_date(s: str) -> datetime:
return datetime.strptime(s, "%Y-%m-%d")
>>>>>>> REPLACE
Aider’s host parses these blocks with regex and applies edits using Python’s difflib. The benefit is portability: this approach works with any instruction-following model, including local models via Ollama, because it depends only on the model’s ability to follow a text formatting convention. The cost is that parsing is fragile when models produce malformed blocks. Structured tool calls push parsing responsibility onto the model’s native JSON generation, which is more reliable in frontier models.
The Bash tool and its design tradeoffs
Bash is the highest-leverage tool in a coding agent and the highest-risk. Claude Code runs shell commands in a persistent session, so state accumulates across calls within a conversation: environment variables, the current directory, background processes. The default timeout is two minutes; the maximum is ten. Stdout, stderr, and exit code are returned as separate fields.
The persistence is convenient but creates subtle hazards. If an early tool call changes the working directory and the model fails to track it, subsequent relative paths resolve against the wrong location. Claude Code addresses this by instructing the model to use absolute paths wherever possible. Other agents solve the problem by running each command in a fresh subprocess with the working directory set explicitly from the conversation state.
Permission gating matters here. Claude Code gates side-effecting tools behind an interactive prompt by default. The model must request permission before running shell commands that were not pre-approved in the session’s allow list. Cursor’s agent mode gates terminal commands similarly. This is the right default: the blast radius of certain shell commands is large and irreversible, and the user should see what is about to run before it does.
Context window management
The context window is the agent’s only working memory. Every message, every tool call, every tool result accumulates in the message list. On a long task, this fills the window.
Three strategies are common. First, truncation with summarization: the agent summarizes early turns into a compact block and drops the originals, preserving task state in condensed form. Claude Code’s /compact command does this on demand. Second, selective tool results: rather than returning full file contents from every read call, return only the relevant section using offset and limit parameters. Claude Code’s Glob and Grep tools return file paths by default; the agent reads files selectively on a second pass. Third, subagents: spawn a new API call with a fresh context window for a bounded subtask, return the result as a single tool result to the parent. This parallelizes work and prevents the parent context from exhausting on subtasks.
Context seeding, what goes into the system prompt before the conversation starts, determines the agent’s ground state. Claude Code’s system prompt is roughly 10,000 tokens. It includes tool descriptions, behavioral rules, and a CLAUDE.md injection mechanism: if a CLAUDE.md file exists in the project root, its contents are included verbatim. This is the mechanism for per-project customization; coding conventions, architectural constraints, and environment configuration injected at the system level rather than repeated in every user message.
How the differences show up in practice
Cursor adds a semantic layer that purely file-based agents lack. It maintains a Language Server Protocol connection to the open project, so the model can receive type errors, symbol definitions, and cross-reference graphs as structured tool results rather than raw file content. When Cursor tells the model that parseDate is referenced in fourteen files, that information is precise; a grep-based agent would need to run the search and parse the output itself.
GitHub Copilot’s agent mode builds a vector index of the workspace incrementally and injects top-K relevant chunks into context automatically on each turn. The agent does not need to call Grep to find related code; it arrives pre-loaded. The tradeoff is that retrieved chunks are approximate, based on embedding similarity, rather than exact, based on the model’s own search queries.
Aider’s architect mode separates planning from execution using two different models. A reasoning-capable model produces the plan in text. A faster model produces the SEARCH/REPLACE blocks. The planner does not write code; the coder does not plan. This separation reduces cognitive load on each model and lets you optimize cost independently at each stage, which matters when running many sessions.
Where the complexity lives
The agentic loop is not the hard part. The hard part is the combination of context budget pressure, tool schema design, permission semantics, and the unpredictable way model behavior shifts as context fills. A coding agent that works on a fresh 50-token task may fail on a 30,000-token task not because the loop broke but because the model started making different choices under long-context conditions.
The practical implication for anyone building on top of these systems, whether extending Claude Code with custom tools via the Model Context Protocol or building a new agent from scratch, is that the tool schema and context management strategy are the primary engineering decisions. Getting the description text right, choosing where to truncate tool output, deciding which operations require user confirmation before execution: these are what determine whether an agent is reliable in practice.
Tool count also has a ceiling. Claude Code ships roughly fifteen tools. Past around twenty, model tool-selection accuracy degrades because more schema tokens compete for attention in the same context window. Fewer, well-designed tools with precise descriptions outperform larger tool sets with vague ones. The impulse when adding capability is to add a new tool, but extending an existing tool’s description often produces better results with less overhead.