The Execution Layer That Agent APIs Have Always Needed

Back in March 2025, OpenAI shipped the Responses API as a more capable successor to Chat Completions. The headline features were built-in tools, a unified response object, and first-class multi-turn state via previous_response_id. It was a cleaner developer experience, but it was still fundamentally a model endpoint with some extra capabilities bolted on. The model could search the web, look through uploaded files, and preview a computer screen, but it had no persistent place to actually do work.

The announcement published on March 11, 2026 completes that picture. The Responses API now has a hosted container environment and a shell tool, giving agents a real execution surface with persistent file state, arbitrary command execution, and the infrastructure scaffolding to support genuinely stateful agent loops. It is worth looking at what that actually means in architectural terms.

The Gap Between Tool Calling and Agent Infrastructure

Function calling, which goes back to mid-2023 in OpenAI’s API, gave models a structured way to request external actions. You define a schema, the model emits a structured tool call, and your code handles the execution and feeds the result back. This works well for bounded operations: look something up, call an API, query a database.

What it does not handle well is the kind of work a coding agent actually needs to do. Writing a file, running tests against it, reading the output, patching the file, running the tests again. This loop involves stateful side effects across multiple turns. In the function calling model, you own all of that state. You persist the file, you run the subprocess, you format the stdout back into the context. For a one-off tool call, that is fine. For an agent loop running fifty turns on a non-trivial task, it becomes the majority of the engineering work.

The Assistants API had a Code Interpreter tool that solved a narrow version of this problem: sandboxed Python execution with file I/O. Developers used it heavily, but it had significant constraints. You could not install packages beyond what was pre-installed, you could not run arbitrary shell commands, and you could not do things like commit to a git repository or invoke build tools. It was a Python notebook, not an execution environment.

Hosted containers with a shell tool are the general solution to the same problem.

What the Container Model Provides

The shell tool in the Responses API executes commands inside a hosted container that OpenAI manages. When an agent turn starts, the model has access to a file system it can read and write, a shell it can invoke, and state that persists across the tool calls within a session. From the model’s perspective, it is operating inside a machine.

A simplified interaction looks like this:

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-4o",
    tools=[{"type": "shell"}, {"type": "file_search"}],
    input="Write a Python script that parses this CSV, run it, and show me the output.",
)

for item in response.output:
    print(item)

The model can write the script to disk, invoke the Python interpreter, and return the stdout to the caller, all within a single response. The file it wrote persists in the container for the duration of the session. On a subsequent turn, it can modify that file, re-run it, or reference it in a different tool call.

This is the architecture that agentic coding tasks actually require. The model is not just generating text that describes what code to run. It is writing code, running it, observing the result, and iterating. The state of the execution environment is part of the agent’s working memory.

Security and Isolation

Hosted containers introduce a trust boundary that function calling sidesteps entirely. With function calling, you control what code runs. You validate inputs, you scope permissions, you handle errors. With a shell tool in a hosted container, the model is running arbitrary commands on OpenAI’s infrastructure.

The security model here matters. Each container is isolated per session, with no network access to OpenAI’s internal systems and constrained egress to the public internet depending on configuration. The container is ephemeral unless the session specifically provisions persistent storage. Commands cannot escape the container boundary in ways that affect other users’ sessions.

This is broadly analogous to what services like E2B and Modal have been offering as standalone products: sandboxed, on-demand execution environments for AI agents. The difference is that OpenAI is embedding this directly into the API, removing the need to provision, authenticate, and wire together a separate service. For many developers, that integration is worth the trade-off in flexibility.

The trade-off is real. A self-hosted approach using E2B or a custom Docker setup gives you full control over the execution environment. You can install any package, configure network access precisely, mount secrets, and tune resource limits. OpenAI’s hosted containers will have constraints on what is pre-installed and what network access is permitted. That is probably fine for the majority of coding tasks, but it matters for anything that depends on specialized toolchains or external service access.

How State Flows Through the API

The Responses API’s previous_response_id mechanism handles conversational context across turns. Container state is a separate dimension. The files on disk, the processes that ran, the environment variables that were set. These persist through the shell tool’s natural state model, not through the response chain.

This separation is important. The conversation context (what was said, what tools were called, what results came back) lives in the response chain and is managed by the API. The execution state (what files exist, what was installed, what output is on disk) lives in the container and is managed by the container lifecycle. Keeping these cleanly separated makes the system easier to reason about, even if it means there are two kinds of state to track.

From a developer integration standpoint, this also means you can resume a session’s execution state without necessarily replaying the entire conversation. If you know the session ID, you can issue a new response against the same container and the file system will be where you left it.

The DIY Alternative and When It Still Makes Sense

Before hosted containers, the standard approach to giving an agent a real execution environment looked like this: provision a container yourself (or use E2B/Modal), define a function calling schema for shell execution and file operations, handle the execution on your side, format results back into the context, and manage session lifecycle yourself. There is a lot of plumbing in that description.

For teams already running infrastructure, this is not prohibitive. If you already have containers, secrets management, and monitoring in place, adding an agent execution environment is incremental work. You also get full observability into what the model is running, which matters for production deployments where audit trails are required.

Hosted containers are better for smaller teams or prototyping work where the infrastructure overhead is the main friction. They are also better for use cases where OpenAI’s security guarantees are sufficient and the pre-installed toolchain covers the task. A lot of software engineering work falls into that category.

The choice is less about capability and more about where you want the complexity to live. OpenAI’s hosted approach absorbs the container management complexity in exchange for reduced flexibility and vendor coupling. That is a reasonable trade for many use cases and a poor trade for others.

A Platform, Not Just an API

What the Responses API has become over the past year is a platform for building agents, not just an endpoint for calling a model. The web search tool, the file search tool, the computer use preview, and now the shell tool and hosted containers are all components of a runtime environment. You write the agent loop logic; the API provides the compute substrate.

This is a meaningful shift in how OpenAI is positioning the API. Compare it to how Anthropic’s computer use API works: Anthropic provides the tool interface and the model; you provide the actual computer being controlled. The execution environment stays on your side. OpenAI is moving in the other direction, absorbing the execution environment into the hosted service.

Neither approach is universally correct. Anthropic’s model gives developers full control and makes the trust boundary obvious. OpenAI’s model reduces friction and gives the API a more coherent story around what agents can do without external dependencies.

For developers building agents that do software engineering tasks, the shell tool and hosted containers are a substantial improvement over what was available in the Assistants API. The Code Interpreter was useful but constrained. A real shell, persistent file state, and a sensibly scoped security model is the execution environment that agentic coding actually requires. That it is now integrated directly into the Responses API, without a separate service to provision, is the part that will matter most to people who have tried to build this plumbing themselves.