· 6 min read ·

The Model as Runtime: What OpenAI's Hosted Containers Actually Change

Source: openai

There is a pattern that keeps showing up in serious agentic systems: the client application becomes a thin wrapper around a loop. The model calls a tool, the client executes it, the client feeds results back, repeat. The orchestration logic lives in application code, the compute lives somewhere else, and the model just emits instructions. OpenAI’s decision to fold a container runtime directly into the Responses API is a direct architectural rebuttal to that pattern.

The announcement describes an agent that can use a shell tool to run commands inside a hosted Linux container, with state persisting across multiple tool calls within a session. That description undersells what is actually being proposed. The model is no longer just generating instructions for a client to execute. The model, the tool executor, and the sandboxed compute environment are all managed by OpenAI’s infrastructure as a single unit. The client submits a task and polls for results; the actual agent loop runs entirely on the other side of the API boundary.

What the Architecture Looks Like

The Responses API, which OpenAI introduced in March 2025, was already a step toward native agentic support. It maintains conversation state server-side via a previous_response_id chain, so clients do not need to replay full message history on every turn. Hosted tools like web_search, file_search, and code_interpreter can execute within a single response without a client round-trip.

The computer environment extension takes this further by adding a general-purpose shell tool and tying it to a persistent container session. A minimal invocation looks like this:

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-4o",
    tools=[{"type": "shell"}],
    input="Clone the repo, run the test suite, and summarize any failures."
)
print(response.output_text)

The model can issue multiple shell commands inside that single responses.create call. Each command’s stdout and stderr feeds back into the model’s context automatically. If the agent needs to continue work across a second call, previous_response_id=response.id attaches the same container state. Files written during the first call are still there.

This is meaningfully different from how the older Assistants API handled code execution. The code_interpreter tool in the Assistants API was purpose-built for data analysis: Python execution, DataFrame manipulation, chart generation. The shell tool is a general Linux environment. There is no semantic constraint on what you run. You can install packages with apt or pip, invoke arbitrary binaries, interact with the filesystem, make network requests. The code_interpreter was a calculator. The shell tool is a workstation.

The Co-Location Argument

When you build an agent with client-side orchestration, every tool call involves at minimum two network hops: one to the model API to get a tool call instruction, and one from your server to wherever the tool lives. The model does not see tool output until your code collects it and sends another request. For a task that involves fifty shell commands, that is a hundred network hops, all serialized.

With hosted tool execution, the feedback loop between a shell command completing and the model reasoning about its output happens within OpenAI’s infrastructure. The latency per tool call drops substantially, and more importantly, it drops predictably. This matters less for low-frequency tasks and matters enormously for agents that operate in tight command-feedback loops, the kind of agents that debug software, iterate on failing tests, or navigate a filesystem.

The comparison that comes to mind is E2B, which offers per-session sandboxed microVMs explicitly designed for AI agent workloads. E2B solves the isolation and lifecycle problem well, but the orchestration is still client-side. Your code must ferry outputs from the E2B sandbox to the model API and back. OpenAI’s approach eliminates that intermediary, at the cost of locking both the model and the compute to a single vendor.

Security Model of the Containers

Sandboxing agent compute is not a solved problem. Agents that can execute arbitrary code need to be isolated from host infrastructure, from each other, and from resources they were not explicitly given access to. OpenAI’s containers appear to use namespace-level isolation similar to Docker or gVisor-based runtimes, with one container per session and no shared filesystem between sessions.

This is the right baseline, but the interesting security question is network access. The shell tool default appears to allow outbound internet connectivity, which means a sufficiently determined or sufficiently deceived agent can exfiltrate data, fetch additional payloads, or interact with external services. For many legitimate workflows (cloning repos, installing packages, calling APIs) this is necessary. For sensitive enterprise workloads it is a meaningful attack surface, particularly given ongoing research into prompt injection in agentic systems.

Anthropics’s approach with computer use is instructive here: Anthropic provides the model capability but explicitly does not host the compute environment. The developer runs their own browser or desktop, sends screenshots to the API, and applies their own network controls. That is more work for the developer, but it keeps the security boundary under developer control. OpenAI is making the opposite bet: that developers want the ease of hosted compute more than they want fine-grained network isolation controls.

The Portability Trade-Off

Frameworks like LangGraph take the opposite architectural position. The graph definition, the orchestration logic, the tool registry, the state checkpointing, all of it lives in your code. You choose your own model, your own compute, your own storage backend. LangGraph Cloud adds hosted execution for the graph itself, but even then you are not locked to a specific model provider. The model is a node in the graph, not the runtime.

This matters for a few categories of use case. If you need to run agents against a fine-tuned model, or against a locally hosted model for data residency reasons, or against multiple models in parallel for ensemble reasoning, a framework-based approach gives you that flexibility. OpenAI’s hosted container runtime gives you none of it. The model and the container are the same product.

For teams that are already committed to GPT-4o and do not have specialized needs, the trade-off probably lands in OpenAI’s favor. The reduction in infrastructure code is real. You do not need to manage sandbox provisioning, lifecycle, network policies, or output routing. For teams that want portability or have specific compliance requirements, the lock-in cost is high.

What This Means for Agent Infrastructure as a Category

The deeper shift here is that “agent runtime” is becoming a distinct infrastructure category, separate from the model API and separate from general-purpose serverless compute. The requirements are specific: isolated execution environments, session-scoped state, tight feedback loops between model and tools, structured observability for debugging multi-step trajectories, and some mechanism for human-in-the-loop interruption.

General-purpose platforms like AWS Lambda and Google Cloud Run meet some of these requirements but were not designed for the agent use case. Lambda is stateless by design; state persistence requires external storage that you wire up yourself. There is no concept of a session spanning multiple invocations with shared filesystem state.

Modal is a closer match for the compute layer: persistent containers, GPU support, reasonably low cold-start latency. But Modal is compute infrastructure, not an agent runtime. You still write the orchestration.

OpenAI’s play is to collapse model API, orchestration, and compute into a single managed surface. That is a significant product bet, and it will likely attract the class of developers who want to build agents without wanting to build the infrastructure underneath them. Whether it also captures the developers building production agentic systems with serious operational requirements depends heavily on how much control OpenAI exposes over networking, observability, container configuration, and pricing for the compute component.

The Responses API plus hosted containers is not primarily a technical announcement. It is a statement about which layer of the stack OpenAI intends to own.

Was this interesting?