The Agent Loop as a Protocol: What OpenAI Got Right with the Codex App Server
Source: openai
Back in February, OpenAI published a technical walkthrough of the Codex App Server, the internal component that mediates between the Codex agent’s reasoning loop, its tool execution environment, and whatever client UI is talking to it. It is worth revisiting now because the architectural decision at its center, making the agent harness an externalized, language-agnostic protocol rather than in-process glue code, is more significant than the announcement framing suggested.
The Codex CLI was open-sourced in May 2025 as a terminal-native coding agent built on the codex-1 model and the OpenAI Responses API. The App Server is its backbone: a bidirectional JSON-RPC 2.0 channel that carries streaming progress events, tool call notifications, file diffs, and human approval requests between the agent loop and any connected client.
The LSP Parallel
The design closest in spirit to this is the Language Server Protocol, which Microsoft introduced in 2016 for VS Code. Before LSP, every editor had to implement language-specific intelligence (autocomplete, go-to-definition, rename symbol) from scratch or rely on language-specific plugins with proprietary APIs. LSP solved this by defining a standard JSON-RPC protocol over stdio: one server per language, any number of compliant editors. The explosion of editor tooling that followed is largely attributable to that design choice.
OpenAI is applying the same logic to AI agents. Before a well-defined harness protocol, building a custom client for a coding agent meant either forking the agent’s source code, wrapping it in brittle subprocess hacks, or reimplementing the agent loop yourself. The Codex App Server sidesteps all of that. If your client speaks JSON-RPC 2.0, it can drive a Codex agent session, regardless of what language the client is written in.
Why JSON-RPC 2.0
JSON-RPC 2.0 has three properties that make it well-suited here. It is transport-agnostic, running cleanly over stdio, Unix domain sockets, WebSockets, or HTTP with SSE. It natively supports both request/response pairs and fire-and-forget notifications. And critically, it is bidirectional: both endpoints can initiate requests, not just the client.
That last property matters a great deal for the approval flow, which I will get to shortly. But even for the simpler streaming case, notifications are the right abstraction. A client polling a REST endpoint for agent progress introduces latency and wastes cycles. Server-sent notifications arrive the moment the agent emits them, with no polling overhead.
A streaming progress notification looks like this:
{
"jsonrpc": "2.0",
"method": "agent/progress",
"params": {
"sessionId": "sess-abc123",
"type": "tool_call",
"callId": "call-7",
"tool": "shell",
"args": { "command": "npm test", "cwd": "/project" }
}
}
No id field, so no response is expected. The server fires these continuously as the agent reasons, calls tools, reads files, and generates diffs. The client can render them in real time, feed them to a log, or discard irrelevant types depending on what the UI needs.
Approval as a First-Class RPC
The human approval mechanism is where the bidirectional design becomes essential. When the agent wants to execute a destructive shell command, apply a patch, or do anything that crosses an approval threshold, the server sends a request to the client:
{
"jsonrpc": "2.0",
"id": "approval-55",
"method": "agent/approval_request",
"params": {
"callId": "call-7",
"tool": "shell",
"args": { "command": "rm -rf dist/", "cwd": "/project" },
"reason": "Cleaning build artifacts before rebuild"
}
}
The client responds with a standard JSON-RPC response, either approving the action or rejecting it with feedback:
{
"jsonrpc": "2.0",
"id": "approval-55",
"result": { "approved": false, "feedback": "Don't delete dist, just empty it" }
}
If the client rejects with feedback, the harness injects that feedback back into the model context and the loop continues. The agent can adjust its plan and propose a different action without starting over.
This is qualitatively different from how most frameworks handle human-in-the-loop. In LangChain, for instance, approval is typically implemented as a Python callback, which means the “client” must be Python code running in the same process. You cannot delegate approval to a separate UI process, a web interface, or a different service without substantial wrapper engineering. Because the Codex approval flow is a proper JSON-RPC request/response, it naturally supports timeout semantics, retry logic, and delegation to any client that can speak the protocol.
The Codex CLI offers three approval tiers: suggest mode where the user approves every action, auto-edit mode where file edits are approved automatically but shell commands still require confirmation, and full-auto mode where everything runs without approval inside an OS-level sandbox. On macOS that sandbox uses sandbox-exec with a Seatbelt profile; on Linux it uses landlock for filesystem restrictions combined with network namespace isolation. The model does not get to decide whether it is sandboxed. The harness enforces it at the OS level.
Diffs Over File Replacement
When the agent modifies files, it transmits changes as unified diffs rather than complete file contents. This has several practical advantages. Diffs are smaller to transmit. They are human-readable in the UI, so users can review exactly what is changing before approving. They fail explicitly when the underlying file has changed unexpectedly, surfacing conflicts that a full replacement would silently clobber. And they are applied atomically: the harness writes the patched content to a temporary file, validates the patch, then renames it into place. A process interruption between write and rename leaves the original file untouched.
A patch notification carries the diff inline:
{
"jsonrpc": "2.0",
"method": "agent/progress",
"params": {
"type": "patch",
"path": "src/auth/jwt.ts",
"diff": "--- a/src/auth/jwt.ts\n+++ b/src/auth/jwt.ts\n@@ -12,7 +12,7 @@\n-import { sign } from 'old-jwt';\n+import { sign } from 'jsonwebtoken';\n"
}
}
What This Means for the Ecosystem
Compare the Codex harness design against the dominant alternatives. LangChain’s AgentExecutor and LlamaIndex’s AgentRunner are Python-in-process abstractions. They are powerful within their ecosystem, but you cannot run them from a Rust CLI or a Go service or a browser without wrapping them in a subprocess and inventing a communication protocol, at which point you have essentially reimplemented what the Codex App Server provides. AutoGen is closer, with a more explicit message-passing model and WebSocket support in AutoGen Studio, but the protocol is proprietary rather than built on a standard.
The Codex approach has the same properties that made LSP transformative. Any language can implement a client. The protocol is versioned and machine-readable. Tooling that understands JSON-RPC 2.0 generically, loggers, proxies, debuggers, can be inserted into the pipeline without understanding the agent domain at all.
The implications extend past IDE plugins and terminal UIs. A CI pipeline can run Codex in full-auto mode and receive structured progress events that feed directly into build dashboards. A code review tool can intercept patch notifications and annotate diffs before they are applied. A second AI system can act as the “client,” programmatically approving or rejecting tool calls based on its own analysis of the proposed action.
The Friction That Remains
The externalized protocol does not eliminate all integration complexity. The Codex model itself (codex-1) is not open-weights; it runs through the OpenAI Responses API, which means the agent loop depends on a paid external service. Custom tool definitions require modifying the harness source or waiting for an extension API. The session model supports multiple concurrent sessions, but session state is in-memory, so long-running agent tasks do not survive process restarts without additional persistence work.
These are addressable problems, not design flaws. The core architectural choice, JSON-RPC 2.0 bidirectional over standard transports, holds up regardless of how the surrounding ecosystem evolves. If OpenAI or the open-source community ships an alternative model that speaks the same Responses API, the harness works without modification. If a new transport becomes preferable, the protocol layer is already decoupled from the transport layer.
The Codex App Server is, in the end, an argument that the right way to build an AI agent harness is the same way the right way to build a language server was: define the protocol, publish the spec, and let the ecosystem do the rest. It took a few years for LSP to reach that outcome in the language tooling world. The agent tooling world is moving faster.