· 6 min read ·

The Codex App Server Treats AI Agents Like Language Servers, and That's the Right Call

Source: openai

Back in February 2026, OpenAI published a retrospective on how they built the Codex App Server, the embedding harness for their Codex coding agent. The post frames it as an explanation of JSON-RPC streaming and tool approvals, but the more interesting story is architectural: they essentially built a language server for an AI agent, and the design decisions that follow from that choice are worth examining carefully.

What the App Server Actually Is

The Codex CLI was open-sourced in April 2025 in two flavors: a TypeScript/Node.js implementation (codex-cli/) and a Rust rewrite (codex-rs/). The TypeScript version runs the agent loop directly in the terminal using Ink. The Rust version does something more interesting: it exposes the agent through a socket-based bidirectional JSON-RPC 2.0 API called the App Server, so that external processes, GUIs, editors, or other tools can drive the agent without being written in Rust or knowing anything about its internals.

The App Server lives in the codex-app-server crate within the codex-rs workspace. It spins up a Unix domain socket (or a named pipe on Windows), accepts a single connection from the embedding host, and then acts as a relay between the host and the agent core in codex-core. The agent loop itself is an async Tokio event loop that calls the OpenAI API with function calling, executes tools in a sandbox, and feeds results back to the model.

The Protocol: Bidirectional JSON-RPC 2.0

The wire format is newline-delimited JSON, with each message being a complete JSON-RPC 2.0 object on a single line. The host submits tasks:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "codex/submit",
  "params": {
    "input": "Refactor the login function to use async/await",
    "workdir": "/home/user/myproject"
  }
}

The agent responds with notifications (no id field, so fire-and-forget) as work progresses:

{ "jsonrpc": "2.0", "method": "codex/agentMessage", "params": { "text": "Reading src/auth.rs..." } }
{ "jsonrpc": "2.0", "method": "codex/toolOutput", "params": { "chunk": "running cargo test...\n" } }

When the agent wants to apply a patch, it emits a codex/diffProposal notification. When it needs to run a shell command and approval mode is not auto, it emits codex/toolUseRequest. The host responds to these with explicit codex/approveAction or codex/denyAction calls, referencing the action by ID.

This is bidirectional in the JSON-RPC sense: both sides can initiate requests. The host sends codex/submit; the agent sends codex/toolUseRequest which expects a corresponding response from the host. Neither side is purely a server or purely a client.

LSP Did This First

The Language Server Protocol, designed by Microsoft in 2016, established this exact pattern. LSP uses JSON-RPC 2.0 over stdio (with Content-Length HTTP-style framing rather than newlines), and both directions can originate requests. A language server can send window/showMessageRequest to pop a dialog in the editor and wait for the user’s choice. An editor can send textDocument/didChange as a notification the server doesn’t need to acknowledge.

The Debug Adapter Protocol follows the same model for debuggers. At this point, the pattern has been stress-tested at massive scale across thousands of language server implementations and editor integrations.

Codex’s deviation from LSP framing, using newlines instead of Content-Length headers, makes the messages easier to process with standard line-reading tools and simpler to implement in any language without a full LSP library. That’s a reasonable trade: the Content-Length header approach in LSP exists mainly to handle multi-line messages, but newline-delimited JSON achieves the same goal by requiring JSON to be single-line, which is fine for structured data.

The Approval Loop Is the Hard Problem

The technically interesting part of the Codex App Server is not the streaming, which is just a sequence of notifications. It’s the approval mechanism.

When an AI agent wants to run rm -rf dist/, something needs to decide whether that’s acceptable. The naive approaches are either to block always (tedious) or allow always (dangerous). Most agent frameworks punt on this problem entirely, leaving it to the developer to implement whatever they want. LangGraph has interrupt nodes for human-in-the-loop, but they’re a framework primitive you wire up yourself, not a protocol-level concept.

Codex makes approval a first-class message in the protocol. The codex/toolUseRequest notification carries enough information for any host, whether a terminal UI, a web interface, or an IDE, to render a meaningful approval dialog:

{
  "jsonrpc": "2.0",
  "method": "codex/toolUseRequest",
  "params": {
    "actionId": "act_abc123",
    "tool": "shell",
    "command": ["rm", "-rf", "dist/"],
    "workdir": "/home/user/project",
    "reason": "Cleaning build artifacts before rebuild",
    "riskLevel": "medium"
  }
}

The host approves or denies by referencing act_abc123. If no response arrives within a timeout, the action is denied automatically. This is the right abstraction: the protocol defines the surface of human oversight, and the host decides how to present it.

The sandbox layer reinforces this. Shell commands run inside macOS Seatbelt (sandbox-exec) or Linux namespaces with seccomp filtering, with network access disabled and filesystem writes constrained to the working directory. So even in auto approval mode, the blast radius of a wrong decision is bounded at the OS level, not just by the agent’s intentions.

Diffs get similar treatment. A codex/diffProposal delivers a unified diff as a proposal, not an immediate write. The host must explicitly approve it, at which point the patch is applied atomically across all files in the proposal or not at all. The agent then emits codex/patchApplied with the list of modified paths and line counts. Treating multi-file edits as a single transaction is the right model for code changes, where partial application usually leaves things in a broken state.

Subprocess Embedding Over SDK Embedding

The architectural choice to expose the agent as a subprocess with a socket interface, rather than as a library or SDK, has real consequences.

LangChain and LlamaIndex require Python. If you want to embed those agents in a Go service or a Rust application, you’re forking a subprocess anyway, but without a stable protocol, so you end up scraping stdout. The Codex App Server inverts this: the Rust agent is the subprocess, the protocol is stable and documented, and the embedding host can be written in anything that can open a socket and parse JSON.

This also means the agent’s runtime is independent of the host’s runtime. The Rust core starts in roughly 20 milliseconds versus 150 milliseconds for the TypeScript CLI, and memory usage is substantially lower. An Electron app embedding the Codex agent doesn’t pay the Node.js startup cost for the agent itself.

The codex-mcp crate takes this further by wrapping the App Server as an MCP (Model Context Protocol) server. MCP, also JSON-RPC 2.0 over stdio, is the protocol Anthropic designed for tool servers, and it’s gaining traction in Claude Desktop, Cursor, and other hosts. By adapting the App Server to speak MCP, OpenAI lets Codex slot into that ecosystem without the host needing to know it’s talking to a Codex-specific protocol.

What This Pattern Gets Right

There is a real design tension in agentic systems between giving the agent enough autonomy to be useful and giving the human enough control to be safe. Most frameworks resolve this by making it the developer’s problem to implement. The Codex App Server resolves it by making it the protocol’s problem.

The JSON-RPC bidirectional pattern, borrowed directly from LSP’s decade of production use, provides a stable interface for any host to participate in that negotiation: approve this shell command, deny that file deletion, confirm this diff. The host can implement a terminal UI, a Slack bot, a web approval queue, or anything else, without touching the agent core.

Building on a protocol that already has 10 years of tooling, debugging infrastructure, and developer intuition behind it is a reasonable call. The Codex team did not need to invent a new streaming format or a new approval handshake. They needed to apply an existing pattern to a new domain, and it fits well.

Was this interesting?