What OpenAI's Codex-on-Windows Sandbox Tells Us About the State of Process Isolation
Source: openai
OpenAI published a short engineering note on how they brought Codex to Windows, focused almost entirely on one problem: how do you let an autonomous coding agent run arbitrary commands on a developer’s machine without it shredding the filesystem or exfiltrating data over the network. The answer involves a stack of Windows primitives that most application developers never touch directly, and the choices they made are a useful lens on why sandboxing on Windows has historically lagged macOS and Linux.
This post is less about Codex and more about the sandbox underneath it. I want to walk through what the Windows isolation primitives actually do, why OpenAI ended up combining several of them, and how the resulting design compares to the Seatbelt profile that powers Codex on macOS and the Landlock/seccomp combo it uses on Linux.
The shape of the problem
A coding agent needs a strange mix of capabilities. It must read source files, write files inside the workspace, execute build tools, spawn compilers, run tests, and sometimes hit the network for package installs. It must not delete the user’s home directory, read SSH keys, or call out to an attacker-controlled host because a prompt injection convinced it to. The threat model is not a malicious binary trying to escape; it is a well-meaning binary executing instructions that may have been tampered with somewhere upstream.
On macOS, OpenAI uses sandbox-exec with a Scheme-like policy language (TinyScheme, technically) inherited from the old Seatbelt system. You write a profile that allows file reads under specific prefixes, denies network access, and the kernel enforces it. On Linux, the Codex CLI uses Landlock plus seccomp, with Landlock handling filesystem scoping and seccomp filtering syscalls. Both approaches share a property: the kernel ships with a unified, documented mechanism for restricting a process tree.
Windows has no such single mechanism. It has AppContainer, inherited from the UWP/Metro era. It has Job Objects, which date back to Windows 2000 and let you cap CPU, memory, and process spawning for a group of processes. It has restricted tokens for stripping privileges from a process. It has the Windows Filtering Platform for kernel-level network filtering. It has Mandatory Integrity Control labels that gate write access by integrity level. Each of these solves part of the puzzle. None of them solves all of it.
Why AppContainer alone is not enough
AppContainer is the closest Windows analogue to a Linux user namespace plus Landlock. A process inside an AppContainer runs with a heavily restricted token, gets its own private namespace for the registry and parts of the filesystem, and can only access resources for which it holds an explicit capability SID. Network access, for instance, is gated by capabilities like internetClient and privateNetworkClientServer. By default an AppContainer process cannot open a socket to the outside world at all, which is exactly the property you want for a sandboxed shell.
The problem is that AppContainer was designed for packaged Store apps, and many traditional Win32 build tools misbehave inside one. MSBuild, link.exe, and various installer subprocesses assume they can write to temp directories they discover via environment variables, or that they can read system DLLs from locations that AppContainer’s filesystem virtualization redirects. Chromium’s team has been writing about the rough edges of AppContainer sandboxing for over a decade, and their conclusion has consistently been that you need a layered approach: a restricted token plus a job object plus, where possible, AppContainer on top.
OpenAI’s post describes essentially that layered approach. The workspace directory is granted as an explicit named-object capability so the agent can read and write inside it. Everything else on the filesystem is read-only or denied. Network egress is blocked by default and selectively allowed for known package registries. A Job Object wraps the whole tree so that even if a child process tries to escape via CreateProcess, it inherits the same restrictions and counts against the same limits.
The network side is where it gets interesting
Filesystem isolation on Windows is a solved problem if you are willing to accept some friction. Network isolation per-process is genuinely hard. Linux has network namespaces; you can put a process in its own namespace with no routes and it physically cannot reach anything. macOS has the (deny network*) family of rules in Seatbelt. Windows historically had no clean equivalent. Windows Firewall rules apply per-application-path or per-user, not per-process-tree.
The Windows Filtering Platform changes this, but it is a kernel-mode API. To filter network traffic for a specific Job Object or AppContainer SID, you either ship a kernel driver or use one of the user-mode WFP layers that support filtering by token and SID. The OpenAI post is light on detail here, but the natural read is that they are using WFP filters keyed on the AppContainer’s package SID to drop outbound traffic except to an allowlist. This is the same general pattern that Windows Sandbox and the Microsoft Defender Application Guard use, although those reach for a full Hyper-V container, which is too heavyweight for an interactive coding agent.
The interesting contrast is with how Docker Desktop on Windows handles the same problem. Docker punts and runs containers inside a lightweight VM. That gives you strong isolation for free, but you pay in startup time, memory, and the awkwardness of sharing files across the VM boundary. For a tool that needs to feel like a local shell, the VM tax is too high, which is presumably why OpenAI built directly on the kernel primitives instead.
A rough architecture sketch
In pseudocode, the launch path for a sandboxed command on Windows looks something like this:
hJob = CreateJobObject(NULL, NULL);
SetInformationJobObject(hJob, JobObjectExtendedLimitInformation, &limits);
SetInformationJobObject(hJob, JobObjectBasicUIRestrictions, &ui);
CreateAppContainerProfile(L"CodexSandbox.v1", ..., capabilities, count, &sid);
SECURITY_CAPABILITIES caps = { sid, capabilities, count, 0 };
STARTUPINFOEXW si = {};
InitializeProcThreadAttributeList(...);
UpdateProcThreadAttribute(..., PROC_THREAD_ATTRIBUTE_SECURITY_CAPABILITIES, &caps, ...);
CreateProcessW(NULL, cmdline, NULL, NULL, FALSE,
CREATE_SUSPENDED | EXTENDED_STARTUPINFO_PRESENT,
env, workdir, &si.StartupInfo, &pi);
AssignProcessToJobObject(hJob, pi.hProcess);
// Install WFP filters keyed on the AppContainer SID before resuming.
ResumeThread(pi.hThread);
The AppContainer profile provides the filesystem and registry isolation, the Job Object provides the resource caps and prevents process escape, and the WFP filters enforce the network policy. None of these primitives is new; the work is in making them compose correctly and survive contact with the wide variety of build tools developers actually run.
What this says about the platform
A few things stand out when you compare Codex’s three sandbox implementations side by side. The macOS profile is roughly a hundred lines of Scheme. The Linux implementation is a Landlock ruleset plus a seccomp BPF program, both well under a thousand lines. The Windows implementation, judging from the surface area described in the post, is materially more complex because there is no single API that covers the same ground.
This is not a Windows-is-bad observation; it is a consequence of Windows having accumulated isolation features over twenty-five years without ever doing a clean-sheet rewrite. AppContainer was for Store apps. Job Objects were for batch workloads. WFP was for security products. Restricted tokens were for service hardening. Each piece works, and together they cover the same ground as Landlock plus seccomp plus network namespaces, but the seams show.
The broader trend worth watching is whether projects like Codex push Microsoft toward a more unified developer-facing sandboxing API. There have been hints in this direction: the Win32 App Isolation project at Microsoft is trying to make AppContainer practical for legacy desktop apps, with tooling to discover the capabilities a binary actually needs. If that effort matures, future agent sandboxes on Windows might look a lot more like their Unix counterparts. For now, the price of running an autonomous agent safely on Windows is a stack of carefully composed kernel objects, and OpenAI’s post is a useful confirmation that the stack works in production.