· 6 min read ·

Codex on Windows and the Quiet Return of the AppContainer

Source: openai

OpenAI published a short engineering note about how it brought Codex, its coding agent, to Windows. The headline is unsurprising: they built a sandbox that gates filesystem access and clamps the network. The interesting part, for anyone who has tried to run untrusted code on Windows before, is which primitives they ended up using, and why those primitives have spent the last decade being slightly unloved.

I build Discord bots and the occasional native tool, and every time I have needed to run somebody else’s code on Windows with any safety, I have walked the same path: read the AppContainer documentation, discover that half the samples assume a UWP manifest, give up, and reach for a VM. OpenAI’s post is a useful data point because it confirms the path is walkable for a server-style process. It is also a reminder that the sandboxing story on Windows is genuinely different from Unix, and the differences shape what an agent like Codex can do.

What Codex needs from a sandbox

The Codex CLI runs an agent loop that reads a task, plans, edits files in a working directory, runs shell commands, and iterates. On macOS it uses sandbox-exec with a Scheme policy derived from Apple’s old Seatbelt framework. On Linux it uses Landlock plus seccomp filters, which is the standard modern stack for unprivileged sandboxing. Both let the parent process narrow a child’s authority to a specific directory subtree and a small set of syscalls, with no root required and no container runtime in the loop.

Windows has no direct analogue. You can drop privileges with a restricted token, you can run inside a Job Object to cap memory and CPU, you can isolate in an AppContainer, and at the heavy end you can spawn a Server Silo which is what Windows Server Containers use under the hood. None of these are syscall filters in the Linux sense. They are object-manager scoped namespaces and discretionary access checks layered on top.

OpenAI’s writeup describes a design built around AppContainer isolation plus controlled file access and network limits. That maps cleanly onto the primitives that have been sitting in Windows since 8.1.

AppContainer, the part everyone forgets

An AppContainer is a process-integrity boundary defined by a per-app SID and a set of capability SIDs. When the process opens a kernel object, the access check happens against that AppContainer SID, not the user’s SID. If a file’s ACL does not grant ALL APPLICATION PACKAGES or the specific package SID, the open fails with ERROR_ACCESS_DENIED, regardless of what the interactive user could do. That is how Edge tabs and UWP apps have been isolated for years.

The interesting trick for a coding agent is that an AppContainer is by default too restrictive. The agent needs to read the working directory, write to it, and run child processes inside the same boundary. You grant the working tree by either ACLing the package SID onto the directory or by using a named object capability. Network gets handled by capability SIDs: internetClient, privateNetworkClientServer, and friends. Omit them and the Windows Filtering Platform drops sockets at the kernel layer before the agent’s code can even see them. This is genuinely stronger than a userspace proxy because there is no in-process bypass.

Where AppContainer struggles is binaries that were never written to live inside one. A lot of Win32 software pokes at HKCU registry keys, opens temp files in places that are not redirected, or calls into COM servers that refuse to marshal across the integrity boundary. The fix Microsoft shipped in Windows 10 was LessPrivilegedAppContainer and an expansion of the loosened mode, but it is still the part that bites.

Why not just Windows Sandbox?

Windows Sandbox is the obvious answer for any Windows developer who has played with it: a disposable Hyper-V VM that boots in a second and discards state on exit. It is great for clicking on a suspicious installer. It is a poor fit for an agent. It requires Hyper-V, which is unavailable on Home SKUs and conflicts with VirtualBox or any other hypervisor the user might have. It boots a full Windows session, which is heavy. And shuttling files into and out of it for every tool call is awkward.

Server Silos, which are a sibling concept, are closer to what Linux containers feel like. They give you a separate object-manager root, a separate registry hive view, and a separate session, all without spinning a kernel. That is the foundation Windows process-isolated containers use when you do not opt into Hyper-V isolation. Spinning a silo from an unprivileged app is not trivial, which is why most projects either ship a service or accept AppContainer’s compromises.

OpenAI did not name silos in the post, but the shape of “controlled file access and network limits” sounds like the AppContainer route with WFP filters. That matches what Chromium does for its renderer sandbox; the Chromium sandbox design doc is still the best public reference for how to combine restricted tokens, job objects, integrity levels, and AppContainer into a coherent policy. If you have ever wondered why Chrome on Windows feels different to Chrome on Linux at the process level, that document is why.

The network piece is the hard one

File access is comparatively easy: ACL the working tree, deny everything else, accept that some tools will fail to write to %TEMP% and need a redirected temp dir. Network is where coding agents get messy. A useful agent wants to install packages, fetch from GitHub, hit a language server’s update endpoint, maybe call out to an LLM provider for sub-tasks. A safe agent wants none of that without explicit consent.

The Linux Codex sandbox handles this with a seccomp filter that blocks socket(AF_INET) and friends, optionally paired with a network namespace. On Windows the equivalent is to omit network capability SIDs and let WFP do the rejection. The advantage is that this is enforced by the kernel’s filtering engine, not by hoping the child process honors a HTTP_PROXY variable. The disadvantage is that allowlisting specific hosts is harder than on Linux, where you can hand the child a unix-socket proxy and a connect()-only seccomp policy. WFP filters are flow-based and can match remote addresses, but writing a per-host policy from an unprivileged process generally means routing through a userland proxy anyway.

For Codex this likely means the same pattern as the other platforms: default-deny, with an explicit --full-auto or workspace-write mode that opens a known set of endpoints. The Codex configuration reference already documents sandbox_mode with read-only, workspace-write, and danger-full-access levels on macOS and Linux. Bringing the same vocabulary to Windows is the right call even if the implementation underneath looks completely different.

What this unlocks

The practical effect is that Codex CLI on Windows can now run with something better than “trust the user’s session.” That matters because the agent runs untrusted-ish code by definition: it executes shell commands derived from a model’s output against a developer’s machine. A bad tool call, a prompt injection in a README, a malicious dependency in a freshly cloned repo, any of these turn into a real incident if the sandbox is just the user account.

The broader effect, which I find more interesting, is that this puts another high-profile project squarely on the AppContainer path. Chromium, Edge, Defender’s Application Guard, and now Codex are all leaning on the same set of primitives. The documentation has historically been written for UWP developers shipping to the Store, which left server-style and CLI uses underexplored. More public examples of “how we set up the AppContainer for a non-Store process” would do the ecosystem a lot of good.

If you are building a Windows tool that runs other people’s code, the takeaway is that the right answer in 2026 still rhymes with what Chromium settled on a decade ago. Restricted token, integrity level low or untrusted, job object for resource caps, AppContainer SID for object isolation, and WFP for the network. It is more moving parts than sandbox-exec and it is less expressive than Landlock, but it works, and OpenAI’s writeup is a useful confirmation that it scales to an agent workload.

Was this interesting?