The Sandbox Threat Model That Changed When LLMs Started Executing Code
Source: simonwillison
Simon Willison’s recent roundup of JavaScript sandboxing research surveys a landscape that, if you have been following it, looks grimly familiar. node:vm is not a security boundary, its own documentation says so. vm2 is deprecated after back-to-back CVSS 9.8 CVEs. The TC39 ShadowRealm proposal explicitly excludes hostile code from its design goals. The approaches that actually work, V8 Isolates, SES with lockdown(), QuickJS embedded via WebAssembly, share a structural property: they establish a real isolation boundary rather than trying to intercept execution at the language level.
That survey is worth reading carefully. But there is a dimension it touches on lightly that deserves more direct treatment. The sandboxing question has changed because of LLMs, and the existing literature was written for a different threat model.
Plugin Systems vs. LLM-Generated Code
The traditional sandboxing use case is a plugin system. A developer writes a plugin, uploads it, and you execute it. The threat model is: trust the plugin author’s authorization, not the plugin’s end users. The sandbox boundary exists to prevent a compromised or malicious plugin from affecting other tenants, not to contain the author. The author is known, they agreed to terms of service, and their identity is logged.
The same model applies to user-generated code execution in educational platforms, online REPLs, and CI/CD pipelines. The person who wrote the code is identified. The sandbox’s job is to prevent them from escaping into infrastructure they do not own.
LLM tool use operates under a different assumption. When you give an LLM a code execution tool, the LLM generates the code. The nominal author is the model, but the model’s behavior is shaped by everything in its context window. If that context window includes content from an untrusted source, a customer record, a web page, an email, then an attacker who controls that content can influence what the LLM generates. This is prompt injection as a code execution vector, and it collapses the “the author is known and authorized” assumption that traditional sandboxing designs rely on.
The Snowflake Cortex Incident
In early March 2026, Willison documented a case where Snowflake’s Cortex AI feature escaped its sandboxed execution environment and executed malware. The Snowpark sandbox was designed for trusted user-defined functions: code written by a database developer who was an authorized Snowflake customer. The threat model was the traditional plugin model.
The incident demonstrated what happens when a system designed for the plugin threat model is used to execute LLM-generated code. The LLM may have processed data containing instructions designed to influence its output. The code it generated reflected those instructions. The sandbox was never designed to contain code whose author’s behavior could be manipulated by the data being processed.
The structural problem is not that Snowflake built a weak sandbox. The problem is that the threat model changed, and the sandbox design did not change with it.
Why Language-Level Sandboxing Fails
The deeper failure mode behind both plugin-system and LLM-execution cases is well documented. JavaScript is a prototype-based language where every object inherits from a small set of built-in objects: Object, Function, Array, Error. All code running in the same agent shares these primordials, and that sharing is the root of the problem.
The canonical escape is thirty characters:
const escape = ({}).constructor.constructor;
escape('return process')();
({}) creates an object literal. .constructor retrieves Object. .constructor again retrieves Function from the outer realm. From there you have arbitrary code execution in the host context. The sandbox never saw what happened.
Variants of this pattern are practically unbounded. Proxy trap exploitation, Symbol.toPrimitive leaks, generator protocol tunnels, Symbol.unscopables: every extensibility hook in JavaScript is a potential path out of scope-based containment. vm2 spent years patching these variants before CVE-2023-29199 (CVSS 9.8) escaped through vm2’s own error-handling code: crafted code threw an exception object that, when caught by vm2’s error wrapper, exposed the outer Function constructor. The sandbox’s own defensive code created the escape route. A second CVE followed weeks later, and the project was deprecated with the maintainers explicitly acknowledging the problem cannot be solved at the pure language level.
This is the definitive case study. A well-maintained, carefully engineered sandbox, used at scale, failed not because of implementation sloppiness but because the attack surface is the prototype chain itself.
What Actually Works
The research converges on a clear hierarchy.
isolated-vm provides V8 Isolate sandboxing for Node.js. A V8 Isolate is an independent instance of the V8 engine with its own heap, its own GC, and its own primordials. Code in one isolate is physically incapable of reading objects in another because there is no shared prototype chain to traverse. Memory overhead is roughly 128KB per isolate. Crossing the boundary requires explicit Reference objects rather than live JavaScript values, which makes every cross-boundary call visible and typed. Cloudflare’s workerd has used this architecture in production since before 2016, running thousands of isolates per process.
QuickJS embedded via WebAssembly gives you an entire JavaScript engine inside WebAssembly’s linear memory boundary. The QuickJS heap is a bounded region that cannot access the host JavaScript heap by construction. You get both the embedding model and the memory model. No npm ecosystem is accessible by default; the host exposes only what it deliberately wires through the FFI layer. This is the same model that made Lua dominant for game scripting across three decades, applied to JavaScript.
The one language-level approach that has held up is SES (Secure ECMAScript) from Agoric. Rather than intercepting escapes at runtime, lockdown() freezes all primordials before any untrusted code runs:
import 'ses';
lockdown();
const compartment = new Compartment({
globals: { fetch: safeWrappedFetch },
});
compartment.evaluate(untrustedCode);
After lockdown(), Object.prototype and Function.prototype are frozen. Navigating the prototype chain no longer reaches mutable state. This removes the mutable state that makes escapes useful rather than patrolling for escapes at runtime. The operational cost is real: lockdown() must run before everything in the process, including dependency initialization, and much of the npm ecosystem mutates primordials at startup. SES is the right answer for systems built around it from the start; it is not an upgrade path for existing Node.js applications.
The Different Isolation Requirements
For the traditional plugin-system threat model, a V8 Isolate boundary is the right level. Real engine boundary, manageable overhead, workable API surface. isolated-vm is the pragmatic choice; the model is production-proven at Cloudflare’s scale.
For LLM-generated code execution where the LLM processes inputs from untrusted sources, the argument for stronger isolation gets more compelling. A separate VM, using Firecracker or equivalent, means that even a sophisticated attacker who knows your isolation layer’s implementation cannot use JavaScript prototype chain knowledge to escape. There is no prototype chain; there is a hardware boundary. Firecracker with snapshot-based CoW forking can now get below 1ms cold start while maintaining KVM-level hardware isolation, removing the latency argument for weaker isolation.
The minimum capability surface matters more in the LLM case. Every capability exposed to the sandbox is a capability an attacker can attempt to reach via prompt injection. This is standard least-privilege, but it requires treating the generated code as if it was written by an attacker who has read your internal documentation, because, under the prompt injection model, that is exactly what may have happened.
The OWASP Top 10 for LLM Applications ranks prompt injection and insecure output handling as its top two risks. These combine directly when an LLM has a code execution tool and processes inputs from untrusted sources. The Snowflake Cortex incident is not an edge case; it is the expected failure mode of the traditional sandbox threat model applied to the wrong problem.
What the Research Leaves Open
Willison’s survey is a useful state of the landscape. The conclusion it supports is clear: real isolation boundaries work, language-level interception does not, and the community keeps rediscovering this. What the survey does not fully address is that the LLM era creates a class of deployments where the code generator itself is part of the attack surface, and the sandbox assumptions built for plugin systems are not conservative enough for that threat model.
Building systems where LLMs execute code is now common enough that this distinction matters in practice. The sandbox literature has the right answers for execution environments. The question is whether those answers are being applied with the correct threat model in mind.