· 6 min read ·

JavaScript Sandboxing: Four Strategies, Four Different Threat Models

Source: simonwillison

JavaScript sandboxing has been a solved problem approximately four times in the last decade, and each solution has eventually been found wanting. The vm2 library powered plugin systems at scale for years before a series of CVEs forced its deprecation in 2023. The built-in Node.js vm module ships a context that its own documentation explicitly warns is not a security boundary. ShadowRealm reached Stage 3 at TC39 with a security model that explicitly excludes hostile code from its scope.

Simon Willison’s recent survey of JavaScript sandboxing research captures something important about the state of the field: this is not a problem the community is converging on solving. It is a problem the ecosystem keeps rediscovering, with each new generation of developers, each new LLM tool execution use case, each new plugin system design. The reason it keeps getting re-solved is that JavaScript was not designed with isolation in mind. Understanding why requires looking at the language’s object model.

The Prototype Problem

In JavaScript, every object is connected to every other object through the prototype chain. Object.prototype sits at the root of that chain for nearly all values. If a sandboxed script can get a reference to any constructor from the outer realm, it can traverse up to Function or Object and from there reach essentially anything:

// Classic escape pattern
const escaped = [].constructor.constructor;
escaped('return process')(); // Access outer Node.js process

The vm2 library patched dozens of variants of this pattern before ultimately being deprecated. The patches were never exhaustive because the prototype chain offers too many entry points. Every new language feature introduced since ES2015 has added new potential vectors: Proxies, WeakMaps, async iterators, Symbol internals. You cannot reliably enumerate all the paths through the object graph, and JavaScript’s flexibility makes the enumeration problem perpetually open.

This is the core tension: the dynamic property system that makes JavaScript expressive is the same property system that makes isolation so difficult. Every sandboxing approach is working against the grain of the language to some degree.

Language-Level Hardening: SES

The most principled language-level approach is the Secure ECMAScript (SES) project from Agoric. SES calls lockdown() at startup, which freezes all JavaScript intrinsics (Object, Function, Array, Promise, and their prototypes) and removes dangerous capabilities like eval and the Function constructor. This blocks the entire class of prototype-chain escapes shown above:

import 'ses';
lockdown(); // Freeze all shared intrinsics

const c = new Compartment({
  Math: harden(Math),
  console: harden({ log: (...args) => console.log(...args) }),
});

c.evaluate(`Math.floor(1.5)`); // fine
c.evaluate(`(0).constructor.constructor('return process')()`); // TypeError

This is not isolation in the VM sense: all code still runs in the same realm, sharing the same heap. SES is better characterized as hardening. Its security guarantee is that compartments cannot escalate privileges beyond what the host explicitly grants through the capability object. The practical limitation is compatibility. Many npm packages depend on mutating global state or constructors in ways that break under a locked-down environment. The Agoric team has done extensive compatibility work, but adopting SES in an existing project involves real migration cost.

Realm Isolation: ShadowRealm

The ShadowRealm proposal at TC39, currently at Stage 3, takes a different approach. A ShadowRealm gets its own global object and a fresh set of intrinsics, which closes the prototype-chain escape immediately. Objects cannot cross the realm boundary directly; only primitives and callable references can be passed through:

const realm = new ShadowRealm();

const double = realm.evaluate(`(x) => x * 2`);
console.log(double(21)); // 42

// Attempting to pass an object across the boundary throws
realm.evaluate(`[]`); // TypeError: cross-realm object

The security model here is explicit in the spec: ShadowRealm is suitable for code you mostly trust, not for hostile code. Callable references passed across the boundary still create vectors for realm confusion attacks, and timing side-channels remain possible. For plugin systems where plugin authors are vetted but still need containment of their ambient authority, ShadowRealm is a reasonable fit. For running code from untrusted external sources, it is not sufficient.

V8 Isolate-Based Isolation: isolated-vm

For genuine isolation within a Node.js process, the strongest option is isolated-vm, which exposes V8’s Isolate API. V8 Isolates are the isolation unit Chrome uses to separate browser tabs: separate heaps, separate garbage collectors, no shared memory. When you run code inside an isolate, there is no prototype chain connecting it to the outer realm because the heaps are physically separate:

const ivm = require('isolated-vm');
const isolate = new ivm.Isolate({ memoryLimit: 64 }); // hard cap at 64MB
const context = await isolate.createContext();

// Every value crossing the boundary must be explicitly serialized
const jail = context.global;
await jail.set('log', new ivm.Reference((...args) => console.log(...args)));

const script = await isolate.compileScript(`log.applySync(null, ['hello'])`);
await script.run(context);

The boundary enforcement here operates at the C++ level inside the V8 engine, not at the JavaScript level. This makes escape significantly harder. The cost is real: inter-isolate communication requires serialization, which runs around 10-20 microseconds per call for simple values, and the per-isolate setup overhead is measurable for short-lived tasks. For plugin systems with nontrivial compute needs, this overhead is acceptable. For high-frequency small calls, it adds up.

Process and OS-Level Isolation

For truly hostile code, none of the JavaScript-level approaches are sufficient. The only reliable boundary is a process or OS boundary: separate child processes, containers, or WebAssembly sandboxes. The Extism project has made Wasm-based plugin execution practical: plugins compile to WASM, run inside a Wasmtime or Wasmer instance, and receive only the host functions the embedder explicitly exports. Nothing from the JavaScript environment leaks into the sandbox unless you put it there. The significant trade-off is that plugins must be compiled to WASM ahead of time, which rules out running arbitrary JavaScript from sources you do not control.

For the case where arbitrary JavaScript execution is required and the threat model is fully adversarial, the production answer is a container-per-execution model, which is expensive but the only option with a defensible security boundary.

The TC39 Research Track

The Compartments proposal, in early stages at TC39, is the most ambitious attempt to address the remaining gap at the language level. Where ShadowRealm handles code evaluation, Compartments targets module loading: you would be able to intercept import statements, redirect them to alternative implementations, or block specific packages entirely. This addresses the supply chain attack vector that all current language-level approaches leave open.

LavaMoat by the MetaMask team approximates this today at build time, through static analysis of the dependency graph. It determines what capabilities each package requires (does it need process.env? File system access? Network?) and generates a policy file that restricts runtime access to those declared capabilities. This does not provide isolation, but it meaningfully shrinks the blast radius of a compromised transitive dependency.

What the LLM Wave Changed

The renewed research interest in JavaScript sandboxing through 2025 and into 2026 is partly driven by the AI agent ecosystem. When a language model can execute code as a tool, you need somewhere to run that code. For browser-based agents, the browser’s existing isolation model handles most of this. For server-side agents, the question becomes which layer of the above stack you need, which depends entirely on whether the code being executed comes from the model itself (low trust, but known origin) or from data the model processed (high risk, potentially adversarial).

The pattern that has emerged in production is layered: SES for the capability boundary, Worker Threads for the memory boundary, and a container for the OS backstop. None of these layers is sufficient alone, and all of them together constrain the attack surface to something tractable.

What the research landscape shows is that there is no unified JavaScript sandbox you can reach for. The threat model determines the appropriate strategy, and the strategies differ enough that choosing the wrong one provides false confidence rather than real protection. The useful work right now is in making those trade-offs visible to developers building plugin systems and agent infrastructure, so that the choice is made deliberately rather than by default.

Was this interesting?