· 7 min read ·

No Interpreter to Fall Back To: How V8 Built WebAssembly Deoptimization

Source: v8

When V8 shipped speculative call_indirect inlining in Chrome M137, documented in detail on the V8 blog, the headline was clear: WebAssembly can now inline indirect calls speculatively, unlocking optimization opportunities that were previously blocked by the opacity of dynamic dispatch. Dart microbenchmarks showed over 50% speedup on average. Realistic applications gained between 1 and 8 percent. The feature is easy to understand at a high level.

What the headline obscures is that speculative inlining was not the hard part. The hard part was building deoptimization support for WebAssembly, and the V8 team is explicit that deopts are the “building block” making speculative inlining viable at all. You cannot have useful speculation without a reliable way to recover when the speculation is wrong. For JavaScript, V8 has had this recovery infrastructure for years. For WebAssembly, building it required solving a problem that does not exist in the JavaScript runtime: there is no interpreter to fall back to.

Why Speculation Requires a Recovery Path

Speculative inlining works by observing runtime behavior and betting on it. V8’s Liftoff baseline compiler collects feedback at each call_indirect site, recording which concrete function actually gets called. When a function is promoted to Turboshaft (the optimizing compiler), a monomorphic call site gets a guarded inline: a comparison against the expected callee, and if it matches, the callee’s body is compiled directly into the caller. The call_indirect’s mandatory type check, required by the Wasm specification for sandboxing, can be folded into this guard so no extra overhead is added on the fast path.

If the guard fails, a different callee appeared than the one the profiling data predicted. The Turboshaft-compiled frame has to abandon the speculative fast path and hand execution to something that can handle the unexpected case correctly.

For JavaScript, that something is Ignition, V8’s bytecode interpreter. When a Turboshaft-compiled JavaScript function deoptimizes, V8 reconstructs the Ignition interpreter state at the corresponding bytecode position: the accumulator, the register file, the stack frame. Ignition resumes execution as if the function had never been optimized. The interpreter is always available; deoptimizing into it is well-understood work, even if the implementation has years of engineering behind it.

WebAssembly has no interpreter tier. V8 runs Wasm code in either Liftoff (fast single-pass baseline compilation) or Turboshaft (optimizing compilation). When a Turboshaft-compiled Wasm function needs to deoptimize, it must hand off to the Liftoff-compiled version of the same function, continuing at the corresponding Wasm program point. Two compilers with different register allocations, different stack frame layouts, and different internal conventions need to agree on a handoff.

The Frame Reconstruction Problem

At the moment a Turboshaft-compiled Wasm function fires a deopt guard, the CPU registers and stack contain values organized according to Turboshaft’s allocation decisions. Some Wasm locals may live in registers; others may be on the stack; some may have been rematerialized from earlier computations rather than kept in storage at all. Turboshaft performs aggressive transforms: values hoisted out of loops, computations sunk into branches, dead stores eliminated. The relationship between a specific machine instruction and the Wasm abstract machine state it corresponds to is not one-to-one.

Liftoff, by contrast, is a single-pass compiler that stays close to the Wasm instruction stream. Its stack frame layout is predictable; locals map to well-defined stack slots. Resuming execution in Liftoff at the point corresponding to a deopt in Turboshaft means assembling a Liftoff frame from whatever values exist in the Turboshaft frame at that moment.

V8 solves this with deopt metadata: a table emitted alongside each Turboshaft-compiled Wasm function. Each entry in the table corresponds to a possible deopt point and encodes where every live Wasm local and value-stack entry resides in the Turboshaft register file or stack at that instruction. When a deopt fires, the runtime reads this entry, extracts the live values from the Turboshaft frame, assembles them into the format Liftoff expects, and jumps into the Liftoff-compiled function at the correct continuation point.

Generating this metadata accurately requires knowing, at every potential deopt site, the precise correspondence between Turboshaft’s internal representation and the Wasm abstract machine state. This is one reason the migration from TurboFan’s Sea of Nodes IR to Turboshaft’s CFG-based IR mattered for Wasm: in a Sea of Nodes representation, floating nodes are scheduled at compile time and the correspondence between a node’s final placement and its source program point is harder to reconstruct. Turboshaft’s explicit basic blocks and sequenced operations within those blocks make the deopt metadata table more tractable to generate and verify.

What the Wasm Spec Gives You For Free

WebAssembly’s abstract machine has a property that works in V8’s favor here. Every Wasm function declares a fixed set of typed locals. The value stack is typed and structured. Control flow uses explicit blocks and branches; there are no arbitrary jumps. At any point in a valid Wasm program, the spec precisely defines what locals exist, what types they carry, and what the current value stack looks like.

This gives the frame reconstruction mechanism a well-specified target. For JavaScript, the interpreter state includes prototype chains, property caches, and dynamic scope, all of which feed into what a deopt must restore. For WebAssembly, the list reduces to: the values of live locals and the current value stack, both statically typed and fully declared at the module level. The set of live locals at any deopt point can be computed at compile time from the module’s type information and control flow structure.

The same constraint that makes WebAssembly predictable and portable across runtimes, its rigidly specified abstract machine, also makes frame reconstruction in a deopt more straightforward than it would be for a dynamically typed language.

What This Infrastructure Enables Next

The V8 team’s framing of deopts as a “building block” rather than a one-off feature is worth examining concretely. Speculative call_indirect inlining is the first application. The WasmGC proposal introduced call_ref, which calls through a typed function reference rather than a table index. The dispatch mechanism is different, but the optimization opportunity is the same: a hot call_ref site with a dominant callee can be speculatively inlined with a guard and a deopt path. With the infrastructure already in place, extending it to call_ref is incremental.

Beyond call speculation, type narrowing becomes viable. In WasmGC code, heap references typed as abstract supertypes often hold concrete subtypes at runtime. Without deopts, the compiler generates code covering all possible concrete types at every access point. With deopts, the compiler can speculate that a value always holds the concrete subtype the profiling tier observed, generate code optimized for that subtype, and bail out on the rare case a different type appears. This is one of the larger remaining performance opportunities for compiled GC languages targeting Wasm, and it depends on exactly the frame reconstruction machinery that shipped with Chrome M137.

Reading the Benchmark Numbers

The 50% improvement on Dart microbenchmarks and 1-8% on realistic applications both describe real phenomena. The gap between them reflects where the optimization applies.

Microbenchmarks designed to demonstrate speculative inlining benefits are typically tight loops calling through a single monomorphic call_indirect site. After inlining, the entire loop body is visible to Turboshaft, enabling constant folding across the call boundary, dead store elimination, and in some cases SIMD vectorization of loops that previously stalled on the opaque dispatch. The full optimization cascade applies, and the benchmark measures essentially that cascade.

In real applications, dispatch sites vary in their polymorphism. A call site that reaches two or three different callees with comparable frequency cannot be speculatively committed to one; the guard-and-inline approach produces branching code rather than a clean inlined body, and the downstream optimization cascade does not fire. Hot paths that spend most of their time on arithmetic or linear memory access were never bottlenecked by dispatch overhead. The 1-8% range reflects the fraction of each application’s hot execution that falls into the favorable monomorphic pattern.

For Flutter on the web, where hot paths in the framework’s layout and rendering code often follow predictable virtual dispatch patterns, gains at the high end of that range affect frame timing in ways users can measure. For server-side Wasm workloads running business logic with diverse call targets, the gain will sit at the low end.

The Sequence JIT Compilers Follow

What V8 built for Wasm in Chrome M137 follows a pattern that JIT compiler development has repeated several times. HotSpot’s C2 compiler developed its uncommon trap and deoptimization mechanism in the early 2000s before Class Hierarchy Analysis and speculative devirtualization became productive features. TurboFan added deopt infrastructure to V8’s JavaScript pipeline before the most aggressive type-based speculations were layered on top. In each case, the bail-out infrastructure came first; the optimizations that depend on it followed.

For WebAssembly, the sequence arrived later, and the problem was distinct because Wasm has no interpreter to absorb deoptimized execution. The V8 blog post on speculative WebAssembly optimizations gives a thorough overview of both features as they shipped in mid-2025. The deoptimization infrastructure, the less prominent of the two, is the piece with the longer consequence: it is the mechanism that makes future speculation in WebAssembly possible, and without it, speculative inlining would have no safe exit.

Was this interesting?