· 6 min read ·

How V8 Taught WebAssembly to Guess and Recover: Speculative Inlining and Deopts in Wasm

Source: v8

Looking back at what shipped with Chrome M137 in mid-2025, V8’s implementation of speculative inlining and deoptimization for WebAssembly deserves more attention than it got at the time. This is not a small tweak. It represents a fundamental shift in how V8 reasons about Wasm execution, borrowing a technique that JVM engineers spent years refining and adapting it to a runtime with very different constraints.

The call_indirect Problem

WebAssembly’s call_indirect instruction is the mechanism the spec provides for dynamic dispatch. Instead of jumping directly to a known function address, call_indirect takes an index into a function table at runtime, performs a mandatory type check, and then dispatches. In WebAssembly Text Format, it looks like this:

(module
  (type $fn-type (func (param f64) (result f64)))
  (table 16 funcref)
  (func $dispatch (param $idx i32) (param $x f64) (result f64)
    local.get $x
    local.get $idx
    call_indirect (type $fn-type)
  )
)

This is how C++ virtual function calls end up in Wasm. A vtable lookup in native code becomes a table index load and a call_indirect. Rust’s dyn Trait objects follow the same path. So does any higher-level language feature that requires late binding. The type check is not optional; the Wasm spec mandates it for sandboxing reasons, and V8 cannot elide it without a replacement guarantee.

The overhead compounds. The indirect dispatch prevents the compiler from seeing across the call boundary. If you call area() on a Shape* through call_indirect, the optimizer cannot inline the Circle::area() body even if, at runtime, every single call goes through Circle. Without that visibility, constant propagation stops at the call edge, dead code elimination cannot eliminate the branches in the callee, and the CPU’s branch predictor has to work harder on an indirect jump.

For computationally intensive Wasm workloads, such as game engines compiled from C++, audio processing libraries, or anything using object-oriented patterns heavily, this overhead is not theoretical. It accumulates across millions of dispatch sites.

What Speculative Inlining Does

V8’s solution is to collect feedback during the baseline compilation tier (Liftoff) about what function each call_indirect site actually calls at runtime. If a particular site calls the same function 99% of the time, that is a monomorphic site. V8’s optimizing compiler (Turboshaft) can then generate code that looks roughly like this in pseudocode:

; Speculative inline of call_indirect
loaded_target = table[idx]
if loaded_target != expected_function_ref:
    deoptimize()  ; bail out to Liftoff
; ... inlined body of expected_function ...

The guard check is cheap: a single comparison and a conditional branch that the predictor will almost always predict correctly. In exchange, the entire body of the callee is now visible to Turboshaft. Constant folding, load elimination, and subsequent inlining of inner calls all become possible across what was previously an opaque dispatch boundary.

This is the same fundamental idea that Java’s HotSpot JIT has applied to virtual method calls since the early 2000s. HotSpot calls the bail-out path an “uncommon trap,” and its C2 compiler has been performing speculative devirtualization for a long time. The JVM can even handle bimorphic sites, where two different concrete types appear, by generating a cascade of type checks with inlined bodies for each. V8’s Wasm implementation starts with the monomorphic case, which covers the majority of real dispatch sites in practice.

Deoptimization Is the Hard Part

Speculative inlining only works if you have a reliable way to fall back when the speculation is wrong. Without deopts, you are stuck with conservative optimizations that do not require correctness guarantees to hold. The V8 JS engine has had deoptimization infrastructure since its early days, so the concept is not new to the team. But applying it to WebAssembly introduced genuinely new problems.

In JavaScript, a deopt reconstructs a JS stack frame and hands execution to the interpreter. The interpreter understands JS values, the scope chain, and the semantics of every operation. In WebAssembly, there is no interpreter tier. V8’s Wasm execution tiers are Liftoff (a fast single-pass baseline JIT) and Turboshaft (the optimizing JIT). A deopt from Turboshaft-compiled code means resuming execution in Liftoff-compiled code.

That sounds straightforward until you consider that Liftoff and Turboshaft produce completely different machine code with different register allocations, different stack frame layouts, and different calling conventions for internal operations. A deopt cannot simply jump from a Turboshaft frame into a Liftoff frame mid-execution; it has to reconstruct the state that Liftoff would have at the equivalent program point.

This requires V8 to emit “deopt data” alongside Turboshaft-compiled Wasm functions: a description of how every live Wasm local and stack value maps to a location in the Turboshaft register file or stack at each potential deopt point. When a guard check fails, the runtime reads this data, extracts all the live values from the Turboshaft frame, and assembles them into a Liftoff frame, then continues in the Liftoff-compiled version of the same function.

The correctness requirements here are strict. WebAssembly’s memory model and validation rules mean there is no slack for frame reconstruction errors. A misread stack slot in JS deopt recovery might produce a wrong value; the same in Wasm can violate the memory safety guarantees the sandbox is supposed to provide. The V8 team had to design and verify this frame reconstruction logic carefully, which is part of why this capability took time to arrive despite the underlying JIT infrastructure existing for years.

Why These Two Features Need Each Other

Speculative inlining without deopts is possible but severly limited. Without a reliable bail-out path, the compiler can only inline when it can prove at compile time that the speculation cannot fail, which defeats the purpose of speculative optimization. Alternatively, it could generate duplicate code paths: one with the inline, one without, connected by a type check. That approach inflates code size and still prevents the optimizer from treating the inlined path as the definitive path.

With deopts, Turboshaft can commit to the inlined path. The guard check is there, but the generated code on the fast path is structurally as if the indirect call never existed. The optimizer sees through it. Subsequent passes can fold constants from the callee back into the caller, merge memory accesses, and so on. The benefit compounds.

This is exactly the same compounding effect that makes Java’s HotSpot optimizations so effective for long-running server workloads. The JVM warms up by collecting type feedback, then the C2 compiler inlines aggressively based on that feedback, enabling further optimizations that are invisible before inlining occurs. A single inlining decision can unlock a chain of secondary transformations that produce code dramatically faster than a conservative compilation would.

What This Means for Real Wasm Workloads

The workloads that benefit most are those that were already bottlenecked on dynamic dispatch. C++ code compiled to Wasm through Emscripten that uses polymorphic class hierarchies is an obvious case. Large Rust applications using dyn Trait extensively will see improvement. Game engines exported to WebAssembly, which often have deep virtual dispatch chains in their entity component systems and rendering paths, stand to gain substantially.

There is also an indirect effect on the broader Wasm ecosystem. One persistent criticism of WebAssembly as a compilation target for high-level languages has been that it strips away the type information that native compilers and runtimes use for optimization, then forces dynamic dispatch through call_indirect without giving the JIT any way to recover it. Speculative profiling-guided inlining is a meaningful answer to that criticism: the JIT observes what actually happens and adapts accordingly.

The V8 blog post on these optimizations describes this as allowing “better machine code by making assumptions,” which understates it a little. The mechanism is the same class of technique that turned the JVM from a slow curiosity in the late 1990s into a performance-competitive platform. For Wasm, this is not the end of that journey, but it is a meaningful step onto the same path.

Was this interesting?