WebAssembly was supposed to make the optimizer’s job easier. You get explicit types, a validated instruction stream, no prototype chains, no hidden class transitions. Compared to JavaScript, Wasm looks almost naively optimizable. And yet, one instruction has been quietly resisting meaningful optimization since Wasm shipped: call_indirect.
In June 2025, the V8 team published details about two features that shipped with Chrome M137: speculative inlining for call_indirect, and deoptimization support for WebAssembly. The pairing was not incidental. Speculative inlining without deopt support is not really speculation; it is a permanent commitment. To understand why these features matter and what made them difficult, it helps to start with what call_indirect actually does.
The Opacity of Indirect Calls
In WebAssembly’s execution model, indirect calls go through a function table. A module declares one or more tables of funcref values, populates them at load time (often by the Emscripten or wasm-bindgen runtime), and then calls into them using an index. The instruction:
local.get $callee_idx
call_indirect (type $sig) (table $funcs)
performs a bounds check on $callee_idx, verifies the table entry matches the expected type signature $sig, and dispatches. This is how C++ virtual dispatch shows up in Wasm. It is also how Rust trait objects, C function pointers, and std::function calls appear after Emscripten or wasm-pack compilation.
From the optimizer’s perspective, the callee is unknown. The index is a runtime value that could be anything. That opacity has compounding consequences: you cannot inline the callee body, cannot propagate constants across the boundary, cannot eliminate type checks inside the callee that the caller’s context would trivially resolve, and cannot vectorize loops that contain such calls. A hot path through a polymorphic C++ class hierarchy compiles to a chain of call_indirect instructions that Turboshaft, V8’s optimizing Wasm compiler, has historically had to treat as black boxes.
What Speculation Requires
The standard answer to opaque indirect calls in JIT compilers is profiling plus speculation. You observe what callee actually gets dispatched at a given call site, make the educated bet that it will keep getting dispatched, inline it, and guard the inlined path with a check. If the check fails, you recover.
The Java HotSpot JVM has done this for virtual calls since the early 2000s. When one target dominates a virtual call site, HotSpot speculatively inlines it and guards with a class check. If a second target appears frequently enough, it produces a bimorphic inline. The fallback is the vtable dispatch. This is not exotic: it is the foundational technique behind HotSpot’s reputation for making object-oriented Java code run fast.
V8 does the same for JavaScript. Inline caches track which object shapes appear at a call site. When one shape dominates, V8 emits specialized code for it and falls back through a deoptimization if a different shape appears. Every JavaScript developer who has heard “avoid megamorphic call sites” is being warned about the cost of defeating these inline caches.
Extending this to Wasm required two pieces of infrastructure. First, a way to collect call site feedback. Second, and harder, a working deoptimization mechanism for Wasm-compiled code.
Why Deopt Was the Hard Part
Deoptimization means: stop executing the speculatively compiled code mid-frame, reconstruct the execution state, and hand control to the baseline compiler’s code to continue from the same program point. For JavaScript, V8’s deopt mechanism drops back to Ignition, the bytecode interpreter. The interpreter’s frame format is stable and well-understood, and V8 has maintained the bookkeeping needed to reconstruct an interpreter frame from an optimized frame for years.
For WebAssembly, the baseline is Liftoff, V8’s single-pass streaming compiler. Liftoff compiles Wasm quickly without optimization, trading code quality for startup latency. Its frame layout, register assignments, and stack slot conventions differ from what Turboshaft produces. When a speculation fails mid-execution in a Turboshaft-compiled Wasm frame, V8 needs to:
- Identify all live Wasm locals and operand stack values at the deopt point.
- Determine where each of those values is in the Turboshaft frame (register, stack slot, or a constant folded away).
- Reconstruct them into the Liftoff frame’s expected layout.
- Transfer control to the correct Liftoff program counter.
This requires Turboshaft to maintain what are called “frame state” descriptors at every potential deopt point during compilation. Every speculatively inlined call site needs one. These descriptors map each logical Wasm value to its physical location in the optimized frame at that moment. Building this tracking through an optimizing compiler without degrading compilation throughput or code quality elsewhere is the engineering challenge that kept Wasm deopts out of V8 for years.
The shift to Turboshaft (which replaced Turbofan as the primary Wasm backend) helped here. Turboshaft was designed with a cleaner intermediate representation that makes frame state tracking more tractable. The deopt infrastructure for Wasm could be built on the same foundations that support JS deopts in Turboshaft without needing an entirely separate implementation.
Profiling at the Liftoff Layer
With deopt infrastructure in place, the feedback collection side becomes feasible. Liftoff instruments call_indirect sites to record which callee index is actually dispatched at runtime. This fits naturally into V8’s tiered execution model: Liftoff runs first, the user interacts with the application, and the profiling data accumulates. When Turboshaft compiles a hot function, it can read the call_indirect feedback for that function and find which callees dominate which call sites.
For a call site where a single callee index accounts for nearly all observed dispatches, Turboshaft generates something like:
if (callee_index == expected_index && table[expected_index] == expected_func) {
// inlined body of expected_func
} else {
deoptimize() // fall back to Liftoff
}
The inlined body is now part of the optimized function’s IR. Turboshaft can optimize across it. Constants from the calling context flow into the callee’s body. Redundant checks collapse. Loops that were interrupted by an opaque call site can now be analyzed as a unit.
For bimorphic sites (two frequent callees), V8 can chain two guards before the deopt fallback, though the benefit diminishes as the number of likely targets grows. Highly polymorphic call_indirect sites remain essentially unspeculatable.
The Irony in the Pipeline
There is a structural irony in doing this work at the V8 layer. When a C++ codebase is compiled to WebAssembly via Emscripten, LLVM’s optimizer runs first. LLVM has its own devirtualization pass that attempts to resolve virtual calls to direct calls using type hierarchy analysis and profile-guided optimization. For call sites that LLVM successfully devirtualizes, the resulting Wasm contains direct call instructions, not call_indirect. Those sites never needed V8’s speculation infrastructure.
But any call site that LLVM could not devirtualize, typically because the type hierarchy is genuinely polymorphic at that point, survives into the Wasm binary as call_indirect. By then, all of LLVM’s type hierarchy information is gone. The Wasm binary is a typed but opaque instruction stream with no knowledge of C++ class relationships. V8 has to rebuild the profile from scratch through runtime observation, re-discovering at runtime what LLVM already partially analyzed at compile time.
This is not a failure of the Wasm design. The semantics of the binary format deliberately discard source-language-specific metadata to remain language-neutral. But it does mean that speculative optimization at the Wasm runtime layer is partly redundant work, and that well-tuned PGO builds with LLVM may devirtualize enough at compile time that fewer sites remain for V8’s speculation to act on. For codebases with genuinely polymorphic dispatch patterns, however, V8’s runtime profiling observes the actual dynamic type distribution, which LLVM’s static analysis approximates.
Real-World Relevance
The workloads most affected are those that compile large C++ codebases to Wasm: game engines using Emscripten, complex desktop applications targeting the web via tools like Qt for WebAssembly, and scientific computing libraries that lean heavily on polymorphic abstractions. These tend to have hot paths that pass through virtual dispatch, and those paths have historically been the performance gap between Wasm and native code.
Rust Wasm users are somewhat less affected, because Rust’s trait system often resolves to static dispatch or function pointer calls that LLVM can handle more aggressively, but codebases using Box<dyn Trait> or Arc<dyn Trait> extensively will produce call_indirect patterns that benefit from V8’s new feedback mechanism.
Where This Fits in V8’s Trajectory
V8’s WebAssembly compiler has been evolving continuously. The transition from Turbofan to Turboshaft as the Wasm optimizing backend, the addition of loop unrolling and other classical optimizations, and now speculative inlining with deopt support represent a sustained effort to narrow the performance gap between Wasm and ahead-of-time native compilation. Each piece builds on the last: Turboshaft’s cleaner IR enabled better deopt tracking, deopt tracking enabled speculation, speculation enables inlining, and inlining enables the downstream optimizations that make the real difference in tight loops.
The original V8 post includes benchmark numbers demonstrating the improvement across several workloads. The headline technique is straightforward by the standards of JIT compiler research, but the implementation required years of infrastructure investment to make correct and production-safe. That gap between “the idea” and “the shipping feature” is the real story in most compiler engineering, and this is a clear example of it.