· 5 min read ·

WebAssembly's Stack Machine Is a Compiler IR in Disguise

Source: eli-bendersky

Eli Bendersky picked up a recent discussion about whether WebAssembly really counts as a stack machine. The original claim, “Wasm is not quite a stack machine”, points at the locals and the absence of dup and swap as evidence that Wasm is a hybrid. Eli’s response is that the term “stack machine” has no formal definition, and Wasm fits the loose Wikipedia one well enough.

Both sides are right, and the argument is more interesting when you stop asking what Wasm is and start asking why it ended up looking the way it does. The shape of the Wasm instruction set is a direct consequence of two constraints that have nothing to do with stack-machine purity: it has to be decoded and validated in a single linear pass, and it has to be cheap to turn into SSA for a real backend. Those constraints push the design toward something that looks a lot more like LLVM bitcode than like Forth.

What a pure stack machine actually looks like

Forth is the reference point everyone uses. In Forth, the only addressable state is the data stack and a separate return stack. Variables exist, but they live in memory, and you reach them through @ (fetch) and ! (store). The instruction set includes stack-shuffling primitives because, without them, you cannot reorder operands. dup, drop, swap, over, rot, tuck, pick, roll; the ANS Forth standard lists about a dozen, and real programs use them constantly.

The JVM is the closest large-scale descendant. It has dup, dup_x1, dup_x2, dup2, swap, pop, pop2, and a full set of local-variable slots accessed by iload, istore, and friends. The JVM specification commits to a particular operand-stack depth per method, computed by the verifier, and javac emits dup aggressively to keep intermediate values around without storing them.

Wasm has none of that. Look at the instruction list in the spec and you will find local.get, local.set, local.tee, and drop, but no dup and no swap. The only way to use a value twice is to put it in a local and read it twice. The only way to reorder two values on the stack is to stash one in a local. This is not an oversight. It is the design.

Why the locals are there

The Wasm design rationale document is explicit about this. The stack model exists because it produces compact binaries and a trivial decoder. The locals exist because every production compiler that targets Wasm wants to emit SSA-like code, and SSA wants named values, not anonymous stack slots.

Consider what happens when LLVM lowers a C function through the Wasm backend. The frontend produces SSA. Register allocation in the traditional sense does not run; instead, the backend assigns SSA values either to Wasm locals or to short-lived stack pushes. The result is an instruction stream where most values move through locals, and the stack is used only for the operands of the next instruction. You can see this in any wasm-objdump of a Rust or C++ binary: long runs of local.get and local.set separated by short arithmetic bursts.

If Wasm had dup and swap instead of locals, every compiler would still need a register-allocation pass to schedule stack operations, and the resulting binaries would be larger and harder to validate. The JVM has both, and the cost shows up in HotSpot’s bytecode interpreter, which has to track an explicit operand-stack pointer and emit code for every dup variant.

Structured control flow is the bigger break

The locals are the obvious deviation from stack-machine orthodoxy, but the more consequential one is control flow. Wasm has no goto. It has block, loop, if, br, br_if, and br_table, and branches can only target the labels created by the enclosing structured constructs. The validation algorithm relies on this to type-check the stack in linear time without building a control-flow graph.

This is the part that makes Wasm decisively not a traditional stack machine. Forth and the JVM both let you jump anywhere within a method. JVM verification famously had to deal with the resulting mess; the original verifier used iterative dataflow, and the type system for jsr/ret was complicated enough that Sun eventually deprecated subroutines and switched to stack maps in classfile version 50. Wasm sidestepped all of that by forbidding unstructured jumps. The validator walks the instruction stream once, maintaining a small stack of block types, and rejects anything that does not balance.

Ben Titzer’s paper “A fast in-place interpreter for WebAssembly” describes how this single-pass property lets engines like Wizard skip the usual bytecode rewriting step and execute the original binary directly. That is a feature you cannot offer for JVM bytecode because the JVM lets you express things the Wasm grammar does not.

The IR comparison

If you squint, Wasm’s design choices line up with LLVM IR more than with any stack machine. Both have explicit typed values, both forbid unstructured control flow in their canonical form (LLVM via the dominance requirement, Wasm via the grammar), and both expect a single forward pass to validate. The difference is that LLVM IR names its values explicitly with %0, %1, and so on, while Wasm threads them through an implicit operand stack between adjacent instructions.

You can see this most clearly in the wasm-opt Binaryen tool, which internally lifts Wasm into its own expression-tree IR, optimizes there, and then re-serializes. The lift is mechanical because the stack discipline is so constrained that every instruction’s operands are unambiguously the values pushed by the immediately preceding instructions or pulled from locals. There is no aliasing between stack slots, no live-range overlap to resolve, no phi nodes to insert at merge points because the structured control flow already tells you where the merges happen and what types flow through them.

A real stack machine does not have this property. In Forth, the same stack slot can hold values of different types at different program points, and the only way to know what is on the stack at a given line is to simulate the program from the start. That is exactly the problem Eli highlights in his Forth-in-Go post: the +! definition is unreadable without the stack-state comments because the language gives you no way to name the values.

What this means in practice

For anyone writing a Wasm engine, the practical consequence is that you almost never treat Wasm as a stack machine internally. V8’s Liftoff baseline compiler walks the instruction stream and emits machine code directly, using a small register cache to avoid materializing the operand stack in memory. Wasmtime goes through Cranelift, which builds SSA on the fly from the Wasm stream. WAMR’s classic interpreter does keep a real operand stack, but its fast interpreter rewrites the bytecode into a register-based form first, exactly because dispatching through a stack is slower than dispatching through registers on modern hardware.

The stack-machine framing survives in the binary format and the spec text, and it does buy you the compactness and single-pass decodability that motivated it. Beyond that, the design has more in common with a constrained IR than with the lineage from Forth through the JVM. Calling it a stack machine is fine as shorthand. Treating it as one when you implement it will leave performance on the floor.

The debate about whether dup and swap belong in the instruction set is, in that light, a question about whether Wasm wants to be more like Forth or more like LLVM. The committee answered that years ago, and the answer was LLVM.

Was this interesting?