· 7 min read ·

Checkpoint and Restore for Wasm: The Case for Interpreter-First Design

Source: lobsters

The interesting thing about snapshotting a running program is that it sounds simple until you think about what “state” actually means. For a WebAssembly module, state lives in several places: linear memory, global variables, table elements, and the execution stack itself. Most tools that claim to snapshot Wasm only capture the first three. Capturing a module mid-execution, with the call stack and value stack intact, is a different problem entirely, and one that interpreter-based runtimes like gabagool are better positioned to solve.

What Wasm Execution State Actually Looks Like

WebAssembly is a stack machine. Instructions push and pop values from an operand stack. Function calls create new stack frames, each with their own local variables. The WebAssembly core specification defines this precisely: each frame holds a reference to the function being executed, an instruction pointer into the function’s bytecode, and a vector of local values (parameters plus declared locals).

To snapshot a running Wasm execution completely, you need to capture all of it:

  • Linear memory: the byte array Wasm uses as its heap, up to 4 GB in the current specification
  • Global variables: module-level mutable state shared across functions
  • Tables: arrays of function references, and more broadly, reference types
  • Value stack: the operand stack, which may hold partial results mid-computation
  • Call stack: each active frame, with its locals, its instruction pointer, and its reference to the current function

The first three are relatively easy to snapshot. Tools like Wizer, built by Nick Fitzgerald at the Bytecode Alliance, handle exactly this: run the module’s initialization functions, then freeze the resulting memory and globals into a new .wasm binary with data segments baked in. That approach eliminates cold start overhead for modules with expensive startup sequences, but it is a snapshot of state at rest. The call stack is empty at init time; there is nothing to capture there.

The last two items are where the difficulty concentrates, and the difficulty is not uniform across runtime architectures.

Why JIT Runtimes Struggle With This

The dominant production Wasm runtimes, wasmtime using the Cranelift compiler and wasmer, compile Wasm bytecode to native machine code before execution. The performance benefits are significant, but the tradeoff is that execution state becomes entangled with the native stack.

When a JIT-compiled Wasm function calls another function, the compiler emits a native CALL instruction. The call stack is now a native call stack, with frames laid out according to the host platform’s calling convention. Local variables live in registers or native stack slots. The program counter is a native instruction pointer into JIT-emitted machine code.

To snapshot this state, you would need to take one of three paths:

  1. Walk the native stack and reconstruct Wasm-level state from it, requiring something like DWARF debug information mapping native program counters back to Wasm instruction offsets, along with register and stack-slot maps for every local in every active frame.
  2. Compile Wasm with explicit checkpointing support that saves relevant state to a side structure before each potential checkpoint, adding overhead on every function call even when no snapshot is taken.
  3. Switch to interpreted execution when a checkpoint is requested, a hybrid approach that complicates the runtime considerably and introduces a mode-switch performance cliff.

wasmtime has useful features for pausing execution, including epoch interruption and fuel consumption, but these are interruption mechanisms rather than state capture. wasmtime’s serialization support covers compiled modules (the native code output, so you avoid recompilation), not the execution state of a running instance.

CRIU (Checkpoint/Restore in Userspace) does full process snapshot-restore at the Linux kernel level, and you could in principle use it to freeze a Wasm runtime process. That gives you an OS process image, not a portable Wasm execution snapshot. A CRIU image from an x86 Linux host cannot be restored on an ARM machine.

The Interpreter Advantage

A pure interpreter handles all of this differently. Instead of compiling to native code, it maintains an explicit representation of the Wasm stack machine in its own heap-allocated data structures. In Rust, the core of such an interpreter looks roughly like:

#[derive(Serialize, Deserialize)]
struct Frame {
    func_idx: u32,
    ip: usize,           // instruction pointer into function body
    locals: Vec<Value>,
}

#[derive(Serialize, Deserialize)]
struct Machine {
    memory: Vec<u8>,
    globals: Vec<Value>,
    tables: Vec<Vec<Value>>,
    call_stack: Vec<Frame>,
    value_stack: Vec<Value>,
}

Every piece of execution state is an ordinary heap-allocated value. Serializing a Machine captures the complete execution state. Deserializing it and resuming the interpreter’s dispatch loop picks up exactly where it left off. There is no native call stack to walk, no JIT code to reconstruct, no platform-specific register state to save and restore.

This is the architecture gabagool implements: a Wasm interpreter where the snapshot includes the full execution state, not just the module’s memory. You can pause a running computation, write its state to disk or transmit it over a network, and resume it on any host running the interpreter.

Prior Art in the Space

Several projects have explored parts of this problem.

Wizer covers the initialization-snapshot case. It is production-ready and widely used for Wasm cold-start optimization, but it only captures state before any user-level execution begins.

WAMR (Wasm Micro Runtime), also from the Bytecode Alliance, targets embedded systems with a focus on small footprint and fast startup. It includes AOT snapshot capabilities oriented around reducing initialization overhead on constrained devices, not migrating live execution state.

The JVM offers an instructive parallel. The CRaC project (Coordinated Restore at Checkpoint) in OpenJDK lets JVM-hosted applications checkpoint themselves by halting execution at a coordination point and capturing JVM-level state. The “coordinated” part matters: the application must cooperate to flush I/O state and release resources before the checkpoint. CRaC solves the same fundamental problem that makes JIT runtimes difficult to snapshot, by treating the JVM as the authoritative holder of execution state rather than the native CPU stack. Interpreter-based Wasm runtimes get this property without any extra machinery.

What Full Snapshotting Enables

The use cases become more interesting when the snapshot covers live execution state rather than just initialized memory.

Serverless and edge cold starts: The standard model initializes the module from scratch on every invocation. A snapshot taken after initialization but before the first request can be restored directly. Cloudflare Workers achieves sub-millisecond startup times in part through V8 isolate snapshots, which capture the JavaScript engine’s internal state after a module has been parsed and initialized. Portable Wasm-level execution snapshots extend this pattern across runtimes and processor architectures without tying the optimization to a specific engine.

Deterministic replay debugging: Snapshot the execution state before a suspicious function call, run to completion and observe the failure, then restore the snapshot and step through with a debugger. Reproducing subtle bugs often requires exact state reconstruction; a snapshot gives you that without needing to reconstruct the input sequence that led to the state.

Live migration: A long-running computation can be checkpointed on one node and resumed on another. This is useful for load balancing or draining nodes for maintenance. Because a Wasm-level snapshot is architecture-neutral, it works across machine types, unlike OS-level process images.

Fork semantics: Snapshot once, restore multiple times from the same checkpoint. This gives you branch-from-a-point semantics for computation, where multiple independent continuations proceed from identical starting state. The pattern is useful for simulation, fuzzing, and certain classes of speculative execution.

The WASI Complication

All of the above applies cleanly to pure Wasm computation with no host imports. WASI (the WebAssembly System Interface) complicates things. File descriptors, socket connections, environment variables, and clock state are managed by the host, not by the Wasm module. A snapshot of the interpreter’s internal machine state does not capture what file descriptor 3 points to on the host operating system.

Solving this completely requires one of three approaches: restricting snapshot-restore to points where no I/O is in-flight, capturing WASI state as part of the snapshot format (which requires cooperation from the host layer), or scoping the feature to computation-only workloads that do not rely on WASI I/O. This is a genuine constraint, not a theoretical edge case, and it sets a ceiling on where interpreter snapshotting works without additional machinery.

The WASI 0.2 Component Model has made progress on standardizing host interfaces, but execution-state snapshot formats are not part of the current roadmap. Any solution for WASI-bearing snapshots will need coordination between the runtime, the host interface layer, and probably the application itself, much as JVM CRaC requires application-level cooperation.

The Performance Trade-off

Interpreters are slower than JIT compilers. For compute-intensive workloads, the performance gap is typically one to two orders of magnitude, and it does not disappear with clever implementation. Gabagool trades throughput for snapshotability as a first-class feature.

For use cases where snapshotability matters more than peak throughput, that is a defensible trade-off. Serverless initialization, debugging infrastructure, and computation migration are not on the hot path for most applications. The question worth asking is not which runtime is fastest in the abstract, but which runtime has the properties a given use case requires.

The broader ecosystem question is whether Wasm will eventually develop standardized snapshot formats that work across runtime types. That would require either standardizing the serialization of Wasm execution state as a spec-level concept, or building native-stack introspection infrastructure into JIT compilers in a portable way. Neither path is straightforward. In the meantime, interpreter-based runtimes like gabagool occupy a genuinely distinct position in the design space, offering capabilities that JIT-first architectures have not yet matched.

Was this interesting?