Snapshot Any Running Wasm Program Without Touching the Binary

WebAssembly runtimes tend to compete on execution speed. Wasmtime, Wasmer, V8’s Liftoff and TurboFan pipelines, Cranelift backends: the discourse is dominated by benchmark numbers, JIT compilation latency, and steady-state throughput. Gabagool takes a different position entirely. It is a Wasm interpreter written in Rust that prioritizes something most runtimes give up at the design level: the ability to pause execution at any instruction boundary, serialize the complete program state to bytes, and restore it later, possibly on a different machine.

That property sounds simple, but most runtimes sacrifice it the moment they adopt JIT compilation.

Why JIT Makes Snapshots Hard

A JIT-compiled Wasm runtime translates Wasm bytecode into native machine code before or during execution. Wasmtime uses the Cranelift code generator. Wasmer supports multiple backends including LLVM and Singlepass. V8 compiles Wasm through Liftoff for fast baseline compilation and TurboFan for optimized steady-state code.

Once Wasm is compiled to native code, the execution state lives in CPU registers and on the native call stack. A function activation frame in compiled Wasm looks like an ordinary native frame: return addresses are raw virtual addresses, local variables live at stack-relative offsets that depend on the register allocation decisions the compiler made, and intermediate computed values might be in CPU registers, in the native stack frame, or in SIMD registers depending on what the optimizer decided.

Checkpointing this state is the same problem as checkpointing any native process. CRIU solves it for Linux by operating at the OS level, freezing the entire process, dumping memory maps, and encoding the native call stack, but the result is tied to the specific OS, CPU architecture, and in practice the specific kernel version. It is not portable, not lightweight, and requires elevated privileges.

Some JIT runtimes add explicit support for yield points by maintaining a parallel shadow stack. This approach is complex to implement correctly across all optimization tiers and usually supports only specific, annotated yield points, not arbitrary instruction-boundary snapshots.

The Existing Workaround: Asyncify

Before interpreters like gabagool entered this space, the standard technique for snapshotable Wasm was Emscripten’s Asyncify. Asyncify is a binary transformation applied at compile time. It instruments Wasm code so that when an unwind is triggered, the module saves its call stack state, including all local variables and position within each function, into the module’s own linear memory. When a rewind is triggered, the module reconstructs its call stack from that memory and resumes.

Asyncify is deployed at scale for cooperative multitasking, for awaiting async host functions from synchronous Wasm code, and for serializable snapshots. It works.

The limitations are architectural. Asyncify requires compile-time instrumentation, which means you need access to the source or at minimum the ability to run the transform pass on the binary. Binary size increases by 40 to 100 percent for instrumented sections. There is runtime overhead on the hot path, typically in the 10 to 30 percent range for compute-heavy code. The state is stored inside the module’s own linear memory, coupling snapshot format to the module’s internal layout. Most importantly, Asyncify unwinds at specific recognized yield points. Arbitrary instruction-boundary snapshotting across any opcode is not the design goal.

Wasm’s State Model Is Already a Data Structure

What makes gabagool’s approach tractable is a property of the Wasm specification that JIT runtimes are actively trying to optimize away. The Wasm execution model defines an explicit, well-specified state:

A value stack holding all in-flight operand values
A call stack of activation frames, each holding local variable bindings and a return position
One or more linear memory instances, each a contiguous byte array
A global variable store
A table of function references
An execution position expressible as a (function index, instruction offset) pair

This is not an implementation detail; it is the specification. Every conforming Wasm runtime must produce the same results as if this state model were being followed exactly.

For an interpreter, this state is not an abstraction over some underlying native representation. It is the state, held in ordinary data structures. In a Rust interpreter, it might look roughly like this:

struct ExecutionState {
    value_stack: Vec<Value>,
    call_stack: Vec<Frame>,
    memory: Vec<u8>,
    globals: Vec<Value>,
    tables: Vec<Vec<Option<FuncRef>>>,
    pc: ProgramCounter,
}

struct Frame {
    func_idx: u32,
    locals: Vec<Value>,
    return_pc: ProgramCounter,
}

struct ProgramCounter {
    func_idx: u32,
    instr_offset: usize,
}

Serializing this to bytes is a straightforward serde problem. Restoring it is equally straightforward. There is no native stack to reconstruct, no JIT-generated code addresses to fix up, no CPU register state to save. The entire execution state is a tree of owned Rust values.

Gabagool exploits exactly this: because it never compiles Wasm to native code, the execution state remains fully inspectable and serializable at any point between instruction dispatches.

What the Snapshot Contains

A gabagool snapshot captures everything needed to resume execution.

Value stack. The operand stack at the moment of snapshot. For a typical computation this is small, but deeply nested expression evaluation can produce a larger stack.

Call stack. Each frame records the function it belongs to, the values of all local variables (function parameters are locals in Wasm), and a return position pointing back to the call site.

Linear memory. The full contents of the Wasm module’s memory. This is usually the largest component of the snapshot. A module using 64 MB of linear memory produces a 64 MB memory dump. Compression helps considerably; Wasm linear memory often has structured patterns with significant zero runs.

Globals. The current values of all mutable global variables.

Tables. The current contents of function reference tables, which can be modified at runtime by table.set instructions.

Execution position. The function index and instruction offset indicating exactly where execution was paused.

Host-imported state is not captured. If a Wasm module holds an open WASI file descriptor, that descriptor exists on the host side and is not part of the Wasm snapshot. This is the main semantic gap in any interpreter-level checkpoint: the Wasm state is portable, but the surrounding host state is not. Restricting snapshot boundaries to points between Wasm-internal computation and host interaction sidesteps most of this in practice.

The Properties That Make This Work

Wasm’s design is unusually snapshot-friendly compared to native code, and it is worth being precise about why.

No hidden register state. Wasm has no CPU registers. All in-flight values live on the explicit operand stack. There is nothing outside the interpreter’s data structures that needs capturing.

Structured control flow. Wasm uses blocks, loops, and if/else rather than arbitrary jumps. Execution position is always representable as a small portable tuple. There is no equivalent of a native return address being a raw pointer into JIT-generated code.

Deterministic execution. Given the same state and the same inputs from host imports, a Wasm execution proceeds identically. Integer arithmetic behavior is specified exactly; trap conditions are defined precisely. This makes snapshots not just serializable but reproducible.

Sandbox boundaries. A Wasm module cannot hold OS file descriptors, socket handles, or kernel objects unless explicitly passed through WASI or host imports. The set of state that needs capturing is bounded and well-defined.

These properties are why Wasm checkpointing is substantially simpler than equivalent work for native processes, even though both involve “saving a running program’s state.”

Use Cases That Become Practical

Serverless cold start elimination. Cold starts in serverless platforms include runtime startup, module parsing and compilation, and application-level initialization. Faasm, a high-performance serverless Wasm runtime, addresses this by snapshotting after module initialization and then forking from the snapshot on subsequent invocations. AWS Lambda SnapStart does the equivalent at the microVM level using Firecracker snapshot/restore. Doing it at the Wasm interpreter level is cheaper, more portable, and works on any host that runs the interpreter.

Time-travel debugging. Wasm’s determinism means that recording a sequence of snapshots at regular intervals and replaying forward from any of them reaches an arbitrary past instruction exactly. The record-and-replay problem that makes tools like rr architecturally complex for native code, because it requires low-level hardware performance counter support and OS cooperation, becomes a much simpler problem for a Wasm interpreter. You replay by restoring a snapshot and re-executing with the same recorded host inputs.

Computation forking. Snapshot the execution state before a decision point, then explore multiple branches independently. This is useful for backtracking search, for fuzzing (snapshot a good corpus state and fuzz from it repeatedly without re-running initialization), and for speculative execution in query engines.

Live migration. A long-running Wasm computation can be paused, serialized, transmitted to a different host, and resumed. The snapshot is not tied to any CPU architecture or OS; it requires only a compatible Wasm interpreter on the destination. This is the compute equivalent of VM live migration, but at the granularity of a single module rather than an entire virtual machine.

The Speed Trade-Off

Interpreters are slower than JIT runtimes. For CPU-bound workloads, the difference is often an order of magnitude or more. Wasmtime on compute-heavy benchmarks runs roughly 2x slower than equivalent native code; a well-implemented interpreter is typically 10 to 50x slower than native on the same workloads.

Whether this matters depends on the use case. For the serverless cold-start scenario, the bottleneck is often initialization work, not steady-state throughput, and snapshot/restore skips initialization entirely. For time-travel debugging and fuzzing, workloads are often exploration-bound rather than CPU-bound. For Wasm modules doing light processing, interpreter overhead is irrelevant.

The more interesting question is whether snapshotability can be added to a JIT runtime without paying the interpreter’s throughput penalty. Some runtimes are exploring tiered approaches: interpret until a snapshot boundary, then JIT-compile and continue. Others are investigating deoptimization mechanisms similar to what V8 uses for JavaScript, where optimized native code can be decompiled back to an interpreter state under specific conditions. These are active research areas, but they are substantially more complex to implement correctly than a pure interpreter approach.

The Broader Point

Gabagool takes the position that execution state should be a first-class value: something you can inspect, serialize, transmit, and restore. The Wasm specification already defines what that state looks like. An interpreter keeps it in that form rather than compiling it away into native frames and JIT-generated addresses.

The serverless and edge computing industries have been rediscovering this property at the infrastructure level, using microVM snapshots to solve cold start problems. Doing the equivalent at the Wasm interpreter level is cheaper, more portable, and more granular. The implementation barrier is lower than it might appear; the hard part was the Wasm specification getting the state model right from the beginning, which it did. Gabagool is a clean demonstration that the interpreter trade-off, slower execution in exchange for full state transparency, is worth making in a meaningful set of contexts.