· 6 min read ·

From Trampolines to return_call: What Scheme Demands from WebAssembly

Source: eli-bendersky

Eli Bendersky recently published a detailed walkthrough of compiling a subset of Scheme to WebAssembly. It is worth reading carefully, not because Scheme is an obscure niche, but because the problems Scheme forces you to confront are the problems every serious language compiler for Wasm eventually hits. Toy compiler examples target Wasm with integer arithmetic, simple function calls, and a hand-rolled allocator. None of that prepares you for a language with real runtime semantics.

Scheme is syntactically minimal but semantically demanding. It requires proper tail calls (not simulated), first-class closures over lexical environments, garbage collection, dynamic typing, and call-with-current-continuation (call/cc). Each of these pushes on a different part of the Wasm spec, and until 2023, several of those parts were not ready.

The Tail Call Problem

Scheme specifies that calls in tail position must not grow the stack. This is not an optimization hint; it is part of the language specification. Iterative processes written as tail-recursive functions, state machines in continuation-passing style, and trampolined interpreters all depend on this guarantee. Without it, idiomatic Scheme is not merely slow on the target platform, it is semantically broken.

For most of Wasm’s history, tail calls were absent. The standard workaround was the trampoline: instead of calling a function in tail position, you return a thunk representing the deferred call, and a driver loop at the top level invokes it in a loop.

;; Tail-recursive factorial in direct style
(define (fact-iter n acc)
  (if (= n 0)
      acc
      (fact-iter (- n 1) (* n acc))))

;; Trampoline-compatible version: tail call becomes a returned thunk
(define (fact-iter/tramp n acc)
  (if (= n 0)
      acc
      (lambda () (fact-iter/tramp (- n 1) (* n acc)))))

(define (trampoline f)
  (let loop ([v f])
    (if (procedure? v) (loop (v)) v)))

This works, but it changes the shape of the compiled output significantly. Every tail call becomes an allocation. Every function that participates in a tail-call loop needs to be restructured. The driver loop adds overhead. And the transformation is not local: it infects every function that might be called in tail position.

The Wasm tail call proposal introduced return_call and return_call_indirect instructions, which perform a function call in tail position without allocating a new stack frame. The proposal reached Phase 4 in April 2023 and shipped in Chrome 112 that same month, with Firefox and Safari following later in 2023 and 2024. With return_call available, the compiled output for a tail-recursive Scheme function looks like this:

(func $fact-iter (param $n i32) (param $acc i32) (result i32)
  (if (i32.eqz (local.get $n))
    (then (return (local.get $acc)))
    (else
      (return_call $fact-iter
        (i32.sub (local.get $n) (i32.const 1))
        (i32.mul (local.get $n) (local.get $acc))))))

No thunks. No trampoline. No heap allocation per iteration. The semantics drop directly to a Wasm instruction.

Closures and the GC Proposal

Scheme closures capture their lexical environment. A closure is a pair: a function reference and the environment in which it was created. Representing this in Wasm’s linear memory is possible, but it requires a custom allocator, a manual layout for environment structs, a tagging scheme for dynamic values, and either reference counting or a garbage collector to reclaim memory.

The Wasm GC proposal introduces managed heap types: struct for fixed-layout records and array for homogeneous sequences, with reference types the host GC understands. These types live outside linear memory. The runtime allocates and collects them.

For Scheme, this means closures can be represented as GC structs holding a function reference and an array of captured values:

(type $closure
  (struct
    (field $func funcref)
    (field $env (ref $env-array))))

(type $env-array
  (array (mut anyref)))

The compiler does not need a custom allocator, does not need to track object lifetimes, and does not need to implement a mark-and-sweep pass. The host GC handles it. The GC proposal reached Phase 4 in late 2023 and shipped in Chrome 119 and Firefox 120.

Without the GC proposal, a Scheme compiler targeting Wasm must implement a garbage collector inside Wasm linear memory. Guile’s Hoot compiler did this in early prototyping, and it is doable, but it duplicates substantial work the host runtime already provides. It also cannot easily interact with the host heap for JavaScript object interop, which matters in browser contexts.

Continuations and CPS Transformation

The hardest problem in compiling Scheme is call/cc. call-with-current-continuation captures the current continuation as a first-class value. You can store it, invoke it later, invoke it multiple times. It is the general mechanism underlying early exit, coroutines, generators, and nondeterministic search.

Wasm’s control flow is structured: blocks, loops, and if-else with labeled branches. There is no mechanism for capturing or restoring arbitrary continuations mid-function. The structured control flow is a deliberate design choice for validation and security, but it rules out direct translation of call/cc.

The standard compiler technique is a CPS transformation. In continuation-passing style, every function takes an extra argument representing the continuation, the computation that consumes the function’s result. The function never implicitly returns; it calls its continuation with the result. All control flow becomes explicit, and call/cc reduces to handing the current continuation object to the called function:

;; Direct style
(define (add x y) (+ x y))

;; CPS style
(define (add-cps x y k)
  (k (+ x y)))

;; call/cc in CPS is just passing k as a value
(define (call/cc-cps f k)
  (f k k))  ; f receives k as both its argument and its continuation

In a CPS-transformed program, every tail call is already a tail call, so return_call applies uniformly. Continuations are closures, so the GC proposal handles their lifetimes. The transformation is invasive: every function gains a continuation argument, every call site passes one, and the IR gets more complex. But the result is a program that compiles cleanly to Wasm without needing any special continuation support from the host.

Full, restartable call/cc is the expensive case, because calling a saved continuation multiple times requires either copying or restoring stack state. Many production Scheme compilers restrict continuations to escape-only or one-shot semantics, which avoids the copying cost while covering most practical use cases: exceptions, early exit, and simple coroutines.

The Historical Context

It is worth noting how compilers dealt with these problems before the Wasm proposals existed. CHICKEN Scheme targets C using a technique called Cheney on the MTA, described by Henry Baker in 1994. The core idea: CPS-transform the whole program, allocate everything on the C stack, and when the stack overflows, copy live objects into the heap and continue from where you left off. This gave CHICKEN proper tail calls and full call/cc on a platform (C) that natively supports neither. The technique required heroics in the compiler and runtime.

Wasm’s 2023 proposals mean those heroics are no longer necessary. Tail calls are a first-class instruction. Garbage collection is a host service. A Scheme compiler targeting Wasm can focus on the semantic transformation (primarily CPS) and rely on the runtime for the memory and control flow infrastructure.

What This Reveals About Wasm’s Evolution

WebAssembly started as a compilation target for C and C++, languages that map cleanly to flat memory and structured control flow. The proposals that have shipped in the last two years, GC, tail calls, exception handling, reference types, have been driven primarily by the needs of higher-level languages: Java, Kotlin, Dart, OCaml, and now Scheme.

The result is a Wasm that is usable as a general compilation target, not just for systems languages but for languages with demanding runtime semantics. The Hoot project from the Spritely Institute demonstrates this concretely: it targets R7RS Scheme, handles tail calls with return_call, represents closures as GC structs, and implements call/cc via CPS transformation. It passes a substantial portion of the R7RS test suite and runs nontrivial programs in browser and server environments.

Scheme is a useful stress test for Wasm because it is small enough to implement end-to-end in a reasonable codebase and demanding enough to expose every gap in the platform. If a language with Scheme’s semantics can compile cleanly to Wasm, the argument that Wasm is only for systems languages becomes much harder to sustain.

Eli’s post works through this in the kind of detail that builds real understanding of both the language being compiled and the target platform. The exercise is worth undertaking even if your goal is never to ship a Scheme runtime; the problems it surfaces are the same ones you will hit compiling any language with closures, dynamic types, and non-trivial control flow.

Was this interesting?