Cranelift's Aegraph: How Scoping Equality Saturation Made It Production-Ready
Source: lobsters
Compiler optimization research has a long history of producing techniques that work beautifully in papers and fall apart under production constraints. Equality saturation, a method for applying rewrite rules non-destructively to an expression graph, spent about fifteen years in that gap. Chris Fallin’s writeup on Cranelift’s acyclic e-graph mid-end optimizer describes how a single design constraint, restricting the e-graph to the DAG of pure SSA values, made equality saturation practical inside a JIT compiler.
The Phase-Ordering Problem Equality Saturation Solves
Traditional compiler optimization pipelines are ordered sequences of passes. GVN runs before LICM; algebraic simplification runs before constant folding; strength reduction has to go somewhere in the sequence. The problem is that these passes interact. Constant folding might expose opportunities for GVN; GVN might expose opportunities for LICM. By the time LICM runs, GVN has already finished and will not fire again. You end up tuning pass order heuristically, and the heuristic is wrong for some programs.
Equality saturation, formalized for compilers in Tate et al.’s POPL 2009 paper, sidesteps this entirely. Instead of destructively rewriting the IR, you build an e-graph: a data structure that represents many equivalent expressions simultaneously by grouping equivalent forms into e-classes. You apply rewrite rules non-destructively, adding new representations to e-classes without removing old ones. When no new merges are possible (saturation), you extract the optimal form using a cost model. Because no information is ever discarded, pass order stops mattering.
The egg library (POPL 2021) made this practical for research by introducing a rebuilding algorithm that defers congruence closure to batch updates, keeping amortized complexity manageable. Egg found uses in tensor expression optimization, floating-point accuracy improvement via Herbie, and CSS layout. Using it inside a production compiler JIT is a harder problem.
Why E-graphs Don’t Fit Compilers Directly
A compiler’s IR is a control-flow graph with loops, side effects, memory operations, and ordering constraints. A standard e-graph represents a DAG of pure expressions. Extending e-graphs to handle loops either requires special encoding or a fundamentally different IR.
The most complete alternative is the RVSDG (Regionalized Value State Dependence Graph), proposed by Reissmann et al. RVSDG encodes control flow directly into the graph using hierarchical region nodes: conditionals become gamma nodes, loops become theta nodes. Because loops have first-class representation, you can write e-graph rewrites that cross loop boundaries. The problem is that migrating Cranelift to an RVSDG-based IR would be a near-complete rewrite of the compiler. Cranelift is the backend for Wasmtime, where compile-time latency and engineering stability matter as much as code quality. A full IR migration was not viable.
The other obstacle is extraction cost. Optimal extraction from a general e-graph is NP-hard; it reduces to problems related to weighted set cover. For a JIT compiler that needs to compile functions in milliseconds, NP-hardness is a rejection, not a constraint to be managed.
The Acyclic Constraint
The aegraph is Cranelift’s answer to both problems. The key observation is that in SSA form, pure value definitions form a directed acyclic graph. Each value is defined exactly once. An add instruction’s result depends on its operands, which depend on their definitions, tracing back to function arguments. There are no cycles in that dependency chain.
The aegraph restricts its scope to exactly this DAG. Pure instructions (arithmetic, bitwise operations, comparisons, selects) become e-nodes. Memory operations, stores, calls, and control-flow instructions remain in the original CFG structure, untouched by the e-graph. Three properties follow from this restriction.
First, saturation terminates quickly. A DAG is finite and bounded; the number of possible e-nodes is bounded by function size and the rewrite rules. In practice, saturation over typical Wasm functions converges in one to three iterations because most rules are local, matching one level of nesting.
Second, extraction is polynomial. DAG extraction is a single bottom-up walk, O(n) in the number of e-nodes. The NP-hardness of general e-graph extraction disappears because the graph has no cycles.
Third, the e-graph stays small enough that its cost remains a small fraction of total compilation time, which is the property that makes this viable in a JIT setting rather than just on paper.
GVN and LICM Without Separate Passes
One practical payoff is that global value numbering and loop-invariant code motion both fall out of the aegraph construction, with no separate analysis passes required.
GVN is automatic via structural hash-consing. When you add an e-node (iadd, class_A, class_B) to the aegraph, you first look it up in a hash map keyed on that structure. If it already exists, you get back its e-class, and the two instructions are merged into the same equivalence class. Congruence closure handles transitivity: if a and b are in the same e-class, then f(a) and f(b) end up in the same e-class. GVN across the entire function is a consequence of how the e-graph is built, not a separate pass layered on top.
LICM is handled during extraction. Each e-class carries the loop depth at which its value is defined. During extraction, the aegraph places each value at the shallowest loop depth at which all its inputs are available, while respecting dominance. A computation whose operands are all defined outside a loop gets placed outside the loop automatically. This is what Fallin calls scoped elaboration: the extraction pass re-places values into the CFG at the earliest correct point, achieving code hoisting without a separate loop analysis pass.
The ISLE Connection
A particularly coherent design choice is that rewrite rules are written in ISLE, the same domain-specific language Cranelift uses for instruction selection. ISLE encodes pattern-matching rules as term rewrites; the ISLE compiler generates efficient Rust code from them. Using the same language for optimization rules and instruction selection rules means one tool, one test infrastructure, and one mental model for everyone working on the compiler.
An algebraic simplification in ISLE looks like this:
(rule (simplify (iadd ty x (iconst ty 0))) x)
(rule (simplify (imul ty x (iconst ty 1))) x)
(rule (simplify (imul ty x (iconst ty 2))) (ishl ty x (iconst ty 1)))
Adding a new algebraic identity is a two-line entry in a rule file. In LLVM’s InstCombine, the equivalent change involves writing C++ inside a function already containing hundreds of pattern matches, reasoning carefully about which other transformations have already fired, and hoping the pass order cooperates. The ISLE encoding also makes rules inspectable and testable in isolation, which matters when you want to verify correctness without running the full optimizer.
How This Compares to LLVM’s Approach
LLVM’s mid-end is a large pipeline of independent passes coordinated by a Pass Manager: GVN, NewGVN, LICM, InstCombine, EarlyCSE, and LoopIdiom are all separate analyses and transformations. LLVM’s NewGVN pass (introduced around 2017) uses congruence classes that approximate some e-graph behavior, but it is not a true e-graph and does not perform equality saturation. Phase-ordering remains a real issue, compensated by running multiple pipeline sweeps and careful manual tuning of pass sequence.
Cranelift’s aegraph does not aim to match LLVM’s optimization breadth. Cranelift targets JIT scenarios where compile-time speed matters alongside code quality. Within that scope, the aegraph unifies GVN, LICM, and algebraic simplification into one mechanism with a clear theoretical foundation. Adding new optimizations is a one-liner in ISLE; rule interactions are handled by the saturation mechanism rather than by pass ordering; the compile-time overhead is predictable.
What the Aegraph Does Not Cover
The aegraph’s scope ends at the boundary of pure values. Any optimization that crosses side-effecting instructions requires CFG-level analysis outside the e-graph. Dead store elimination, memory forwarding, alias analysis, and induction variable recognition all remain separate concerns. Cranelift handles some of these elsewhere; others it optimizes less aggressively than LLVM.
Loop-carried values through phi nodes are out of scope entirely. Strength reduction of induction variables, replacing i * stride with an increment inside the loop body, requires reasoning about loop-carried dependences that an RVSDG-based system could express through theta-node rewrites but the aegraph cannot. These are deliberate trade-offs shaped by the JIT context, where Wasm workloads tend toward compute kernels without deeply nested loop structures.
The Takeaway
What the aegraph demonstrates is the value of specificity in design constraints. Equality saturation with unbounded scope and full-program IR coverage is a research problem; equality saturation restricted to pure SSA values with acyclic structure is a practical compiler pass. The acyclic constraint looks like a limitation, but it is the constraint that eliminates the NP-hard extraction, controls saturation time, and keeps the pass fast enough for a JIT. Identifying which part of a research idea you need and cutting the rest is the mechanism by which techniques like this move from academic literature into production toolchains.