Why Cranelift Had to Build Its Own E-Graph

When Max Willsey and collaborators published “egg: Fast and Extensible Equality Saturation” at POPL 2021, the compiler community got genuinely excited. The egg library made equality saturation practical: by deferring the maintenance of e-graph invariants to a batched “rebuild” phase rather than enforcing them eagerly after every union, it achieved orders-of-magnitude speedups over previous e-graph implementations. Research projects from Herbie (floating-point expression improvement) to Tensat (tensor graph optimization) showed real results. The natural question was whether equality saturation could finally make it into a production compiler’s optimization pipeline.

Cranelift, the Rust-based code generator used by Wasmtime and the Bytecode Alliance, took that question seriously. The answer was yes, but not with a standard e-graph. Chris Fallin’s aegraph post explains the resulting data structure, which Cranelift now uses as its mid-end optimizer. The core design choice is imposing acyclicity as a structural invariant. That single constraint resolves almost every problem that makes standard e-graphs impractical for a compiler, and understanding why clarifies what compiler IR design is really about.

The Three Problems with Standard E-Graphs in a Compiler

An e-graph represents a set of expressions and equalities between them using two concepts: e-nodes (operators applied to e-class children) and e-classes (sets of equivalent e-nodes). Equality saturation works by repeatedly applying rewrite rules, adding new e-nodes to the same e-class as the matched expression, until no new equalities can be derived. Then you extract the best expression using a cost model.

This is elegant, but it runs into three structural problems when you try to use it inside a compiler.

First, cycles. A standard e-graph allows cyclic e-class membership: e-class A can contain a node that references e-class B, and B can contain a node that references A. In an SMT solver or theorem prover, this is fine; it represents cyclic equalities. In a compiler, any extracted program must be a DAG, since values must be computed before they are used. Cycles in the e-graph either make extraction loop infinitely or require ad hoc cycle-breaking that undermines the theoretical guarantees.

Second, dominance and scoping. In SSA form, a value defined in basic block B can only be used in blocks that B dominates. Standard e-graphs have no notion of a dominator tree. If a rewrite rule moves a computation to a new location, the e-graph gives you no help determining whether that location is valid, whether all operands are available there, or whether the placement introduces a use before a definition.

Third, extraction. Optimal extraction from a saturated e-graph is equivalent to weighted tree automaton intersection, which is NP-hard in general. The greedy heuristics that egg provides are fast but can miss global optima. In a JIT context, you need extraction that is both fast and consistently good, not occasionally optimal.

What Acyclicity Buys You

The aegraph imposes one rule: when two e-nodes are merged (declared equivalent), neither may be an ancestor of the other in the existing graph. In practice, this is enforced by maintaining a union-find over e-nodes where each e-node’s canonical representative was defined before (i.e., has a lower ID than) any node that transitionally references it. Cycles are impossible by construction.

This sounds like a restriction, and it is, but it resolves all three problems at once.

With no cycles, extraction becomes a DAG traversal. You can process nodes in reverse topological order, selecting the lowest-cost e-node in each equivalence class. The algorithm is polynomial, the result is well-defined, and there is no need for cycle detection or NP-hard tree-selection formulations.

The acyclicity constraint also means the e-graph respects a global ordering of definitions. When you elaborate the optimized graph back into SSA form, you already have the information needed to place each node at the correct point in the dominator tree. If a node’s operands are all available at block B, the node can be placed at B. If they span multiple blocks, it goes at their least common dominator. This is the scoped elaboration algorithm: traverse the dominator tree, and for each value needed in a block, recursively elaborate its best e-node at the earliest valid placement point, memoizing to avoid duplication.

Scoped elaboration subsumes several classical optimization passes. Loop-invariant code motion falls out naturally, because any node whose operands are all available above a loop header gets placed there by the dominator-tree traversal. Global value numbering is handled by the equivalence classes themselves: two computations that end up in the same e-class are elaborated once and shared. Dead code elimination is implicit, since nodes never referenced during elaboration are simply not placed.

How This Compares to Sea-of-Nodes and RVSDG

Cranelift is not the first project to try to represent compiler IR as a graph rather than a sequence of instructions in basic blocks. The sea-of-nodes IR from Cliff Click and Michael Paleczny, used in HotSpot’s C2 compiler and V8’s Turbofan, represents both data and control dependencies as graph edges. This gives natural code motion and GVN, but sea-of-nodes has no notion of equivalence classes; it does not support equality saturation, and it has a well-documented reputation for subtle scheduling bugs.

RVSDG (Regionalized Value State Dependence Graph), used in some MLIR-adjacent work, takes a different angle. It represents control flow structurally through region nesting rather than as explicit edges, which makes the program’s data-dependence structure explicit and hierarchical. The tradeoff is that converting a general CFG to RVSDG requires a full restructuring pass, and not all programs have reducible control flow suitable for the representation.

The aegraph approach sidesteps both issues. It works directly on Cranelift’s existing SSA form without requiring a conversion to a different IR. The dominator tree, already present in SSA, provides the structural information that scoped elaboration needs. You get the benefits of graph-based reasoning during optimization and a clean return to sequential SSA for register allocation and instruction selection, without the overhead or correctness challenges of maintaining a sea-of-nodes representation throughout the pipeline.

ISLE Integration and the Practical Shape of the Pipeline

Cranelift expresses its optimization rules using ISLE, a term-rewriting DSL that compiles to Rust at build time. The same DSL drives both mid-end optimization and instruction selection lowering. Aegraph rewrites are expressed as ISLE rules that pattern-match on e-nodes and produce new e-nodes, which are then unioned with the matched expression’s equivalence class.

This means the mid-end and backend share a rule language, and adding a new algebraic simplification requires writing one ISLE clause rather than a custom Rust pass. The compile-time cost of the aegraph phase is bounded by running rewrites for a fixed number of iterations rather than waiting for full saturation, which keeps compilation time predictable, a non-negotiable requirement for a JIT serving WebAssembly workloads where cold-start latency matters.

What This Represents More Broadly

The aegraph design is a good example of how production constraints reshape theoretical tools. The egg library proved that equality saturation could be fast in a batch setting. Cranelift’s aegraph shows what changes are necessary to use the same idea inside a latency-sensitive compiler: the acyclicity invariant, dominator-tree-aware elaboration, bounded iteration, and integration with an existing rewrite-rule framework.

The pattern generalizes. Formal methods tools, SMT solvers, and theorem provers repeatedly generate ideas that need structural modification before they fit compiler pipelines. Aegraphs join a long list, including ideas borrowed from proof assistants (dependent types), SAT solvers (CDCL-style search in instruction selection), and term-rewriting systems. The gap between theory and production is rarely about the core idea being wrong; it is usually about one or two structural assumptions that do not survive contact with SSA form, control flow, and bounded compile time.

For anyone building compilers or working on program analysis tools, the aegraph paper is worth reading carefully. Not just for the specific technique, but for how it reasons about which invariants to relax and which to strengthen.