· 8 min read ·

Cranelift's Aegraph: How Acyclicity Solves Phase Ordering Without Equality Saturation's Cost

Source: lobsters

The phase ordering problem has been an open irritant in compiler design for decades. You have a set of optimization passes, each of which creates opportunities for others, and no static ordering is universally optimal. Run constant folding before GVN and you miss constants that GVN would expose. Run GVN first and you miss the constants that folding would have produced. The traditional response is to repeat passes until convergence, which is expensive, or to manually tune the ordering per target, which is fragile. Cranelift, the code generator behind Wasmtime and Firefox’s WebAssembly engine, has been shipping a different answer since 2022: the acyclic e-graph, or aegraph.

Chris Fallin’s recent writeup on the aegraph design is worth reading for the implementation specifics. This post zooms out to place the aegraph in context: where e-graphs came from, why the standard equality saturation model is too expensive for a production compiler, and what the acyclicity constraint actually buys you.

E-Graphs and Equality Saturation

E-graphs originated in theorem provers in the early 1980s with Nelson and Oppen’s congruence closure algorithm. The structure encodes not just a single expression, but a whole set of equivalent expressions simultaneously. Every sub-expression belongs to an e-class, which is a set of e-nodes (operator plus references to child e-classes). Union-find tracks which e-classes have been merged as rewrites fire.

The compiler application, called equality saturation, was formalized by Tate et al. in their 2009 POPL paper. The algorithm is straightforward in principle: start with the program’s expressions as an e-graph, apply rewrite rules by adding the right-hand side to the same e-class as the matched left-hand side rather than replacing it, rebuild congruence closure, and repeat until no new nodes or merges appear. Then extract the lowest-cost representative from each e-class.

The egg library, published at POPL 2021 by Willsey et al., made this practical in Rust by introducing deferred rebuilding: instead of maintaining congruence after every single rule application, you batch all rewrites and rebuild once per phase. This change alone produced order-of-magnitude speedups on real workloads. Egg has since been used in Herbie for floating-point expression optimization, TENSAT for tensor graph superoptimization, and Quartz for quantum circuit optimization.

But equality saturation, even with deferred rebuilding, carries costs that are hard to absorb in a production compiler. Extraction is the first problem: finding the globally optimal expression from a saturated e-graph is NP-hard in general, and practical approaches use greedy heuristics or ILP relaxations that are expensive for large programs. Saturation itself is the second problem: the e-graph can grow exponentially as rules keep adding new e-nodes, and hitting a budget cutoff means you have no guarantee of having applied all relevant rules. For an ahead-of-time compiler targeting WebAssembly, where compilation speed matters and inputs can be arbitrarily large, these costs are not acceptable.

The Structural Observation That Changes Everything

A compiler’s value graph is naturally a DAG. A value’s definition cannot depend on itself; loops in the program are represented at the control flow level through block parameters (Cranelift uses a conventional CFG with explicit phi-like block arguments), not by cycles in the value computation graph. This is not a design choice Cranelift makes, it is a semantic property of SSA-form programs.

The aegraph exploits this observation directly. By construction, it prohibits cycles: when you create a new e-node, all of its children must be existing e-classes, which were themselves created from already-existing children, and so on. The graph is acyclic by induction. This constraint is not a restriction on what programs you can represent; it is simply a reflection of reality.

Acyclicity transforms extraction from an NP-hard problem into a single bottom-up traversal. You walk the DAG from leaves to roots, at each e-class selecting the cheapest e-node from among the equivalents recorded for that e-class. No ILP, no greedy approximation: one linear pass.

It also eliminates the need for saturation. In a traditional e-graph, after you apply a rule that merges two e-classes, you must propagate congruence: if a = b then f(a) = f(b), so their parent e-nodes must also be merged, which may trigger further merges. This recursive propagation is what makes full equality saturation iterative and expensive. In an acyclic e-graph, once an e-class is created its children are immutable. New rewrites triggered by later simplifications only ever apply to newly created nodes, not to previously processed ones. There is no backward propagation to chase.

How the Aegraph Works in Practice

The aegraph in Cranelift centers on two core mechanisms: hash-consing and on-the-fly ISLE rewriting.

Hash-consing is the structural deduplication technique. A global map from (opcode, [child_eclass_ids]) to eclass_id is maintained. When you want to create a new node, you first look it up in this map. If it already exists, you get back the existing e-class, for free. This makes global value numbering automatic and essentially free: two computations that produce the same result, even if they appear in different parts of the function, will share an e-class without any separate GVN pass.

ISLE (Instruction Selection and Lowering Engine) is Cranelift’s pattern-matching DSL. It compiles to a Rust decision tree, essentially a giant match arm structure, where each rule fires at most once per node and dispatch is O(1) with no dynamic overhead. The mid-end rewrite rules live in cranelift/codegen/src/opts/ and cover algebraic identities, constant folding, strength reduction, integer range analysis, and load-store forwarding.

The integration point is this: ISLE rules fire at node-creation time. When the aegraph’s make_node function is called for a new e-node, the ISLE simplifier runs first. It inspects the opcode and child e-classes, applies any matching rules, and may return a simpler e-class instead of creating a new node at all. If constant folding applies, the constant e-class is returned. If an algebraic identity fires (x + 0 becomes x), the child e-class is returned directly. If multiple rules overlap, the decision tree handles priority.

Because the children of any node have already been fully processed by the time the node is created (the acyclicity guarantee again), every rewrite sees its children in their already-simplified form. There is no situation where a rewrite would have fired if only the child had been simplified first; by construction, the child was simplified first. This is the aegraph’s resolution of phase ordering: it is not that all rules run simultaneously, it is that the evaluation order of the DAG guarantees that simplifications are always available to later nodes.

Historical Lineage: Sea of Nodes

Cranelift is not the first system to observe that treating the value graph as primary simplifies optimization. Cliff Click’s Sea of Nodes IR, introduced for the HotSpot JVM’s C2 compiler in 1995, eliminates explicit basic blocks and represents control flow as data dependencies between nodes. In Sea of Nodes, GVN is structurally free: if two computations are identical, they are the same node, because the IR has no notion of “same computation at different program points.” Simplification rewrites the node in place.

The aegraph and Sea of Nodes share the hash-consing insight and the implicit GVN property. The difference is that Sea of Nodes uses destructive rewriting: when you simplify x + 0 to x, the addition node is deleted. The aegraph retains both in the same e-class. In Cranelift’s single-pass design this distinction rarely matters in practice (the simplified form is what gets materialized), but the e-class structure is what enables cost-based selection among multiple equivalent forms, something Sea of Nodes does not support.

The aegraph can be understood as Sea of Nodes generalized with equivalence classes and a clean separation between the value graph (acyclic, DAG, hash-consed) and the control flow graph (conventional CFG with block parameters). Cranelift’s approach avoids the scheduling complexity that Sea of Nodes compilers typically face when lowering to machine code, because the CFG structure is preserved throughout.

What This Replaces in Cranelift’s Pipeline

Before the aegraph mid-end, Cranelift ran a sequence of approximately six separate optimization passes: GVN, dead code elimination, constant folding, algebraic simplifications, and load-store forwarding among them. Each pass had to be ordered relative to the others, and cross-pass opportunities were routinely missed.

The aegraph collapses these into a single forward traversal of the function’s value DAG. Compilation throughput improves because the traversal is linear in the number of values, with ISLE dispatch being a compile-time-generated decision tree. Code quality improves because interactions between simplifications that previously required careful pass ordering happen naturally, since each node sees its children already simplified.

This is not magic; there are simplifications that the aegraph cannot find that a full equality saturation pass could find, because the aegraph does only one forward pass rather than iterating to a fixed point. But for the class of simplifications that Cranelift’s ISLE rules encode, the forward pass captures essentially all of them. The rules are written with the DAG traversal order in mind, and in practice the simplifications that require iteration are the domain of the backend register allocator and instruction selector, not the mid-end.

Broader Implications

The aegraph is an example of a general pattern in systems work: identify a structural property of your domain that makes a hard problem easy, and build your data structures around that property rather than solving the hard problem in its full generality.

Full equality saturation solves a more general problem than Cranelift needs. Compiler value graphs are DAGs, always; the acyclicity constraint costs nothing semantically and buys linear extraction and single-pass rewriting. The egg library is a fine tool for program synthesis, rule inference, and superoptimization, where the full generality of e-classes with cycles and saturation to a fixed point is genuinely needed. A production compiler backend has different requirements.

The result in Cranelift is an optimizer that is both faster to compile with and easier to extend than a traditional pass pipeline. Adding a new optimization means writing an ISLE rule, not figuring out where to slot a new pass into the ordering. The correctness argument for the optimizer is local to each rule rather than depending on the global pass interaction. For a compiler that needs to be correct across all of WebAssembly’s semantics and fast enough to be used as a JIT, those properties matter.

The aegraph is currently shipping in Cranelift as the primary mid-end optimizer. If you are using Wasmtime in production today, this is the code that is simplifying your compiled modules.

Was this interesting?