Why Cranelift Threw Out Cycles and Made E-Graphs Fast Enough for Production
Source: lobsters
Compiler optimizations have a sequencing problem. Running dead code elimination before constant folding can miss opportunities; running it after changes what constant folding can see. The order in which you apply optimization passes determines what you find, and there is no universally correct order. This is the phase ordering problem, and it has been an accepted limitation of production compilers for decades.
E-graphs offer a different model. Instead of applying transformations sequentially and committing to each decision, an e-graph accumulates all possible rewrites simultaneously. You build a compact representation of every equivalent form of a program you have discovered, then extract the best one at the end. The ordering problem largely disappears because you are no longer making irreversible decisions mid-pass.
The catch is that general-purpose e-graphs come with their own complexity, and using them in a production compiler requires navigating that complexity carefully. Cranelift’s solution, described by Chris Fallin in his recent post on the aegraph design, is to enforce a structural constraint that most compiler literature treats as a limitation: acyclicity.
What an E-Graph Actually Stores
An e-graph groups terms into equivalence classes called e-classes. Each e-class contains a set of e-nodes, where an e-node is an operator applied to e-class IDs rather than to concrete terms. When you discover that two expressions compute the same value, you merge their e-classes. Adding a rewrite rule means adding new e-nodes to existing e-classes.
The egg library, developed by Max Willsey and colleagues at the University of Washington and published at POPL 2021, is the standard Rust implementation of this idea. egg uses a union-find structure to track e-class membership and supports full equality saturation: you apply all rules until no new e-nodes can be added, then run extraction. It has seen use in tensor graph optimization, program synthesis, and SQL query optimization.
The extraction step is where things get expensive. In a general e-graph, e-nodes in one e-class can reference e-classes that in turn reference the original e-class, creating cycles. Extracting a finite, acyclic program from a cyclic e-graph is NP-hard in the general case. You need to solve a tree grammar problem: find an assignment of one e-node per e-class such that the result is a valid finite term. Practical approaches use heuristics, and even those require careful implementation to avoid exponential blowup.
SSA Is Already Acyclic
Cranelift compiles WebAssembly to native code and serves as the backend for Wasmtime and the SpiderMonkey Wasm tier in Firefox. Its IR, CLIF, is in SSA (Static Single Assignment) form. In SSA, every value is defined exactly once, and every use of a value refers back to its definition. Loops in the control flow graph exist, but the value graph, where edges run from definitions to uses, is a DAG.
This is the key insight behind the acyclic e-graph (aegraph). If your source IR is already a DAG, and if you only add e-nodes that reference values already in the graph, the e-graph itself remains a DAG. You never need to represent a cycle, because the programs you are optimizing do not have cyclic value dependencies. Loop-carried values go through explicit block parameters in SSA form; they are not cycles in the value graph.
Enforcing acyclicity is therefore not giving something up. It is accepting the natural structure of the IR you are working with.
What Acyclicity Buys You
The extraction problem becomes trivial. Because the e-graph is a DAG, you can process e-classes in topological order. When you reach an e-class, all the e-classes its e-nodes depend on have already been processed. You pick the cheapest e-node in the current e-class, knowing that its children have already been resolved. A single linear pass is sufficient; no search, no heuristics for cycle-breaking.
Analysis propagation is also simpler. In a full equality saturation framework, analyses like constant ranges or known-bits need to reach a fixpoint because e-classes can be mutually dependent. With acyclicity, you propagate analyses in topological order and you are done after one pass. The same property that makes extraction cheap makes analysis cheap.
The tradeoff is that you cannot represent certain kinds of rewrites. A rewrite that introduces a cycle, such as replacing a term with one that references itself, is not expressible. For a mid-end optimizer working with SSA values, this is rarely a limitation. The rewrites you want, algebraic identities, strength reduction, GVN, load folding, are all expressible without cycles.
The Cranelift Implementation
In Cranelift’s aegraph, each CLIF value maps to an e-class ID. The data structure stores a list of e-nodes per e-class. When instruction selection rules or optimization rules discover an equivalent form of a value, they intern a new e-node and add it to the appropriate e-class. The union-find from egg is simplified considerably: because rewrites do not need to merge two existing e-classes (which would be the source of cycles), you do not need the full union-find machinery.
Rewrite rules are specified in ISLE, Cranelift’s domain-specific language for instruction selection and optimization rules. ISLE rules match on e-node patterns and produce new e-nodes to add to e-classes. The same rule language that drives instruction selection for the back-end also drives mid-end rewrites, which means the optimizer inherits the same testing infrastructure and rule expressivity.
After rewriting, Cranelift performs an elaboration pass. Elaboration traverses the aegraph in topological order, selects the best e-node for each e-class (minimizing an estimated cost), and emits a linear sequence of CLIF instructions. This is equivalent to extraction in the e-graph literature, but because of acyclicity it is a straightforward linear scan rather than a search problem.
Global value numbering (GVN) falls out naturally. When two different computations produce e-nodes that are structurally identical, their e-class IDs can be unified. The elaboration pass then emits only one copy. There is no separate GVN pass; it is a consequence of how e-node interning works.
Comparison with Traditional Pass Pipelines
LLVM’s middle-end runs a sequence of passes: GVN, SCCP (sparse conditional constant propagation), loop transformations, LICM (loop-invariant code motion), and so on. Each pass assumes the previous ones have run and left the IR in a particular state. Bugs can arise when passes interact in unexpected ways, and adding a new optimization requires reasoning about where it fits in the sequence.
Cranelift’s aegraph approach does not eliminate passes entirely. Loop optimizations, for example, still require dedicated handling because the aegraph works on values, not control flow structure. But algebraic simplifications, GVN, constant folding, and strength reduction all happen in a single rewriting phase without ordering constraints between them. A rewrite that exposes a constant fold opportunity is immediately visible to the constant folder because they operate on the same e-graph simultaneously.
This also means optimization rules are more composable. You write a rule that says “x + 0 == x” and a rule that says “a - a == 0” and the combination “(a - a) + 0 == a” works automatically, without any rule explicitly matching that full pattern. In a pass pipeline, you would need either a single combined rule or a guarantee that the two passes run in the right sequence.
Where This Sits in the Broader Landscape
The idea of constraining e-graphs to exploit IR structure is not unique to Cranelift. Work on RVSDG (Regionalized Value State Dependence Graph) uses a similarly hierarchical, acyclic structure for program representation and has been explored as a basis for optimization in systems like Juno. The common thread is that acyclicity is not a restriction imposed on programs; it is a property of how most programs compute, made explicit in the representation.
The egg paper and the broader equality saturation literature tend to focus on the general case, including cyclic e-graphs and the challenge of extraction. Cranelift’s contribution is demonstrating that for production compiler use, the specialized case is both sufficient and substantially simpler to implement correctly.
Fallin’s post is worth reading in detail for the data structure specifics and the discussion of how the elaboration pass handles values that are used in multiple e-classes. The high-level design has been public since around 2023, but the newer writeup goes into more depth on the corners that matter for correctness.
The Takeaway for Compiler Writers
The lesson from Cranelift’s aegraph is less about e-graphs specifically and more about the value of taking IR invariants seriously. SSA guarantees that value graphs are acyclic. Most compiler IRs provide similar guarantees. When you build data structures and algorithms that assume those guarantees, you often get simpler code than if you build for the general case and constrain it at runtime.
Equality saturation has been in the research literature since at least the late 1990s. Cranelift’s deployment in a production WebAssembly runtime is one of the clearest demonstrations that the technique is ready for that context, provided you are willing to specialize it to the structure you actually have.