· 6 min read ·

Acyclicity as a Feature: Inside Cranelift's E-Graph Optimizer

Source: lobsters

Compiler mid-ends occupy an awkward position. The front-end parses and type-checks; the back-end handles register allocation and instruction selection. The middle is where algebraic simplification, common subexpression elimination, constant folding, and dead code elimination live. For decades these were separate passes, each sweeping through the IR with limited visibility into each other’s results. Cranelift has been consolidating this model with something more principled, and the mechanism driving it is the acyclic e-graph, documented in detail in Chris Fallin’s recent writeup.

What an E-Graph Actually Is

An e-graph is a data structure for compactly representing a set of terms along with a declared equivalence relation over them. Each node in the graph (an “e-node”) is an operator applied to child equivalence classes. Each equivalence class (an “e-class”) groups all the e-nodes the system currently considers equivalent, using a union-find structure to track merges efficiently.

The appeal is in how rewrite rules interact with this structure. Given a rule like (mul a 2) => (shl a 1), you do not have to choose which form to keep. You insert both as e-nodes in the same e-class and let them coexist. After applying all rules to fixpoint (equality saturation), an “extraction” phase selects the cheapest e-node from each e-class, recursively.

This model was made broadly accessible by the egg library in Rust, and the accompanying POPL 2021 paper by Willsey, Nandi, Wang, Flatt, Tatlock, and Panchekha demonstrated equality saturation across a range of applications from superoptimization to tensor graph rewriting. The technique generated genuine interest in compiler research because it replaces a sequence of interdependent passes with a single coherent framework.

The Extraction Problem With Cycles

Standard e-graphs permit cycles. An e-node’s children are e-classes, and those e-classes can contain e-nodes that eventually reference back through the graph. This accommodates recursive terms and loop-carried values, which are real program structures.

But cycles create serious problems at extraction time. If e-class A’s cheapest node depends on e-class B, and B’s cheapest node depends on A, the extraction algorithm faces a system of mutually recursive cost equations. In the general case this becomes NP-hard. Practical implementations handle it through heuristics, iteration limits, and cost function constraints, but there is no clean solution.

For a JIT compiler like Cranelift, which runs inside Wasmtime and must compile WebAssembly modules before serving requests, compilation latency is a first-class concern. An optimizer that can exhibit superlinear behavior on pathological inputs is not a good foundation for production infrastructure.

The Acyclicity Constraint

Cranelift’s response is to prohibit cycles by construction. In an aegraph, every e-node’s children must reference e-classes that were created earlier in a strict topological ordering. Forward references only; the graph is always a DAG.

This sounds like a severe restriction, but it aligns naturally with SSA form. In SSA, each value is defined exactly once, and every use of a value must be dominated by its definition. Within a basic block or an acyclic region of code, values flow strictly forward. The aegraph’s constraint mirrors this: when you add a new e-node, all of its children are already in the graph and fully processed.

Loop-carried values and phi nodes sit outside the aegraph. The optimizer works within acyclic regions and handles control flow separately. This is not a fundamental limitation for the optimizations that matter most in a mid-end: constant folding, strength reduction, commutativity normalization, identity elimination, and common subexpression elimination all operate on local patterns within straight-line or acyclic code.

The payoff is extraction. With a DAG, extraction is a single bottom-up pass: process each e-class in topological order, select the cheapest e-node, recurse into its children. No cycles to resolve, no fixed-point equations. The complexity is linear in the number of e-nodes.

Eager Rewriting Instead of Global Saturation

Standard equality saturation builds the full e-graph first, then iteratively applies rewrite rules until reaching a fixpoint. This gives rules global visibility: a rewrite anywhere in the graph can unlock rewrites elsewhere, and multiple passes over the graph are expected.

The aegraph replaces this with eager rewriting. When a new e-node is inserted, the optimizer immediately checks applicable rules and inserts any rewrites into the same e-class. Because the graph is acyclic and children are already fully processed, there is no possibility of later discovering a new equivalence in a child that would retroactively trigger a rule on the current node.

This approach can in principle miss optimizations that depend on the interaction of rewrites across multiple nodes after global saturation. In practice, the simplification rules that matter for a production compiler are predominantly local: they inspect a node and its immediate children, not distant parts of the graph. The trade-off is explicit. You give up theoretical completeness in exchange for a guarantee that processing each node takes bounded time.

ISLE Integration

Cranelift uses ISLE (Instruction Selection Language Expressions) as a DSL for specifying pattern-matching rules. ISLE drives instruction selection in the back-end and optimization rules in the mid-end alike. The aegraph optimizer compiles ISLE rules at build time into Rust code that fires when matching e-nodes are added.

This integration has concrete ergonomic consequences. Adding a new algebraic simplification means writing a new ISLE rule. The pattern matching and case dispatch are generated automatically. A rule for eliminating addition of zero looks something like:

(rule (simplify (iadd ty x (iconst ty 0)))
      x)

When any iadd node is inserted into the aegraph, the generated code checks for this pattern. If it matches, the new e-node is immediately unioned into the e-class for x. The iadd with zero is never materialized in the output; it exists only transiently during graph construction.

Because ISLE rules compose and the matching infrastructure is shared between the mid-end optimizer and the instruction selector, there is a single language for expressing both algebraic identities and target-specific lowering rules. This uniformity reduces the surface area for inconsistencies between the two phases.

Comparison With Alternative Approaches

Global Value Numbering (GVN) is the classical SSA-based approach to finding equivalent values. It assigns a value number to each computed expression and identifies when two expressions compute the same thing. GVN is fast and produces good results, but each value gets one number: once assigned, discovering additional equivalences requires rerunning the pass.

E-graphs generalize GVN by allowing a single e-class to accumulate multiple equivalent forms, deferring the selection of which to use until extraction. The aegraph preserves this generalization over GVN while discarding the extraction complexity that comes with cycles.

RVSDG (Regionalized Value State Dependence Graph) is a structured IR with properties that overlap with the aegraph approach. RVSDG represents programs as hierarchically nested regions where data flow is acyclic within each region, which enables similar bottom-up processing. The practical obstacle to adopting RVSDG is that it requires a complete redesign of the IR. Aegraphs can be layered onto an existing SSA-based IR like Cranelift’s CLIF format, which matters for a compiler with real production users and a real codebase to evolve incrementally.

Program Expression Graphs (PEGs), explored in earlier compiler research, extend e-graph ideas to cover entire programs including loops by introducing explicit fixed-point operators. This requires significant IR changes, and the extraction problem in the presence of those operators remains hard. The aegraph avoids the problem by handling loops at a different level of the compiler.

The aegraph sits between GVN and full equality saturation: more expressive than GVN, more tractable than unrestricted e-graphs. That positioning is the design goal the constraints are built to serve.

What This Means in Practice

Cranelift backs Wasmtime and is used by production systems including Fastly’s compute platform and experimental JIT backends in several language runtimes. Compilation throughput is measured and matters. The aegraph mid-end consolidates what were previously separate passes (GVN, constant folding, algebraic simplification) into a single graph construction and extraction phase. Information discovered during one rewrite is immediately available to subsequent rules within the same pass, without a second sweep over the IR.

The deeper point in Fallin’s writeup is about the relationship between a formalism and the domain it is applied to. Standard e-graphs are designed for generality. Cranelift’s IR is structured: SSA, acyclic within regions, values defined before use. An optimizer that respects that structure gets linear extraction for free. A general-purpose optimizer that ignores it must build machinery to handle cases that never actually occur in the IR.

Restricting a powerful abstraction to the domain where its hard problems are absent is often preferable to solving those hard problems. The aegraph is a concrete application of that reasoning to production compiler infrastructure, and the result is an optimizer that is both more principled and faster than the sequence of ad hoc passes it replaces.

Was this interesting?