7. Related Work
The extraction problem in equality saturation has grown in importance as e-graphs have been applied to compilers (Joshi, Nelson & Randall, 2002; VanHattum, Nigam, Lee, Bornholt & Sampson, 2021), SMT solvers (Flatt, Coward,Willsey, Tatlock & Panchekha, 2022), hardware verification (Coward, Constantinides & Drane, 2023), and manufacturing (Nandi et al., 2020). The standard extraction algorithm implemented in the egg framework (Willsey, Nandi,Wang, Flatt, Tatlock & Panchekha, 2021) performs a single bottom-up traversal that greedily selects the candidate minimizing a decomposable, single-node cost function such as abstract syntax tree size. This approach is efficient and provably optimal when costs decompose additively over individual nodes, but it cannot express objectives that depend on multi-node structural context. Earlier, Joshi, Nelson and Randall (2002) used e-graphs in the Denali superoptimizer to generate optimal code via goal-directed search, demonstrating the potential of e-graphs for program optimization; however, Denali does not formalize an extraction-time objective. To handle more general, non-decomposable cost functions, Tate, Stepp, Tatlock and Lerner (2009) formulated extraction as an integer linear program (ILP), enabling global optimality at the expense of NP-hard worst-case complexity that limits scalability to large e-graphs. Our work is complementary: rather than targeting arbitrary cost functions with a general-purpose solver, we identify a structured class of objectives—bounded-depth weighted pattern coverage—and exploit this structure to obtain an exact, polynomial-time algorithm.
Beyond computational cost, the ILP formulation exhibits a structural limitation for the weighted pattern cover objective. The ILP encoding of Tate, Stepp, Tatlock and Lerner (2009) assigns one binary variable per (e-class, e-node) pair, enforcing a global selection that cannot express position-dependent decisions when a shared e-class appears under parents with differing contextual requirements. The pattern cover objective is therefore
structurally incompatible with the ILP formulation: the gap lies in the distinction between global selection and position-dependent selection, which DAG-to-tree unfolding (
Section 4.1) resolves.
A closely related line of work originates from compiler instruction selection, where the tree covering problem seeks to partition a fixed expression tree into non-overlapping tiles matching instruction patterns at minimum total cost. Aho, Ganapathi and Tjiang (1989) formalized code generation as tree matching with dynamic programming, and Pelegri-Llopart and Graham (1988) established the BURS (Bottom-Up Rewrite System) framework with offline automaton compilation (Proebsting, 1995); both achieve optimal tree covering via bottom-up dynamic programming when the input is a fixed tree. However, extending tree covering from trees to DAGs—as required when expression DAGs contain shared subexpressions—renders the problem NP-complete (Blindell, 2016). Practical compilers such as GCC and LLVM therefore resort to unfolding DAGs into trees before applying instruction selection (Koes & Goldstein, 2008). Our algorithm adopts the same unfolding principle but applies it to the AND-OR DAG structure of e-graphs, where each e-class (OR node) introduces candidate selection on top of the standard tile assignment problem. This additional degree of freedom necessitates a three-state DP—the minimal extension of classical BURS that explicitly tracks the tile-leaf role arising from depth-2 mutual exclusivity. When the e-graph degenerates to a single-candidate-per-class tree, the algorithm reduces to standard BURS. To our knowledge, no prior work has migrated tree covering techniques to the e-graph extraction setting or formalized the weighted pattern cover objective on AND-OR DAGs.