Specializing Python with E-Graphs

eigenspace

E-Graphs are a quite interesting tool, and I find them quite exciting, but whenever I hear about people excitedly advocating for them, I can't help but worry that they're not scalable. Asymptotically, these graphs must get enormous if you have a large expression and many rules.

Does anyone know of any research into their large-scale performance compared to other approaches to transforming expressions like SSA compilers or symbolic term rewriters?

rberg

I was minorly (i.e. testing and compilation infra) involved with eggcc, which I think qualifies:

https://github.com/egraphs-good/eggcc

There will be a paper coming out soon which benchmarks compilation time along with the compiled outputs against some popular compilers. The project lead is Oliver Flatt, I'm sure if you asked him he could let you know more.

There's also Chris Fallin's aegraphs (acyclic egraphs) which I believe is turned on by default for Cranelift. I'm unsure if everyone would agree that Cranelift is an industrial compiler but seeing as Fastly makes use of it I think it would qualify. aegraphs seem to solve a lot of performance issues while also getting a decent amount of benefit from equality saturation.

At the end of the day, and mentioned by the other commenter, cyclic egraphs really are still currently fun research tools. There's a lot of really smart people working on constraining the blow-up to make use of them in "real" compilers.

At a minimum, I expect they may become interesting offline or "super optimizers" (such as Souper) that run on performance critical vector/fp code. Because you can get rid of some of the phase ordering issues of traditional optimizers, egraphs can find some novel and really fast optimizations

almostgotcaught

> they're not scalable

they're not and "optimal extraction" (i.e., finding the rewrite you actually want) is still NP-hard:

https://github.com/egraphs-good/extraction-gym/blob/main/src...

no one uses them in real systems because of that. they're a research toy.

SkiFire13

> no one uses them in real systems because of that.

Cranelift is a compiler backend that uses egraphs for its optimization phase. It's already used in production in Web Assembly engines like Wasmtime.

chc4

You don't have to actually saturate the egraph, or compute a globally optimal extraction. There are schemes that drive the rewrite exploration by "expected value" of the rewrite, for example, to avoid bloating the egraph with identities that are probably useless - I'd be surprised if that is much heavier than normal graph rewriting optimizers.

HN

Specializing Python with E-Graphs

Specializing Python with E-Graphs