Flattening ASTs (and Other Compiler Data Structures)
8 comments
·January 10, 2025kazinator
> Instead of allocating Expr objects willy-nilly on the heap, we’ll pack them into a single, contiguous array.
This happens naturally if you bump-allocate them in a garbage-collected run-time, particularly under a copying collector. Free lists also tend to co-locate because they are produced during sweep phases of GC which run through heaps in order of address.
Don't make me bring out the L word for the billionth time.
> A flat array of Exprs can make it fun and easy to implement hash consing
OK, it's not a case of L-ignorance, just willful neglect.
samps
FWIW I did acknowledge this in the article:
> A sufficiently smart memory allocator might achieve the same thing, especially if you allocate the whole AST up front and never add to it
> Again, a really fast malloc might be hard to compete with—but you basically can’t beat bump allocation on sheer simplicity.
emptysea
Rust-analyzer uses a similar technique for parsing https://github.com/rust-lang/rust-analyzer/blob/master/crate... which then gets fed into https://github.com/rust-analyzer/rowan (lossless syntax tree)
ndesaulniers
Cool! Carbon is doing exactly this. I had asked leads if there was a paper on this approach, but they didn't have anything for me. I'll send them this post!
benatkin
Zig uses a MultiArrayList which sounds similar https://mitchellh.com/zig/parser
cardanome
Amazing article, very good advice to keep your data structures flat.
Adding to that, it also makes editing the AST vastly more efficient.
I have discovered that principle on my own when I worked on an editor that directly operated on the AST instead of text. I found manipulating the tree-style AST so painful, constantly traversing the tree and all. Once I made it flat, my life was a hell lot easier. You can just directly index any part of AST in linear time.
ww520
This is a fantastic idea. AST works well in an array based allocation block since it has no need for freeing individual nodes. It’s an add-only allocation.
"Instead of allocating Expr objects willy-nilly on the heap, we’ll pack them into a single, contiguous array." Zig compiler pipeline (AST, Zir, Air, Sema) does exactly this on all layers. Not only contiguous, but instead of array-of-structs it is struct-of-arrays, so walking the tree is even more cache friendly. For AST see: https://github.com/ziglang/zig/blob/master/lib/std/zig/Ast.z...