Wavelet Trees: An Introduction (2011)
10 comments
·May 15, 2025jdonaldson
bawolff
What does this have to do with wavelet trees?
(Sorry for being a dick if im wrong) - was this an AI generated comment that got confused by the domain specific meaning of the word "rank" in this context?
jdonaldson
That is a pretty badly overloaded word for it, and I didn't even pay much attention to the notion of "rank" anyways. I'm mainly interested in how the text is represented with bit vectors. It's very reminiscent of how vector math plays out in other ML domains, but I would bet that many people working with text have never heard of it.
29ebJCyy
Can you explain how this is useful for those problems though? I'm struggling to come up with a way to use rank queries on embeddings in order to get back useful information.
bawolff
> It's very reminiscent of how vector math plays out in other ML domains
How so?
quantadev
You weren't wrong. Wavelet Trees have no relationship whatsoever to vector embeddings.
JohnKemeny
Discussed here 12 years ago. https://news.ycombinator.com/item?id=5526991 (7 comments)
quantadev
Interestingly, this algo has absolutely nothing whatsoever to do with "Wavelets" or even waves. The name "wavelet" stuck to it mostly only because it uses a recursive decomposition approach which happens to be something that Wavelet does in actual wave processing. It got collectively labeled "Wavelet" when what was really meant was just "Recursive".
jltsiren
"Wavelet tree" is not just a collective label but the name explicitly given by the authors of the paper where the data structure was first described in. At least Vitter had worked in image/video compression, where wavelet transforms and similar techniques are common. I believe the original idea was adapting those techniques for representing strings, and the wavelet tree data structure was the final outcome.
quantadev
You're seriously nit picking what "collective label" means? It means that name was accepted by the community.
I think what's changed since this was posted in 2011 is the emergence of embeddings and the need to take advantage of its higher dimensional space. While embeddings expose more underlying structure that can be used for tensor math, ranking systems often are still good ol' trees. This project to me points at a new major "hinge" of information architecture.