An Experimental Study of Bitmap Compression vs. Inverted List Compression
5 comments
·February 28, 2025sitkack
> We observe that they essentially solve the same problem, i.e., how to store a collection of sorted integers with as few as possible bits and support query processing as fast as possible. Due to historical reasons, bitmap compression and inverted list compression were developed as two separated lines of research in the database area and information retrieval area.
https://www.cs.purdue.edu/homes/csjgwang/pubs/SIGMOD17-Bitma...
Presentation slides https://pdfs.semanticscholar.org/32bc/322fce0cf6c99b0dd31f1a...
Mentioned in https://github.com/junchangwang/reading-list/blob/main/Datab...
see also
https://roaringbitmap.org/publications/
If you like this, you might like the work of https://en.wikipedia.org/wiki/Gonzalo_Navarro
The lead author has a great body of research https://www.cs.purdue.edu/homes/csjgwang/pubs/
westurner
ScholarlyArticle: "An Experimental Study of Bitmap Compression vs. Inverted List Compression" (2017) https://dl.acm.org/doi/10.1145/3035918.3064007
Inverted index > Compression: https://en.wikipedia.org/wiki/Inverted_index#Compression :
> For historical reasons, inverted list compression and bitmap compression were developed as separate lines of research, and only later were recognized as solving essentially the same problem. [7]
westurner
> and only later were recognized as solving essentially the same problem. [7]
"Hard problems that reduce to document ranking" https://news.ycombinator.com/item?id=43174910#43175540
Ctrl-F "zoo" https://westurner.github.io/hnlog/#comment-36839925 #:~:text=zoo :
> Complexity Zoo, Quantum Algorithm Zoo, Neural Network Zoo
sitkack
Programming Language Zoo https://plzoo.andrej.com/
> Thus, a natural question is: Which one is better between bitmap compression and inverted list compression?
Definitely useful to find out if these two areas have any developments the other is lacking. These days I would want to know which algorithms will run well in parallel and scale with more threads on the CPU or GPU. Looks like there are ~4 algorithms here using 4-wide and 8-wide CPU SIMD, but that doesn’t scale very far and doesn’t tell us which algorithms are amenable to parallel compression scaling. Some of the best compression algorithms out there are between difficult and impossible to parallelize, and so while the compression rates might be good, they probably won’t be the most used or most popular in practice.