Biscuit is a specialized PostgreSQL index for fast pattern matching LIKE queries

kwillets

This is a fairly simple idea of indexing characters for each column/offset and compressing the bitmaps. Simple is good, as the overhead of more sophisticated ideas (eg suffix sorting) is often prohibitive.

One suggestion is to index the end-of-string as a character as well; then you don't need negative offsets. But that turns the suffix search into a wildcard type of thing where you have to try all offsets, which is what the '%pat%' searches do already, so maybe it's OK.

fabian2k

Looks very interesting. I really like trigram indexes for certain use cases, but those are essentially running an ILIKE %something% on various text content in the DB. So that would fit the described limitations of this index type very well.

Usually you're quickly steered towards fulltext search (tsvector) in Postgres if you want to do something like that. But depending on what kind of search you actually need, trigram indexes can be a better option. If you don't search so much for natural language, but more for specific keywords the stemming in fulltext search can get in the way.

One information that would be nice here is a comparison of the index size on disk for both index types.

out_of_protocol

Any data on index size for big tables? Comparison (with ms/megabytes) vs trigram regarding size/speed?

UPD

> Biscuit is 15.0× faster than B-tree (median) and 5.6× faster than Trigram (median)

> Trade-off: 3.2× larger index than Trigram, but 5.6× faster queries (median)

eatonphil

Noticed Daniel Lemire talking about it and how they use Roaring Bitmaps.

https://x.com/lemire/status/2000944944832504025

HN

Biscuit is a specialized PostgreSQL index for fast pattern matching LIKE queries

Biscuit is a specialized PostgreSQL index for fast pattern matching LIKE queries