Skip to content(if available)orjump to list(if available)

Supercharge vector search with ColBERT rerank in PostgreSQL

simonw

> However, generating sentence embeddings through pooling token embeddings can potentially sacrifice fine-grained details present at the token level. ColBERT overcomes this by representing text as token-level multi-vectors rather than a single, aggregated vector. This approach, leveraging contextual late interaction at the token level, allows ColBERT to retain more nuanced information and improve search accuracy compared to methods relying solely on sentence embeddings.

I don't know what it is about ColBERT that affords such opaque descriptions, but this is sadly common. I find the above explanation incredibly difficult to parse.

I have my own explanation of ColBERT here but I'm not particularly happy with that either: https://til.simonwillison.net/llms/colbert-ragatouille

If anyone wants to try explaining ColBERT without using jargon like "token-level multi-vectors" or "contextual late interaction" I'd love to see a clear description of it!

jszymborski

FYI you have a broken hot linked image in that post.

haki

See psycopg Identifier for binding table names

https://www.psycopg.org/psycopg3/docs/api/sql.html#psycopg.s...