Skip to content(if available)orjump to list(if available)

HN

Launch HN: Issen (YC F24) – Personal AI language tutor

Google DeepMind Releases AlphaGenome

deepmind.google

A Review of Aerospike Nozzles: Current Trends in Aerospace Applications

Show HN: I built an AI dataset generator

Introducing Gemma 3n

developers.googleblog.com

I built an ADHD app with interactive coping tools, noise mixer and self-test

FLUX.1 Kontext [Dev] – Open Weights for Image Editing

A new pyramid-like shape always lands the same side up

quantamagazine.org

Access BMC UART on Supermicro X11SSH

Puerto Rico's Solar Microgrids Beat Blackout

spectrum.ieee.org

Muvera: Making multi-vector retrieval as fast as single-vector search

research.google

Some bits on malloc(0) in C being allowed to return NULL

utcc.utoronto.ca

-2000 Lines of code

Learnings from building AI agents

The Business of Betting on Catastrophe

thereader.mitpress.mit.edu

Snow - Classic Macintosh emulator

OpenAI charges by the minute, so speed up your audio

What makes comprehensible input comprehensible?

cij-analysis.streamlit.app

Modeling the World in 280 Characters

Writing a basic Linux device driver when you know nothing about Linux drivers

Better Auth, by a self-taught Ethiopian dev, raises $5M from Peak XV, YC

Structured Output with LangChain and Llamafile

blog.brakmic.com

Real-world performance comparison of ebtree/cebtree/rbtree

wtarreau.blogspot.com

Muvera: Making multi-vector retrieval as fast as single-vector search

Muvera: Making multi-vector retrieval as fast as single-vector search

2 comments

·June 26, 2025

trengrj

We added Muvera to Weaviate recently https://weaviate.io/blog/muvera and also have a nice podcast on it https://www.youtube.com/watch?v=nSW5g1H4zoU.

When looking at multi-vector / ColBERT style approaches, the embedding per token approach can massively increase costs. You might go from a single 768 dimension vector to 128 x 130 = 16,640 dimensions. Even with better results from a multi-vector model this can make it unfeasible for many use-cases.

Muvera, converts the multiple vectors into a single fixed dimension (usually net smaller) vector that can be used by any ANN index. As you now have a single vector you can use all your existing ANN algorithms and stack other quantization techniques for memory savings. In my opinion it is a much better approach than PLAID because it doesn't require specific index structures or clustering assumptions and can achieve lower latency.

dinobones

So this is basically an “embedding of embeddings”, an approximation of multiple embeddings compressed into one, to reduce dimensionality/increase performance.

All this tells me is that: the “multiple embeddings” are probably mostly overlapping and the marginal value of each additional one is probably low, if you can represent them with a single embedding.

I don’t otherwise see how you can keep comparable performance without breaking information theory.