The Theoretical Limitations of Embedding-Based Retrieval

This paper has been misrepresented many times. At then end it says:

Multi-vector models

Multi-vector models are more expressive through the use of multiple vectors per sequence combined with the MaxSim operator [Khattab and Zaharia, 2020]. These models show promise on the LIMIT dataset, with scores greatly above the single-vector models despite using a smaller backbone (ModernBERT, Warner et al. [2024]). However, these models are not generally used for instruction-following or reasoning-based tasks, leaving it an open question to how well multi-vector techniques will transfer to these more advanced tasks.

Sparse models

Sparse models (both lexical and neural versions) can be thought of as single vector models but with very high dimensionality. This dimensionality helps BM25 avoid the problems of the neural embedding models as seen in Figure 3. Since the of their vectors is high, they can scale to many more combinations than their dense vector counterparts. However, it is less clear how to apply sparse models to instruction-following and reasoning-based tasks where there is no lexical or even paraphrase-like overlap. We leave this direction to future work.

In other words, it says that both multi-vector (i.e. late interaction) and sparse models hold promise.

HN

The Theoretical Limitations of Embedding-Based Retrieval

The Theoretical Limitations of Embedding-Based Retrieval