You Don't Need Re-Ranking: Understanding the Superlinked Vector Layer

20 comments

·May 23, 2025

janalsncm

I don’t think the author understands the purpose of reranking.

During vector retrieval, we retrieve documents in sublinear time from a vector index. This allows us to reduce the number of documents from potentially billions to a much smaller number. The purpose of re-ranking is to allow high powered models to evaluate docs much more closely.

It is true that we can attempt to distill that reranking signal into a vector index. Most search engines already do this. But there is no replacement for using the high powered behavior based models in reranking.

_QrE

I agree.

> "The real challenge in traditional vector search isn't just poor re-ranking; it's weak initial retrieval. If the first layer of results misses the right signals, no amount of re-sorting will fix it. That's where Superlinked changes the game."

Currently a lot of RAG pipelines use the BM25 algorithm for retrieval, which is very good. You then use an agent to rerank stuff only after you've got your top 5-25 results, which is not that slow or expensive, if you've done a good job with your chunking. Using metadata is also not really a 'new' approach (well, in LLM time at least) - it's more about what metadata you use and how you use them.

nostrebored

If this were true, and initial candidate retrieval were a solved problem, teams where search is revenue aligned wouldn't have teams of very well paid people looking for marginal improvement here.

Treating BM25 as a silver bullet is just as strange as treating vector search as the "true way" to solve retrieval.

_QrE

I don't mean to imply that it's a solved problem; all I'm saying is that in a lot of cases, the "weak initial retrieval" assertion stated by the article is not true. And if you can get a long way using what has now become the industry standard, there's not really a case to be made that BM25 is bad/unsuited, unless the improvement you gain from something more complex is more than just marginal.

laszlo_cravens

I agree as well. Especially in the context of recommendation systems, the decoupling of retrieval from a heavy ranker has a lot of benefits. It allows for 1) faster experimentation, and 2) the use of different retrieval sources. In reality, the retrieval might consist of a healthy mix of different algorithms (collaborative filtering, personalized page rank, word2vec/2tower embeddings, popular items near the user, etc.) and fallback heuristics

supo

It allows faster experimentation because you can't do things like partial embedding updates and reasonable schema migrations on your vector search index - if you could, you'd experiment in retrieval... and with better retrieval you don't have to move 100s or 1000s of candidates from a database and pay a ton for a ranker inference on every query (not even mentioning the latency impact of that).

nostrebored

That "much smaller number" is the tricky part. Most rerankers degrade substantially in quality over a few hundred candidates. No amount of powerful rerankers will make "high powered behavior based models" more effective. Those behavioral signals and intents have to be encoded in the query and the latent space.

janalsncm

> Most rerankers degrade substantially in quality over a few hundred candidates.

The reason we don’t use the most powerful models on thousands/millions of candidates is because of latency, not quality. It’s the same reason we use ANN search rather than cosine sim for every doc in the index.

nostrebored

This isn’t true. You can look at basically every cross encoder used today and observe degradations in precision with increases in k

Ofc latency matters for retrieval pipelines and this is another reason to care. But first pass retrieval has to surface the right candidates for it to matter at all. It has to do it within the constraints of the precision degradation wrt k of the first pass reranker

supo

by that same logic, why would you not strive to push all the signals you have available into the ANN search? sure, some will have reduced resolution vs using a heavy reranker, but surely the optimal solution is to use the same signals in both stages and just add resolution in the second stage? the more they are aligned, the fewer candidates you need -> better latency & lower cost.

supo

If you could wave a magic wand and push all the ranking signals into retrieval and that index would be fast to update and not that expensive to operate - you would do that and you would delete all your reranking systems, wouldn't you?

rooftopzen

>"When it comes to vector search, it's not just about matching words. Understanding the meaning behind them is equally important."

This statement ^ is clearly incorrect on its premise -semantic meaning is already vectorized, and the problems with that are old news and have little to do w indexing.

I went through the article though, and realized the company is probably on its last legs - an effort that was interesting 2 years ago for about a week, but funded by non-developers without any gauge of reality.

supo

If this was a solved problem, every e-com website would already run some variant of https://arxiv.org/pdf/2209.07663 for all their shopping surfaces and some version of "deep research" on every search query. They don't, so the problem isn't solved. And a big part of the problem is still the integration of behavioral, structured and unstructured signals into the underlying retrieval of products and content - which is what the article is about.

Also, the company is doing fine, not sure where that last sentence came from ^_^

AmazingTurtle

At everfind.ai, we've found a middle ground that leverages both structured and unstructured data effectively in retrieval systems. We utilize a linear OpenSearch index for chunked information but complement this by capturing structured metadata during ingestion—either via integrations or through schema extraction using LLMs. This structured metadata allows us to take full advantage of OpenSearch's field-type capabilities.

At retrieval time, our approach involves a broad "prefetching" step: we quickly identify the most relevant schemas, perform targeted vector searches within these schemas, and then rerank the top results using the LLM before agentic reasoning and execution. The LLM is provided with carefully pre-selected tools and fields, empowering it to dive deeper into prefetched results or explore alternate queries dynamically. This method significantly boosts RAG pipeline performance, ensuring both speed and relevance.

Additionally, by limiting visibility of the "agentic execution context" to just the current operation span and collapsing it in subsequent interactions, we keep context sizes manageable, further enhancing responsiveness and scalability.

supo

This article focuses on ways to make "pre-fetching" more accurate, reducing or eliminating the need for reranking to improve latency/cost but also sometimes quality - for example if you use a text cross-encoder to rerank your structured objects, you'll find that those rerankers don't actually understand much of the numbers, locations and other data like that.

petesergeant

> The key idea here is that with Superlinked, your search system can understand what you want and adjust accordingly.

I read as much of this article as I could be bothered to and still didn’t really understand how it removes the need for reranking. It starts talking about mixing vector and non-vector search, so ok fine. Is there any signal here or is it all marketing fluff?

dev_l1x_be

I might not know enough about this subject and think the main idea is to make the initial search retrieval much smarter and more comprehensive, so the results are already good enough, lessening or removing the need for a second, often costly, re-ranking step.

They achieve this with few different ways:

- Unified Multimodal Vectors (Mixing Data Types from the Start)

Instead of just creating a vector from the text description, Superlinked creates a single, richer vector for each item (e.g., a pair of headphones) right when it's indexed. This "multimodal vector" already encodes not just the text's meaning, but also its numerical attributes (like price, rating, battery life) and categorical attributes (like "electronics," "on-ear").

- Dynamic Query-Time Weighting (Telling the Search What Matters Now)

When you make a query, you can tell Superlinked how important each of those "baked-in" aspects of the multimodal vector is for that specific search. For example: "Find affordable wireless headphones under $200 and high ratings" – you can weight the "price" aspect heavily (to favor lower prices), the "rating" aspect heavily, and the "text similarity" to "wireless headphones" also significantly, all within the initial query to the unified vector.

- Hard Filtering Before Vector Search (Cutting Out Irrelevant Items Early)

You apply these hard filters (like price <= 200 or category == "electronics") before the vector similarity search even happens on the remaining items.

If these are implemented well, Superlinked could improve the quality of initial retrieval to a point where a separate re-ranking stage becomes less necessary.

Does this answer your question?

ccleve

Is there a paper or some other explanation of what they're doing under the hood?

supo

Here is a quick overview, doesn't really explain the deep details though: https://www.youtube.com/watch?v=ikYsr6nvbdE

Basically think mixture of experts, but each expert is an encoder with it's own input tokenization/feature engineering part and then there is some machinery to parametrize these encoders based on natural language queries and stitch their outputs together to form unified vectors for objects in the index and the queries coming in.

Given that the whole framework is Apache 2.0, you can also check https://github.com/superlinked/superlinked and the docs: https://docs.superlinked.com/ https://docs.superlinked.com/concepts/overview

We have some proprietary tech besides that - mainly embedding models for things like location and then executors for the Superlinked DAG that help you run it on GPU-accelerated Spark and something analogous to that for streaming workloads - those proprietary things are how we make money, we call it Superlinked Cloud.

HN

You Don't Need Re-Ranking: Understanding the Superlinked Vector Layer

You Don't Need Re-Ranking: Understanding the Superlinked Vector Layer