Show HN: Ranked Search for Semi-Structured Data

7 comments

·February 27, 2025

We’ve been working on a search problem that requires querying both text and numbers simultaneously. For example, in a dataset of clothing items with descriptions and prices, a search for “slim pants for $20” should prioritize skinny jeans for $25 over slim pants for $50 because they are semantically similar and the price is closer. I’ve found that standard embedding models struggle with numerical ordering, while text-to-SQL methods rely on exact matches and often filter out too many results.

To solve this, we built a system designed specifically for structured datasets like CSVs or tables. Here’s a demo link where you can upload a small CSV to try out (no login required): https://demo.tryvoker.com.

Unlike most RAG approaches, we process each column independently, handling text with embeddings and numbers with custom scoring. When a user submits a query, we parse it into relevant fields—for instance, extracting “slim pants” as the description and “20” as the price. We then compute cosine similarity between the description embeddings and “slim pants” while also calculating the percent error between the user’s price input and the numerical field. These individual similarity scores are then combined across all columns to generate a final ranking.

Right now, our system works best with well-structured data, so some preprocessing is often needed. We’re working on improving this by detecting and restructuring messy data automatically, such as pivoting columns or extracting attributes from large text fields. We’re also adding feedback mechanisms, like a thumbs up/down system, to refine future search results based on user input. I’d love to hear about your experiences with similar search challenges and would appreciate any feedback!

Visit

sumoboy

not the results I expected with the query "80+", "80+ Gold" worked better.

alrudolph

Thanks for sharing, what would you have expected the 80+ to return?

tomnipotent

Not OP, but at a minimum I'd expect range searches to work.

This particular search pattern could either be a open-ended range search "all products 80-infinity" or it could be a keyword search "80+" as in "PSU 80+ Gold".

I'm also noticing that many results have mixed-bag prices, while I was expecting something closer to being ranked by the price delta. For example "psu" returns 5 different $155 products, but "psu 300" returns a $2000 RTX 4090 along with 4 $155 PSU's. I don't know if it's a coincidence that the RTX has a stock of "304" or if you're applying the ranking method to fields outside of price.

alrudolph

Yeah, right now I'm not supporting ranges but can see the value and will work on adding in

The demo is setup to match on the description, price and stock columns, when you send a message I try to parse it into searches for each column. I'm realizing with these kinds of searches that I probably have to do more work on that parsing, because its not super clear if that number should be part of the description search or matching against price, etc..

null

[deleted]

catlover76

Aren't these the guys with the 6-day work week? If so, I think I'll pass on the product lol

alwa

This apparently has been a point of commentary, and the (YC-backed) firm is apparently open about it in their job listings (to their credit):

https://news.ycombinator.com/item?id=43181601

I feel like we could have done without the snark, but that the question of our collective norms with respect to labor conditions is legitimate to discuss…

For my part, I’ve willingly worked 7-day weeks for significant periods of time when it felt important to me to do so. I was fairly compensated (albeit way less than the $120K Voker is offering for the full-stack engineer role in question), the work felt meaningful to me, the conditions of the job were clear upfront, I didn’t have competing demands on my time, and it couldn’t have gotten done any other way.

At the same time I know salaried colleagues who get dragged in 5 days a week for no reason, even when there’s nothing to do. I imagine there’s a balance to be struck in here somewhere…