Skip to content(if available)orjump to list(if available)

HN

Doing well in your courses: Andrej's advice for success (2013)

cs.stanford.edu

What Are RFCs? The Forgotten Blueprints of the Internet

ackreq.github.io

The Spherical Cows of Programming

programmingsimplicity.substack.com

Show HN: Duck-UI – Browser-Based SQL IDE for DuckDB

demo.duckui.com

Comparing the power consumption of a 30 year old refrigerator to a brand new one

Infisical (YC W23) Is Hiring Full Stack Engineers

ycombinator.com

The Trinary Dream Endures

Show HN: Pyversity – Fast Result Diversification for Retrieval and RAG

How to Assemble an Electric Heating Element from Scratch

solar.lowtechmagazine.com

The case for the return of fine-tuning

The macOS LC_COLLATE hunt: Or why does sort order differently on macOS and Linux

blog.zhimingwang.org

The Zipper Is Getting Its First Major Upgrade in 100 Years

Abandoned land drives dangerous heat in Houston, Texas A&M study finds

stories.tamu.edu

Why an abundance of choice is not the same as freedom

Xubuntu.org Might Be Compromised

Lost Jack Kerouac story found among assassinated mafia boss' belongings

Windows 11 25H2 October Update Bug Renders Recovery Environment Unusable

techpowerup.com

Improving PixelMelt's Kindle Web Deobfuscator

Show HN: Open-Source Voice AI Badge Powered by ESP32+WebRTC

Thieves steal crown jewels in 4 minutes from Louvre Museum

Feed me up, Scotty – custom RSS feed generation using CSS selectors

feed-me-up-scotty.vincenttunru.com

EQ: A video about all forms of equalizers

When Pollution Spikes in Southeast Asia, Rainfall Shifts from Land to Sea

Show HN: Pyversity – Fast Result Diversification for Retrieval and RAG

Show HN: Pyversity – Fast Result Diversification for Retrieval and RAG

2 comments

·October 19, 2025

Hey HN! I’ve recently open-sourced Pyversity, a lightweight library for diversifying retrieval results. Most retrieval systems optimize only for relevance, which can lead to top-k results that look almost identical. Pyversity efficiently re-ranks results to balance relevance and diversity, surfacing items that remain relevant but are less redundant. This helps with improving retrieval, recommendation, and RAG pipelines without adding latency or complexity.

Main features:

- Unified API: one function (diversify) supporting several well-known strategies: MMR, MSD, DPP, and COVER (with more to come)

- Lightweight: the only dependency is NumPy, keeping the package small and easy to install

- Fast: efficient implementations for all supported strategies; diversify results in milliseconds

Re-ranking with cross-encoders is very popular right now, but also very expensive. From my experience, you can usually improve retrieval results with simpler and faster methods, such as the ones implemented in this package. This helps retrieval, recommendation, and RAG systems present richer, more informative results by ensuring each new item adds new information.

Code and docs: github.com/pringled/pyversity

Let me know if you have any feedback, or suggestions for other diversification strategies to support!

leobg

Might also be useful for dataset curation, or even just prompt engineering. For example when training a classification task and picking a diverse set of examples for training or evaluation.

Tananon

True, I think that's also a great usecase! Though these algorithms likely won't scale to very large datasets (e.g. millions of samples), but for smaller datasets, like fine-tuning sets, I think this would work very well. I've worked on something similar in the past that works for larger datasets (semantic deduplication: https://github.com/MinishLab/semhash).