Show HN: A GPU-accelerated binary vector index
8 comments
·February 17, 2025martinloretz
Great work. Can you elaborate on how the radix selection works and how to get that working with float's and inner product distance? I just quickly checked the code, I'm not familiar with radix selection, but really interested in making extremely fast GPU indices.
andes314
The radix selection algorithm is a modified radix sort that, since we don't care to sort the whole list but rather just the top K results, manages to cut some of the time out of the algorihm. It was an awesome way to get experience in CUDA programming.
Radix sort works with any ordered sets--it just lends itself well for GPUs since it is designed to be run in parallel. I used the modified version to get the best hamming distance results quickly, and implemented a few other distance measures as well (e.g., cosine distance)
rytill
When and how would one use binary vectors for encoding in ML? Do you have to make your model work natively with binary vectors or is there a translation step between float and binary vectors to make it compatible?
PhilippGille
You can quantize an embedding: https://zilliz.com/learn/what-are-binary-vector-embedding#Ho...
kookamamie
Does it beat hnswlib? Also, it would be nice to see code examples (C++) without the API.
andes314
This is a great question! Yes, for a very specific subset of problems--those where you need total recall. HNSW-based algorithms typically only compare against a subset of the whole collection in order to achieve much faster results than linear time search, and it is sometimes the case that they miss the true best results, which is a trade-off worth making. I aimed to keep total recall and my method does in fact perform faster for this particular use case.
throwaway_20357
Are you aware of any benchmarks comparing the HNSW with the the binary quantization trade-offs in accuracy for different models/datasets?
null
This is a vector index I built that supports insertion and k-nearest neighbors (k-NN) querying, optimized for GPUs. It operates entirely in CUDA and can process queries on half a billion vectors in under 200 milliseconds. The codebase is structured as a standalone library with an HTTP API for remote access. It’s intended for high-performance search tasks—think similarity search, AI model retrieval, or reinforcement learning replay buffers. The codebase is located at https://github.com/rodlaf/BinaryGPUIndex.