Show HN: A bi-directional, persisted KV store that is faster than Redis
79 comments
·March 17, 2025dangoodmanUT
What disks give 600ns persistence _with fsync/fdatasync_? Never heard of anything under 50us p50.
mehrant
the 600ns figure represents our optimized write path and not a full fsync operation. we achieve it -among other things- through:
1- as mentioned, we are not using any traditional filesystem and we're bypassing several VFS layers.
2- free space management is a combination of two RB trees, providing O(log n) for slice and O(log n + k) - k being the number of adjacent free spaces for merge.
3- majority of the write path employs a lock free design and where needed we're using per cpu write buffers
the transactional guarantees we provide is via:
1- atomic individual operations with retries
2- various conflict resolution strategies (timestamp, etc.)
3- durability through controlled persistence cycles with configurable commit intervals
depending on the plan, we provide persistence guarantee between 30 sec to 5 minutes
dangoodmanUT
I didn't necessarily mean exactly fsync. I guess I'll ask: Is it actually flushed to persistent disk in 600ns such that if the node crashes, the data can always be read again? Or does that not fully flush?
mehrant
yes, in that case data can potentially be lost. 30 sec in a worse case scenario without HA.
dangoodmanUT
And hold on, 600ns can't possibly be right...
A memory copy plus updating what ever internal memory structures you have is definitely going to be over 1us. Even a non-fsync NVMe write is still >=1us, so this is grossy misleading.
null
mehrant
our p50 is indeed 600ns for write, the way I explained it. I understand that at this point, this can be read as "trust me bro" kind of statement, but I can offer you something. we can have a quick call and I provide you access to a temp server with HPKV installed on it, with access to our test suit and you'll have a chance to run your own tests.
this can be a good learning opportunity for both of us (potentially more for us) :)
if you're interested, please send us an email to support@hpkv.io and we can arrange that
pclmulqdq
Wait, "depending on the plan"?
You're already monetizing your non-persistent non-database?
alex_smart
I don’t get it. How could you be fsyncing the WAL in 600ns? What are the transactional guarantees that you are offering?
mehrant
that's a great question. the 600ns figure represents our optimized write path and not a full fsync operation. we achieve it -among other things- through:
1- as mentioned, we are not using any traditional filesystem and we're bypassing several VFS layers.
2- free space management is a combination of two RB trees, providing O(log n) for slice and O(log n + k) - k being the number of adjacent free spaces for merge.
3- majority of the write path employs a lock free design and where needed we're using per cpu write buffers
the transactional guarantees we provide is via:
1- atomic individual operations with retries
2- various conflict resolution strategies (timestamp, etc.)
3- durability through controlled persistence cycles with configurable commit intervals
depending on the plan, we provide persistence guarantee between 30 sec to 5 minutes
buenzlikoder
What storage backend are you using?
A write operation on a SSD takes 10s of uS - without any VFS layers
mehrant
sorry for not being clear again. by saying this number does not represent full fsync operation, I meant it doesn't include the SSD write time. this is the time to update KVs internal memory structure + adding to write buffers.
this is fair because we provide transactional guarantee and immediate consistency, regardless of the state of the append-only write buffer entry. during that speed, for a given key, the value might change and a new write buffer entry might be added for the said key before the write buffer had the chance to complete (as you mentioned the actual write on disk is slower) but the conflict resolution still ensures the write of the last valid entry and skips the rest. before this operation HPKV is acting like an in-memory KV store.
addaon
You’re getting a lot of crap (rightly) for your lack of clarity and fuzzy language use on this point…
But that also points out the demand for the seemingly-unachievable promises you’re making. I wonder if it’s worth stirring up some out-of-production DIMM-connected Optane and using that as a basis for a truly fast-persisted append-only log. If that gives you the ability to achieve something that’s really in demand, you can go from there to a production basis, even if it’s just a stack of MRAM on a PCI-e card or something until the tech (re-) arises.
UltraSane
you can just use NVDIMMs which are generally 8, 16, or 32GB DIMM modules that have a enough flash and backup power to copy all data to the flash storage if power is lost on the host.
https://www.micron.com/content/dam/micron/global/public/prod...
mrbluecoat
Something must be in the water.. this is the third similar tool in three days on HN
linotype
Yeah, Redis fucked around and found out.
https://redis.io/blog/redis-adopts-dual-source-available-lic...
Snawoot
> 2-6x faster than Redis (benchmark link below) yet disk persisted!
That's a false contradistinction: Redis is also disk persisted.
The benchmark you did mentions Redis benchmarking guide and this guide has following paragraph:
> Redis is, mostly, a single-threaded server from the POV of commands execution (actually modern versions of Redis use threads for different things). It is not designed to benefit from multiple CPU cores. People are supposed to launch several Redis instances to scale out on several cores if needed. It is not really fair to compare one single Redis instance to a multi-threaded data store.
Did you just benchmarked against only single Redis instance and claimed performance win? Even if so, how do benchmarks compare against source-available competitor DragonflyDB?
Finally, documentation doesn't mention how persistence exactly works and what durability guarantees should we expect?
mehrant
thanks for taking time to write a feedback :)
> That's a false contradistinction: Redis is also disk persisted.
The performance gain mentioned was vs. Redis in memory. so we weren't claiming that Redis can't be persisted (which of course it can), but we were saying that Redis without persistence (which performs faster that with persistence) was still this much slower than HPKV with persistence. But you're correct that we probably should have been more clear in explaining this :)
>Did you just benchmarked against only single Redis instance and claimed performance win?
Signle node of Redis vs. Single node of HPKV. so it's an apples to apples comparison
>Even if so, how do benchmarks compare against source-available competitor DragonflyDB?
Benchmark with DragonFly coming soon :)
sorry about lack of that information in documentation, we'll update that. for for now, the durability guarantee on Pro is 30 seconds. on Business with HA is 5 minutes.
Xelynega
They asked about instances and you responded with nodes.
From the redis comment it sounds like the way to scale a redis node is to increase the size and run multiple instances in parallel.
Saying it's "apples to apples" would be like setting the thread limit to a competitor to 1, then saying it's a fair benchmark.
ForTheKidz
> That's a false contradistinction: Redis is also disk persisted.
This feels wildly disingenuous.
bjornsing
Interesting. I did some work on a related but different product idea (https://www.haystackdb.dev/) a few years back. Gave up though as it seemed hard to get traction / find customers. What’s your thinking on that? How are you going to reach your initial customers?
Would love to have a chat about possible collaboration or if I could help out in some way. Nice to see foundational tech coming out of the EU!
mehrant
thank you :) it would be interesting to have a chat for sure. would you mind dropping an email on the email I mentioned in OP and I'll reach out to you.
edoceo
How will it be faster than my Redis or KeyVal which is very close if your servers are far away? Network time matters here, right?
mehrant
of course. the speeds down to 15us can be achieved over network over our custom protocol on the same region. for sub-microsecond latency, you need to have HPKV running on the same machine as yours :)
avinassh
If it based on some research papers, could you link them please
mehrant
One thing we'd like to know your opinion on, is our key monitoring via WebSocket (pub-sub) feature. You can read more about it in our documentation under WebSocket.
Is it something that you think it's useful and you might have use case for or you can't see any value in it? In other words, is it something that you might consider using HPKV because of it?
kshmir
Why pay what you're asking instead of using dragonfly or something like that and just putting a beefier node?
ehsanaslani
Well that's a technical choice depending on the context, but I can list some of the advantages of HPKV:
-Persistent by default without any performance penalties
-The pub/sub feature which is unique to HPKV and allows for a bi-directional websocket connection from clients to database
-Lower cost as we need less expensive infrastructure to provide the same service
-Simple API to use
quibono
Is this 2-6x faster because of multi threading/core? Or is this actually 2-6x faster on a single core machine?
mehrant
the test was done on a single node and a single thread. on multi thread and batch operations, HPKV was still faster on the same machine
alexpadula
Why no open source :<
cess11
Does it have ACID guarantees?
mehrant
We provide some elements of ACID guarantees, but not full ACID compliance as traditionally defined in database systems:
Atomicity: Yes, for individual operations. Each key-value operation is atomic (it either completes fully or not at all).
Consistency: Partial. We ensure data validity through our conflict resolution strategies, but we don't support multi-key constraints or referential integrity.
Isolation: Limited. Operations on individual keys are isolated, but we don't provide transaction isolation levels across multiple keys.
Durability: Yes. Our persistence model allows for tunable durability guarantees with corresponding performance trade-offs.
So while we provide strong guarantees for individual operations, HPKV is not a full ACID-compliant database system. We've optimized for high-performance key-value operations with practical durability assurances rather than complete ACID semantics.
gcbirzan
> Consistency: Partial. We ensure data validity through our conflict resolution strategies, but we don't support multi-key constraints or referential integrity.
That's not what consistency means in ACID.
> Durability: Yes. Our persistence model allows for tunable durability guarantees with corresponding performance trade-offs.
> ~600ns p50 for writes with disk persistence
I'm pretty sure there's no durability there. That statement is pretty disingenuous in itself, but it'd be nice to see a number for durability (which, granted, is not something you advertise the product for).
My main concern is that all these speed benefits are going to be eclipsed by the 0.5ms of network latency.
cess11
OK, thanks. Those tradeoffs aren't suitable for my purposes.
we've been working on a KV store for the past year or so which is 2-6x faster than Redis (benchmark link below) yet disk persisted! so you get the speed of in-memory KV stores but with disk persistence. To achieve this we've created our custom filesystem that is optimized for our special usecase and we're doing smart batching for writes and predictive fetching for reads.
In addition to basic operations, it also provides atomic inc/dec, atomic json patch, range scans and a unique key monitoring mechanism (pub-sub) over WebSockets which essentially allows you to receive notification on registered key changes directly from the KV store. so for example in a realtime web application, you can receive notifications directly in your front-end, with no back-end implementation (no WebSocket server management, no relay etc.) and still be secure and not expose your API keys on front-end. We have REST, WebSocket and RIOC API and we can't wait to hear your feedback.
We're only providing the free tier for now but let us know and we can increase the limits for you, if have a specific case. please either send us an email to support@hpkv.io or use http://hpkv.io/contact if you prefer that way.
sign up: http://hpkv.io/signup
documentation: http://hpkv.io/docs
realtime pub-sub: http://hpkv.io/blog/2025/03/real-time-pub-sub
benchmark vs Redis: http://hpkv.io/blog/2025/02/redis-vs-hpkv-benchmark
looking forward to hear your feedback :)