What Does a Database for SSDs Look Like?
9 comments
·December 20, 2025zokier
Author could have started by surveying current state of art instead of just falsely assuming that DB devs have just been resting on the laurels for past decades. If you want to see (relational) DB for SSD just check out stuff like myrocks on zenfs+; it's pretty impressive stuff.
raggi
It may not matter for clouds with massive margins but there are substantial opportunities for optimizing wear.
londons_explore
Median database workloads are probably doing writes of just a few bytes per transaction. Ie 'set last_login_time = now() where userid=12345'.
Due to the interface between SSD and host OS being block based, you are forced to write a full 4k page. Which means you really still benefit from a write ahead log to batch together all those changes, at least up to page size, if not larger.
esperent
Don't some SSDs have 512b page size?
digikata
I would guess by now none have that internally. As a rule of thumb every major flash density increase (SLC, TLC, QLC) also tended to double internal page size. There were also internal transfer performance reasons for large sizes. Low level 16k-64k flash "pages" are common, and sometimes with even larger stripes of pages due to the internal firmware sw/hw design.
danielfalbo
Reminds me of: Databases on SSDs, Initial Ideas on Tuning (2010) [1]
[1] https://www.dr-josiah.com/2010/08/databases-on-ssds-initial-...
> Design decisions like write-ahead logs, large page sizes, and buffering table writes in bulk were built around disks where I/O was SLOW, and where sequential I/O was order(s)-of-magnitude faster than random.
Overall speed is irrelevant, what mattered was the relative speed difference between sequential and random access.
And since there's still a massive difference between sequential and random access with SSDs, I doubt the overall approach of using buffers needs to be reconsidered.