Skip to content(if available)orjump to list(if available)

SSD-IQ: Uncovering the Hidden Side of SSD Performance [pdf]

__turbobrew__

Something I learned the hard way is that SSD performance can nosedive if DISCARD/TRIM commands are not sent to the device. Up to 50% lower throughput on our Samsung DC drives.

Through metrics I noticed that some SSD in a cluster were much slower than others despite being uniform hardware. After a bit of investigation it was found that the slow devices had been in service longer, and we were mot sending DISCARDs to the SSDs due to a default in dm-crypt: https://wiki.archlinux.org/title/Dm-crypt/Specialties#Discar...

The performance penalty for our drives (Samsung DC drives) was around 50% if TRIM was never run. We now run blkdiscard when provisioning new drives and enable discards on the crypt devices and things seem to be much better now.

Reflecting a bit more, this makes me more bullish on system integrators like Oxide as I have seen so many times software which was misconfigured to not use the full potential of the hardware. There is a size of company between a one person shop and somewhere like facebook/google where they are running their own racks but they don’t have the in house expertise to triage and fix these performance issues. If for example you are getting 50% less performance out of your DB nodes, what is the cost of that inefficiency?

p_ing

While not the same issue, I took four 500GB Samsung 850 EVO drives and created a Storage Space out of them for Hyper-V VMs. Under any sort of load the volume would reach ~1 second latency. This was on a SAS controller in JBOD mode.

Switched to some Intel 480GB DC drives and performance was in the low milliseconds as I would have thought any drive should be.

Not sure if I was hitting the DRAM limit of the Samsungs or what, spent a bit of time t-shooting but this was a home lab and used Intel DCs were cheap on eBay. Granted, the Samsung EVOs weren't targeted to that type of work.

__turbobrew__

850 EVO is basically the lowest tier consumer device, from what I have read those devices can only handle short bursts of IOs and do not perform well under sustained load.

loeg

We've observed FDP to make a surprisingly big difference in drive internal WA. If you can meaningfully tag different lifetime/stream data from your workloads, and you can expect hardware that supports it, it's very helpful. We some something like WAF reduction from ~1.60 to ~1.04 (on a synthetic but vaguely plausible workload).

jeffbee

With rocks, I assume?

loeg

Most of our writes (>99%) aren't rocks.

jeffbee

Hrmm. Still guessing about your workloads, but isn't it possible that workload A could cause a disproportionate amount of amplification, while still being much smaller in aggregate than workload B?

kvemkon

> Vendors downplay the idiosyncrasies of specific SSD models by marketing their devices using four “headline” throughput metrics: sequential read, sequential write, random read, and random write.

For SOHO yes, where no serious database usage is expected. But server/datacenter SSDs are categorized: read-intensive, write-intensive and mixed-usage.

lmz

Those categories are usually derived from another advertised number: Drive Writes Per Day.

As an example in this Micron product brief the Latency for the read-intensive vs mixed use product are the same: https://assets.micron.com/adobe/assets/urn:aaid:aem:e71d9e5e...

Of course the footnote says that latency is a median at QD=1 random 4K IO.

From the paper the PM9A3 which is 1 DWPD has better P99.9 write latency under load vs the 7450 Pro (3 DWPD mixed use).

wtallis

You're conflating two different things here: the performance metrics that marketing provides, and the product segments that marketing groups products into.

p_ing

Gamers also fall into the read/write number trap. When tested, that type of workload performs just about the same from PCIe 3.0 through 5.0 due to the 4KiB often random access. And in some cases, there was only a minor delta between PCIe 5.0 NVMe and SATA SSD.

https://www.youtube.com/watch?v=gl8wXT8F3W4

jmpman

Feels like a paper that should have been published about 15 years ago.

tanelpoder

In the database-nerd world, we had something like this about ~10 years ago, written by @flashdba. Still a good read:

https://flashdba.com/category/storage-for-dbas/understanding...

jeffbee

Seems like the color codes in Table 3 are reversed? Higher write application factors are green and lower ones are red.

djoldman

That table is really confusing as the colors have wildly different meanings depending on the row.