Archival Storage
67 comments
·March 17, 2025entrepy123
creer
Well, professionally, tape is it - there is technology and it lasts more than 5 years. Unfortunately, the market for tape has evolved such that it's not very friendly to the non-pros. Not impossible but not friendly. That probably has to do with the lack of perceived market for that among non-corporate - or perhaps the impression that clown storage is where it's at for non-corporate.
To be fair some more, JBOD/RAID and hard drives does work pretty well. Past the 5 year horizon to be sure.
Product mgt and corp finance has also fallen in love with subscriptions - and clown storage is such an awesome match for that! Who needs to sell long term terabyte solutions when you can rent it out. Easy to argue against that logic of course, but not easy to fight.
tombert
I have an LTO-6 tape drive. It works fine, but it is a pain in the ass to set up on Linux. It only connects via SAS, you need to load a lot of arcane kernel modules, the logs are non-standardized and often misleading, and the entire interface is command line based.
I don't mind living in the command line and I don't even mind fighting to get everything up, but I don't see most people putting up with it. It's also a huge pain to get working with a laptop, since I don't think most laptops have a SAS connector, so you have to use an eGPU case with a Host Bus Adapter, which is its own share of headaches.
creer
Hehe. If you're trying to live with just a laptop and an LTO drive, I can certainly see it! I'd expect most people who get into LTO drives have a massive set of hard drives, some convoluted desktop-and-up case(s) to run them, a couple laptops and random peripherals such as cameras, scanners and whatnot. USB all over the place. - So that for them both the drive and the command line interface are deeper in a technology map.
But that goes back to the minimal market for LTO among amateurs. They will scratch their itch and write software for it but it's not exactly critical mass.
pdimitar
> or perhaps the impression that clown storage is where it's at for non-corporate.
Clown storage.
Thanks for the giggle. :D Needed it, had a pretty rough last couple of days.
null
bob1029
I think it's even more mind-blowing that we can hold back the tide of entropy for as long as we can and with so little energy expended.
Solid state electronics and magnetic media are beyond magical. The odds of keeping terabytes of data on rails are astronomically bad.
Rygian
Emphatic Yes.
I'd like to expand. What I find mindblowing about it is that, as a regular consumer:
* When you need more space you can't just plug in another disk or USB stick. You also have to choose on which device you want to use it, and you have to tell all your software to use it. And that may involve shuffling data around.
* As a corollary, you need to remember in which device you put which stuff.
* As an extra corollary, any data loss is catastrophic by default.
* File copy operations still fail, and when they fail, they do so without ACID-strong commit/fallback semantics.
* Backups don't happen by default, and are not transparent to the end user.
* Data corruption can be silent.
Bonus, but related:
* You can't share arbitrary files with people without going through a 3rd party.
nine_k
This is because most of the non-technical retail customers don't value reliability too much. They want more capacity and good speed at the lowest price point. Hence the proliferation of QLC / MLC NVMe drives with a small SLC write cache. It's not very stable, and it can only keep the write speed high for small files, but hey, you can get a terabyte for $50, and it loads the latest game real fast!
Similarly, many users don't have much valuable, unique data on their computers. The most important stuff very often lives completely in the cloud, where it's properly backed up, etc.
Also, the most ubiquitous computing device now is a smartphone. It has all the automatic backup stuff built in, you can put an SD card into it, and it will transparently extend the free space, without hassle. Even on PCs, MS and Apple nudge the users very prominently to use OneDrive and Apple Cloud for backing up their desktops / laptops. But past certain size, it costs money though, hence many people opt out of that. Again, because most people value the lowest price, and just hope for the best; because what could ever happen?
Silent data corruption can still be an issue, but, frankly, malware is a much bigger threat for a typical non-technical user.
Technical people have no trouble setting up all that: mount your disks under LVM, run ZFS on top of it, set up multiple backups, set up their own "magic wormhole" to share files with stranger easily. But they know why they are doing that.
Educating users about IT hygiene is key for improving it, much like educating people about the dangers of not washing hands, or of eating unhealthy stuff, helped improve their health.
remus
In some ways it is surprising, but the examples you gave are only currently straightforward because of massive investment over many years by thousands of people. If you wanted to build chatGPT from scratch I'm sure it would be pretty hard, so it doesn't seem so unreasonable that you might pay someone if you care about keeping your data around for extended periods of time.
klysm
Making stuff last for a long time is very difficult. It's cheaper to make things last for a short time and allow improvement
vodou
How good/bad would it be to have a poor man's tape archival, using standard cassette tapes (C90, C120, etc)?
For example, using something like ggwave [1]. I guess that would last way more than 5 years (although the data density is rather poor).
EvanAnderson
> ... (although the data density is rather poor).
"Rather poor" is putting it mildly. This sent me down a sort rabbit hole. From a Stack Exchange discussion[0] it was a short trip to exceedingly technical discussion about using QAM encoding[1] to really beef-up the storage capability.
With the wacky QAM encoding tt looks like maybe 20MB per C90 cassette (and 90 minutes to "read" it back).
[0] https://retrocomputing.stackexchange.com/questions/9260/how-...
dmd
A C120 can store, very generously, about 1 megabyte.
A LTO8 tape stores 12,000,000 megabytes.
creer
LTO tape tech has gotten into pretty nutty territory - in order to achieve its density and speed. It wasn't "easy". So, so far away from C90 technology.
pdimitar
I share your disappointment. I explained it for myself with this: nobody cares if we the netizens have our data backed up. The corps want it for themselves and they face zero accountability if they lose it or share it illegally with others.
So it's up to us really. I have a fairly OK setup, one copy on a local machine and several encrypted compressed copies in the cloud. It's not bulletproof but has saved my neck twice now, so can't complain. It's also manual...
We the techies in general are dragging our feet on this though. We should have commoditized this stuff a decade ago because it's blindingly obvious the corps don't want to do it (not for free and with the quality we can do it anyway). Should have done app installers for all 3 major OS-es, zero-interaction-required unattended auto-updates -- make is to grandma would never know it's there and it's working. The only thing it asks is access to your cloud storage accounts and it decides automatically what goes where and how (kind of like disk RAID setups I suppose).
keyringlight
I think in cases like this "personal computer" is a blessing and a curse. It seems like most of the big parties involved in computing with the big levers to move things are mostly in it for themselves and pick and choose whether they shoulder responsibility for a certain feature, or which side of 'personal' it comes down on with regards to if they hook it into their services and privacy. What would the attitudes be to losing physical information or property be for other parts of our lives versus digital info/property, if that was media, references, financial details, property deeds, old records, etc.
While I do appreciate the generosity in how many projects make themselves available (free or otherwise), it does seem like they can have a narrow focus where they solve the challenge they had to solve, but aren't interested going past that point. There's logical reasons for why that happens, but there's unfulfilled potential to make personal computing a better environment there.
PaulHoule
Cost calculations are often different at the enterprise scale from the individual scale. Hypothetically
https://en.wikipedia.org/wiki/Linear_Tape-Open
is an affordable storage medium if you need to store petabytes but for what the drive costs
https://www.bhphotovideo.com/c/product/1724762-REG/quantum_t...
you could buy 400 TB worth of hard drives. Overall I'd have more confidence in the produced-in-volume hard drives compared to LTO tapes which have sometimes disappeared from the market because vendors were having patent wars. Personally I've also had really bad experiences with tapes, going back to my TRS-80 Color Computer which was terribly unreliable, getting a Suntape with nothing at zeros on it, when the computer center at NMT ended my account, the "successful" recovery of a lost configuration from a tape robot in 18 hours (reconstructed it manually long before then), ...
wmf
This is mentioned in the article.
There's an old presentation from Google where they mentioned that they were the only ones who read back their tapes to make sure they work.
alnwlsn
I've thought about the "hundreds of years" problem on and off for a while (for some yet to be determined future time capsule project), and I figure that about all we know for sure that will work is:
- engraved/stamped into a material (stone tablets, Edison cylinders, shellac 78s, vinyl, voyager golden record(maybe))
- paper, inked (books) or punched (cards, tape)
- photography; microfiche/microfilm (GitHub Arctic Code Vault), lithography?
I actually looked into what it might take to "print" an archival grade microfilm somewhat recently - there might be a couple options to send out and have one made but 99.99% of all the results are to go the other way, scanning microfilm to make digital copies. This is all at the hobbyist grade cheapness scale mind you, but it seems weird that a pencil drawing I did in 2nd grade has a better chance of lasting a few hundred years than any of my digital stuff.
jewel
If you're using cloud storage for backups, don't forget to turn on Object Lock. This isn't as good as offline storage, but it's a lot better than R/W media.
At work we've been using restic to back up to B2. Restic does a deduplicating backup, every time, so there's no difference between a "full" and an "incremental" backup.
simonw
That little note half way through this that said "The Svalbard archipelago is where I spent the summer of 1969 doing a geological survey" made me want to know more about the author - and WOW they have had a fascinating career: https://blog.dshr.org/p/blog-page.html
See also https://en.wikipedia.org/wiki/David_S._H._Rosenthal
nntwozz
I basically use the 3-2-1 backup strategy.
The 3-2-1 data protection strategy recommends having three copies of your data, stored on two different types of media, with one copy kept off-site.
I keep critical data mirrored on SSDs because I don't trust spinning rust, then I have multiple Blu-ray copies of the most static data (pics/video). Everything is spread across multiple locations at family members.
The reason for Blu-ray is to protect against geomagnetic storms like the Carrington Event in 1859.
[Addendum]
On 23 July 2012, a "Carrington-class" solar superstorm (solar flare, CME, solar electromagnetic pulse) was observed, but its trajectory narrowly missed Earth.
kemotep
3-2-1 has been updated to 3-2-1-1-0 by Veeam’s marketing at least.
At least 3 copies, in 2 different mediums, at least 1 off-site, at least 1 immutable, and 0 detected errors in the data written to the backup and during testing (you are testing your backups regularly?).
nntwozz
All the data is spread across more than 3 sites, both SSDs and Blu-ray (which is immutable). I don't test the SSDs because I trust Rclone, the Blu-ray is only tested after writing.
There is surely risk of Bit rot on the SSDs but it's out of sight and out of mind for my use case.
rtkwe
I wish tape archival was easier to get into. But because it's niche and mainly enterprise, drives usually start in the multiple thousands of dollar range unless you go way down in capacity to less than a modern SSD.
codemac
No, it's because of IBM's monopoly. It has little to do with it being enterprise.
hn_throwaway_99
This article touches on a lot of different topics and is a bit hard for me to get a single coherent takeaway, but the things I'd point out:
1. The article ends with a quote from the Backblaze CTO, "And thus that the moral of the story was 'design for failure and buy the cheapest components you can'". That absolutely makes sense for large enterprises (especially enterprises whose entire business is around providing data storage) that have employees and systems that constantly monitor the health of their storage.
2. I think that absolutely does not make sense for individuals or small companies, who want to write their data somewhere and ensure that it will be there in many years when they might want it without constant monitoring. Personally, I have a lot of video that I want to archive (multiple terabytes). I've found the easiest thing that I'm most comfortable with the risk is (a) for backup, I just store on relatively cheap external 20TB Western Digital hard drives, and (b) for archival storage I write to M-DISC Bluerays, which claim to have lifetimes of 1000 years.
nadir_ishiguro
I personally don't believe an archival storage, at least for personal use.
Data has to be living if it is to be kept alive, so keeping the data within reach, moving it to new media over time and keeping redundant copies seems like the best way to me.
Once things are put away, I fear the chances of recovering that data steadily reduce over time.
lurk2
> Once things are put away, I fear the chances of recovering that data steadily reduce over time.
I’ve run into this a lot. You store a backup of some device without really thinking of it, then over time the backup gets migrated to another drive but the device it ran on is lost and can’t be replaced. I remember reading a post years ago where someone commented that you don’t need a better storage solution, you need fewer files in simpler formats. I never took his advice, but I think he might have been right.
wmf
My takeaway is that for personal/SMB use you have to use the cloud.
null
lizknope
I've got files going back to 1991. They started on floppy and moved to various formats like hard drives, QIC-80 tape, PD optical media, CD-R, DVD-R, and now back to hard drives.
I don't depend on any media format working forever like tape. New LTO tape drives are so expensive and used drives only support small sized tapes so I stick with hard drives.
3-2-1 backup strategy, 3 copies, and 1 offsite.
Verify all the files by checksum twice a year.
You can over complicate it if you want but when you script things it just means a couple of commands once a week.
gtdawg
What is your process for automating this checksum twice a year? Does it give you a text file dump with the absolute paths of all files that fail checksum for inspection? How often does this failure happen for you?
lizknope
I run snapraid once a night and it has a scrub feature to read every file and compare against the stored checksum.
https://www.snapraid.it/manual
All my drives are Linux ext4 and I just run this program on every file in a for loop. It calculates a checksum and stores it along with a timestamp as extended attribute metadata. Run it again and it compares the values and reports if something changed.
https://github.com/rfjakob/cshatag
These days I would suggest people start with zfs or btrfs that has checksums and scrubbing built in.
Over 400TB of data I get a single failed checksum about every 2 years. So I get a file name and that it failed but since I have 3 copies of every file I check the other 2 copies and overwrite the bad copy. This is after verifying that the hard drive SMART data shows no errors.
mburns
Why not use mdisc and effectively solve the “has my cd/dvd degraded beyond the point of being readable” question entirely.
rtkwe
You may not even be able to get real MDiscs any more [0] and I'm always extremely dubious of 1000 year lifespans since they're effectively impossible to test.
[0] https://www.reddit.com/r/DataHoarder/comments/yu4j1u/psa_ver...
wmf
M-Disc has such low capacity that you'd probably want a robot to burn it which is not cheap.
xhrpost
The quote of LTO tape being much less prone to read failures (10^-20) vaguely reminded me of an old article stating that something like 50% of tape backups fail. I'm not in that side of the industry so can't really comment as to if there is some missing nuance.
https://www.quora.com/What-percentage-of-restores-from-a-tap...
codemac
The read failures are also attributed to other parts of the system, which for the end user still end up in failed reads. The author links to a sales PDF from Quantum.
e.g. the robot dies, the drive dies, the cartridge dies, the library bends, the humidity was too much.. multiplied by each library, robot, drive and cartridge your data is spread across.
jbverschoor
Or, a fun little anecdote, the cleaner had access to the server room and turned off the AC of the server room, most disk drives failed, and the tapes melted inside the robots.
That was a fun Monday
8jef
My recipe for large files: 3 copies. Right now, 1st copy on external 8 to 16TB NTFS desktop hard drives, and 2nd copy on 14 to 16TB internal ext4 drives. Theses drives I power up only for copy purposes, once a month or so. At present time, my drives are 5 to 7 years old, and still good.
Main working copies I keep on 4 to 8TB NTFS SSDs (mix of sata and nvme), plugged into a PC I'm using regularly, but intermittently.
0cf8612b2e1e
Don’t forget the offsite storage. I try to ship an old copy every year or so to an acquaintance so I have a catastrophic recovery option.
jrib
are you concerned about something like a fire destroying all the copies?
It's kinda mind-blowing that we have (so-called) AI, quantum computing, 6K screens, M2 NVME, billions of networked devices, etc., but regular data *can only be expected to last about 5 years* due to the propensity of moving disk failure, SSD impermanence, bitrot, etc., and is only overcome with great attention and significant cost (continually maintaining a JBOD or RAID or NAS, or painstakingly burning to M-Disc bluray etc.) or handing it over to someone else to manage (cloud) or both. I mean maybe you get lucky with a simple 3-2-1 but maybe you don't, and for larger archives of data that is simply not necessarily a walk in the park either.
Absolutely mindblowing.