When you deleted /lib on Linux while still connected via SSH (2022)

115 comments

·March 22, 2025

gleenn

One time I was flipping back and forth between directories, compiling some code, then checking it, then rm -rf'ing it. I accidentally hit up an cd ..'d one too many times. Suddenly the rm command hung and I was confused because it should be nearly instant as it was a few files. I stared in horror as I was accidentally deleting everything in mybhome directory. Luckily back then I had a .pr0n directory with a significant amount of content. A few things were lost but that .pr0n folder was luckily early enough in the list and big enough to slow down the deletion of my photos and documents. That's why I always recommend having a big "buffer" of video content for such situations, ya know, for data integrity ;)

winwang

That's wild and inspirational. Like the scene in Rush Hour when a stack of bills saves Tucker from a bullet.

Though I'm moreso tempted to just create a `.1111aaaa-antidumb` directory and store my caches and backups there.

This has also un-inspired me from creating a fast `rm`-esque utility.

woleium

You could of course alias rm -rf to rm -rf -i

marc_abonce

I always use trash-cli and alias rm to 'echo NO! #'.

Only if the file is too big to fit into the garbage bin, I can unalias rm, rm the thing and then reset the alias immediately after.

bardan

There is also safe-rm https://adamheins.com/blog/a-safer-rm

immibis

I thought -f cancels out -i, and when you alias "rm=rm -i", and then run "rm -rf" you get "rm -i -rf" and -f is last so it still wins.

LinuxBender

That's why I always recommend having a big "buffer" of video content for such situations, ya know, for data integrity

Good idea, for safety of course. Just for completeness sake I would add one should not rely on that especially if they are using XFS and especially if using a hardware raid controller. The recursive rm may complete faster than the inodes are removed due to the nature how XFS among a few other filesystems operate in the background. I doubt many are using hardware raid controllers on their workstations but if one is on a server there is a chance there may be one and those will also perform some transactions in the background and one may get their prompt back sooner than the inodes have actually been removed. This is all edge case off course. People should have a massive .pr0n folder regardless.

OKRainbowKid

Great idea, I shall see to it right away.

contingencies

Userpace pronfs provides high latency unlinking?

satiric

Turns out that pronfs uses fsck in the background to provide the delay

hinkley

Unless you were using tcsh you had to rewrite your shell configuration though.

JoshTriplett

rm has a --preserve-root option that's enabled by default, which prevents you from removing / . I wish it had a --preserve-home option too.

mekster

Seriously, start using trash-cli. Even Windows from 30 years ago had a recycle bin.

I can’t grasp how “power users” like Linux users are stuck working in primitive environments.

accelbred

Nah, recycle bin is the wrong model. Automatic filesystem snapshoting is the way to go. Snapshots are cheap and let you recover full state from your history.

Just use btrfs, and set up btrbk to snapshot your home directory every 5 minutes, and have its gc keep every snapshot from past few hours, an hourly snapshot for past few days, and weekly for past few months.

miki123211

I wish filesystems had a first-in, first-out "garbage heap" model.

The problem with snapshots is that they take up space. The trash has the same issue, with the added drawback that you have to think about emptying it, and if you empty it before realizing you made a mistake, it won't help you.

With a "garbage heap" model, all deleted files would automatically end up on the "heap" when deleted. The heap would be exactly as large as the amount of free space you have left, and shrink when necessary by deleting the oldest files, perhaps with a minimum size (expressed in days) that would require manual action to shrink beyond.

WhyNotHugo

Snapshots of home directories grow to ridiculous amount due to trash that piles inside of it. Off the top of my head:

- ~/.local/share/docker is 29G of data that continuously mutates. - Rust/Cargo produce "target" directories in $PWD whenever they run, which is gigabytes of crap littered all over the place.

Keeping snapshots of home would take up a huge amount of disk space, especially if you ever want to keep more than one.

That said, keeping TWO snapshots for 24hs for "un-deleting" files might be an interesting idea.

wruza

Recycle bin is just a different model. I have full disk weekly and homedir daily backups and could use live snapshots too, but it’s just easier to dig into the bin sorted by deletion date.

Ferret7446

If you delete things from a file manager in Linux, they all generally go into the Trash too.

And if you rm/del/Remove-Item on Windows, it will also delete without sending to the recycle bin.

3836293648

You're way too trusting of trash-cli here. I switched to it a while back and the only time I actually needed to recover something half the files in the folder weren't in the rubbsih bin, they were just gone

LoganDark

Fuck recycle bins. Make proper backups.

When I delete something I want it deleted. If I make a mistake I grab it from a backup or just re-obtain it.

jamesy0ung

This is the best comment I've read on Hacker News in a long time

eitland

Something similar happened a place I worked somewhere between 10 and 20 years ago.

Service technician did not see the . in the command

  rm -rf ./bin

so he proceeded to run

  rm -rf /bin

When that didn't work he did what everyone who knows a little bit Linux does and added sudo in front.

I was on a terminal from the other side of the globe when the server suddenly started acting weird.

We were able to use scp or rsync (one of then was in sbin or something) to get back the bits from an identical server, which saved me from three days of tedious work :-)

In hindsight of course we should have written the docs in a way that would prevent this exact situation but in the beginning it was just the output of the history command after I had done it, dumped into a document with some explanations.

ryao

It is not the same thing, but recently, my pfsense router started acting weird where dns stopped working. I was bust so I rebooted it and it failed to boot. I ended up bringing a monitor and keyboard to see what went wrong and it turned out that suricata’s logs had used 1.4TB on my 250GB SSD (ZFS zstd compression is awesome), causing it to run out of space a few years after I enabled the feature out of curiously. I wiped the logs, rebooted and things worked.

My lesson from them is when a machine servers acting strangely, do not reboot and instead troubleshoot right away. Had I done that, I could have found the problem and had only minimal partial downtime (I would have had to restart dns afterward). Since I was too busy to do things the right way, my internet was out for a half hour. Your story reminded me of this, since you would have had a bigger headache if you had rebooted.

hinkley

This is the origin story for anyone who has ever set up alerts for 90% disk utilization on a machine.

These are especially bad with services that generally grow their logs very slowly but when something like a net split happens or a server is down they generate as much log data in an hour as they typically do all week. So you get close to full and then an incident happens, and now you have two incidents because the screaming machine goes down with a full disk right after you lose your internal DNS server or what have you.

3eb7988a1663

One trick I learned is to create a big random file (must be random to ensure no funny compression filesystem tricks out smart you) called something like BIG_DUMMY_DATA_SAFE_TO_DELETE. When you find yourself in a catastrophic space situation, you can delete the file and have a less panicky recovery process as the immediate problem is gone.

PhilipRoman

The funniest consequence of full disk was that I could not log in to a server, because the login process required writing some tiny temporary file in the user's home. Did not find a way to recover it, other than plugging a keyboard in it.

hinkley

My second boss was a Sun Microsystems enjoyer and he always pronounced “superuser” as “stupid user”.

After a raised eyebrow he went on to explain, “because when the machine is broken it’s always because some stupid user did something.

It took me a couple more stupiduser incidents of my own before I instituted a rule of counting to five before hitting enter on any `rm -rf` command.

Ferret7446

> I instituted a rule of counting to five before hitting enter on any `rm -rf` command.

That's just (should be) standard practice. Another risky command is `sudo dd`/`sudo cat` for writing disk images, always chant the disk device against an fdisk -l listing like a magical spell, lest you nuke your main drive.

cafeinux

When using `dd` I always verbally say "`if` is input file, because I want to take data from <whatever I wrote there> and `of` is output file, because I want that data to overwrite <whatever I wrote here>". Never have I ever had a `dd` incident, but that's such a fear I have that I developed this habit.

hinkley

But it’s surprising how often you see the wheels turn when you explain it to someone else. It’s not obvious to everyone.

jwrallie

I always prepend a # when writing possibly dangerous commands, mostly rm -rf and dd. Started doing it after I wanted to dd an .iso of a distro to my usb drive but accidentally my main drive.

It can help if you press enter instinctively after finishing writing the command, and also against fat fingering an incomplete but valid command.

mook

I like to prefix it with `echo` so I can see what it's trying to run, especially when wildcards are involved. Or things like `find`.

eitland

I've done it too, but probably not as consistent as you.

mekster

First, you don’t let people read and type commands by hand. Second, hire a better guy than someone who blindly does sudo because the command didn’t work.

eitland

First, there is a difference between rolling out 5 such systems a year and modern server farms, second, replacing a brilliant electronics guy because he once messed up when he helped us out on site isn't something I'd want to have on my cv. Third this is well over a decade ago.

kees99

You were lucky scp uses a binary that typically lives outside /bin - in /usr/lib/openssh/ or some such.

Many years ago, I've got to recover a remote server where /usr was nuked (and /bin, /sbin, and /lib were all symlinks into now-empty /usr). Ended up writing a one-liner perl to convert /bin/busybox-static from my local machine into a series of:

  echo -ne "\x7f\x45..." >>~/busybox-static

...and copy-pasting that, chunk-by-chunk, into a single surviving ssh/bash connection, and then used that busybox binary to pull in from a backup.

hinkley

I have learned through hard experience that is you ever user sudo to edit the sudoers file, create two shell windows logged in as root before doing so.

Use visudo to edit the file of course, because not doing so can blow everything up by rendering the sudoers file unparseable and then everyone is gonna have a bad time.

But also the temptation when altering sudo is to immediately log out as super user and try using sudo to do the new command. If you’ve fucked up the file you might not be able to sudo anymore. So now use your second shell window to undo whatever you just did in the first window.

ryao

I have setup ssh remote forwards to a jump host in the past to allow remote access through a firewall. daemontools executes scripts exec’ing ssh. ExitOnForwardFailure and ServerAliveInterval are set client side with ClientAliveInterval and ClientAliveCountMax set server side to enable rapid recovery if something goes wrong.

Whenever one of the daemon tools scripts doing remote forwards needs to be modified, a second reverse forward script is added and the reverse forward from that is used for ssh, before changing the original script. The second script is removed only after confirming the first still works after the edit. This procedure prevents fatfingering from locking out remote access, since if something goes wrong, you just need to redo the previous step(s) until you get things working.

If anyone wants to replicate that, I suggest setting -nNT as arguments to ssh and restricting what the user login can do via sshd_config.

null

[deleted]

trelane

> In hindsight of course we should have written the docs in a way that would prevent this exact situation

I would say the bigger failure is relying on a human typing things into a terminal rather than automating the tasks or changing the system so the task is no longer needed.

hinkley

As if there haven’t been outages caused by incorrect directory interpolation in scripts.

trelane

Sure, bug happen. They (usually) happen reliably, and tests can help prevent/detect them.

It is impossible to test for human errors in advance, though.

rav

I often run rm -rf as part of operations (for reasons), and my habit is to first run "sudo du -csh" on the paths to be deleted, check that the total size makes sense, and then up-arrow and replace "du -csh" with "rm -rf".

mekster

Use trash-cli and additionally git commit the target if you’re nervous before deletion.

micw

It's always a good idea to allow sudo to untrained people on critical systems...

malkia

What would be safer alternative?

    pwd # Then check something?
    pushd bin #
    rm -rf .

Probably still with pitfalls

cortesoft

Write a script that does the steps required?

If the problem is defined enough to create an exact series of commands for an operator to execute, it is defined enough to create a script to do it for you

pphysch

Use the full path of the bin dir in your rm rf

hinkley

rm -rf ~/bin could have some nasty consequences if you fat finger a return key anywhere in the middle.

Run enough commands enough times and you will find Murphy is waiting for you.

If you’re just removing a bin directory one time, odds are low but not zero. If you’re writing a run book for people to use, odds are 100% that you will have to help someone rebuild at least once.

01HNNWZ0MV43FF

`rm -rf bin`

bigstrat2003

And not just for bin. There's probably an edge case where you would need to give the ./ prefix to rm, but I've never come across it. The vast, vast majority of the time just entering the name of the thing is easier and less error-prone.

OJFord

I think the misguided belief that `./` means 'execute script' or something (program that isn't 'installed'?) is single-handedly to blame for so much script spaghetti.

inejge

The heroic version of the story is now almost 40 years old[1]. (One HN mention with the link to a HTMLized version is here[2].) In both cases, the upshot is that as long as you have a running shell with root privileges, at least one existing executable file on the filesystem, and the means to overwrite that file with arbitrary binary content, you can write a small program which can recreate a skeleton system structure and dig yourself out of the hole.

The reason why this keeps happening is that in regular UNIX root is omnipotent and the filesystem is ultimately unprotected. Immutable systems and restricted execution environments may make this a thing of the past.

[1] https://www.wolczko.com/rm.txt [2] https://news.ycombinator.com/item?id=7892471

ryao

The technical term is unlinked. The files in use by running processes are still anonymous files in the filesystem. They will not be garbage collected until the last program that mmap’ed them is gone. If you could find the inode number from /proc (possibly from /proc/$PID/maps), you should be able to use a filesystem specific tool to retrieve them, such as debugfs or zdb.

abound

I think the tricky piece is that you'd need to find that inode number without using any dynamically linked libraries, including ls and cat in the author's case. And debugfs is likely dynamically linked too (I just checked and it is on my machine)

ryao

Presumably, the read shell builtin can be used to read files in /proc. As per the original article, you can get a static version over the network using only bash builtins and overwrite a file with execute permission to be able to execute it.

lloeki

A long long time ago the team I was part of managed old unix systems.

A coworker telnet'd (or rsh, can't recall) into such a machine to do some maintenance and after a while fat fingered:

    umount /

Would you believe it, back then being root meant this was absolutely unprotected and the minicomputer OS (some ancient AIX) dutifully complied.

The chaos that ensued is but a blur.

teaearlgraycold

Did you just restart?

WesolyKubeczek

Wasn’t restarting those AIX dinosaurs a nontrivial thing?

teaearlgraycold

I could believe it. I’m just hoping OP can provide more info.

dingaling

Using what command, though?

teaearlgraycold

Perhaps there's a restart button?

vessenes

I “rm -rf /“ ed my Linux box in 1996 and realized what I had done about 30 seconds in. It was through like a-d in /bin.

‘Recovery’ wasn’t a strong concept in Linux os installers then, and I wasn’t sure my home directory would survive a reinstall, (not on a separate partition) and I didn’t think I’d be able to reboot successfully in any event.

My brother ran the same distribution but at a school across the country; I was able to recover by carefully pulling what I needed down with ftp to his dorm computers IP. I’m not sure scp even existed then on Linux; I guess maybe we could have used netcat in a pinch. Well, this was already a pinch. In a real pickle.

This was one of the first moments where high bandwidth connections to endpoints really saved me / impacted me, and I haven’t forgotten. For many years after university this kind of direct high bandwidth connection got much harder to achieve, first because we were back in a low bandwidth residential world, then because ipv4 was mostly denied to consumers, then because of nat.

Today this would again be achievable, but with vastly more complexity. For a home computer rescue you’d want Tailscale in both sides. And it’s extremely unlikely you’d be using the same distribution as your sib, much less have the same libc linking. And god help you if you had to restore your systemd directory by hand.

nasretdinov

I think I now know why on macOS "Applications" (with capital A) comes first in the list of root-level directories. I've been showing to my colleague that on modern systems "sudo rm -rf /" does nothing, however that turned out to be GNU-specific thing, and macOS of course uses BSD tools. So I've started noticing that something is off very quickly and stopped the command. On HFS+ (probably on APFS too) the file list is always sorted, so it only deleted some of my Adobe apps and stuff like Addressbook, so I didn't even notice anything at first :)

nullorempty

At the start of my career I removed the `x` attribute from all files :)

ryao

Of all of the stories I have read here, this is the first to make me laugh. Congratulations. :)

nullorempty

... we had an amazing sysadmin. He had a shell open on that box when I came to tell the news. He started to type quickly and thoughtfully, trying utilities I haven't even heard of. Then, he echoed a small C program that was supposed to set `x` on the chmod. Quickly he typed `cc x.c` to compile... and then it dawned on him.

ryao

Perhaps scp -p or sshfs would have worked. This is just a guess. I am not certain.

ivanjermakov

chmod -R is too convenient lol

zavec

I actually started a short blog series about a similar problem where a friend had blown away /bin and a bunch of other stuff, but/lib was still there. Unfortunately it didn't end up getting anywhere because even though I was able to drop executables on the machine with echo and make them executable with a .so from lib I wasn't able to get back to root permissions as sudo and everything had been blown away and I didn't think I'd have great luck trying to find a zero-day in the kernel. It was still a lot of fun though.

Dwedit

If you deleted /lib, you'd probably be better off reinstalling packages while booting off of USB or something. You're gonna have downtime because programs won't work correctly.

LorenDB

I also had to wonder why not just liveboot from USB or attach the affected boot medium to another system, then use the recovery system's fully working tools to just relink the /lib folders?

ideasphere

“This is not a very important machine, and I could have just reimaged the MicroSD card and be done with it, but I was curious if I could recover from the error.”

nurple

My workstation seems fine:

  $ ls -R /{lib,usr,bin,sbin}
  ls: cannot access '/sbin': No such file or directory
  /bin:
  sh

  /lib:
  ld-linux.so.2

  /usr:
  bin

  /usr/bin:
  env

Oh right...

  $ ls -l /usr/bin/env
  lrwxrwxrwx 1 root root 65 Mar 21 23:39 /usr/bin/env -> 
  /nix/store/9m68vvhnsq5cpkskphgw84ikl9m6wjwp-coreutils-9.5/bin/env

  $ ldd /usr/bin/env 
        linux-vdso.so.1 (0x00007ffff7fc4000)
        libacl.so.1 => /nix/store/dyizbk50iglbibrbwbgw2mhgskwb6ham-acl-2.3.2/lib/libacl.so.1 (0x00007ffff7fb3000)
        libattr.so.1 => /nix/store/vlgwyb076hkz7yv96sjnj9msb1jn1ggz-attr-2.5.2/lib/libattr.so.1 (0x00007ffff7fab000)
        libgmp.so.10 => /nix/store/dsxb6qvi21bzy21c98kb71wfbdj4lmz7-gmp-with-cxx-6.3.0/lib/libgmp.so.10 (0x00007ffff7f06000)
        libc.so.6 => /nix/store/maxa3xhmxggrc5v2vc0c3pjb79hjlkp9-glibc-2.40-66/lib/libc.so.6 (0x00007ffff7d0e000)
        /nix/store/maxa3xhmxggrc5v2vc0c3pjb79hjlkp9-glibc-2.40-66/lib/ld-linux-x86-64.so.2 => 
        /nix/store/maxa3xhmxggrc5v2vc0c3pjb79hjlkp9-glibc-2.40-66/lib64/ld-linux-x86-64.so.2 (0x00007ffff7fc6000)

MadnessASAP

Don't have to worry about trashing FHS if your OS doesn't use FHS :-P

ofalkaed

Many years ago I wrote a backup script where I did "rm -rf /etc/" instead "cp -r /etc/ /mnt/whatever," not sure how I managed that. Took ages to figure out what was causing /etc/ to disappear since /etc/ going missing often went unnoticed for awhile and I was running Arch back in the days when running pacman -syu always caused exciting things to happen. I even did a complete backup and reinstall trying to figure that one out, and I was extra cautious about making a backup of that backup script which I had spent so much time on and was fairly proud of since it was my first non-trivial bash script.

I also once did "rm -rf /", was deleting a dir which started with a "[" and accidentally hit "enter" instead of "\." That one taught me the dangers of absolute paths.

Edit: That last one is not quite right, would not have been an absolute path issue, that dir must have ended up in root somehow, can't quite remember the details, been too long.

Vilian

Isn't absolute path better practice than relative paths?

ofalkaed

In scripts, yes, but they are not without their dangers and can be especially troublesome when working in the terminal since almost no one types out and checks the full paths constantly, they just let tab completion take care of it and assume it worked.

In scripts things like ../../../../file are a pain to read and assume everything in the script before all those previous dirs worked as it should have and everything is where it should be. Cd to an incorrect absolute path produces an error code so we can be sure we are in the proper dir, cd ../../../ will never produce an error and always succeeds even if you are at root.

3np

> almost no one types out and checks the full paths constantly, they just let tab completion take care of it and assume it worked.

Are you a bootcamp coach having watched countless of people at the terminal? What makes you assume this is a general habit?