Skip to content(if available)orjump to list(if available)

In POSIX, you can theoretically use inode zero

the_mitsuhiko

The OpenBSD UFS documentation says this:

> The root inode is the root of the file system. Inode 0 can't be used for normal purposes and historically bad blocks were linked to inode 1 (inode 1 is no longer used for this purpose; however, numerous dump tapes make this assumption, so we are stuck with it). Thus the root inode is 2.

This is also echoed on the wikipedia page for it.

The linux Kernel also has this comment for why it does not dish out that inode for shmem for instance:

> Userspace may rely on the the inode number being non-zero. For example, glibc simply ignores files with zero i_ino in unlink() and other places.

On macOS it's pretty clear that inode 0 is reserved:

> Users of getdirentries() should skip entries with d_fileno = 0, as such entries represent files which have been deleted but not yet removed from the directory entry

jcalvinowens

It used to happen on Linux with tmpfs, but kernel doesn't allow it anymore: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds...

It turns out that glibc readdir() assumes inode zero doesn't happen and the files are "invisible" to anything using libc. But you can call getdents() directly and see them.

I actually ran into this on a production machine once a few years ago, a service couldn't restart because a directory appeared to be "stuck" because it had one of these invisible zero inode files in it. It was very amusing, I figured it out by spotting the invisible filename in the strace output from getdents().

nulld3v

Also, there seems to be an effort brewing in the kernel to push userspace away from depending on inode #s due to difficulty in guaranting uniqueness and stability across reboots. https://youtu.be/TNWK1zbTMOU

AndrewDavis

They definitely aren't unique even without reboots. Postfix uses the inode number as a queue id. At $dayjob we've seen reuse surprisingly quickly, even within a few hours. Which is a little annoying when we're log spelunking and we get two sets of results because of the repeating id!

(there is now a long queue id option which adds a time component)

koverstreet

The combination of st_ino and the inode generation is guaranteed to be unique (excepting across subvolumes, because snapshots screw everything up). Filesystems maintain a generation number that's incremented when an inode number is being used, for NFS.

Unfortunately, it doesn't even seem to be exposed in statx (!). There's change_cookie, but that's different.

If anyone wants to submit a patch for this, I'll be happy to review it.

amiga386

...but it's unique while the file exists, right?

The combination of st_dev and st_ino from stat() should be unique on a machine, while the device remains mounted and the file continues to exist.

If the file is deleted, a different file might get the inode, and if a device is unmounted, another device might get the device id.

the_mitsuhiko

> The combination of st_dev and st_ino from stat() should be unique on a machine

It should, but it seems no longer to be the case. I believe there was an attempt to get a sysctl flag in to force the kernel to return the same inode for all files to see what breaks.

londons_explore

> ...but it's unique while the file exists, right?

I don't think all filesystems guarantee this. Especially network filesystems.

AndrewDavis

Yes! It's reusable, but not duplicated.

quotemstr

The problem isn't relying on inode numers; it's inode numbers being too short. Make them GUIDs and the problems of uniqueness disappear. As for stability: that's just a matter of filesystem durability in general.

the_mitsuhiko

> The problem isn't relying on inode numers; it's inode numbers being too short.

It's a bit of both. inodes are conflating two things in a way. They are used by the file system to identify a record but they are _also_ exposed in APIs that are really cross file system (and it comes to a head in case of network file systems or overlayfs).

What's a more realistic path is to make inodes just an FS thing, let it do it's thing, and then create a set of APIs that is not relying on inodes as much. Linux for instance is trying to move towards file handles as being that API layer.

bastawhiz

You could make it bigger, but then your inode table gets pretty big. If an inode number is 32 bits today, then UUIDs would take up four times the space. I'd also guess that the cost of hashing the UUIDs is significant enough that you'd see a user-visible performance hit.

And really, it's not even super necessary. 64-bit inode numbers already exist in modern file systems. You don't need UUIDs to have unique IDs forever: you'll never run out of 64-bit integers. But the problem wasn't really ever that you'd run out, the problem is in the way they're handled.

quotemstr

> You could make it bigger, but then your inode table gets pretty big.

You could do it like Java's Object.identityHashCode() and allocate durable IDs only on demand.

> If an inode number is 32 bits today, then UUIDs would take up four times the space.

We probably waste more space on filesystems that lack tail-packing.

> I'd also guess that the cost of hashing the UUIDs is significant enough that you'd see a user-visible performance hit.

We're hashing filenames for H-tree indexing anyway, aren't we?

> you'll never run out of 64-bit integers

Yeah, but with 128-bit ones you'll additionally never collide.

Animats

It's been a long time since what user space sees as an "inode" has anything to do with the representation within the file system.

null

[deleted]

duckerude

See also: https://internals.rust-lang.org/t/can-the-standard-library-s...

A file descriptor can't be -1 but it's not 100% clear whether POSIX bans other negative numbers. So Rust's stdlib only bans -1 (for a space optimization) while still allowing for e.g. -2.

etbebl

Isn't writing an article like this just daring someone to make the system in question more cursed, more likely to produce errors, harder to reason about, etc.? In the same spirit as, "well technically this common assumption about C behavior is undefined, so let's add some nasal demons to save 2 us (and make me look clever)"?

Am I missing something or is this just evil? I guess I'm taking it too seriously.

account42

Not really, you should understand the limitations of the system you are working with, including expected limitations that don't actually exist or only exist for some parts.

For example, knowing that inodes 0 are technically possible will tell you to not rely on inode 0 as a special magic number. It's when parts of the system disagree about details like this that you get issues, not when you learn about them.