The order of files in your ext4 filesystem does not matter
61 comments
·April 6, 2025userbinator
kevincox
Yeah, my jaw was dropping as I realized how far they went with this from checking their mount options to reading ext4 source code. Directly order is almost always an implementation detail (I'm pretty sure it is on ext4) and even if it isn't you still shouldn't rely on it (for when someone decides to migrate your production machines to BTRFS because they want snapshots and now your app has some weird breakage). The problem is that the app depends on directory order, and you need to fix that, not figure out how you can predict the directoy order.
Maybe there should be a mount options to randomize directory order that people can use in their staging environments.
MrDrMcCoy
> Maybe there should be a mount options to randomize directory order that people can use in their staging environments.
The behavior I've witnessed suggests that the the order is based on inode numbering, which is initially sequential from creation time, and drifts semi-randomly as inodes are unlinked and reused. I don't know this for a fact, but it makes enough sense. Directory ordering should be assumed to be random in all cases, as you suggest.
ivanjermakov
Also, command line strings are limited to 128kiB on Linux: https://unix.stackexchange.com/questions/120642/what-defines...
wpollock
Counting on the order of files to support multiple versions of jars was never a good idea. Java does have multiple version ("release") jar files for your use case since java 9. See <https://docs.oracle.com/en/java/javase/24/docs/api/java.base...>.
Since duplicates on the classpath don't cause problems, a quick & dirty fix is to manually list versioned jars first, in order, then the jars/* argument.
yjftsjthsd-h
> there was a client library that needed a Bouncy Castle “provider” with a version “jdk15”+ as the client initialization used specific properties from a class, and those properties were only available in “jdk15”+.
> up until the node image update, we “fortunately” had node images with directory hash seeds ordering “jdk15” or “jdk18” before “jdk14”.
So the actual bug is that something needing jdk15+ should either retry or be deterministically fed a valid file, right? And this whole article is figuring out why the filesystem coincidentally masked it by accidentally always happening to hand it a file with what it needed?
amiga386
> something needing jdk15+
Actually, no, that "15" refers to Java 1.5, aka Java 5, released 2004. Bouncy Castle has some funky variants, specially for Java 1.1, 1.2, 1.3, 1.4, 5, 6, 7, 8. All you actually need is the Bouncy Castle for Java 8 onwards, which is pretty much all versions of Java in use today.
The bug is that multiple providers of Bouncy Castle don't cleanly work when in the classpath together. The authors of Bouncy Castle aren't changing that, because they're like "use our software correctly, please". It's not Java's fault, you can only make classes that don't work on old versions of the JDK, you can't make new Java somehow notice you've included a jar written specifically for an old version of the JVM.
Java did introduce the ability to create multi-release jar files, where you can have JDK-version-specific classes/resources in one jar file... but only from Java 9 onwards. All this mixing and matching by filename that Bouncy Castle uses is for Java 1.1 - Java 1.8 only.
You can also mix and match and cause failure by using one of the Bouncy Castle JCE provider variants with the wrong corresponding "pkix", "util", "mail" jars (extra jars for all the things you might want to do with cryptography that _aren't_ part of the standardised Java Cryptography Extensions API that the main "provider" jar implements). And you can also mess up by mixing FIPS-approved BC with FIPS-not-approved BC.
You only need one set of jars:
* If you don't need FIPS approval: bcprov-jdk18on, bcutil-jdk18on, bcpkix-jdk18on, ...
* If you do: bc-fips, bcutil-fils, bcpkix-fips, ...
o11c
It does matter for performance.
If you read files in the same order they are on disk (often, the order in which they were written, which readdir on modern filesystems should choose to produce), I/O is much faster.
eptcyka
Order of files listed in a directory need not match the order of the bytes saved in the physical media.
scrapheap
It's worth noticing that the performance difference between sequential and non-sequential reads will differ significantly between types of devices. It's much more noticeable on a spinning hard disk drive than it is on a solid-state drive.
bitwize
On spinning rust, sure. That does not hold for SSDs (which most consumer-grade computers have now).
jbverschoor
You’d still miss out on some potential prefetch cache misses
LoganDark
Literally condemn any computer that still comes new from the factory with spinning rust. I was using SSDs back in 2012.
jonhohle
Build tools supporting duplicate class detection have existed for… well a long time. Ignore them at your own peril.
rzzzt
The orange site discusses the article in the first footnote here: https://news.ycombinator.com/item?id=43573507
yjftsjthsd-h
Why "the orange site"?
PMunch
It was referenced in the article as "the orange site" however the reason for it initially being named as such is probably because of HNs system of trying to avoid popularity being artificially driven high. The details of this is as far as I know pretty scarce, but the idea is that if you try to get to the top of Hackernews they somehow detect that and penalize you. So people have taken to calling it "the orange site" in order to avoid this detection when talking about HN.
froh
how about it being a simple gentle nod to the plain design of HN.
aargh_aargh
It's a different calling convention. Call by value rather than call by name.
tom_
See the first sentence of the article!
yjftsjthsd-h
?
> the title is a cheeky reference to something at the front page of the orange site today
Yes, that's what I'm asking. Why do people refer to HN as "the orange site"?
dathinab
Always fun when code relies on the order of iterating over a dir (which is in general clearly not defined to have any order, even iterating the same dir 2th consecutively might not yield the same order depending on "stuff" (e.g. exact file system used)).
So if order matters, always sort.
(Luckily in most situations where dir iter order matters, the performance impact from sorting is acceptable or even outright irrelevant.)
amelius
By the way, max hardlink count for ext4 seems configured ridiculously low for modern standards, at least on Ubuntu.
amelius
"ls" can take ages on a large folder. Is there a way to make it more immediate, i.e. streaming output without sorting?
kristianp
It's something like ls -u from memory.
amelius
Looks like it's -U (capital U). But I just tried it and it still took several seconds for the first filename to appear. It was not the spinning up of the disk because I first did ls in the parent folder which was immediate. The second time I did ls on the large folder, though, it was fast (even without -U).
aaronmdjones
> It was not the spinning up of the disk because I first did ls in the parent folder which was immediate
That doesn't tell you anything; the parent's dentries could have been cached days ago and still present, meaning it didn't actually access the disk or cause it to be spun up (if it wasn't) at all.
When doing any kind of repeatable measurement or experimentation on disks you will want to drop the page cache every time first:
# echo 3 >/proc/sys/vm/drop_caches
kristianp
Yes you're right, it's -U on linux (1). On Mac its -f (2). Linux also has -f, which is equivalent to -a -U .
cheshire_cat
Could they have avoided that issue by specifying the classpath without the star?
So -cp /jars/ instead of -cp /jars/*?
Kwpolska
So what was the production fix? Surely you're not hex-editing the image until the end of time?
amiga386
The production fix is don't include 3 versions of the same dependency in the image build (use "bcprov-jdk18on" and don't use any other "bcprov")
Another fix can be to use a fat jar (containing your software and all its dependencies), but this doesn't work for Bouncy Castle, because Cryptography Is Special(TM), and Java won't load cryptography providers unless their jars are signed, and including the cryptography provider jar in the far jar means it loses its signature.
Kwpolska
> The production fix is don't include 3 versions of the same dependency in the image build (use "bcprov-jdk18on" and don't use any other "bcprov")
I doubt anyone is doing that manually, that’s probably done by mvn/gradle/sbt/whatever the cool Java kids use these days. Do the build tools not know about this problem and just make a mess?
amiga386
It's Bouncy Castle's particular situation. The Java build tools are totally fine with resolving thousands of version dependencies so everyone is happy. You can depend on A which in turn depends on B version 1.2 and also depend on C which depends on D which depends on E version 1.1 and you only end up with one version of B included, version 1.2. Java execution environments also support all kinds of classloader isolation so you have multiple versions of the same jar and classes, all in the same JVM, only visible to the components that wanted to see them, so there's no clash.
But Bouncy Castle - and almost nothing else - adds another dimension across its artifact names. This is not standard! You now have to watch your dependency trees like a hawk to see that some other artifact doesn't bring in <artifactId>bcprov-jdk14</artifactId> to fuck with your <artifactId>bcprov-jdk18on</artifactId>, and if they do, you need to slap an <exclusion> on that dependency's dependency.
The reason Bouncy Castle does this is because it chooses to support some very old versions of Java, that predate JDK 9 introducing multi-release jars (https://docs.oracle.com/en/java/javase/21/docs/specs/jar/jar...) which removed the need for different named jars for different JDKs (...but only from JDK 9 onwards)
So, in general, the Java tools have this solved, unless you're Bouncy Castle.
tryauuum
great article
I'm feeling like an old man now but who the hell calls a tool "buildah"? Especially with its ugly dog logo. You can almost assume the dog wants to say "builder" but the extra flaps of skin makes the sound distorted
usr1106
At least it is search engine friendly. Recently had to search for code snippets for the 30 year old "expect" tool. Was rather difficult and I thought, well the Web is younger than that tool, they could not imagine a search engine. Hint: "expect script" seemed to work decently well.
eMPee584
(.. or a search on https://pkgs.org to surface metadata)
davideg
Looks like it's a silly and self-aware play on the word "builder" (New England regional dialect):
> Since I’m relatively new to the world of containers and images, I was excited to learn about the Buildah tool. Especially since I’m a native New Englander and it’s a clever play on how we say Builder in these parts. [0]
[0] https://buildah.io/blogs/2017/06/22/introducing-buildah.html
ghaff
That is correct. The person who largely had overall responsibility for Red Hat’s open source container tooling is a Boston area native.
Waterluvian
Much like the choice to stop using language features like capitalization, it’s part of the current cultural trend.
Kinda like Buildly or Buildr. It’s cool until it’s your turn to be old. Then you look back and wince.
llmthrow103
I've been using no capitalization on short messages in chat for more than 20 years (and still do), but an entire article written in the style makes it harder to read. It's funny that the author believes in syntax highlighting for code readability but not capitalization for English readability.
thaumasiotes
> but an entire article written in the style makes it harder to read
That's purely a familiarity effect; it's a self-solving problem.
jraph
> it’s part of the current cultural trend
Is it, or it's just a niche just like people who write 5 digit years, putting a 0 in front?
It's still very rare to encounter any of those.
jerven
Is it a current trend? my Mom does this and she picked it up in the 70's on typewriters.
bongodongobob
Those people are so short sighted. I put two 0's in front because I really care about humanity. This, I believe, will help fix climate change. Excuse me while I sniff my own farts.
PMunch
I mean "buildah" is at least searchable (imagine trying to look for a build tool called "builder"). The lack of capitalization doesn't have any positive side-effects, apart from saving your shift key some use..
throwaway127482
> who the hell calls a tool "buildah"?
Bostonians? :P
Brian_K_White
Since it's a name I'm fine with it. That is actually some people's pronounciation, even if no one's spelling, but I have no problem taking them seriously since they are not simply putting annoying affectation into writing, it's a name. Names have to be distinct, and they don't have to be cute but it's also not exactly damning either.
All that said, probably wouldn't have been my choice either.
It's weird. I personally wouldn't want quite such a silly name for that particular kind of tool, but that is a funny thing for me to say because I was never one of the people who wanted to remove the swear words from the kernel because "professional impression". Don't ask me to explain it.
dathinab
> hell calls a tool "buildah"?
people who seem to have done a pretty good job
I mean branding logo for a this kind of tool really doesn't matter and if so why should you hire a graphic designer to do that for you if you already have something which is passable.
You can read it as build-ah, ah is in some languages the word for the sound people make when they have a insight/light bulb moment. It might also just be a coincidence, idk.
But most importantly it's nicely searchable word, it's memorable too, it's pronounceable and it's somewhat related to what it does (a "build" tool).
So in all the metrics which matter it's a good name.
benatkin
I like the dog logo. Thanks for calling attention to it, I now have something to ghiblify.
> the actual argument value the JVM receives is "/jars/*", and in turn decides to be helpful, and expand the wildcard anyway
Whenever I see such things, I immediately think "whatever the resulting order is, it had better not matter"; and if it does, which is definitely true for Java classpaths, I consider it a bug that needs to be fixed ASAP, before it causes what happened in the article.