Migrating Away from Bcachefs

104 comments

·January 20, 2025

agartner

I've been patiently waiting to convert my ZFS array to bcachefs. I'm very excited about better SSD promotion logic. But I'm not willing to spend any time on an experimental filesystem on my main systems.

> But you can expect to get flamed about running Debian, or perhaps more accurately, not being willing to spearhead Kent's crusade against Debian's Rust packaging policies.

It is quite unfortunate that Kent couldn't have just said "Debian isn't supported, we will revisit this when bcachefs is more stable" and stopped talking after that. Debian and experimental software just don't work well together.

koverstreet

Oh, the author's completely misrepresenting what happened here.

We had a major snafu with Debian, where a maintainer volunteered to package bcachefs-tools (this was not something I ever asked for!), and I explained that Debian policy on Rust dependencies would cause problems, and asked him not to do that.

But he did debundle, and then down the road he broke the build (by debundled bindgen and ignoring the minimum version we'd specified), and then _sat on it for months_, without even reporting the issue.

So Debian users weren't getting updates, and that meant they didn't get a critical bugfix that fixed passing of mount options.

Then a bunch of Debian users weren't able to mount in degraded mode when a drive died. And guess who was fielding bug reports?

After that is when I insisted that if bcachefs-tools is packaged for debian, dependencies don't get debundled.

If you're going to create a mess and drop it in my lap, and it results in users not able to access their filesystem, don't go around bitching about being asked to not repeat that.

agartner

Yeah just typical Debian stuff. jwz has been ranting about this for years. It's not worth spending any time on it.

Some suggestions:

- Only "supporting" the latest mainline kernel and latest tools. I prefer to point to CI system configurations to show exactly what it "supported"

- Make this clear via your website and a pinned issue on Github.

- Force users to report the versions they use via an issue template: https://docs.github.com/en/communities/using-templates-to-en.... Immediately close any issues not meeting your version/system requirements without further discussion or thought.

kstrauser

That last one’s great advice. I don’t remember if you can use checkboxes there and I’m too lazy to look at the moment, but I could imagine the first question being:

  [ ] I am using Debian packages

and auto-closing if set.

andrewshadura

Do you ever admit you're wrong?

koverstreet

I think I did once back in 2002.

bartvk

Genuinly curious: it seems like you are making a remark on his character, right? But why did you do so? Just fed up? Or did he actually state something wrong in the parent comment?

ThatPlayer

I've been running bcachefs on my spare dedicated SteamOS gaming machine for fun. Especially for the SSD promotion logic. It's a spare computer with an old 128GB SSD and 3TB HDD that I've got as a single filesystem. I love not having to manage games between the SSD/HDD. Too bad it's a mITX build with no space for more old drives I could stick in.

Darkskiez

I need to write up my experience. But I'm trying it out. Linux needs something like this. I've had issues, posted traces and had them fixed in seconds. Pretty damn amazing. I'd love to see a bigger team involved though.

gkmcd

My experience also. Kent is obviously very committed to the project.

A change to a filesystem should never be made in seconds.

hinkley

Confidence intervals don’t have precise timelines associated with them. Sometimes you know exactly what the problem is when you hear the symptoms.

We always balance new work versus cleanup. I always have a laundry list of beefs with my own work. You often have a sneaking suspicion that this piece of code is wrong in a manner you can’t quite put your finger on. Or you do but a new customer is trumping a hypothetical error. Or ten people are blocked waiting for you to merge some other feature. And then someone gives you a test case and you know exactly what is wrong with it. Sometimes that’s fixing a null check, an off by one error, or sorting results. And sometimes the repro case basically writes itself.

ajb

Bcachefs is experimental, and last I heard, the authors hope to be able to declare it not experimental in ~6 months. To me, that's 'try on a cache server/build server' territory, not on anything where you even think about backing it up.

Kent Overstreet does have a problem working with people; I can well believe that interactions around bugs were painful. He likely he should try to hire someone else to deal with users and distributions and use his limited patience to deal with the kernel maintainers. But it sounds like the OP was a bit naive about what it means to run an experimental FS.

AceJohnny2

> last I heard, the authors hope to be able to declare it not experimental in ~6 months

How long have they been saying that? I feel like that's been the case for years.

ajb

It's only been in the upstream kernel 6 months or so?

I know the FS has actually been around longer, but I've not heard claims about it's stability before that. It's only now that's it's getting a lot more testing

aragilar

Searching bcachefs on LWN (https://lwn.net/Search/DoTextSearch) does make for a somewhat interesting read e.g. https://lwn.net/Articles/717379/ (from 2017) which suggests the disc format would have one last change (c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin... which is for the upcoming 6.14 release, which notes the format will have at least one more changes after the one included), and the numerous articles on its upstreaming (e.g. https://lwn.net/Articles/755276/ from 2018), and the stability claims do seem a bit, well, optimistic?

pixelesque

It was merged in 6.7, 12 months ago...

gkmcd

FWIW, I've done a few bcachefs bug reports and thought Kent's responses were great.

ajb

I should not have implied that every bug report would be a bad interaction - and to be fair where I've seen it degenerate in the mailing list, it's often the other person wasn't exactly civil to start with (which unfortunately includes Linus Torvalds). It's just that Kent seems to feel the need to fight back.

jpgvm

A file system is an emotional investment of a decade+ so I can understand Kent's desire to defend it the way he does.

That said the kernel is the bigger picture, if he wants it to be successful and remain in mainline he needs to adapt to how Linus expects maintainers to operate. That probably means delaying getting fixes to mainline users until he can show they have been adequately tested. Until that trust is built bcachefs is always a few bad cycles away from being dropped from mainline.

arp242

Another thing is that I'm half expecting bcachefs to be pulled from mainline. That's my prediction for 2025, I guess? Not that I want that to happen, but Kent's behaviour has been ... difficult, and Linus already hinted this is a possibility.

I appreciate that Kent means well, but as an outsider reading some of his communications ... yeah, I'd struggle working with that too.

pronoiac

Is XFS well-regarded by others?

I was benchmarking filesystems by generating a billion empty files on them, and while ext2, ext4, and btrfs could finish in a day or two, xfs hit a wall in the first 4 million files, and was on track to take weeks. After hitting ctrl-c, it took 20+ minutes to get a responsive shell, and unmounting took a few minutes.

This surprised me, because it's been around for decades, and I expected scalability, performance, and robustness. A one-user accidental DOS was a surprising rake to trip over.

starspangled

XFS is well regarded. It is kind of a second-generation of UINX filesystem that was designed from ground up with journaling and extents and btrees and xattrs.

EXT4 is well regarded too, it has similar goodies now but it evolved from a first-generation stile without journaling and with block (not extent) tracking, primitive kinds of trees implemented as N-indirect arrays, etc. Whether that really matters is debatable, these days I think it's well acknowledged that you have to evolve and adapt to new features and clean redesigns aren't necessarily better (XFS has gone through a bunch of new changes too, e.g., metadata checksums).

XFS came from SGI IRIX and there it was really focused on datapath throughput and scalability (think, HPC systems or graphics clusters working through enormous source and result data files). And they were much less focused on metadata performance.

XFS certainly was much slower and more heavyweight than EXT doing things like creating files, but there has been a lot of work on that over many years and it's a lot better than it was. That said, a billion files is a fairly extreme corner and it's not necessarily what's required for most users.

This paper I found is similar, although probably has many differences from exactly what you were doing:

https://arxiv.org/html/2408.01805v1

XFS creation performance was around par with EXT4 up to 100M files, but took 2x as long to create 1B. Maybe yours did much worse because you aren't splitting files into subdirectories but creating them all in one? That could suggest some XFS structure has a higher order of complexity and is starting to blow out a bit, directory index for example.

XFS often comes out pretty high up there on other benchmarks

https://www.phoronix.com/review/linux-611-filesystems/3

So as always, the better performing one is going to depend on what your exact workload is, especially if you are doing something unusual.

ChocolateGod

There's not really any reason to use ext4 over xfs anymore, unless you need fscrypt or shrinking.

Reflink support is super useful.

pronoiac

> Maybe yours did much worse because you aren't splitting files into subdirectories but creating them all in one?

No, and also, I'd expect that to be awful. 1000 folders, each with 1000 folders, each with 1000 files.

Those Arxiv and Phoronix links are great!

ThePowerOfFuet

>https://www.phoronix.com/review/linux-611-filesystems/3

It's a shame that ZFS was not included.

ChocolateGod

Probably because it's benchmarking Linux filesystems.

ak217

I don't know what caused your experience, but I've had the opposite experience using XFS with many small files. It's my filesystem of choice for simple (single device) use cases.

The main reason is that XFS automatically allocates inode space. With ext4, I would quickly run out of the default inode quota while the volume was nowhere near full, and then manually tune the inode to block ratio to accommodate more files. XFS took care of that automatically. Performance was otherwise identical, and I've never seen any data loss bugs or even crashes from either one.

GuestFAUniverse

How many decades ago was that? Sounds more like a partition converted from ext3. No ext4 partition I've seen in the last 15y, didn't have a ridiculous amount of inodes. I do support for several hundred Linux systems.

ak217

Zero decades ago? I run EC2 instances that process hundreds of millions of small files. I always use the latest Ubuntu LTS.

I'm also tired of trying to share my experience and having to choose between ignoring snide ignorant low-brow dismissals, and leaving them unanswered so they can misinform people. Ext4 does not have a dynamic inode reservation adjustment feature. XFS does. So with ext4, it's possible to run out of inodes while there are blocks available. With XFS, it's not.

starspangled

From this paper https://arxiv.org/html/2408.01805v1 (2024)

> EXT4 is the general-purpose filesystem for a majority of Linux distributions and gets installed as the default. However, it is unable to accommodate one billion files by default due to a small number of inodes available, typically set at 240 million files or folders, created during installation.

Which is interesting. I knew EXT2/3/4 had inode bitmaps, but I haven't been paying them much attention for the past decade. Slightly surprised they haven't added an option for dynamic allocation, OTOH inodes are small compared with storage and most people don't need billions of files.

throwdbaaway

With gentoo, if you allocate let's say 20G to / on ext4, then you can quite easily run into this issue.

/usr/src/linux will use about 30% of the space and 10% of the inodes.

/var/db/repos/gentoo will use about 4% of the space and 10% of the inodes.

Next you clone the firefox-source hg repo, which will use about 15% of the space and 80% of the inodes.

Dylan16807

Based on the mke2fs git history, the default has been a 256 byte inode per 16KB of drive size since 2008, and a 128 byte inode per 8KB of drive size before that.

nicman23

i had ext4 telling out of space with 52% in df lol

i just converted it inline to btrfs

kstrauser

What's all that about the bcachefs author's complaints with Rust and Debian? I'm far out of the loop on this stuff.

ajb

His side is that basically it doesn't make sense to package the tools for an experimental FS in an OS that's going to get very far behind, as debian stable will do, since the tools have to iterate rapidly to fix problems with releases of the FS. Debian-stable had an old set of some rust libraries and the packager relaxed the dependency constraints in order to package it, which doesn't make a lot of sense on something you are hoping to fix your filesystem.

Basically it shouldn't be packaged for LTS-class OS releases until it's not experimental.

dralley

Kent's issue IIRC was that they loosened the dependencies, full-stop. Debian presumably isn't replicating his entire testing regime when they change all the dependencies being compiled in. The potential exists for a bug to be introduced that is genuinely consequential to the user, moreso than for, say, a broken calendar app. Combine that with a rapidly changing FS and the fact that any issues would likely be blamed on his FS and I can see why he might feel that way.

nialv7

If a distro chose to build some program with a different set of dependencies than what is specified by its author, then arguably it is not the same piece of software anymore. If Debian wants to do that - which I think they 100% have the rights to do, this is free software after all - they should make it clear that they will also take over the maintenance responsibility for it as well.

bhelkey

> [bcachefs auhor's] side is that basically it doesn't make sense to package the tools for an experimental FS in an OS that's going to get very far behind

If the bcachefs author's believe the tool is too unstable for Debian stable and Debian developer's believe bcachefs is too unstable for Debian stable [1], it sounds as if they agree.

[1] https://jonathancarter.org/2024/08/29/orphaning-bcachefs-too...

kstrauser

Ah, I see. So "the Rust team" here means the team of Debian maintainers who package Rust for it, not the Rust developers? That makes sense.

ajb

Yeah I assume that's what "the Rust team" meant

knowitnone

This makes perfect sense to me. Why would Debian-stable include something obviously not stable. They have an experimental branch for a reason

6SixTy

To my knowledge, Debian changed deps in bcachefs-tools to synchronize with Debian's Rust repo, breaking it. It's one part that it's a green fs, and the other with clashing expectations of how to treat dependencies between Rust and Debian.

elitepleb

Debian would compile dependencies with versions lower than specified in the project, reintroducing bugs that users would blame upstream for, https://www.phoronix.com/news/Debian-Orphans-Bcachefs-Tools

linsomniac

Why is Kent not providing his own deb packages that users can install to override the Debian provided ones to get updates?

koverstreet

Perhaps down the road, after experimental is lifted. For now I've generally been telling distros to slow down.

For anything this big staging the release is important, we don't need or want to be in every distro right away (the Fedora people were also _extremely_ gung ho in the past and I told them the same thing).

Until every outstanding critical bug report is fixed (and there are still a few), I want power users testing it who know how to deal with things breaking, I don't want unsuspecting users getting bit.

tmtvl

Because he isn't a Debian packager. That isn't his job.

kstrauser

Got it. I could see that being an issue.

XorNot

It seems honestly odd to me not to just vendor what you need as standalone packages if your dependencies are that specific and you're a filesystem i.e. you use the bcachefs-errno package, not errno.

tredre3

Debian tends to put their principles above pragmatism (for better or worse), so would they even agree with such vendoring when it's entirely meant as a way to bypass their vision/requirements/process for how dependencies should be handled?

honestSysAdmin

I don't expect ZFS to be dethroned this decade, nor the next one.

nicman23

I used to run zfs with bcache blocks lol

honestSysAdmin

Some of the zpool configurations I have set up seem "odd" to others, but they do have a purpose and are well thought out.

What motivated you to combine ZFS and bcache? What did that configuration look like? What was the thinking behind it? I would prefer to inquire further rather than make and present wrong assumptions as to why the configuration is wrong.

nicman23

User uses experimental fs. It breaks.

Surprised Pikachu face

aragilar

Yes and no—said user ran this as an experiment (relevant quote: "But with a filesystem this young, you inevitably have to accept some rough edges in return for some fun."), but eventually stopped it partially due to the people involved (relevant quotes: "But here's the main issue for me: Reporting bugs to bcachefs is not a pleasant experience."; "And at some point, the fun just stopped for me."), not just because there were bugs.

nicman23

fun is not something i want to describe my fs as

stavros

If that's what you got from the article, I recommend you read it again.

snvzz

I stick to OpenZFS, it's rock solid.

Filesystems are the one thing I don't want to experiment with. Too much potential trouble.

MaKey

Can it still be called "rock solid" with a data corruption bug in its younger history? https://news.ycombinator.com/item?id=38770168

koverstreet

Every filesystem has had data corruption bugs. Unfortunately, that's just how it is with the complexity of modern filesystems and the tools we have.

(We're still coding in bloody _C_).

What you really don't want is to be losing entire filesytems. A single file here and there getting corrupted is one thing, and it's usually highly workload dependent so it'll probably only hit one application, but losing an entire filesystem is much more impactful.

ext 2/3/4 has the best track record here (and note, it's also had data corruption bugs), with the relatively simple on disk format and e2fsck being quite robust. XFS is probably next up, I've seen reports of XFS filesystems being lost but only one or two - a tiny, tiny fraction of the btrfs reports. Can't speak as much to ZFS.

hulitu

> Can it still be called "rock solid" with a data corruption bug in its younger history? https://news.ycombinator.com/item?id=38770168

Ok.It is sand solid. There were a lot of voices on HN trying to downplay the issue, but, a filesystem cannot be named stable, if it corrupts data, in 2020.

snvzz

Absolutely. We can trust it because it has already done through all this. It is battle-proven.

Unlike some new filesystem created just yesterday.

helf

Yes.