Skip to content(if available)orjump to list(if available)

Bzip2 crate switches from C to 100% Rust

dralley

How realistic is it for the Trifecta Tech implementation to start displacing the "official" implementation used by linux distros, which hasn't seen an upstream release since 2019?

Fedora recently swapped the original Adler zlib implementation with zlib-ng, so that sort of thing isn't impossible. You just need to provide a C ABI compatible with the original one.

tiffanyh

I think that is the goal of uutils.

https://uutils.github.io/

wmf

Ubuntu is using Rust sudo so it's definitely possible.

masfuerte

They do provide a compatible C ABI. Someone "just" needs to do the work to make it happen.

rlpb

> You just need to provide a C ABI compatible with the original one.

How does this interact with dynamic linking? Doesn't the current Rust toolchain mandate static linking?

arcticbull

Rust lets you generate dynamic C-linkage libraries.

Use crate-type=["cdylib"]

nicoburns

Dynamic linking works fine if you target the C ABI.

timeon

You can use dynamic linking in Rust with C ABI. Which means going through `unsafe` keyword - also known as 'trust me bro'. Static linking directly to Rust source means it is checked by compiler so there is no need for unsafe.

rwaksmunski

I use this crate to process 100s of TB of Common Crawl data, I appreciate the speedups.

viraptor

What's the reason for using bz2 here? Wouldn't it be faster to do a one off conversion to zstd? It beats bzip2 in every metric at higher compression levels as far as I know.

rwaksmunski

Common Crawl delivers the data as bz2. Indeed I store intermediate data in zstd with ZFS.

declan_roberts

That assumes you're processing the data more than once.

null

[deleted]

malux85

Yeah came here to say a 14% speed up in compression is pretty good!

firesteelrain

Anyone know if this will by default resolve the 11 outstanding CVEs?

Ironically there is one CVE reported in the bzip2 crate

[1] https://app.opencve.io/cve/?product=bzip2&vendor=bzip2_proje...

tialaramex

There's certainly a contrast between the "Oops a huge file causes a runtime failure" reported for that crate and a bunch of "Oops we have bounds misses" in C. I wonder how hard anybody worked on trying to exploit the bounds misses to get code execution. It may or may not be impossible to achieve that escalation.

Philpax

> The bzip2 crate before 0.4.4

They're releasing 0.6.0 today :>

solarized

Do they use any llm to transpile the C to Rust ?

nightfly

Task that requires precision and potentially hard to audit? Exactly where I'd use an LLM /s

anonnon

> Improved performance

After the uutils debacle, does anyone still trust these "rewrote in Rust" promotional benchmarks without independent verification?

vlovich123

> After the uutils debacle

Which debacle?

anonnon

See https://desuarchive.org/g/thread/104831348/#q104831479 https://desuarchive.org/g/thread/104831348/#104831809

Also uutils is a corporate-sponsored, corporate-friendly MIT licensed rewrite that's hostile to user (and developer) freedom.

EDIT: for those unaware of the context, that thread was not long after a uutils dev gave a talk at FOSDEM where he presented benchmarks purporting to show uutils sort's superior performance, which /g/ exposed as being only due to its inadequate locale support.

0cf8612b2e1e

So what was I supposed to get from that 4chan wannabe site? That the project is not currently at fast as GNU? Where is the lying?

vlovich123

So what I’m getting is

1. The uutils project didn’t also make all locales cases for sort faster even though the majority of people will be using UTF-8, C or POSIX where it is indeed faster

2. There’s a lot of debating about different test cases which is a never ending quibble with sorting routines (go look at some of the cutting edge sort algorithm development).

This complaint is hyperfocusing on 1 of the many utilities they claim they’re faster on and quibbling about what to me are important but ultimately minor critiques. I really don’t see the debacle.

As for the license, that’s more your opinion. Rust as a language generally has dual licensed their code as MIT and Apache2 and most open source projects follow this tradition. I don’t see the conspiracy that you do. And just so I’m clear, the corporation your criticizing here as the amorphous evil entity funding this is Ubuntu right?

jeffbee

You should of course verify these results in your scenario. However, I somewhat doubt that the person exists who cares greatly about performance, and is still willing to consider bzip2. There isn't a point anywhere in the design space where bzip2 beats zstd. You can get smaller outputs from zstd in 1/20th the time for many common inputs, or you can spend the same amount of time and get a significantly smaller output, and zstd decompression is again 20-50x faster depending. So the speed of your bzip2 implementation hardly seems worth arguing over.

dale_huevo

A lot of this "rewrite X in Rust" stuff feels like burning your own house down so you can rebuild and paint it a different color.

Counting CPU cycles as if it's an accomplishment seems irrelevant in a world where 50% of modern CPU resources are allocated toward UI eye candy.

cornstalks

> Counting CPU cycles as if it's an accomplishment seems irrelevant in a world where 50% of modern CPU resources are allocated toward UI eye candy.

That's the kind of attitude that leads to 50% of modern CPU resources being allocated toward UI eye candy.

0cf8612b2e1e

Every cycle saved is longer battery life. Someone paid the one time cost of porting it, and now we can enjoy better performance forever.

dale_huevo

They kicked off the article saying that no one uses bzip2 anymore. A million cycles saved for something no one uses (according to them) is still 0% battery life saved.

If modern CPUs are so power efficient and have so many spare cycles to allocate to e.g. eye candy no one asked for, then no one is counting and the comparison is irrelevant.

yuriks

It sounds like the main motivation for the conversion was to simplify builds and reduce the chance of security issues. Old parts of protocols that no one pays much attention to anymore does seem to be a common place where those pop up. The performance gain looks more like just a nice side effect of the rewrite, I imagine they were at most targeting performance parity.

null

[deleted]

jimktrains2

Isn't bzip used quite a bit, especially for tar files?

Rucadi

I personally find a lot more relevant the part about "Enabling cross-compilation ", which in my opinion is important and a win.

The same about exported symbols and being able to compile to wasm easily.

tcfhgj

> Counting CPU cycles as if it's an accomplishment seems irrelevant in a world where 50% of modern CPU resources are allocated toward UI eye candy.

Attitude which leads to electron apps replacing native ones, and I hate it. I am not buying better cpus and more ram just to have it wasted like this

Terr_

It seems to me like binary file format parsing (and construction) is probably a good place for using languages that aren't as prone to buffer-overflows and the like. Especially if it's for a common format and the code might be used in all sorts of security-contexts.

viraptor

Those cycles translate directly to $ saved in a few places. Mostly in places far away from having any UI at all.

anonnon

> Counting CPU cycles

And that's assuming they aren't lying about the counting: https://desuarchive.org/g/thread/104831348/#q104831479

DaSHacka

Rust devs continuing to use misleading benchmarks? I, for one, am absolutely shocked. Flabbergasted, even.

jxjnskkzxxhx

> lot of this "rewrite X in Rust" stuff feels like

Indeed. You know the react-angular-vue nevermind is churn? It appears that the trend of people pushing stuff because it benefit their careers is coming to the low level world.

I for one still find it mistifying that Linus torvals let this people into the kernel. Linus, who famous banned c++ from the kernel not because of c++ in itself, but to ban c++ programmer culture.