Wild – A fast linker for Linux
116 comments
·January 24, 2025pzmarzly
estebank
Note that Mold has no interest in becoming incremental, so there is a big reason there for another linker to exist. I find it kind of embarrassing that MS' linker has been incremental by default for decades, yet there's no production ready incremental linker on Linux yet.
pjmlp
Additionally the way precompiled headers are handled in Visual C++ and C++ Builder have always been much better than traditional UNIX compilers, and now we have modules as well.
paulddraper
It has to be a candidate for the longest biggest gap in build tooling ever.
bogwog
[flagged]
dang
"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."
estebank
Yes, I missed a word. And I believe pretty much everybody else realized what I meant to say.
Feel free to point me in the direction of a production grade incremental compiler that can run on Linux, GNU or otherwise.
Thorrez
I'm pretty sure that's a typo, and "incremental" was meant to be included in that sentence.
bdhcuidbebe
Why so hostile? Have a break, go look at the clouds, they are beautiful today!
easythrees
Wait a minute, it’s possible to relicense something from GPL to MIT?
prmoustache
Yes if you are the only developper and never received nor accepted external contributions or if you managed to get permission from every single person who contributed or replaced their code with your own.
computably
> or if you managed to get permission from every single person who contributed
This makes it sound more difficult than it actually is (logistically); it's not uncommon for major projects to require contributors to sign a CLA before accepting PRs.
DrillShopper
Yes. Generally you need permissions from contributors (either asking them directly or requiring a contribution agreement that assigns copyright for contributions to either the author or the org hosting the project), but you can relicense from any license to any other license.
That doesn't extinguish the prior versions under the prior license, but it does allow a project to change its license.
secondcoming
Maybe I'm holding it wrong, but mold isn't faster at all if you're using LTO, which you probably should be.
compiler-guy
Mold will be faster than LLD even using LTO, but all of its benefits will be absolutely swamped by the LTO process, which is, more or less, recompiling the entire program from high-level LLVM-IR. That's extremely expensive and dwarfs any linking advantages.
So the benefit will be barely noticable. As another comment points out, LTO should only be used when you need a binary optimized to within an inch of its life, such as a release copy, or a copy for performance testing.
paulddraper
Username checks out.
And factual.
0x457
I think we're talking about non-release builds here. In those, you don't want to use LTO, you just want to get that binary as fast as possible.
Arelius
Yeah, if you're development process requires LTO you may be holding it wrong....
Specifically, if LTO is so important that you need to be using it during development, you likely have a very exceptional case, or you have some big architectural issues that are causing much larger performance regressions then they should be.
benatkin
Being able to choose a middle ground between development/debug builds and production builds is becoming increasingly important. This is especially true when developing in the browser, when often something appears to be slow in development mode but is fine in production mode.
WebAssembly and lightweight MicroVMs are enabling FaaS with real time code generation but the build toolchain makes it less appealing, when you don't want it to take half a minute to build or to be slow.
jcalvinowens
If you're debugging, and your bug only reproduces with LTO enabled, you don't have much of a choice...
benatkin
Agreed. Both fast and small are desirable for sandboxed (least authority) isomorphic (client and server) microservices with WebAssembly & related tech.
null
1vuio0pswjnm7
"These benchmark were run on David Lattimore's laptop (2020 model System76 Lemur pro), which has 4 cores (8 threads) and 42 GB of RAM."
https://news.ycombinator.com/item?id=33330499
NB. This is not to suggest wild is bloated. The issue if any is the software being developed with it and the computers of those who might use such software.
satvikpendem
I looked at this before, is it ready for production? I thought not based on the readme, so I'm still using mold.
For those on macOS, Apple released a new linker about a year or two ago (which is why the mold author stopped working on their macOS version), and if you're using it with Rust, put this in your config.toml:
[target.aarch64-apple-darwin]
rustflags = [
"-C",
"link-arg=-fuse-ld=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld",
"-C",
"link-arg=-ld_new",
]
dralley
No, the author is pretty clear that it shouldn't be used for production yet
satvikpendem
Great, I'll keep a look out but will hold off on using it for now.
newman314
Can you confirm that's still the right location for Sequioa?
I have the command line tools installed and I only have /usr/bin/ld and /usr/bin/ld-classic
satvikpendem
Then it'd be the /usr/bin/ld as I believe my solution was for before they moved the linker it seems.
brink
I don't even use mold for production. It's for development.
kryptiskt
What would be refreshing would be a C/C++ compiler that did away with the intermediate step of linking and built the whole program as a unit. LTO doesn't even have to be a thing if the compiler can see the entire program in the first place. It would still have to save some build products so that incremental builds are possible, but not as object files, the compiler would need metadata to know of the origin and dependencies of all the generated code so it would be able to replace the right things.
External libs are most often linked dynamically these days, so they don't need to be built from source, so eliminating the linker doesn't pose a problem for non-open source dependencies. And if that's not enough letting the compiler also consume object files could provide for legacy use cases or edge cases where you must statically link to a binary.
dapperdrake
SQLite3 just concatenation everything together into one compilation unit. So, more people have been using this than probably know about it.
almostgotcaught
People trot this out like it's some kind of brilliant insight all the time and I always laugh.
First of all UNITY_BUILD is supported in CMake for a long time - try it out and please report back how many ODR violations your code base has.
Secondly, if you think any compiler is meaningfully doing anything optimal ("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) relative to compiling individually you're dreaming. Let alone on a codebase with millions of lines. Maybe inlining functions at most but you should have those in a header already.
nn3
>Secondly, if you think any compiler is meaningfully doing anything optimal >>("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) >relative to compiling individually you're dreaming.
That's wrong. gcc generates summaries of function properties and propagate those up and down the call tree, which for LTO is then build in a distributed way. It does much more than mere inlining, but even advanced analysis like points to analysis.
https://gcc.gnu.org/onlinedocs/gccint/IPA.html https://gcc.gnu.org/onlinedocs/gccint/IPA-passes.html
It scales to millions of lines of code because it's partioned.
jcalvinowens
> if you think any compiler is meaningfully doing anything optimal ("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) relative to compiling individually you're dreaming.
You can build the Linux kernel with LTO: simply diff the LTO vs non-LTO outputs and it will be obvious you're wrong.
dapperdrake
SQLite3 may be a counter-example:
fuzztester
Related, and a good one, though old:
The book Linkers and Loaders by John Levine.
Last book in the list here:
https://www.johnlevine.com/books.phtml
I had read it some years ago, and found it quite interesting.
It's a standard one in the field.
He has also written some other popular computer books (see link above - pun not intended, but noticed).
ajb
2008: Gold, a new linker, intended to be faster than Gnu LD
2015(?): Lld a drop in replacement linker, at least 2x as fast as Gold
2021: mold, a new linker, several times faster than lld
2025: wild, a new linker...
wolfd
I’m not sure if you’re intending to leave a negative or positive remark, or just a brief history, but the fact that people are still managing to squeeze better performance into linkers is very encouraging to me.
ajb
Certainly no intention to be negative. Not having run the numbers, I don't know if the older ones got slower over time due to more features, or the new ones are squeezing out new performance gains. I guess it's also partly that the bigger codebases scaled up so much over this period, so that there are gains to be had that weren't interesting before.
wolfd
Good question, I always wonder the same thing. https://www.phoronix.com/news/Mold-Linker-2024-Performance seems to show that that the newer linkers still outperform their predecessors, even after maturing. But of course this doesn’t show the full picture.
cbmuser
Gold is slated for removal from binutils for version 2.44.0, so it's officially dead.
dundarious
For windows, there is also [The RAD Linker](https://github.com/EpicGamesExt/raddebugger?tab=readme-ov-fi...) though quite early days.
ComputerGuru
There’s been a lot of interest in faster linkers spurred by the adoption and popularity of rust.
Even modest statically linked rust binaries can take a couple of minutes in the link stage of compilation in release mode (using mold). It’s not a rust-specific issue but an amalgam of (usually) strictly static linking, advanced link-time optimizations enabled by llvm like LTO and bolt, and a general dissatisfaction with compile times in the rust community. Rust’s (clinically) strong relationship with(read: dependency on) LLVM makes it the most popular language where LLVM link-time magic has been most heavily universally adopted; you could face these issues with C++ but it wouldn’t be chalked up to the language rather than your toolchain.
I’ve been eyeing wild for some time as I’m excited by the promise of an optimizing incremental linker, but to be frank, see zero incentive to even fiddle with it until it can actually, you know, link incrementally.
pjmlp
C++ can be rather faster to compile than Rust, because some compilers do have incremental compilation, and incremental linking.
Additionally, the acceptance of binary libraries across the C and C++ ecosystem, means that more often than not, you only need to care about compiling you own application, and not the world, every time you clone a repo, or switch development branch.
sitkack
I solved this by using Wasm. Your outer application shell calls into Wasm business logic, only the inner logic needs to get recompiled, the outer app shell doesn't even need to restart.
SkiFire13
How is this different than dynamically linking the business logic library?
ComputerGuru
I don’t think I can use wasm with simd or syscalls, which is the bulk of my work.
sitkack
I haven't used SIMD in Rust (or Wasm). Syscalls can be passed into the Wasm env.
https://doc.rust-lang.org/core/arch/wasm32/index.html#simd
https://nickb.dev/blog/authoring-a-simd-enhanced-wasm-librar...
Could definitely be more effort than it is worth just to speed up compilation.
bjourne
What a coincidence. :) Just an hour ago I compared the performance of wild, mold, and (plain-old) ld on a C project I'm working on. 23 kloc and 172 files. Takes about 23.4 s of user time to compile with gcc+ld, 22.5 s with gcc+mold, and 21.8 s with gcc+wild. Which leads me to believe that link time shouldn't be that much of a problem for well-structured projects.
davidlattimore
It sounds like you're building from scratch. In that case, the majority of the time will be spent compiling code, not linking. The case for fast linkers is strongest when doing iterative development. i.e. when making small changes to your code then rebuilding and running the result. With a small change, there's generally very little work for the compiler to do, but linking is still done from scratch, so tends to dominate.
wolf550e
The linker time is important when building something like Chrome, not small projects.
searealist
Fast linkers are mostly useful in incremental compilation scenarios to cut down on the edit cycle.
ndesaulniers
How about ld.lld?
ndesaulniers
Can it link the Linux kernel yet? Was a useful milestone for LLD.
KerrAvon
I'm curious: what's the theory behind why this would be faster than mold in the non-incremental case? "Because Rust" is a fine explanation for a bunch of things, but doesn't explain expected performance benefits.
"Because there's low hanging concurrent fruit that Rust can help us get?" would be interesting but that's not explicitly stated or even implied.
shmerl
That looks promising. In Rust to begin with and with the goal of being fast and support incremental linking.
To use it with Rust, this can probbaly also work using gcc as linker driver.
In project's .cargo/config.toml:
[target.x86_64-unknown-linux-gnu]
rustflags = ["-C", "link-arg=-fuse-ld=wild"]
Side note, but why does Rust need to plug into gcc or clang for that? Some missing functionality?davidlattimore
Unfortunately gcc doesn't accept arbitrary linkers via the `-fuse-ld=` flag. The only linkers it accepts are bfd, gold lld and mold. It is possible to use gcc to invoke wild as the linker, but currently to do that, you need to create a directory containing the wild linker and rename the binary (or a symlink) to "ld", then pass `-B/path/to/directory/containing/wild` to gcc.
As for why Rust uses gcc or clang to invoke the linker rather than invoking the linker directly - it's because the C compiler knows what linker flags are needed on the current platform in order to link against libc and the C runtime. Things like `Scrt1.o`, `crti.o`, `crtbeginS.o`, `crtendS.o` and `crtn.o`.
shmerl
Ah, good to know, thanks!
May be it's worth filing a feature request for gcc to have parity with clang for arbitrary linkers?
null
Ever since mold relicensed from AGPL to MIT (as part of mold 2.0 release), the worldwide need for making another fast linker has been greatly reduced, so I wasn't expecting a project like this to appear. And definitely wasn't expecting it to already be 2x faster than mold in some cases. Will keep an eye on this project to see how it evolves, best of luck to the author.