Skip to content(if available)orjump to list(if available)

Link Time Optimizations: New Way to Do Compiler Optimizations

kazinator

[delayed]

WalterBright

Link time optimizations were done in the 1980s if I recall correctly.

I never tried to implement them, finding it easier and more effective for the compiler to simply compile all the source files at the same time.

The D compiler is designed to be able to build one object file per source file at a time, or one object file which combines all of the source files. Most people choose the one object file.

senkora

In C++, there is a trick to get this behavior called "unity builds", where you include all of your source files into a single file and then invoke the compiler on that file.

Of course, being C++, this subtly changes behavior and must be done carefully. I like this article that explains the ins and outs of using unity builds: https://austinmorlan.com/posts/unity_jumbo_build/

WalterBright

> this subtly changes behavior

The D module design ensures that module imports are independent of each other and are independent of the importer.

srean

I think MLton does it this way.

http://mlton.org/WholeProgramOptimization

Dynamically linked and dynamically loaded libraries are useful though (paid for with its problems of course)

tester756

Yea, generating many object files seems like weird thing. Maybe it was good thing decades ago, but now?

Because then you need to link them, thus you need some kind of linker.

Just generate one output file and skip the linker

yencabulator

Not maybe. Sufficient RAM for compilation was a serious issue back in the day.

WalterBright

I've considered many times doing just that.

tester756

And what was the result/conclusion of such considerations?

mcdeltat

Different .c/.cpp files being a barrier to optimisation always struck me as an oddly low bar for the 21st century. Yes I know the history of compilation units but these days that's not how we use the system. We don't split code into source files for memory reasons, we do it for organisation. On a small/medium codebase and a decent computer you could probably fit dozens of source files into memory to compile and optimise together. The memory constraint problem has largely disappeared.

So why do we still use the old way? LTO seems effectively like a hack to compensate for the fact that the compilation model doesn't fit our modern needs. Obviously this will never change in C/C++ due to momentum and backwards compatibility. But a man can dream.

Deukhoofd

What do you mean, new? LTO has been in GCC since 2011. It's old enough to have a social media account in most jurisdictions.

srean

Yes and if I remember correctly there used to be Linux distros that had all the distro binaries LTO'ed.

jeffbee

Pretty sure MSVC ".NET" was doing link-time whole-program optimization in 2001.

andyayers

HPUX compilers were doing this back in 1993.

abainbridge

Or academics in 1986: https://dl.acm.org/doi/abs/10.1145/13310.13338

The idea of optimizations running at different stages in the build, with different visibility of the whole program, was discussed in 1979, but the world was so different back then that the discussion seems foreign. https://dl.acm.org/doi/pdf/10.1145/872732.806974

jeffbee

Oh yeah, well ... actually I got nothin'. You win.

I will just throw in some nostalgia for how good that compiler was. My college roommate brought an HP pizza box that his dad secured from HP, and the way the C compiler quoted chapter and verse from ISO C in its error messages was impressive.

sakex

Maybe add the date to the title, because it's hardly new at this point

vsl

...or in 2020 (the year of the article).

phkahler

I tried LTO with Solvespace 4 years ago and got about 15 percent better performance:

https://github.com/solvespace/solvespace/issues/972

Build time was terrible taking a few minutes vs 30-40 seconds for a full build. Have they done anything to use multi-core for LTO? It only used one core for that.

Also tested OpenMP which was obviously a bigger win. More recently I ran the same test after upgrading from an AMD 2400G to a 5700G which has double the cores and about 1.5x the IPC. The result was a solid 3x improvement so we scale well with cores going from 4 to 8.

wahern

Both clang and GCC support multi-core LTO, as does Rust. However, you have to partition the code, so the more cores you use the less benefit to LTO. Rust partitions by crate by default, but it can increase parallelism by partitioning each crate. I think "fat LTO" is the term typically used for whole-program, or at least in the case of Rust, whole-crate LTO, whereas "thin LTO" is what you get when you LTO partitions and then link those together normally. For clang and GCC, you can either have them automatically partition the code for thin LTO , or do it explicitly via your Makefile rules[1].

[1] Interestingly, GCC actually invokes Make internally to implement thin LTO, which lets it play nice with GNU Make's job control and obey the -j switch.

Remnant44

Link time optimization is definitely not new, but it is incredibly powerful - I have personally had situations where the failure to be able to inline functions from a static library without lto cut performance in half.

It's easy to dismiss a basic article like this, but it's basically a discovery that every Junior engineer will make, and it's useful to talk about those too!

srean

The inline keyword should really have been intended for call sites rather than definitions.

Perhaps language designers thought that if a function needs to be inlined everywhere, it would lead to verbose code. In any case, it's a weak hint that compilers generally treat with much disdain.

lilyball

ffmpeg has a lot of assembly code in it, so it's a very odd choice of program to use for this kind of test as LTO is presumably not going to do anything to the assembly.

null

[deleted]

gitroom

[dead]