Tilde, My LLVM Alternative

164 comments

·January 21, 2025

ksec

>I'm calling it Tilde (or TB for tilde backend) and the reasons are pretty simple, i believe it's far too slow at compiling and far too big to be fixed from the inside. It's been 20 years and cruft has built up, time for a "redo".

That put a smile on my face because I remember that was how LLVM was born out of frustration with GCC.

I dont know how the modern GCC and LLVM compares, I remember LLVM was fast but resulting binary were not as optimised, once those optimisation added it became slower. While LLVM was a wake up call to modernise GCC and make it faster. In the end competition made both a lot better.

I believe some industry ( Gaming ) used to swear by VS Studio / MS Compiler / Intel Compiler or languages that depends / prefer the Borland ( What ever they are called now ) compiler. Its been very long since I last looked I am wondering if those two are still used or have we all merged mostly into LLVM / GCC?

pjmlp

It is pretty much Visual Studion on Windows and XBox, Nintendo and Sony have clang forks.

Embarcadero owns Borland, unfortunely stuff like C++ Builder doesn't seem to get much people outside big corps wanting to use it, which is a shame given its RAD capabilities and GUI design tooling for C++.

Also has a standard ABI between Delphi and C++ Builder, which allows to similar development workflows that .NET offered later with C#/VB alongside Managed C++ extensions (later replaced by C++/CLI).

null

[deleted]

s1gsegv

Borland as of a few years ago also ships a clang fork for C++ Builder, interestingly enough. It unfortunately does not solve all of the problems you encounter using C++ Builder in the modern age.

I’ve personally watched the enshittification of too many proprietary tools to ever build something I care about on top of one today, especially something which becomes so fundamental to the design of your application like a GUI toolkit.

I know it sounds crazy, like you’d never bother forking your GUI framework anyway even when it’s open source. But I worked on an application built in C++ Builder, at a company with enough engineering talent and willpower that we would’ve forked the compiler internally to fix its problems and limitations had we been granted access to the source. Instead, I got to watch the product get held back by it year after year, hoping that Borland would care.

pjmlp

Yes, they do, but are still not done adding all the necessary C++ Builder sugar, that makes their tooling great, although almost there as of last year's release.

willvarfar

Back 20 or more years ago I used to do a lot of rec math competition programming and found that the metrowerks c++ compiler made massively faster programs than gcc, vsstudio, intel and everything else I tried then.

This seemed to be simply down to variable alignment; the programs took more memory but ran much faster, particularly multi-core (which was still high end then).

And this was on x86 where metrowerks weren't really competing, and was probably accidental. But the programs it compiled were fast.

I'd be surprised if anyone even knew that metrowerks had a c++ compiler on x86 on windows. At the time metrowerks were on the tail end of their domination of mac compilers from before mac ran on x86.

bruce343434

Memory access patterns are everything. Memory delay is almost always the bottleneck anyway. I have the feeling that more and more this is becoming common knowledge, and techniques like "struct of arrays" are becoming more wide spread and talked about.

willvarfar

Is this perhaps why Build Engine used arrays, rather than arrays of struct? Organising things columnwise rather than rowwise, like an OLAP engine? https://fabiensanglard.net/duke3d/code_legacy.php

dan_hawkins

I was using Metrowerks C++ compiler suite to develop code for Dragonball (68000) embedded system 22 years ago!

bbatha

Intel still has a decent use for compiling math heavy code for intel processors — so it gets a decent amount of use in HPC applications. It has some of the best vectorization passes but they only work with actual intel cpus. So it’s starting to get less traction as AMD takes the performance crown and as vectorized math moves to the gpu.

CUViper

Intel's compilers are now based on LLVM too.

rurban

For large C projects, tinycc is invaluable for extremely fast, non-optimized builds. Like 10x faster than gcc or clang builds. In my case I don't wait 10m, when tcc is done with it in 3s

whimsicalism

msvc is very much in third but it is one of the three that i think of when i think of c++ compilers

melodyogonna

Chris Lattner seems to have also created an alternative for LLVM - https://mlir.llvm.org/

Because of how the architecture works, LLVM is one of the backends, but it doesn't have to be. Very interesting project, you could do a lot more IR processing before descending to LLVM (if you use that), that way you could give LLVM a lot less to do.

Chris has said LLVM is fast at what it is designed to do - lower IR to machine code. However, because of how convoluted it can get, and the difficulty involved in getting information from some language-specific MIR to LLVM, languages are forced to generate tons upon tons of IR so as to capture every possible detail. Then LLVM is asked to clean up and optimize this IR.

One thing to look out for is the problem of either losing language-specific information when moving from MIR to Low-level IR (be it Tilde or LLVM) or generating too much information, most of it useless.

zellyn

I wonder if this question can attract any MLIR people to answer my question:

From Chris Lattner's descriptions of LLVM vs MLIR in various podcasts, it seems like LLVM is often used as a backend for MLIR, but only because so much work has been put into optimizing in LLVM. It also seems like MLIR is strictly a superset of LLVM in terms of capabilities.

Here's my question: It seems inevitable that people will eventually port all LLVM's smarts directly into MLIR, and remove the need to shift between the two. Is that right?

marssaxman

They solve different problems. MLIR is not a backend, but a toolkit for defining intermediate representations ("dialects") and the passes which optimize them or lower from one to another. You can use MLIR to organize everything your compiler does between its front-end and its back-end; MLIR doesn't care where the IR comes from or what you ultimately do with it.

MLIR does include a couple dozen built-in dialects for common tasks - there's an "scf" dialect which defines loops and conditionals, for example, and a "func" dialect with function call and return ops - and it so happens that one of these built-in dialects offers a 1:1 representation of the operations and types found in LLVM IR.

If you choose to structure your compiler so that all of your other dialects are ultimately lowered into this MLIR-LLVM dialect, you can then pass your MLIR-LLVM output through a converter function to get actual LLVM-IR, which you can then provide to LLVM in exchange for machine code; but that is the extent of the interaction between the two projects.

jcranmer

MLIR is less a specific IR and more a generic framework for expressing your own custom IR (which is itself composed of many pluggable IRs)--except these IRs are renamed "dialects." One of the MLIR dialects is the LLVM dialect, which can be lowered to LLVM IR.

In the future, it's plausible that dialects for hardware ISAs would be added to MLIR, and thus it would be plausible to completely bypass the LLVM IR layer for optimizations. But the final codegen layer of LLVM IR is not a simple thing (I mean, LLVM itself has three different versions of it), and the fact that GlobalISel hasn't taken over SelectionDAG after even a decade of effort should be a sign of how difficult it is to actually replicate that layer in another system to the point of replacing existing work.

CalChris

Earlier compilers were a pipeline of specialized IRs. I used to think that MLIR was an acknowledgment that this specialization was necessary. Ok, it is necessary. But MLIR's real contribution is, as you say, a generic framework for expressing your own custom IR.

superlopuh

I know of a few projects looking in that direction, each optimising for different things, and none getting near the capability of LLVM, which is going to take some time. I spoke with some of the core MLIR developers about this, and they're generally open to the notion, but it's going to take a lot of volunteer effort to get there, and it's not clear who the sherpa will be, especially given the major sponsors of the LLVM project aren't in a particular hurry. If you're interested in this feel free to look our paper up in a week or two, we've had a bit of trouble uploading it to arxiv but should be ready soon.

https://2025.cgo.org/details/cgo-2025-papers/39/A-Multi-Leve...

Here's a quick pres from the last dev meeting on how this can be leveraged to compile NNs to a RISC-V-based accelerator core: https://www.youtube.com/watch?v=RSTjn_wA16A&t=1s

fooker

All the important bits of MLIR are closed source and there’s no indication that’ll change anytime soon.

The big players have their own frontend, dialects, and mostly use LLVM backends. There’s very little common usable infrastructure that is upstreamed. Some of the upstreamed bits are missing large pieces.

ozinenko

I'd be interested to learn about such closed-source important bits and invite them to MLIR workshop / open developer meeting. Having worked on the project essentially since its inception, I am quite positive that the bits the original MLIR team considered important are completely open source.

Certainly, there are closed-source downstream dialects, that was one of the actual design goals of the project, but they are rarely as useful as one might think. I'd expect every big company with a hardware to have an ISA/intrinsic-level dialect, at least as a prototype, that they won't open source for the same reason they won't open source the ISA.

What I find sad is the lack is that end-to-end flows from, e.g., PyTorch to binaries are usually living outside of the LLVM project, and often in each company's downstream. There is some slow motion to fix that.

marssaxman

What important bits are those? I can't imagine what you have in mind here; my current job and my previous job have both revolved around MLIR-based compilers, and it has never seemed to me that there is anything missing. I wonder if you might be expecting MLIR to do a job it's not really meant for.

almostgotcaught

> There’s very little common usable infrastructure that is upstreamed

Hmm I wonder what all that stuff is then that's under mlir/lib?

Like what are you even talking about? First of all there are literally three upstream frontends (flang, ClangIR, and torch-mlir). Most people use PyTorch as a frontend (some people use Jax or Triton). Secondly, downstream users having their own dialects... is basically the whole point of MLIR. Core dialects like linalg, tensor, memref, arith are absolutely generically useful. In addition many (not all) MLIR-based production quality compilers are fully open source (IREE, Triton) even if wholly developed at a for-profit.

ozinenko

MLIR maintainer here, or however close one can be given that we don't have a clear ownership structure. This has been discussed repeatedly in the community, and it is likely that many things will get eventually ported/reimplemented, but there is no strong push towards that. Lower level parts of the stack, such as register allocation / machine IR / instruction selection are where LLVM has seen a lot of investment are unlikely to move soon. At least not in a generic way.

There was a keynote at the LLVM developer meeting a couple of years ago presenting the differences and the likely evolution from somebody not involved in MLIR that does the lay of the land.

mshockwave

> Here's my question: It seems inevitable that people will eventually port all LLVM's smarts directly into MLIR, and remove the need to shift between the two. Is that right?

Theoretically, yes -- taking years if not decades for sure. And set aside (middle-end) optimizations, I think people often forgot another big part of LLVM that is (much) more difficult to port: code generation. Again, it's not impossible to port the entire codegen pipeline, it just takes lots of time and you need to try really hard to justify the advantage of moving over to MLIR, which at least needs to show that codegen with MLIR brings X% of performance improvement.

_flux

I wonder if one solution would be have tighter integration between the layers, so the backend could ask for some IR to be generated? Basically starting from the program entrypoints. This way the frontend wouldn't need to generate all the possible code up-front.

Mind you, I've never written a compiler after that Uni course and touched LLVM IR a long time ago

melodyogonna

That is how MLIR works. Basically you have multiple levels of IR, you can optimize each level until you get to the last level.

It also has the advantage of being able to parallelize passes

mtlynch

I saw Yasser present this at Handmade Seattle in 2023.[0] He explained that when he started working on Tilde, he didn't have any special knowledge or interest in compilers. But he was reading discussions in the Handmade forums, and one of the most popular requests was for an alternative to LLVM, so he thought, "Sure, I'll do that."

[0] https://handmadecities.com/media/seattle-2023/tb/

Rochus

Cool. The author has set himself a huge task if he wants to build something like LLVM. An alternative would be to participate in a project with similar goals that is already quite progressed, such as QBE or Eigen (https://github.com/EigenCompilerSuite/) ; both so far lack of optimizers. I consider Eigen very attractive because it supports much more targets and includes assemblers and linkers for all targets. I see the advantage in having a C implementation; Eigen is unfortunately developed in C++17, but I managed to backport the parts I'm using to a moderate C++11 subset (https://github.com/rochus-keller/Eigen). There are different front-ends available, two C compilers among them. And - as mentioned - an optimizer would be great.

EDIT: just found this podcast where the author gives more informations about the project goals and history (at least the beginning of the podcast is interesting): https://www.youtube.com/watch?v=f2khyLEc-Hw

wffurr

What’s unfortunate about C++17? It has some nice features that build on C++11’s safety and ergonomic improvements.

Rochus

You need a large, modern C++ compiler and standard library, which are not available for most older systems, and you're inviting an excess of dependencies because not all compilers support all parts of the newer C++ standards (in the same way), and require a lot more resources and newer versions of APIs and libraries, which further limits their usability on older systems. Furthermore, C89 and C++98 are much easier to bootstrap than a colossus like LLVM and Clang. The few "nice features" are perhaps enticing, but the costs they incur are disproportionate.

o11c

For reference, GCC 4.7 (released March 2012) was the last build of GCC that is written in C, and supports almost all of the C++11 language (4.8.1, written in C++, finished the last bits) and a fair amount of the library.

If you have to work on a system that hasn't been updated since 2014 or so (since it's fair enough to avoid the .0 releases), getting support for later C++ standards is significantly more complicated.

lerno

The QBE author has said that good compilation speed was not a design goal. It also outputs asm which then has to be run through GCC or Clang, which nullifies any benefit of being a standalone backend.

Rochus

> that good compilation speed was not a design goal

Not sure how this relates to my statement. I was talking about an optimizer, not about compilation speed. I'm neither using QBE, but Eigen, for good reasons.

fguerraz

Looking at the commit history inspires some real confidence!

https://github.com/RealNeGate/Cuik/commits/master/

wild_pointer

chicken (+558998, -997)

jamil7

Cursed. I had a coworker once would commit diffs like that but always with the message "Cleanup". The git history was littered with "Cleanup" commits that actually hid all kinds of stuff in them. If you pulled them up on it (or anything else) they went into defensive meltdown mode, so everyone on the team just accepted it and moved on.

tasty_freeze

Back in 1990 or so I worked at a networking company (Vitalink) that was using whatever source control was popular back then. I forget which one, but the important thing was that rather than allowing multiple check outs followed by resolve, that system would lock a file when it was opened for edit and nobody else could make edits until the file was checked in.

One young developer checked out a couple files to "clean them up" with some refactoring. But because he changed some function interfaces, he needed to check out the files which called those functions. And since he was editing those files he decided to refactor them too. Pretty quickly he had more than half the files locked and everyone was beating on him for doing that. But because he had so many partial edits in progress and things weren't yet compiling and running, he prevented a dozen other people from doing much work for a few days.

pveierland

Eh, when you're hacking away as a solo developer on something big and new I don't think this matters at all. In my current project I did about 200x commits marked "wip" before having enough structure and stability to bother with proper commit messages. Whatever lets you be productive until more structure is helpful.

pkal

Perhaps, but I still think it is lazy. A very nice counter example of someone with high commit standards can be seen in this repository: https://github.com/rmyorston/pdpmake/commits/master/

jandrewrogers

The code base may go through several almost total rewrites before it stabilizes, especially for non-trivial systems that are performance sensitive. Changes to the code may be intrinsically non-modular depending on the type of software. This prior history can be enormous yet have no value, essentially pure noise.

The efficient alternative, which I’ve seen used a few times in these cases, is to retcon a high-quality fake history into the source tree after the design has stabilized. This has proven to be far more useful to other engineers than the true history in cases like this.

Incremental commits are nice but not all types of software development lends itself to that, especially early in the development process. I’ve seen multiple cases where trying to force tidy incremental commit histories early in the process produced significantly worse outcomes than they needed to be.

Aachen

That doesn't have a commit history going back to what the parent said about the first 200 commits though. It starts off with basically 3 commits all called some variant of "initial public release", after which good commit messages start, so that probably skipped many intermediate WIP states

I agree that one can do good commit messages also early on though. Initial commit can be "project skeleton", then "argument parsing and start of docs", then maybe "implemented basic functionality: it now lists files when run", next "implemented -S flag to sort by size", etc. It's not as specific as "Forbid -H option in POSIX mode" and the commits are going to often be large and touch different components, but I'd say that's expected for young projects with few (concurrent) contributors

null

[deleted]

jasonjmcghee

Another example is ghostty

artemonster

went to write exactly that. Ambitions are great and I dont want to be dissuasive, but monumental tasks require monumental effort and monumental effort requires monumental care. That implies good discipline and certain "beauty" standards that also apply to commit messages. Bad sign :)

KolmogorovComp

Not really. In the initial phase of a project there is usually so much churn than enforcing proper commit messages is not worth it, until the dust settle down.

apocalypses

I massively disagree. It would have taken the author approximately 1 minute to write the following high quality hack-n-slash commit message:

``` Big rewrites

* Rewrote X

* Deleted Y

* Refactored Z ```

Done

kccqzy

I am deeply suspicious of anyone who doesn't bother or who is unable to explain this churn. For the right kind of people, this is an excellent opportunity to reflect: why is there churn? Why did the dust not settle down? Why was the initial approach wrong and reworked into a new approach?

I can understand this if you are coding for a corporate. But if it's your own project, you should care about it enough to write good commit messages.

muizelaar

I thought the sea-of-nodes choice was interesting.

V8 has been moving away from sea-of-nodes. Here's a video where Ben Titzer is talking about V8's reasons for moving away from sea-of-nodes: https://www.youtube.com/watch?v=Vu372dnk2Ak&t=184s. Yasser, the author of Tilde, is is also in the video.

o11c

TL;DW version: sea of nodes requires a scheduling pass, which was taking 20% of their compilation time. But it sounds like there's a lot of legacy baggage, so ...

negate32

My GCM doesn't take 20% of my compile times last I checked but even so, V8 is gonna be in a special camp because they're comparing against a compiler that doesn't do much code motion. LLVM does a lot of code motion so that "20%" is still being paid by LLVM during their hoisting and local scheduling passes.

gergo_barany

> a decent linear scan allocator which will eventually be replaced with graph coloring for optimized builds.

Before setting out to implement 1980s-style graph coloring, I would suggest considering SSA-based register allocation instead: https://compilers.cs.uni-saarland.de/projects/ssara/ , I find the slides at https://compilers.cs.uni-saarland.de/projects/ssara/hack_ssa... especially useful.

Graph coloring is a nice model for the register assignment problem. But that's a relatively easy part of overall register allocation. If your coloring fails, you need to decide what to spill and how. Graph coloring does not help you with this, you will end up having to iterate coloring and spilling until convergence, and you may spill too much as a result.

But if your program is in SSA, the special properties of SSA can be used to properly separate these subphases, do a single spilling pass first (still not easy!) and then do a coloring that is guaranteed to succeed.

I haven't looked at LLVM in a while, but 10-15 years ago it used to transform out of SSA form just before register allocation. If I had to guess, I would guess it still does so. Not destroying SSA too early could actually be a significant differentiator to LLVM's "cruft".

gergo_barany

Also, for a different notion of "cruft", informally it seems to me like new SSA-based compilers tend to choose an SSA representation with basic block arguments instead of the traditional phi instructions. There are probably reasons for this! I'm not aware of a Sea of Nodes compiler with (some notion corresponding to) block arguments, but it might be fun to explore this when designing a new compiler from the ground up. Might be too late for TB, though.

mungaihaha

> I believe it's (LLVM) far too slow at compiling and far too big to be fixed from the inside

What are you doing to make sure Tilde does not end up like this?

negate32

One of the big things which makes LLVM very slow is the abundance of passes, I believe I last counted 75 for an unoptimized function? My solution for this is writing more combined passes, due to the SoN design I'm combining a lot of things which are traditionally separate passes. For instance, my equivalent to "SimplifyCFG" "GVNPass", "InstCombine", "EarlyCSEPass" and "JumpThreadingPass" is one combined peephole solver which runs faster than all of these passes separately. This is for two main reasons:

* Less cache churn, I'm doing more work per cacheline loaded in (rather than rescanning the same function over and over again).

* Combining mutually beneficial optimizations can lead to less phase ordering problems and a better solve (this is why SCCP is better than DCE and constant prop separately).

In a few years when TB is mature, I'd wager I'll have maybe 10-20 real passes for the "-O2 competitive" optimizer pipeline because in practice there's no need to have so many passes.

gergo_barany

If this is one of the main things you want to demonstrate, wouldn't it be better to focus on this one goal first, instead of the whole pipeline from a C preprocessor to directly linked executables?

Essentially, if you say that LLVM's mid-end in particular is slow, I would expect you to present a drop-in replacement for LLVM's mid-end opt tool. You could leave C-to-LLVM-bitcode to Clang. You could leave LLVM-bitcode-to-machine-code to llc. Just like opt, take unoptimized LLVM bitcode as input and produce optimized LLVM bitcode as output. You would get a much fairer apples to apples comparison of both code quality and mid-end compiler speed (your website already mentions that you aren't measuring apples-to-apples times), and you would duplicate much less work.

Alternatively, look into existing Sea of Nodes compilers and see if you can build your demonstrator into them. LibFIRM is such a C compiler: https://libfirm.github.io/ There may be others.

It just seems like you are mixing two things: On the one hand, you are making some very concrete technical statements that integrated optimizations are good and the Sea of Nodes is a great way to get there. A credible demonstrator for this would be very welcome and of great interest to the wider compiler community. On the other hand, you are doing a rite-of-passage project of writing a self-hosting C compiler. I don't mean this unkindly, but that part is less interesting for anyone besides yourself.

EDIT: I also wanted to mention that the approach I suggest is exactly how LLVM became well-known and popular. It was not because of Clang; Clang did not even exist for the first eight years or so of LLVM's existence. Instead, LLVM focused on what it wanted to demonstrate: a different approach to mid-end optimizations compared to the state of the art at the time. Parsing C code was not part of that, so LLVM left that to an external component (which happened to be GCC).

DannyBee

I'm very confused what magic you believe will achieve what has not so far been achieved.

I'm also confused why you believe LLVM didn't start out the exact same way?

I say this as one of the main people responsible for working on combined pass replacements in both GCC and LLVM for things that were reasonable to be combined.

I actually love destroying lots of passes in favor of better combined ones. In that sense, i'm the biggest fan in the world of these kinds of efforts.

But the reason these passes exist is not cruft, or 20 years of laziness, - it's because it's very hard to replace them with combined algorithms that are both faster, and achieve the same results.

What exactly do you plan on replacing GVN + Simplify CFG + Jump Threading + correlated value prop with?

It took years of cherrypicking research and significant algorithm development to develop algorithms for this that had reasonable timebounds, were faster, and could do better than all of them combined. The algorithm is quite complex, and it's hard to prove it terminates in all cases, actually. The number of people who understand it is pretty small because of the complexity. That's before you get to applied engineering of putting it in a production compiler.

These days, as the person originally responsible for it, i'd say it's not better enough for the complexity, even though it is definitely faster and more complete and would let you replace these passes.

Meanwhile, you seem to think you will mature everything and get there in a few years.

I could believe you will achieve some percent of GCC or LLVM's performance, but that's not the reason these passes exist. They exist because that is what it reasonably takes to achieve LLVM (and GCC's) performance across a wide variety of code, for some acceptable level of algorithm complexity and maintainability.

So if you told me you were only shooting for 80% across some particular subset code, i could believe 10-20 passes. If you told me you were going to build a different product that targets a different audience, or in a different way, i could maybe believe it.

But for what you say here, I think you vastly underestimate the difficult and vastly underappreciate the effort that goes into these things. This is hundreds of very smart people working on things for decades. It's one thing to have a healthy disrespect for the impossible. It's another to think you will, in a few years, outdo hundreds of smart, capable engineers on pure technical achievement.

That strikes me as somewhere between hubris and insanity.

People also pretty much stopped building and publishing general purpose compiler optimization algorithms a decade ago, moving towards much more specialized algorithms and ML focused things and whatnot.

This is because in large part, there isn't a lot left worth doing.

So unless you've got magic bullets nobody else has, either you won't achieve the same performance level, or it will be slow, or it will take you a significant amount of algorithm development and engineering well beyond "a few years".

I say this not to dissuade you, but to temper your apparent expectations and view.

Honestly - I wish you the best of luck, and hope you succeed at it.

rendaw

Critical responses from people in the industry are what I come here to read. I have no doubt of your credentials and I'm not at all qualified to judge the technical details here, but your reply comes off as an emotional kneejerk.

I read it a few times and as best I can get this is what you're saying:

- You came up with a similar combined replacement pass for LLVM based on years of personal and external research.

- It's faster and has more functionality.

- It's very complex and you're not comfortable that it's possible to achieve various reliability guarantees, so it is unreleased

- Therefore (?) you think the Tilde author also couldn't possibly succeed

AFAICT you also believe that the Tilde author hasn't completed their replacement pass. From their post my take was that it was already done. The part that will mature is additional passes, or maybe optimizations/bugfixes, but not the MVP development.

Your main arguments seem to be probability and appeal to authority (external research, assigned responsibility, industry association). Pretty much all projects and startups fail, but it's because people attempt them that some succeed.

Is the author betting their career on this? Why do their expectations need to be tempered?

I'd be interested in hearing concrete criticisms of their algorithms (have you looked at the code?) or oversights in the design. Maybe the author would too! If you let the author know, maybe you could think of a solution to reduce the complexity or improve the guarantees together.

snowfarthing

I just had a random thought: perhaps it would be a good idea to have a project that doesn't do optimizations, but just focuses on fast compiling.

Then again, I now can't help but wonder if LLVM (or even GCC) would be fast, if you just turned off all the optimizations ...

(Of course, at this point, I can't help but think "you don't need to worry about the speed of compilation" in things like Common Lisp or Smalltalk, because everything is compiled incrementally and immediately, so you don't have to wait for the entire project to compile before you could test something ...)

ajross

> Then again, I now can't help but wonder if LLVM (or even GCC) would be fast, if you just turned off all the optimizations ...

It's not the optimizations really, it's the language front ends. Rust and C++ are extremely analysis-heavy. Try generating a comparable binary in C (or a kernel build, which is likely to be much larger!) and see how fast these compilers can be.

christophilus

Go’s internal compiler / linker is kind of like that. So is qbe[0] iirc.

https://c9x.me/compile/

fermigier

"You either die a hero or live long enough to see yourself become the villain".

Ygg2

You either reinvent the wheel but square or live long enough to make it a circle.

laweijfmvo

  > It's been 20 years and cruft has built up, time for a "redo".

Ah.. is this one of those "I rewrote it and it's better" things, but when people inevitably discover issues that "cruft" was handling the author will blame the user?

rc_mob

What a strangely pessimistic and negative comment.

snowfarthing

I think this is more a problem with the nature of technology in general.

If we want simple and fast, we can do that, but sometimes it doesn't cover the corner cases that the slow and complicated stuff does -- and as you fix those things, the "simple and fast" becomes "complicated and slow".

But, as others have observed about GCC vs LLVM (with LLVM having had a similar life cycle), the added competition forced GCC to step up their game, and both projects have benefited from that competition -- even if, as time goes on, they get more and more similar to what each can do.

I think all our efforts suffer from the effects of the Second Law of Thermodynamics: "You can't win. You can't break even. And it's the only game in town."

cfiggers

Tsoding explored this project on a recent stream: https://youtu.be/aKk_r9ZwXQw?si=dvZAZkOX3xd7yjTw

nurettin

I got tsoding fatigue after youtube started suggesting him on an hourly basis. He's on ignore.

muke101

If you're going to rewrite LLVM, you should avoid just trying to 'do it again but less bloated', because that'll end up where LLVM is now once you've added enough features and optimisation to be competitive.

Rewriting LLVM gives you the opportunity to rethink some of its main problems. Of those I think two big ones include Tablegen and peephole optimisations.

The backend code for LLVM is awful, and tablegen only partially addresses the problem. Most LLVM code for defining instruction opcodes amounts to multiple huge switch statements that stuff every opcode into them, its disgusting. This code is begging for a more elegant solution, I think a functional approach would solve a lot of the problems.

The peephole optimisation in the InstCombime pass is a huge collection of handwritten rules that's been accumulated over time. You probably don't want to try and redo this yourself but it will also be a big barrier to achieving competitive optimisation. You could try and solve the problem by using a superoprimisation approach from the beginning. Look into the Souper paper which automatically generates peepholes for LLVM: (https://github.com/google/souper, https://arxiv.org/pdf/1711.04422.pdf).

Lastly as I hate C++ I have to throw in an obligatory suggestion to rewrite using Rust :p

jcranmer

> The backend code for LLVM is awful, and tablegen only partially addresses the problem. Most LLVM code for defining instruction opcodes amounts to multiple huge switch statements that stuff every opcode into them, its disgusting. This code is begging for a more elegant solution, I think a functional approach would solve a lot of the problems.

So one of the main problems you run into is that your elegant solution only works about 60-80% of the time. The rest of the time, you end up falling back onto near-unmaintainable, horribly inelegant kludges that end up having to exist because gee, real architectures are full of inelegant kludges in the first place.

Recently, I've been working on a decompiler, and I started out with going for a nice, elegant solution that tries as hard as possible to avoid the nasty pile of switch statements. And this is easy mode--I'm not supporting any ugly ISA extensions, I'm only targeting ancient, simple hardware! And still I ran into the limitations of the elegant solution, and had to introduce ugly kludges to make it work.

The saving grace is that I plan to rip out all of this manual work with a fully automatically-generated solution. Except that's only feasible in a decompiler, since the design of that solution starts by completely ignoring compatibility with assembly (ISAs turn out to be simpler if you think of them as "what do these bytes do" rather than "what does this instruction do")... and I'm worried that it's going to end up with inelegant kludges because the problem space more or less mandates it.

> You could try and solve the problem by using a superoprimisation approach from the beginning. Look into the Souper paper which automatically generates peepholes for LLVM:

One of the problems that Souper ran into is that LLVM IR is too abstract for superoptimization to be viable. Rather than the promise of an automatic peephole optimizer, it's instead morphed more into "here's some suggestions for possible peepholes". You need a really accurate cost model for superoptimization to work well, and since LLVM IR gets shoved through instruction selection and instruction scheduling, the link between LLVM instructions and actual instructions is just too tenuous to build the kind of cost model a superoptimizer needs (even if LLVM does have a very good cost model for the actual machine instructions!).

fuhsnn

>So one of the main problems you run into is that your elegant solution only works about 60-80% of the time. The rest of the time, you end up falling back onto near-unmaintainable, horribly inelegant kludges that end up having to exist

This is generally true, though for small compiler backends they have the luxury to straight up refuse to support such use cases. Take QBE and Cranelift for example, the former lacks x87 support [1], the latter doesn't support varargs[2]; which means either of them support the full x86-64 ABI for C99.

[1]https://github.com/michaelforney/cproc?tab=readme-ov-file#wh...

[2]https://github.com/bytecodealliance/wasmtime/issues/1030

muth02446

I think you are generally correct but the two examples you gave "triggered" me ;-)

What damaged would there be if gcc or LLVM did decide to not support x87 anymore. It is not much different from dropping an ISA like IA64. You can still use the older compilers if you need to.

Similarly, what is varargs used for? Pretty much only for C and its unfortunate printf, scanf stdlib calls. If a backend decides not support C, all this headache goes away. The problem is, of course, that the first thing every new backend designer does is to write a C frontend.

o11c

"Everything should be as simple as it can be but not simpler!" —Roger Sessions, loosely after Albert Einstein

einpoklum

I'm not familiar with a lot of the acronyms and catch-phrases already in the first part of the article... let me try to make a bit of sense of this:

  IR = Intermediate Representation
  SSA = Single Static Assignment
  CFG = Control-Flow Graph (not Context-Free Grammar)

And "sea of nodes" is this: https://en.wikipedia.org/wiki/Sea_of_nodes ... IIANM, that means that instead of assuming a global sequence of all program (SSA) instructions, which respects the dependecies - you only have a graph with the partial order defined by the dependencies, i.e. individual instructions are nodes that "float" in the sea.

e4m2

https://github.com/RealNeGate/Cuik/blob/5c6f6ef9bfa983eb358a...

HN

Tilde, My LLVM Alternative

Tilde, My LLVM Alternative