Skip to content(if available)orjump to list(if available)

Tilde, My LLVM Alternative

Tilde, My LLVM Alternative

40 comments

·January 21, 2025

ksec

>I'm calling it Tilde (or TB for tilde backend) and the reasons are pretty simple, i believe it's far too slow at compiling and far too big to be fixed from the inside. It's been 20 years and cruft has built up, time for a "redo".

That put a smile on my face because I remember that was how LLVM was born out of frustration with GCC.

I dont know how the modern GCC and LLVM compares, I remember LLVM was fast but resulting binary were not as optimised, once those optimisation added it became slower. While LLVM was a wake up call to modernise GCC and make it faster. In the end competition made both a lot better.

I believe some industry ( Gaming ) used to swear by VS Studio / MS Compiler / Intel Compiler or languages that depends / prefer the Borland ( What ever they are called now ) compiler. Its been very long since I last looked I am wondering if those two are still used or have we all merged mostly into LLVM / GCC?

pjmlp

It is pretty much Visual Studion on Windows and XBox, Nintendo and Sony have clang forks.

Embarcadero owns Borland, unfortunely stuff like C++ Builder doesn't seem to get much people outside big corps wanting to use it, which is a shame given its RAD capabilities and GUI design tooling for C++.

Also has a standard ABI between Delphi and C++ Builder, which allows to similar development workflows that .NET offered later with C#/VB alongside Managed C++ extensions (later replaced by C++/CLI).

willvarfar

Back 20 or more years ago I used to do a lot of rec math competition programming and found that the metrowerks c++ compiler made massively faster programs than gcc, vsstudio, intel and everything else I tried then.

This seemed to be simply down to variable alignment; the programs took more memory but ran much faster, particularly multi-core (which was still high end then).

And this was on x86 where metrowerks weren't really competing, and was probably accidental. But the programs it compiled were fast.

I'd be surprised if anyone even knew that metrowerks had a c++ compiler on x86 on windows. At the time metrowerks were on the tail end of their domination of mac compilers from before mac ran on x86.

dan_hawkins

I was using Metrowerks C++ compiler suite to develop code for Dragonball (68000) embedded system 22 years ago!

bbatha

Intel still has a decent use for compiling math heavy code for intel processors — so it gets a decent amount of use in HPC applications. It has some of the best vectorization passes but they only work with actual intel cpus. So it’s starting to get less traction as AMD takes the performance crown and as vectorized math moves to the gpu.

CUViper

Intel's compilers are now based on LLVM too.

muizelaar

I thought the sea-of-nodes choice was interesting.

V8 has been moving away from sea-of-nodes. Here's a video where Ben Titzer is talking about V8's reasons for moving away from sea-of-nodes: https://www.youtube.com/watch?v=Vu372dnk2Ak&t=184s. Yasser, the author of Tilde, is is also in the video.

melodyogonna

Chris Lattner seems to have also created an alternative for LLVM - https://mlir.llvm.org/

Because of how the architecture works, LLVM is one of the backends, but it doesn't have to be. Very interesting project, you could do a lot more IR processing before descending to LLVM (if you use that), that way you could give LLVM a lot less to do.

Chris has said LLVM is fast at what it is designed to do - lower IR to machine code. However, because of how convoluted it can get, and the difficulty involved in getting information from some language-specific MIR to LLVM, languages are forced to generate tons upon tons of IR so as to capture every possible detail. Then LLVM is asked to clean up and optimize this IR.

One thing to look out for is the problem of either losing language-specific information when moving from MIR to Low-level IR (be it Tilde or LLVM) or generating too much information, most of it useless.

_flux

I wonder if one solution would be have tighter integration between the layers, so the backend could ask for some IR to be generated? Basically starting from the program entrypoints. This way the frontend wouldn't need to generate all the possible code up-front.

Mind you, I've never written a compiler after that Uni course and touched LLVM IR a long time ago

mtlynch

I saw Yasser present this at Handmade Seattle in 2023.[0] He explained that when he started working on Tilde, he didn't have any special knowledge or interest in compilers. But he was reading discussions in the Handmade forums, and one of the most popular requests was for an alternative to LLVM, so he thought, "Sure, I'll do that."

[0] https://handmadecities.com/media/seattle-2023/tb/

fguerraz

Looking at the commit history inspires some real confidence!

https://github.com/RealNeGate/Cuik/commits/master/

pveierland

Eh, when you're hacking away as a solo developer on something big and new I don't think this matters at all. In my current project I did about 200x commits marked "wip" before having enough structure and stability to bother with proper commit messages. Whatever lets you be productive until more structure is helpful.

wild_pointer

chicken (+558998, -997)

jamil7

Cursed. I had a coworker once would commit diffs like that but always with the message "Cleanup". The git history was littered with "Cleanup" commits that actually hid all kinds of stuff in them. If you pulled them up on it (or anything else) they went into defensive meltdown mode, so everyone on the team just accepted it and moved on.

artemonster

went to write exactly that. Ambitions are great and I dont want to be dissuasive, but monumental tasks require monumental effort and monumental effort requires monumental care. That implies good discipline and certain "beauty" standards that also apply to commit messages. Bad sign :)

KolmogorovComp

Not really. In the initial phase of a project there is usually so much churn than enforcing proper commit messages is not worth it, until the dust settle down.

kccqzy

I am deeply suspicious of anyone who doesn't bother or who is unable to explain this churn. For the right kind of people, this is an excellent opportunity to reflect: why is there churn? Why did the dust not settle down? Why was the initial approach wrong and reworked into a new approach?

Rochus

Cool. The author has set himself a huge task if he wants to build something like LLVM. An alternative would be to participate in a project with similar goals that is already quite progressed, such as QBE or Eigen (https://github.com/EigenCompilerSuite/) ; both so far lack of optimizers. I consider Eigen very attractive because it supports much more targets and includes assemblers and linkers for all targets. I see the advantage in having a C implementation; Eigen is unfortunately developed in C++17, but I managed to backport the parts I'm using to a moderate C++11 subset (https://github.com/rochus-keller/Eigen). There are different front-ends available, two C compilers among them. And - as mentioned - an optimizer would be great.

EDIT: just found this podcast where the author gives more informations about the project goals and history (at least the beginning of the podcast is interesting): https://www.youtube.com/watch?v=f2khyLEc-Hw

wffurr

What’s unfortunate about C++17? It has some nice features that build on C++11’s safety and ergonomic improvements.

Rochus

You need a large, modern C++ compiler and standard library, which are not available for most older systems, and you're inviting an excess of dependencies because not all compilers support all parts of the newer C++ standards (in the same way), and require a lot more resources and newer versions of APIs and libraries, which further limits their usability on older systems. Furthermore, C89 and C++98 are much easier to bootstrap than a colossus like LLVM and Clang. The few "nice features" are perhaps enticing, but the costs they incur are disproportionate.

cfiggers

Tsoding explored this project on a recent stream: https://youtu.be/aKk_r9ZwXQw?si=dvZAZkOX3xd7yjTw

IshKebab

I dunno if "twice as fast as Clang" is very impressive. How fast is it compared to Clang 1.0?

Also starting a new project like this in C is an interesting choice.

orliesaurus

The maintainer said that LLVM has 10M lines of code making it too hard to improve, so he's building its own. That sounds weird to me: but good luck I guess?

coolThingsFirst

Is it just me or I find it difficult to believe that 19 year olds can implement the LLVM alternative?

muke101

If you're going to rewrite LLVM, you should avoid just trying to 'do it again but less bloated', because that'll end up where LLVM is now once you've added enough features and optimisation to be competitive.

Rewriting LLVM gives you the opportunity to rethink some of its main problems. Of those I think two big ones include Tablegen and peephole optimisations.

The backend code for LLVM is awful, and tablegen only partially addresses the problem. Most LLVM code for defining instruction opcodes amounts to multiple huge switch statements that stuff every opcode into them, its disgusting. This code is begging for a more elegant solution, I think a functional approach would solve a lot of the problems.

The peephole optimisation in the InstCombime pass is a huge collection of handwritten rules that's been accumulated over time. You probably don't want to try and redo this yourself but it will also be a big barrier to achieving competitive optimisation. You could try and solve the problem by using a superoprimisation approach from the beginning. Look into the Souper paper which automatically generates peepholes for LLVM: (https://github.com/google/souper, https://arxiv.org/pdf/1711.04422.pdf).

Lastly as I hate C++ I have to throw in an obligatory suggestion to rewrite using Rust :p

jcranmer

> The backend code for LLVM is awful, and tablegen only partially addresses the problem. Most LLVM code for defining instruction opcodes amounts to multiple huge switch statements that stuff every opcode into them, its disgusting. This code is begging for a more elegant solution, I think a functional approach would solve a lot of the problems.

So one of the main problems you run into is that your elegant solution only works about 60-80% of the time. The rest of the time, you end up falling back onto near-unmaintainable, horribly inelegant kludges that end up having to exist because gee, real architectures are full of inelegant kludges in the first place.

Recently, I've been working on a decompiler, and I started out with going for a nice, elegant solution that tries as hard as possible to avoid the nasty pile of switch statements. And this is easy mode--I'm not supporting any ugly ISA extensions, I'm only targeting ancient, simple hardware! And still I ran into the limitations of the elegant solution, and had to introduce ugly kludges to make it work.

The saving grace is that I plan to rip out all of this manual work with a fully automatically-generated solution. Except that's only feasible in a decompiler, since the design of that solution starts by completely ignoring compatibility with assembly (ISAs turn out to be simpler if you think of them as "what do these bytes do" rather than "what does this instruction do")... and I'm worried that it's going to end up with inelegant kludges because the problem space more or less mandates it.

> You could try and solve the problem by using a superoprimisation approach from the beginning. Look into the Souper paper which automatically generates peepholes for LLVM:

One of the problems that Souper ran into is that LLVM IR is too abstract for superoptimization to be viable. Rather than the promise of an automatic peephole optimizer, it's instead morphed more into "here's some suggestions for possible peepholes". You need a really accurate cost model for superoptimization to work well, and since LLVM IR gets shoved through instruction selection and instruction scheduling, the link between LLVM instructions and actual instructions is just too tenuous to build the kind of cost model a superoptimizer needs (even if LLVM does have a very good cost model for the actual machine instructions!).

s3graham

This looks pretty cool. I've been looking at all the "small" backends recently. It's so much nicer to work with one of them than trying to wrangle LLVM.

QBE, MIR, & IR (php's) are all worth a look too.

Personally I've settled on IR for now because it seemed to match my needs the most closely. It's actively developed, has aarch64 in addition to x64 (looks like TB has just started that?), does x64 Windows ABI, and seems to generate decent code quickly.