A year of Rust in ClickHouse

101 comments

·April 5, 2025

pjmlp

I love that it goes both ways, about plus and minus of both languages, including rewriting back into C++ when it made sense, and the side joke about rewriting existing CLI tools in Rust.

While C++ isn't perfect, has the warts of a 50 year's old language, and probably will never match Rust's safety, we would already be in a much better place if at least everyone used the tools at their disposal from the last 30 years.

While I would advise to use Rust for some security critical scenarios, there are many others where it is still getting there, and there are other requirements to take into account other than affine types.

simonask

Coming from C++, my favorite take on Rust is that it is fundamentally about productivity.

Avoiding UB is a serious drain on productivity in C++, and every new language or library feature comes with additional pitfalls, increasing the mental load.

This is to say: The benefit of Rust is not actually about "security critical scenarios", but much more generally about delivering the same quality of code in a fraction of the time.

pjmlp

While I agree with the general sentiment, while it doesn't sort out all safety issues, actually programming in C++ instead of C with C++ compiler, and making use of the pleothora of analysers would already prevent many issues.

Better be 80% safer than none at all.

j-krieger

It's not only avoid UB in one area, it's about hidden footguns which you may not know about. I have been using C++ for a decade now and there are still UB causes I learn about every week or so.

DeathArrow

C++ can be safe enough if you proceed with care.

What I dislike of C++ is that it grew to become a monster of a language, containing all programming paradigms and ideas, good or bad, known to mankind.

It's so monstrously huge no human can hold its entire complexity in his head.

C++ allows you to do things in 10000 different ways and developer would do just that. Often in the same code base.

That being said, I would use a sane subset of C++ every day over Rust. It's not that I hate Rust or that I don't think is good, technically sound and capable. It just doesn't fit the way I think and I like to work.

I like to keep a simple model in mind. For me, the memory is just a huge array from which we copy data to CPU cache, move some to CPU registers, execute instructions and fetch data from the registers and put it again in some part of that huge array, to be used later. Rust adds a lot of complexity over this simple mental model of mine.

ultimaweapon

> C++ can be safe enough if you proceed with care.

The problem with this is if you have a team working on a C++ product you will need some people who can catch memory bugs to review every code before merging. Even with this approach it still possible to missed some memory bugs since the reviewer need to fully understand each object lifetime, which is time consuming during code review.

I'm working on a company that run on a server application written in C/C++. The code base is very large and we always have memory bugs that required ASAN on production to fix the bugs. We have started migrating each part to Rust one year ago and we never have a single crash from Rust code. The reason we choose Rust is because it is a server application that computation intensive, latency sensitive and large amount of active connections.

Try keep using Rust until you comfortable with it and you will like it. It fit with your simple mental model. I can say this because I was a C++ users for the whole life and switched to Rust recently.

alexey-salmin

> The problem with this is if you have a team working on a C++ product you will need some people who can catch memory bugs to review every code before merging. Even with this approach it still possible to missed some memory bugs since the reviewer need to fully understand each object lifetime, which is time consuming during code review.

Nah, if you're trying to match every "new" with a "delete" during the code review, you've already lost the battle. You can probably succeed when the code is added, but then the edits start to flow and sooner or later it's gone. Reviews are mostly good to catch design problems, not bugs.

The only reliable approach I know is to have a strict rule of never mixing memory management with business logic. Nothing else works well enough but this one however works remarkably well.

Business logic should rely on containers, starting with simple unique_ptrs and vectors and going deeper and deeper into the custom land when appropriate. If you can't find a suitable standard container, you build a custom one. The principal difference of "writing a custom container when you need it" compared to "integrate custom memory-management into the business logic when you need it" is that containers are:

* well understood

* well tested

* relatively small code-wise

* almost never change once implemented

None of the above applies to the business logic, it's the complete opposite.

Think of it kind of like programming in Java: someone has to write the memory management and it's a hell of job. However once this is done, programming the ever-changing business logic is easy and safe.

You can live the same life in C++ AND also have the ability to put on the "Doomguy of the memory management" shoes whenever you feel like it. Just don't forget to take of the shoes of the "business logic guy" when you do it, you can't wear both at the time.

j-krieger

> The code base is very large and we always have memory bugs that required ASAN on production to fix the bugs

This is a big part of why Rust works. We also never have errors that we can't reproduce in development.

joatmon-snoo

> if you proceed with care

Yes, but that is _incredibly_ time consuming. You have to set up asan, msan, tsan, and valgrind. If you want linting you need to do shenanigans to wire up clang-tidy.

I also like simple mental models. I like not having to figure out the cmake modifications to pull in a new library. I like having a search engine when I need a new library for x. I like when libraries return Result<Ok, Err> instead of ping ponging between C libraries which indicate errors using retval flags or C++ libraries that throw std::runtime_error(). I like not dealing with void* pointer casting .

angra_mainyu

I find Zig to be a saner Rust alternative, with the caveat that it is still a immature (but getting there).

Give it a few years and it will be a very strong contender.

The true C successor.

davemp

Yeah, I’m leaning towards zig but I’m a bit on the fence still.

For Rust, I kind of got tired of writing unsafe rust for embedded, but that’s addressable afaik. The real dealbreaker was that after 10k+ lines of code I still will pop open the source of a library that solves a simple problem and the code looks indecipherable. I also don’t really agree with the dependency explosion that cargo encourages.

Zig is very nice in that it has the most ergonomic struct usage I’ve encountered. The stdlib could really use some improvement though. Comptime is very cool, but I also worry if the community will get undisciplined with it.

DeathArrow

I like Zig, too. Not sure if it's a Rust alternative as in you still have to manage the memory yourself.

But is much simpler, easier to read, easier to understand, easier to follow and easier to reason about. It's less verbose and more productive.

It feels like what C would look like had it been invented today.

eptcyka

What complexity specifically does Rust add to that model?

arnsholt

I can think of three things off the top of my head:

- Rust doesn’t let you pretend that memory is a flat array of bytes - Single ownership of data can be annoying in some cases - The borrow checker pointing out that you’re trying to do something stupid with pointers (again) can be annoying

Of course, I’m of the opinion that the hassles are worth it, especially the borrow checker. Almost every time I have to fight the borrow checker, it’s because I haven’t thought properly about the pointers involved and tried to do something stupid.

mplanchard

I love this. We are a rust shop and we use clickhouse a fair bit. We’ve been quite impressed with its speed and flexibility. I’m glad to see this kind of direct, real-world feedback around both the benefits and difficulties of mixing rust and C++, which we have also had to do a bit of (albeit in the opposite direction: a smattering of C++ in a sea of rust).

I’m not sure if the poster here is the post author, but it would be great if the author would consider filling out this survey that was recently released asking for feedback on the future of rust’s vision: https://blog.rust-lang.org/2025/04/04/vision-doc-survey.html

I’d love to see rust become the de facto standard for cross-language portable code by virtue of its ease of use, but as this and our experience highlights, there’s some way to go yet!

mplanchard

Oh one more note, regarding hermetic builds: I have tried to package clickhouse in nix for our dev environment and CI, but its build is pretty complicated, so I resorted to just pulling the precompiled binaries.

Nix, via the standard rust integration or via something like crane, is actually quite nice for building rust/C++ combo projects, so it’d be awesome if the team might consider this as a means of achieving reproducibility. I’d imagine they’d have an easier time of it than I did, given they are more familiar with their own build process.

ChocolateGod

> If you do an experiment and say "C++" anywhere on the Internet, in a minute someone will chime in and educate you about the existence of Rust.

Many people see this as a problem. The response to TypeScript choosing Go over Rust was pretty gross imho, no one should be abused for choosing a language.

pjmlp

The actual issue wasn't Go over Rust, rather having key people responsible for C# design, on a Microsoft project, going for a Google language.

While at the same time, the .NET team routinely talks about .NET image problem outside traditional Microsoft shops, which naturally decisions like this aren't helping a tiny bit.

dep_b

Yeah loved the language and the IDE, but only boring Enterprise stuff built with it. Also C# got so complex, since it had to absorb every idea from F# rather than making F# a viable programming language on itself and improving interop...

At a given point after being a C# programmer for years I still encountered patterns that were completely unreadable to me.

ncruces

That's what happens when you have an high level VM that wants to support high level concepts from multiple high level languages, and all your languages need to be able to talk to each other.

Same thing will happen to Wasm as it decides to add more and more high level stuff “to avoid shipping multiple GCs” and “to get different languages to talk to each other.” As soon as you want to abstract over more than “a portable CPU and memory” you get into that mess.

Never worked in the past better than JVM and CLR, but let's keep trying.

neonsunset

Do you have a specific snippet in mind which demonstrates the issue? It is likely more of a team or a community issue when it comes to writing unreadable code than a language one since it tends to happen in every sufficiently powerful language.

C# did not “have to absorb every idea from F#”. This is not how programming language development works. You can read LDM notes at https://github.com/dotnet/csharplang/discussions?discussions... and specs are documented in the repo.

> rather than making F# a viable programming language

F# is a viable language aside from using specific few libraries that don’t play with it nicely or around writing ref struct heavy code. I’m not sure what makes you think it is not. In comparison, it is probably more viable for shipping products than Scala, Clojure, OCaml and Haskell.

nobleach

I think it speaks to an incredibly pragmatic viewpoint though. When you take a look at your flagship language/ecosystem and you say, "hey, this is great for building entire systems for doctor/patient data... or perhaps even banking software"... but recognize that "it's probably not the best thing for building a compiler/tooling". Google themselves show the same pragmatism when NOT using Go for Android. They prefer Kotlin these days.

pjmlp

Complete different reasoning, choosing Go instead of Kotlin would mean rewrite from scratch 100% of Android userspace, minus the C++ libraries for Treble drivers, graphics and ART toolchain.

And Google did do exactly that with Fucshia, which doesn't seem to be going to power anything beyond Nest screens.

ChocolateGod

.NET isn't famous for producing portable binaries and I don't think it's being any better now other than some experimental modes that break most code.

pjmlp

Do you want more portable than bytecode with a dynamic compiler? Apparenty the greatest thing on Earth as per WebAssembly folks.

As for AOT compilation, there have been multiple approaches since the early days, and the latest, Native AOT is good enough for everything required to write a TypeScript compiler, including better WebAssembly support than the Go compiler, thanks to Blazor infrastructure.

neonsunset

> no one should be abused for choosing a language

Can you link to the abuse?

hitekker

https://news.ycombinator.com/item?id=43413702 is one example.

A Rustacean implied Go was not memory safe and that Microsoft couldn't understand the power of Rust. Steve Klabnik & others told them off. But other Rustaceans, like Patrick Walton, argued that Go has memory safety issues in theory.

neonsunset

https://dictionary.cambridge.org/dictionary/english/abuse

Rustacean, Gopher... this is an embarrassing way of looking at it.

And, speaking of, Go is not a memory safe language when you reach for its concurrency primitives as it very easily lets you violate memory safety (as opposed to Rust, .NET and JVM, where instead you get logic bugs but not memory safety ones).

rc00

https://github.com/microsoft/typescript-go/discussions/411

Some of the worst comments have been scrubbed but they might be in one of the internet archival sites.

ChocolateGod

I recall a thread on Twitter where someone called the TypeScript developers "brain dead *tards" for using Go over Rust.

Ygg2

You'll get loud, obnoxious idiots in any big enough crowd. Also [source needed].

johnny22

That's really dumb, but it's hard to call one one instance abuse. Now, I do beleive it turns into abuse if it a bunch of people do it. Especially if they are bringing their insults specifically into your space (repo forge, mailing lists, flooding your replies on social media, etc).

angra_mainyu

The Rust community is unfortunately plagued by this subset of devs who are zealous (and downright toxic) in their shilling for their favorite language.

Before anyone gets triggered and starts typing up a reply: "SUBSET" is the word I used.

neonsunset

Ah, so no abuse took place then? Interesting how it works. One could look at it as well-deserved frustration. No one would've batted an eye were Rust to be chosen, but opting into Go over C# or F# is an unquestionably poor long-term decision.

IshKebab

This guy seems to be both very positive about Rust and unfairly cynical about it at the same time...

Rust is a really fantastic language but having worked on a mixed C++/Rust codebase I can see why they had so many issues. Rust just wasn't really designed with C++ interop in mind so it's kind of painful to use them together. Impressive that they made it work.

uecker

"unfairly"? A lot of issues are obvious deficiencies of Rust, including immaturity of the ecosystem, integration issues, complexity, monomorphization bloat, supply chain issues. Now, all languages have issues, and Rust is certainly a nice language overall. The main issue with Rust is that is has been oversold as a panacea for safety using exaggerated arguments. So a bit of cynicism seems entirely fair.

j-krieger

> monomorphization bloat

Especially because the fix is so easy, it could just be fixed by the compiler on the fly.

If you have a function

  fn a(arg: Into<C>) {
    expensive/extensive operations here
  }

and call it two times

  a(&"test");
  a(10);

The compiler will generate two functions

  fn a_str(arg: &str) {
  /// expensive/extensive operations
  }
  fn a_u16(arg: u16) {
  /// expensive/extensive operations
  }

This can be fixed by just proxying the duplicated call like this

  fn expensive_ops(arg: C) {
  // ...
  }

  fn a_str(arg: &str) {
    let _arg: C = arg.into();
    expensive_ops(_arg);
  }

AlotOfReading

> The main issue with Rust is that is has been oversold as a panacea for safety using exaggerated arguments.

I know you know this, but Rust does provide essentially complete memory and lifetime safety if you stay within the bounds of safe. Standard C/C++ tooling has no way to even reliably detect memory safety violations, let alone fix them. It's trivial to write buffer overflows that escape ASAN, and missing a single violation invalidates the semantic meaning of the entire program (particularly in C++), which means virtually all nontrivial programs in C/C++ have UB somewhere (a point we disagree on).

Safe rust doesn't guarantee all the other possible definitions of safety, but neither does any other mainstream language. I don't think it serves any useful purpose to complain that the rust folks have oversold their safety arguments by "only" eliminating the biggest cause of safety issues. Stroustrup harps on this a lot and it comes across as very disingenuous given the state of C++.

uecker

Your comment is a perfect example for those exaggerated claims.

IshKebab

Yes. I was thinking of this:

> If you do an experiment and say "C++" anywhere on the Internet, in a minute someone will chime in and educate you about the existence of Rust.

> I know examples when engineers rewrite code from Rust in Rust if they like to rewrite everything in Rust.

> our engineers become too nauseous from Rust poisoning

> So now they [Rust devs] can write something other than new versions of old terminal applications.

> someone shows PRQL, everyone else thinks "What a wonderful idea, and, also, Rust" and gives this project a star on GitHub. This is, by the way, how most of Rust projects get their stars on GitHub. It doesn't look like someone wants to use this language, but what we want is to ride the hype.

> we started to understand that it would be hard to get rid of Rust, and we could tolerate it.

It's a very shitty attitude and not even accurate. You see this attitude from old C/C++ devs quite a lot, it's just very weird that he has that attitude and then also seems to be simultaneously quite keen to use Rust. Very weird!

Anyway those are just the non-technical things. On the technical side:

> Fully offline builds

They solved it by vendoring but this is the obvious solution and also applies to C++.

> Segfault in Rust

They tried to do a null-terminated read of a string that wasn't null-terminated. Nothing to do with Rust. That would be an error in C++ too. In fact this is a strong argument for Rust.

> Panic

C/C++ code aborts. Or more commonly it crashes in a very difficult to debug way. I'll take panics any day.

> Sanitizers require nightly

Ok fair enough but this seems relatively minor.

> Rust's OpenSSL links with the system library by default and you have to set an environment variable to statically link it.

They set the environment variable. Frankly this is a million times easier than doing the same thing in C++.

I'll stop there, but overall it seems like a lot of "this is a problem we had with Rust" where it should really be more like "this is something we had to do when using C++ with Rust".

Weird vibe anyway.

uecker

If I try to read between the lines, I think the vibe comes from feeling / being pressured to use Rust without really seeing the point and it causing frustration in this context.

I think your vibe is more weird. If people have issues with Rust, it is a "shitty attitude". While, of course, C/C++ just objectively suck, right?

alserio

Check the post date. It was published on April first

pornel

They link to actual issues in their bug tracker, so if it was a joke, it was an impressive long con.

alserio

The joke is in the tone, not in the issues

epage

> As a downside, Rust libraries typically have a large fan-out of dependencies, much like Node.js. This requires taking care to avoid the blow-up of dependencies, and to deal with annoyances of dependabot.

In the linked situation, the were using the library of a binary. This get into the tension between "make it easy for `cargo install` (and have a `cli` feature be default) and "make it easy for `cargo add` (and make `cli` opt-in).

This is not a great experience and we should improve it. There was an RFC to auto-enable features when a build-target is built (allowing `cli` to be opt-in but `cargo install` to auto-opt-in), rather than skip it, but the dev experience needed work, The maintainer can split the package which helps with semver for the two sides but needs to break one side to do so and if its the bin, people need to discover the suffix (`-bin`, `-cli`, etc).

Current workarounds:

- `cargo add skim` will show the `cli` feature is enabled and you can re-run with `--no-default-features`

- if `cli` wasn't a default, `cargo install skim` would suggest adding `--features cli`

alkonaut

Re: panics: If you have a single long lived process that must do multiple short-lived things (web requests, say) and a panic in one of them MUST NOT take down the whole process, is that extremely difficult to pull off in Rust? I thought you could set up panic boundaries much like you would use catch-all exception handlers around e.g. each web request or similar, in other languages?

tuetuopay

You can install a global panic handler to avoid bringing the whole process down. Instead of aborting, take the stack trace, print it, perhaps raise to sentry, and kill the specific "work unit" that caused it. This "work unit" can be a thread of a task, depending on how the application is architected.

This is precisely what Tokio does: by default, a panic in async code will only bring down the task that panicked instead of the whole application. In the context of a server, where you'll spawn a task for each request, you have no way to bring down the whole application (), only your current scope.

(): there could be other issues, like mutex poisoning, which is why nobody uses the stdlib's mutexes. But the general point still stands.

EE84M3i

> there could be other issues, like mutex poisoning, which is why nobody uses the stdlib's mutexes.

What does everyone use instead?

tuetuopay

In the context of Tokio, the tokio's native mutexes / locking primitives. For sync code, parking_lot is the de facto replacement for the stdlib's ones.

I don't remember where I read it, but it has been admitted that having synchronization primitives with poisoning in the stdlib was a mistake, and "simpler" ones without it.

for context: a mutex is poisoned should a panic occur while the mutex is held. it is then assumed the guarded data to be broken or in an unknown state, thus poisoned.

j-krieger

parking_lot

sph

Since Rust is not a managed/high-level language, panics are unrecoverable crashes so they need to be dealt at a higher-level, i.e the OS, with appropriate supervisor systems like systemd, or having a master Rust process that spawns subprocesses, and react when one of them abnormally terminates with regular POSIX APIs.

On a platform like Elixir, for example, you can deal with process crashes because everything runs on top of a VM, which is at all effects and purposes your OS, and provides process supervision APIs.

pornel

Rust can be optionally compiled in a panic=abort mode, but by default panics are recoverable. From implementation perspective Rust panics are almost identical to C++ exceptions.

For servers that must not suddenly die, it's wise to use panic=unwind and catch_unwind at task/request boundaries (https://doc.rust-lang.org/stable/std/panic/fn.catch_unwind.h...)

In very early pre-1.0 prototypes Rust was meant to have isolated tasks that are killed on panic. As Rust became more low-level, it turned into terminating a whole OS thread on panic, and since Rust 1.9.0, it's basically just a try/catch with usage guidelines.

alkonaut

But no few would write a process-per-request web server today for example. And if a single process web server handles 100 requests you would then accept that one bad request one tore down the handling of the 99 others. Even if you have a watchdog that restarts the service after the one request choked, you wouldn't save the 99 requests that were in-flight on the same process. Can't you catch_unwind for each request handler, if one chokes then you just ignore that request. If you worry about that messing anything up, then you can tear down and restart your process after that, so the 99 other requests get a chance to complete?

tuetuopay

This is factually incorrect. The behavior you describe with Elixir (sic) is precisely what most Rust async runtimes do. (sic because it's Erlang that's to thank)

IMHO that is the sensible thing to do for pretty much any green thread or highly concurrent application. e.g. Golang does the same: panicking will only bring down the goroutine and not the whole process.

chenhoey1211

Love this approach — Rust in the right places. I’ve been wondering if using Wasm modules (e.g. from MoonBit) for isolated components might offer a similar balance: memory safety without full rewrite.

rokob

Given the date on the post, I can’t tell if this is real.

hexo

This web is impossible to scroll. Sadly, when I see that I don't think I need to read about any technical stuff they did anymore.

loicalleyne

https://github.com/UoCCS/project-GROS

pod_krad

In early versions (before 1.0) Rust already had GC. But then it was removed.

dathinab

One thing I often see pop up in larger projects, which in the article is likely the fault of way to large symbols, is overuse of generics/type state/etc.

Or you could formulate this as needless obsession with not using `dyn`.

And sure generics are more powerful, dyn has limitations, etc. etc.

It's one of this "Misconceptions Programmers believe about Monomorphisation vs. Virtual Calls" things as in:

TL;DR: dyn isn't as bad as some people make it out to be; Weather perf. or convenience it can be the better choice. Any absolute recommendation of always use this or that is wrong.

- wrong: monomorphisation is always faster; reason: monomorphisation pollutes the instruction cache way worse, as such in some situations switching some parts (not all parts) to virtual calls and similar approaches can lead to major performance improvements. Good example here are various experiments about how to implement something like serde but faster and with less binary size.

- wrong: monomorphisation was picked in rust because it's better for rust; right: it was picked because it is reasonable good and was viable to implement with available resources. (for low level languages it's still better then only using vtables, but technically transparent hybrid solutions are even more desirable)

- wrong: virtual calls are always slow in microbenchmarks; right: while they are more work to do modern cpus have gotten very very good at optimizing them, under the right conditions the might be literally as fast as normal function calls (but most times they are slightly slower until mono. trashes icache too much)

- wrong: monomorphisation is always better for the optimizer; right: monomorphisation gives the optimizer more choices, but always relevant or useful choices but they always add more work it has to do, so slower compiler times and if you are unlucky it will miss more useful optimizations due to noise

- wrong: in rust generics are always more convenient to use; right: Adding a generic (e.g. to accomodate a return position impl trait) in the wrong place can lead you to having to write generic parameters all through the code base. But `dyn` has much more limitations/constraints, so for both convenience and performance it's a trade of which more often favors monomorphisation, but not as much as many seem to believe.

- wrong: always using dyn works; right: dyn doesn't work for all code and even if it would using it everywhere can put too much burden on the branch predictor and co. making vcalls potentially as slow as some people thing they are (it's kinda similar to how to much monomorphisation is bad for the icache and it's predictors, if we gloss over a ton of technical details)

So all in all understand what your tools entail, instead of just blindly using them.

And yes that's not easy.

It's on of the main differences between a junior and a senior skill level.

As a junior you follow rules, guidelines (or imitate other) when to use which tool. As a senior you deeply understand why the rules, guidelines, actions of other people are the way they are and in turn know when to diverge from it.

vacuity

My understanding of why Rust does monomorphization by default is that it wanted maximum performance, since it was meant to replace C++. Excellent post!

null

[deleted]

HN

A year of Rust in ClickHouse

A year of Rust in ClickHouse