Build a Database in Four Months with Rust and 647 Open-Source Dependencies
80 comments
·January 15, 2025ergonaught
sealeck
Rust dependencies tend to be pretty high quality in my experience. Maintained by experts and offer new improvements over state-of-the-art.
But if you compare to C/C++ at least with Rust you _can_ but aren't required to use dependencies. In C/C++ if you want to, it's a _massive_ pain.
rectang
I care less about the quality of the dependencies than about the burden of protecting against supply chain attacks when there are a lot of dependencies.
kibwen
Indeed, and that's a good reason to avoid third-party dependencies. But that's irrelevant to the choice of programming language; a language with a bad dependency manager might force you to build everything yourself, but you can always just do that, even in a language with a good dependency manager, you just choose to build everything yourself if you care.
Perplexingly, the original commenter seems to understand that this doesn't matter, and then handwaves away the correct conclusion.
larusso
Nothing stops you from vendoring them into your repo and hand update each. But how would you do this in c++? Write everything from scratch? I mean rust doesn’t stopp you there
[edit] typos
whodev
Thank you.
As someone who works in cybersecurity and works closely with our developers, a lot of them tend to inherently trust third-party code with no auditing of the supply chain. I am always fighting that while yes, we don't need to reinvent the wheel and libraries/packages are important, our organzation and developers need to be aware of what we are bringing into our network and our codebase.
orf
it's completely stupid to measure "number of dependencies" in absolute numbers.
Lots of packages have a `-macros` or `-derive` transient dependency, meaning a single dependency can end up coutning as 3 additional dependencies.
Rust makes it simple to split packages into workspaces - for example, regex[1] consists of `regex-automata` and `regex-syntax` packages.
This composition and separation of concerns is a sign of good design, and not an npm-esque hellhole.
adamc
The vulnerability to supply chain attacks gives me pause. It's not unique to rust and it bothers me with npm or Python as well.
rectang
I suppose you could say that the audit burden scales linearly with the number of module publishers, with a small additional amount on every release point to confirm that the publisher is still who they purport to be and hasn't been compromised.
This is assuming that the audit consists of validating dependency authorship, and not the more labor-intensive approach of reviewing dependency code.
arccy
that's kind of on rust for pushing crates front and center rather than groupings of crates that are developed / reviewed / released together as a single cohesive unit (typically a git repo).
e.g. go dependencies are counted on modules (roughly git repos), rather than packages (directories, compilation units). java is counted in packages rather than classes.
brabel
Just tried to look at what some macro was generating using cargo-expand. It requires a LOT of dependencies. Took like 5 minutes to compile it all (run `cargo install cargo-expand` if you want to try). I almost aborted because the description of the crate says "Wrapper around rustc -Zunpretty=expanded." so I had expected the simplest possible crate to do that.
PittleyDunkin
> Took like 5 minutes to compile it all
TBF this has nothing to do with dependency complexity and everything to do with semantic complexity. You could easily do this without using any dependencies at all.
unless you're downloading dependencies during the build or something like that, of course.
kibwen
How many transitive dependencies is the right number for a database?
jandrewrogers
Honestly, current best practice puts that number right around zero, which you see for ambitious implementations.
A non-obvious issue is that database engines have peculiar requirements for how libraries are designed and implemented which almost no conventional library satisfies. To make matters worse, two different database implementations may have different requirements in this regard, so you can't even share libraries between databases. There are no black boxes in good database engines.
kibwen
> current best practice puts that number right around zero
In the case where the answer is "zero", then that means that one does not actually need a package manager at all, in which case the features of the package manager are not relevant to the choice of language. This would imply that the parent commenter has no need to reject Rust.
almostdeadguy
Compression libraries, OpenSSL, ICU, etc. are all common dependencies for databases.
Looking at the dependencies list (https://gist.github.com/tisonkun/06550d2dcd9cf6551887ee6305e...) I see plenty of reasonable things like:
* Base64/checksum/compression encoding libraries
* Encryption/hash libraries
* Platform-specific bindings (likely conditional dependencies)
* Bit hacking/casting/zero-copy libraries like bytemuck, zerocopy, zero-vec, etc.
* "Small"/stack allocated data structure libraries (smallvec, tinystr, etc.)
* Unicode libraries
There are certainly things that would add bloat too, but I think it's silly to pretend like everything here is something a database engine would need custom implementations of.
PittleyDunkin
What are you comparing this to? Do you have positive examples? This seems to be a general dependancy management issue unrelated to rust—the reason C++ has this is that C++ also lacks any concept of dependencies, so people kind of just make do with modifying what packages are already integrated into the build process. This certainly doesn't imply you should trust boost (or the standard library, or whatever people use this decade, or xz, or whatever).
klysm
This take is utter nonsense to me - just don't use them...
Deukhoofd
I really chuckled about how the blog post opens with how great Rusts open-source ecosystem is, and ends with an "anyway, we made our software private and proprietary"
bdcravens
Isn't that pretty much the modern stack? Open source language, framework, and libraries, and proprietary end product?
bbkane
That's technically correct, but they listed several ways they contribute back to the OSS ecosystem: PRs, issues, creating new libraries...
This comment makes it seem like all this company does is take, which feels unfair to me
ipaddr
"We keep ScopeDB private and proprietary, while we actively get involved and contribute back to the open-source dependencies, open source common libraries when it’s suitable"
They say they do when suitable (never or rarely).
But that's fine as the licenses allow it. It feels like another company blogging about how great open source to get pr while close sourcing their product.
The older I get the more I understand why gpl variations are superior to bsd if you want to grow the software. Bsd are good for throw away code or standards you want others to adopt.
PittleyDunkin
>This comment makes it seem like all this company does is take, which feels unfair to me
Profit isn't far removed from theft, so maybe this shouldn't feel so unfair.
easterncalculus
From the title I was really expecting this page to be a tutorial like build-your-own[1].
01HNNWZ0MV43FF
That's why all my useless little crates are AGPL :D
PittleyDunkin
> I really chuckled about how the blog post opens with how great Rusts open-source ecosystem is, and ends with an "anyway, we made our software private and proprietary"
I mean that's been the prevalent attitude for the entire history of open source. Its easy to laugh until someone replaces you.
moi2388
“ With a team of three experienced developers, we have implemented ScopeDB from scratch”
“ with 100 direct dependencies and 647 dependencies in total”
Next up: watch me build numpy from scratch with only 150 dependencies, one of which numpy.
remram
You're not wrong, they depend on an external SQL database, which they access with sqlx.
tison
In the linked article below, we talked about "If RDS has already been used, why is another database needed?" and "Why RDS?"
Briefly, you need to manage metadata for the database. You can write your own raft based solution or leverage existing software like etcd or zookeeper that may not "a relational database". Now you need to deploy them with EBS and reimplement data replication + multi AZ fault tolerance, and it's likely still worse performance than RDS because first-class RDS can typically use internal storage API and advanced hardware. Such a scenario is not software driven.
https://flex-ninja.medium.com/from-shared-nothing-to-shared-...
binaryturtle
Isn't that something that should be posted April 1? I'm really not sure if the author is proud about the fact that his project has so many dependencies. Is that something modern coders aim for these days? I usually try to achieve the exact opposite in my projects.
griomnib
April 20th as you’d have to be high as hell to think this was a good idea.
ramon156
Its just really tongue-in-cheek about everything which makes this article more fun to read imo
thadt
"An absolutely outrageous number of dependencies! What a bunch of wankers."
I comment, in a Chromium[1] tab, running on my Ubuntu[2] box.
[1] https://github.com/chromium/chromium/blob/main/.gitmodules
[2] https://releases.ubuntu.com/24.04/ubuntu-24.04.1-desktop-amd...
flufluflufluffy
I read the title thinking it was a joke, and after reading the article, I still can’t tell if it is or not.
kpcyrd
The title of the submission is somewhat bait, unfortunately the Cargo.lock doesn't seem to be public. Since my current Rust side-project also has some kind of database (along with, well, a p2p system) and also totals 454 dependencies, I've decided to do a breakdown of my dependency graph (also because I was curious myself):
- 85 are related to gix (a Rust reimplementation of git, 53 of those are gix itself, that project is unfortunately infamous for splitting things into crates that probably should've been modules)
- 91 are related to pgp and all the complexity it involves (aes with various cipher modes, des, dsa, ecdsa, ed25519, p256, p384, p521, rsa, sha3, sha2, sha1, md5, blowfish, camellia, cast5, ripemd, pkcs8, pkcs1, pem, sec1, ...)
- 71 are related to http/irc/tokio (this includes a memory-safe tls implementation, an http stack like percent-encoding, mime, chunked encoding, ...)
- 26 are related to the winapi (which I don't use myself, but are still part of the resolved dependency graph)
- 8 are related to web assembly (unused when compiling for Linux)
- 2 are relatd to android (also unused when compiling for Linux)
In some ways this is a reminder of how much complexity we're building on top of for the sake of compatibility.Also keep in mind "reviewing 100 lines of code in 1 library" and "reviewing 100 lines of code split into 2 libraries" is still pretty much the same amount of code (if any of us actually reviewed all their dependencies). You might even have a better time reviewing the sha2 crate vs the entirety of libcrypto.so, if that's all you needed.
My project has been around for (almost) two years, I scanned every commit for vulnerable dependencies using this command:
for commit in $(git log --all --pretty='%H'); do git show "$commit":Cargo.lock > Cargo.lock && cargo audit -n --json | jq -r '.vulnerabilities.list[] | (.advisory.id + " - " + .package.name)'; done | sort | uniq
I got a total of 25 advisories (basically what you would be exposed to if you ran all binaries from every single commit simultaneously today). Here's the list: RUSTSEC-2020-0071 - time
RUSTSEC-2023-0018 - remove_dir_all
RUSTSEC-2023-0034 - h2
RUSTSEC-2023-0038 - sequoia-openpgp
RUSTSEC-2023-0039 - buffered-reader
RUSTSEC-2023-0052 - webpki
RUSTSEC-2023-0053 - rustls-webpki
RUSTSEC-2023-0071 - rsa
RUSTSEC-2024-0003 - h2
RUSTSEC-2024-0006 - shlex
RUSTSEC-2024-0019 - mio
RUSTSEC-2024-0332 - h2
RUSTSEC-2024-0336 - rustls
RUSTSEC-2024-0345 - sequoia-openpgp
RUSTSEC-2024-0348 - gix-index
RUSTSEC-2024-0349 - gix-worktree
RUSTSEC-2024-0350 - gix-fs
RUSTSEC-2024-0351 - gix-ref
RUSTSEC-2024-0352 - gix-index
RUSTSEC-2024-0353 - gix-worktree
RUSTSEC-2024-0355 - gix-path
RUSTSEC-2024-0367 - gix-path
RUSTSEC-2024-0371 - gix-path
RUSTSEC-2024-0373 - quinn-proto
RUSTSEC-2024-0421 - idna
I guess I'm doing fine. Keep in mind, the binary is fully self-contained, there is no "look, my program has zero dependencies, but I need to ship an entire implementation of the gnu operating system along with it".etaioinshrdlu
My main question is why observability data needs (or benefits from) a tailor-made database instead of a general purpose one. In 2025, anyone working on observability who told me they have to build their own database, I would be very suspicious!
tison
Datadog always builds their own event store: https://www.datadoghq.com/blog/engineering/introducing-husky...
It may not be named "database" but actually take the place of a database.
Observability vendors will try to store logs with ElasticSearch and later find it over expensive and has weak support for archiving cold data. Data Warehouse solution requires a complex ETL pipeline and can be awkward when handling log data (semi-structured data).
That said, if you're building an observability solution for a single company, I'd totally agree to start with single node PG with backup, and only consider other solution when data and query workload grow.
jcgrillo
In 2025 I'd consider starting with clickhouse instead, if you're going the DIY route
Jolter
Not even limited to general purpose ones, there are existing tailor made databases for observability. Maybe somewhere on that page, they explain why this one is better.
carlos-menezes
100 direct dependencies is insane.
synergy20
so,npm hell,or pip hell again?
to be fair, python pkg dependency are fine to me,there might be a lot of pip pkgs still,but not a few hundreds like npm and cargo normally pulls in.
golang also has a reasonable amount of dependencies. npm and cargo dependencies are just scary due to the huge number.
eximius
NPM and pip hell come about for several reasons, one of the biggest being that package versions are global.
In rust, you can project A can use dependencies B and C which can both depend on different versions of D. Cargo/crates generally also solve some of the other metadata problems Python has.
This means the developer experience is _significantly_ improved, at a potential cost of larger binaries. In practice, projects seem to have sufficiently liberal bounds that duplication isn't an issue.
EVa5I7bHFq9mnYK
57 of which written by DPRK Koding Forces, waiting for the right moment to push a glorious update, striking at the heart of The Biggest Enemy.
eknkc
Is the dependency count supposed to be impressive?
jjtheblunt
i think the implication is that it's precarious...how does one know all are bug free, for example?
speed_spread
Past a number of dependencies, actually getting anything to build deterministically, run reliably and then not get 0wnd to bits becomes an actual challenge, which many enthusiastic developers have a masochistic kink for.
The thrill of complexity is real.
While acknowledging one does not "have to" have so many dependencies, the prevalence of this npm-esque type of practice is one of the two things that destroyed all of my interest in Rust.