Build a Database in Four Months with Rust and 647 Open-Source Dependencies
148 comments
·January 15, 2025moi2388
remram
You're not wrong, they depend on an external SQL database, which they access with sqlx.
tison
In the linked article below, we talked about "If RDS has already been used, why is another database needed?" and "Why RDS?"
Briefly, you need to manage metadata for the database. You can write your own raft based solution or leverage existing software like etcd or zookeeper that may not "a relational database". Now you need to deploy them with EBS and reimplement data replication + multi AZ fault tolerance, and it's likely still worse performance than RDS because first-class RDS can typically use internal storage API and advanced hardware. Such a scenario is not software driven.
https://flex-ninja.medium.com/from-shared-nothing-to-shared-...
stuhood
When it comes to understanding the risks involved with having this many dependencies, one thing that folks might not understand is that Rust's support for dependency resolution and lock files is fantastic.
Tools like `cargo audit` can tell you statically based on the lockfile which dependencies have security vulnerabilities reported against them (but you have to run it!). And Github's https://github.com/dependabot/ will do that same thing automatically, just based on the existence of the lockfile in your repo (and will also open PRs to bump deps for you).
And as mentioned elsewhere: Cargo's dependency resolver supports providing multiple versions of a dep in different dependency subgraphs, which all but eliminates the "dependency hell" that folks expect from ecosystems like Python or the JVM. Two copies of a dep at different versions? Totally fine.
Threadbare
Doesn't node npm also do similar?
stuhood
Yes. AFAIK, it evolved over time across 3+ package managers (`npm`, `yarn`, `pnpm`, etc), but the current state of that ecosystem is similar (including the behavior of dependabot).
robertlagrant
Python's Poetry has poetry audit as well, and there are third-party tools such as Safety (Python), Nancy (Golang), etc. Lots of languages have something like this.
stuhood
They support lockfiles and tools like `audit`, yes. But they do not support having multiple versions of a dependency.
Tools based on loading libraries from a *PATH (Go, Python, JVM) usually do so by grabbing the first one that they encounter that contains the appropriate symbols. That is incompatible with having multiple versions of a package.
On the other hand, Rust and node.js support this -- each in their own way. In Rust, artifact names are transparently suffixed with a hash to prevent collisions. And in node.js, almost all symbol lookups are accomplished with relative filesystem paths.
hulitu
> Tools like `cargo audit` can tell you statically based on the lockfile which dependencies have security vulnerabilities reported against them
known security vulnerabilities. If someone compromises your cargo repository (see npm for examples) all your safety is gone.
binaryturtle
Isn't that something that should be posted April 1? I'm really not sure if the author is proud about the fact that his project has so many dependencies. Is that something modern coders aim for these days? I usually try to achieve the exact opposite in my projects.
griomnib
April 20th as you’d have to be high as hell to think this was a good idea.
hulitu
> Is that something modern coders aim for these days?
Yes. No dependencies is so 80's. Just run an ldd on your commonly used programs.
bdcravens
Even developers with "few" dependencies often lean on projects (languages, frameworks, etc) where there are hundreds of dependencies.
joquarky
I also prefer to minimize dependencies, and it feels like this is why I can't find work.
johnisgood
So do I. I am writing a Perl script right now, and I could either use a non-core dependency, or implement my own. I went with my own. It is only a few lines of code. It works without the need to cpan i the module.
ramon156
Its just really tongue-in-cheek about everything which makes this article more fun to read imo
thadt
"An absolutely outrageous number of dependencies! What a bunch of wankers."
I comment, in a Chromium[1] tab, running on my Ubuntu[2] box.
[1] https://github.com/chromium/chromium/blob/main/.gitmodules
[2] https://releases.ubuntu.com/24.04/ubuntu-24.04.1-desktop-amd...
mlok
Related : "Build a Database in 3000 Lines with 0 Dependencies" https://news.ycombinator.com/item?id=42725163
ergonaught
While acknowledging one does not "have to" have so many dependencies, the prevalence of this npm-esque type of practice is one of the two things that destroyed all of my interest in Rust.
sealeck
Rust dependencies tend to be pretty high quality in my experience. Maintained by experts and offer new improvements over state-of-the-art.
But if you compare to C/C++ at least with Rust you _can_ but aren't required to use dependencies. In C/C++ if you want to, it's a _massive_ pain.
rectang
I care less about the quality of the dependencies than about the burden of protecting against supply chain attacks when there are a lot of dependencies.
humanfromearth9
In the past, I worked on a project for Luxembourg's CTIE (their IT administration). In most cases, they explicitly requested that we reimplemented features we needed instead of including more third party libraries. They just allowed essential libraries for the project, like Struts for the Web framework, or implementations of standard libraries like JPA, JTA etc. that came with WebSphere. Everything else, we had to reimplement. For them, it was just much easier to manage, given the amount of systems they have to manage. And the allowed libraries were only allowed in versions that they had reviewed before for security issues. In the end, reimplementing features/functions that we could have included with other libraries was never a reason for any problem : this practice requires some additional work, but has never been significant for the ability to deliver the project as expected.
kibwen
Indeed, and that's a good reason to avoid third-party dependencies. But that's irrelevant to the choice of programming language; a language with a bad dependency manager might force you to build everything yourself, but you can always just do that, even in a language with a good dependency manager, you just choose to build everything yourself if you care.
Perplexingly, the original commenter seems to understand that this doesn't matter, and then handwaves away the correct conclusion.
ironhaven
Well if you look at the most recent open source supply chain attack on openssh, that used social engineering to add a backdoor to a project that openssh did not have a dependency on anywhere in it's SBOM. And with the xz example the backdoor had to be rushed out when it was deployed because the dynamic dependencey was being removed before the backdoor was completely in place. Doing a open source supply chain attack is not easy, fast or reliable for long.
It is not as simple as you say. Sometime it is better to know all of you dependencies are static linked at build time and specified when you are releasing your code. And the more sane you build system is the harder it is to add shellcode to your dependency's tarball and build scripts without turning peoples heads with random unsafe code.
whodev
Thank you.
As someone who works in cybersecurity and works closely with our developers, a lot of them tend to inherently trust third-party code with no auditing of the supply chain. I am always fighting that while yes, we don't need to reinvent the wheel and libraries/packages are important, our organzation and developers need to be aware of what we are bringing into our network and our codebase.
larusso
Nothing stops you from vendoring them into your repo and hand update each. But how would you do this in c++? Write everything from scratch? I mean rust doesn’t stopp you there
[edit] typos
0x457
Well, no one is forcing you to use these dependencies? Rust crates tend to be very minimal because how easy it is to use them.
The amount of code you have to review stays the same.
cchance
So ... then don't use them? No one forces anyone to use any dependencies in rust lol its just faster to use shit thats already made
goodpoint
One malicious dependency is enough. When you have 600 dependencies "tend to be pretty high quality" does not cut it.
orf
it's completely stupid to measure "number of dependencies" in absolute numbers.
Lots of packages have a `-macros` or `-derive` transient dependency, meaning a single dependency can end up coutning as 3 additional dependencies.
Rust makes it simple to split packages into workspaces - for example, regex[1] consists of `regex-automata` and `regex-syntax` packages.
This composition and separation of concerns is a sign of good design, and not an npm-esque hellhole.
rectang
I suppose you could say that the audit burden scales linearly with the number of module publishers, with a small additional amount on every release point to confirm that the publisher is still who they purport to be and hasn't been compromised.
This is assuming that the audit consists of validating dependency authorship, and not the more labor-intensive approach of reviewing dependency code.
0x457
Hard no. Burden scales with number of lines. Lines being split into smaller chunks (crates) only speed up the process in long run.
rcxdude
Indeed. It's actually been quite handy on a few occasions to be able to just pull in a smaller crate as opposed to the whole project. (in constrast to, say, boost in C++, which is a big mess of a dependency even though it's one that goes to at least a little bit of effort to let you split it up, but through an ad-hoc process as opposed to a standard package management system).
(I would genuinely be interesting in an experiment which pushes this as far as possible: what if each function was a 'package'? There's already a decent understanding of how dependencies within a library work in the compiler, what if that extended to the package manager? You would know exactly what code you actually needed, and would only pull in exactly what was necessary)
arccy
that's kind of on rust for pushing crates front and center rather than groupings of crates that are developed / reviewed / released together as a single cohesive unit (typically a git repo).
e.g. go dependencies are counted on modules (roughly git repos), rather than packages (directories, compilation units). java is counted in packages rather than classes.
adamc
The vulnerability to supply chain attacks gives me pause. It's not unique to rust and it bothers me with npm or Python as well.
PittleyDunkin
What are you comparing this to? Do you have positive examples? This seems to be a general dependancy management issue unrelated to rust—the reason C++ has this is that C++ also lacks any concept of dependencies, so people kind of just make do with modifying what packages are already integrated into the build process. This certainly doesn't imply you should trust boost (or the standard library, or whatever people use this decade, or xz, or whatever).
kibwen
How many transitive dependencies is the right number for a database?
jandrewrogers
Honestly, current best practice puts that number right around zero, which you see for ambitious implementations.
A non-obvious issue is that database engines have peculiar requirements for how libraries are designed and implemented which almost no conventional library satisfies. To make matters worse, two different database implementations may have different requirements in this regard, so you can't even share libraries between databases. There are no black boxes in good database engines.
almostdeadguy
Compression libraries, OpenSSL, ICU, etc. are all common dependencies for databases.
Looking at the dependencies list (https://gist.github.com/tisonkun/06550d2dcd9cf6551887ee6305e...) I see plenty of reasonable things like:
* Base64/checksum/compression encoding libraries
* Encryption/hash libraries
* Platform-specific bindings (likely conditional dependencies)
* Bit hacking/casting/zero-copy libraries like bytemuck, zerocopy, zero-vec, etc.
* "Small"/stack allocated data structure libraries (smallvec, tinystr, etc.)
* Unicode libraries
There are certainly things that would add bloat too, but I think it's silly to pretend like everything here is something a database engine would need custom implementations of.
kibwen
> current best practice puts that number right around zero
In the case where the answer is "zero", then that means that one does not actually need a package manager at all, in which case the features of the package manager are not relevant to the choice of language. This would imply that the parent commenter has no need to reject Rust.
brabel
Just tried to look at what some macro was generating using cargo-expand. It requires a LOT of dependencies. Took like 5 minutes to compile it all (run `cargo install cargo-expand` if you want to try). I almost aborted because the description of the crate says "Wrapper around rustc -Zunpretty=expanded." so I had expected the simplest possible crate to do that.
PittleyDunkin
> Took like 5 minutes to compile it all
TBF this has nothing to do with dependency complexity and everything to do with semantic complexity. You could easily do this without using any dependencies at all.
unless you're downloading dependencies during the build or something like that, of course.
klysm
This take is utter nonsense to me - just don't use them...
null
FullGarden_S
imagine if one dependency is GPL lol
With over 600 dependencies, the probability goes up and up.
Deukhoofd
I really chuckled about how the blog post opens with how great Rusts open-source ecosystem is, and ends with an "anyway, we made our software private and proprietary"
bbkane
That's technically correct, but they listed several ways they contribute back to the OSS ecosystem: PRs, issues, creating new libraries...
This comment makes it seem like all this company does is take, which feels unfair to me
ipaddr
"We keep ScopeDB private and proprietary, while we actively get involved and contribute back to the open-source dependencies, open source common libraries when it’s suitable"
They say they do when suitable (never or rarely).
But that's fine as the licenses allow it. It feels like another company blogging about how great open source to get pr while close sourcing their product.
The older I get the more I understand why gpl variations are superior to bsd if you want to grow the software. Bsd are good for throw away code or standards you want others to adopt.
PittleyDunkin
>This comment makes it seem like all this company does is take, which feels unfair to me
Profit isn't far removed from theft, so maybe this shouldn't feel so unfair.
bbkane
> Profit isn't far removed from theft
I definitely think there are unethical ways to profit - capitalism needs to be regulated for the good of the consumer/ecosystem/society.
However, I don't believe that a blanket comparison of any type of profit to theft can be useful or correct.
> so maybe this shouldn't feel so unfair
Do you think this company is unethical for writing closed source software and trying to sell it?
easterncalculus
From the title I was really expecting this page to be a tutorial like build-your-own[1].
01HNNWZ0MV43FF
That's why all my useless little crates are AGPL :D
bdcravens
Isn't that pretty much the modern stack? Open source language, framework, and libraries, and proprietary end product?
goodpoint
And then tech companies fire engineers while making record profits.
PittleyDunkin
> I really chuckled about how the blog post opens with how great Rusts open-source ecosystem is, and ends with an "anyway, we made our software private and proprietary"
I mean that's been the prevalent attitude for the entire history of open source. Its easy to laugh until someone replaces you.
dabinat
I was hoping this would be a discussion of Rust build times and how they optimized them with that number of dependencies.
But I think it’s easy for people to criticize dependencies from afar without understanding what they’re used for. I’m sure the dependencies in my projects would look strange to others - for example, I use three HTTP libraries: one for 95% of cases and the others for very specific use-cases where I need control at a low level. But without that context it might seem excessive.
flufluflufluffy
I read the title thinking it was a joke, and after reading the article, I still can’t tell if it is or not.
etaioinshrdlu
My main question is why observability data needs (or benefits from) a tailor-made database instead of a general purpose one. In 2025, anyone working on observability who told me they have to build their own database, I would be very suspicious!
tison
Datadog always builds their own event store: https://www.datadoghq.com/blog/engineering/introducing-husky...
It may not be named "database" but actually take the place of a database.
Observability vendors will try to store logs with ElasticSearch and later find it over expensive and has weak support for archiving cold data. Data Warehouse solution requires a complex ETL pipeline and can be awkward when handling log data (semi-structured data).
That said, if you're building an observability solution for a single company, I'd totally agree to start with single node PG with backup, and only consider other solution when data and query workload grow.
jcgrillo
In 2025 I'd consider starting with clickhouse instead, if you're going the DIY route
Jolter
Not even limited to general purpose ones, there are existing tailor made databases for observability. Maybe somewhere on that page, they explain why this one is better.
EVa5I7bHFq9mnYK
57 of which written by DPRK Koding Forces, waiting for the right moment to push a glorious update, striking at the heart of The Biggest Enemy.
robertclaus
I'm having a real crisis trying to decide whether this system should be called a database or not. It's a system for managing data, so obviously it is.. but by that loose interpretation any CRUD webserver would count too.
“ With a team of three experienced developers, we have implemented ScopeDB from scratch”
“ with 100 direct dependencies and 647 dependencies in total”
Next up: watch me build numpy from scratch with only 150 dependencies, one of which numpy.