SQLite-on-the-Server Is Misunderstood: Better at Hyper-Scale Than Micro-Scale
74 comments
·March 3, 2025lbutler
0cf8612b2e1e
Maybe I am misunderstanding which part you want in the cloud, but that sounds like litestream. Let’s you transparently backup a live SQLite database to a remote destination.
gwking
I depend on litestream for production backups and as the months wear on without any releases I am getting more nervous. To be clear, I don’t feel entitled to anything with an open source project like this, but bug reports and fixes seem to be accumulating. I have flirted with the idea of building from main.
I’ve also flirted with the idea of forking litestream and stripping it down dramatically. The reason why is that I don’t like the idea of the production server being in charge of rotation and deletion. It seems like the thing getting backed up shouldn’t have the privilege of deleting backups in case it gets compromised. I might even go so far as to propose that the “even liter stream” process merely writes to a different local volume and then some other process does the uploading but I haven’t gotten beyond the daydream stage.
adamtaylor_13
Yeah, I was about to suggest litestream. Isn't it local-first-with-cloud-backups?
Daril
Have your tried CR-SQLite ? https://vlcn.io/docs/cr-sqlite/intro
It implements CRDT as SQLite extension.
rogerbinns
SQLite has a session extension that can record changes on a local database into a changeset and you can replay those changes on another SQLite instance. Note that it replays what the changes were, not the queries that resulted in the changes. When applying changes you provide a conflict handler. (You can also invert changesets making a handy undo/redo feature.)
You can save conflicts to another changeset. There is also a rebaser to help deal with multiple way syncing.
https://www.sqlite.org/sessionintro.html - overview
https://www.sqlite.org/session/sqlite3changeset_apply.html - conflict information
jimbokun
Isn't the simplest way to "sync" to just replace the remote database file with the local database file? One of the nice things about each database being encapsulated as a single file.
timewizard
Enabling WAL mode means you don't have a single file anymore.
larkost
I too think that CRDT databases are probably something you should explore. You generally have the whole database locally, and changes get synced pretty easily (but you have to live within the rules of your CRDT).
The one I thought of (mostly because I worked there before they went under/bought by MongoDB) is RealmDB: https://en.wikipedia.org/wiki/Realm_(database)
I have long since lost touch with the state of it, but at the time the syncing to their server was fast and had worked with a long list of environments/languages.
The one thing I will caution: their model was that you almost had to have a database-per-customer. You could have a second one that contained common information, but they had no concept of only syncing part of a database based on some logic. So many customer implications had the clients syncing multiple databases, and then a back-end client that would aggregate the data from all of those databases into one for backend processes. Extra complexity that I always thought was a real killer.
roncesvalles
Offline-first databases are a hard problem because there isn't just one copy of the database on the user's side, there are N copies - every browser tab or device on which the user can open the local database and make an edit. It's basically an AP multi-master database (= the same row can be edited at different nodes at the same time), and you likely cannot achieve good results without a database that natively supports multi-master operations.
9dev
That’s not necessarily true; if you use Origin Private Filesystem along with a Web Worker that acts as a local database server and works off a single SQLite database, you at least have a single DB file per device. From there on, your problem becomes state reconciliation on the server, which CRDTs should help solving.
Not an easy problem for sure, but the web platform is surprisingly capable these days.
ammo1662
A simple, manual backup would be fine I think. You can just put an "upload" or "backup to cloud" button to allow user push a full version with timestamp to S3.
Synchronization may introduce a lot more problems, especially when you want to automatically sync the database to some other place. You will need to deal with sync errors, inconsistency, version conflicts, rollbacks...
If your users could accept that, a simple full version backup is the best solution.
galaxyLogic
How about: Have 1 + N separate SQLite database-files.
Each user would have their own database-file which contains only information about that user. Then 1 shared database-file which contains info needed for all users.
Users would update their own data, which is a small database file which can be easily uploaded. They would not need to update the shared data.
Not knowing your app I don't know what the shared data would contain, presumably something. Perhaps the shared data-file would be updated on the server based on what individual user-data the users upload.
anovick
In this multi-tenant arrangement, you run into synchronization problems.
Developers should expect users to connect to the service using multiple devices (clients).
AFAIK bare SQLite doesn't offer synchronization mechanisms between multiple SQLite instances.
I believe Turso offers some solution of this kind, but not sure if that's open source or not.
galaxyLogic
> expect users to connect to the service using multiple devices (clients).
But probably using only one device at a time by a single user?
My thought, and it is just a thought, here is that instead of trying to provide a GENERAL solution for all kinds of data-update patterns, it is often possible to think in terms of what my current application specifically needs. It is easier to come up with such a solution with SQLite per app because SQLite is so "lite".
I can't speak for the "general solution" except to say that many times you don't need an all-encompassing general solution, just a solution for your current app.
bob1029
> Benefits of SQLite-Per-Partition
The obvious caveat here is any situation where you need global tables. I've never worked on a product where 100% of the schema could be partitioned like this. I've done a ton of SQLite-per-<entity type>, but there's always been a singleton SQLite database above this tracking the metadata.
Looking up a user session via token is not an analytics task. Neither is checking if an email address is already registered. These are very pedestrian queries that demand global visibility, ideally with hard transactional guarantees.
hinkley
Global tables with anything but a tiny rate of write operations is going to bring Amdahl's Law in to ruin several years of your life while you fight against it.
I had a service that was adding 30 ms to TTFB because most of the other decisions we made during page render were predicated on this response. I would later find about a quarter of this time was in bookkeeping code that got out of hand, but that's a story for another day.
The biggest problem was that this data requires a capital-P Process to change, so it took about 20 minutes to change it and fifteen seconds to roll it back. There had been plans to have the service do more but in the end it looked more like a remote request for a feature toggle. We were already broadcasting feature toggle changes through Consul, which is great for this kind of data. So I did more of the same and got to decom a cluster.
Moral of the story is, it matters what kind of global data you require to build your system, and that's the sort of fact you should keep in mind while designing the system because you can pick a design that will scale or one that absolutely won't, because you've given the users features that make the entire system 10x more expensive per request.
kgeist
>I've never worked on a product where 100% of the schema could be partitioned like this
SaaS for B2B works well with this, you partition by organization. Each organization has their own DB. Most organizations are typically small (no more than 100 users), and it greatly simplifies things. However, it's still problematic with large organizations (200k-300k is where it starts to fall apart).
skrebbel
But then any sort of management information, eg "find out how many active customers we have" or "which customer has the most activity" becomes a query over all the DBs. Similarly, admin tools for customer support etc easily become very hard to build. I'd love to know if anybody has a solution for this because otherwise DB-per-tenant sounds very nice indeed.
kgeist
>But then any sort of management information, eg "find out how many active customers we have" or "which customer has the most activity" becomes a query over all the DBs
Create a separate DWH storage where all necessary data from all databases is pulled and processed/aggregated in the background (via change data capture) and saved in a format that is more suitable for analytical queries. The downside is that it requires additional preparation and some upfront design.
>Similarly, admin tools for customer support etc easily become very hard to build.
Our CRM understands the concept of organizations/tenants, so when a customer files a ticket, we know their organization, and from there, for a tech support person, accessing "coolcompany.mysaas.com" is no different from accessing "mysaas.com". In my opinion, it makes tech support easier, because you have far less data/fewer logs to analyze/sift through (only a specific organization's DB) when investigating a problem.
NathanFlurry
Yep.
Nile (https://www.thenile.dev/) is trying to address this use case with a fully isolated PG databases. Though, I don't know how they handle scaling/sharding.
apitman
Agree with your overall point, but for user session specifically, once a user has entered an identifier can't you use their specific DB for sessions? You also have things like JWTs/PASETO that can carry the DB mapping in the token.
NathanFlurry
> The obvious caveat here is any situation where you need global tables
A lot of people still end up storing data that's not frequently updated in a traditional OLTP database like Postgres.
However:
I think it always helps to think about these problems as "how would you do it in Cassandra/DynamoDB?"
In the case of Cassandra/DynamoDB, the relevant data (e.g. user ID, channel ID, etc) is always in the partitioning key.
For Durable Objects, you can do the same thing by building a key that's something like:
``` // for a simple keys: env.USER_DO.idFromName(userId);
// or for composite keys: env.DIRECT_MESSAGE_CHANNEL_DO.idFromName(`${userAId}:${userBId}`); // assumes user A and B are sorted ```
I've spoken with a lot of companies using _only_ this architecture for Durable Objects and it's working well.
manmal
Is it even realistic to depend on transactional guarantees, with hundreds of services hammering the DB(s) more or less concurrently? Don’t they need to coordinate themselves outside of transactions?
bob1029
> Is it even realistic to depend on transactional guarantees, with hundreds of services hammering the DB(s) more or less concurrently?
If you have hundreds of services hammering the datastore concurrently, then the ability to offer transactional guarantees seems absolutely mandatory for me.
> Don’t they need to coordinate themselves outside of transactions?
I think we're back at the beginning of our journey here.
manmal
Do you mean via distributed transactions? I‘m clueless in that area, and don’t understand how hundreds of self sufficient services would partake in transactions that span different sets of those services, without entangling them badly.
NathanFlurry
> Is it even realistic to depend on transactional guarantees, with hundreds of services hammering the DB(s) more or less concurrently?
If a single request frequently touches multiple partitions, your use cases may not work well.
It's the same deal as Cassandra & DynamoDB: use cases like chat threads or social feeds fit really well because there's a clear ownership hierarchy. e.g. message belongs to a single thread partition, or a social post belongs to a feed partition.
stepanhruda
You could have another sqlite with this global information related to users / sessions / passwords etc
renegat0x0
Might be a little bit off topic. I created a web page, with data. I didn't want to host VPS, be charged for traffic. I do not want also to play with cloudflare and self-hosting.
My solution? The app is a webpage, which reads SQLite. If user wants to use app, the database is downloaded, unpacked, and used on users device.
Links:
- https://github.com/rumca-js/Internet-Places-Database - search.html provides a preview for my database file (but code supports also reading zip file).
- https://rumca-js.github.io/search?file=top&page=1&search=neo... - uses JSON files stored in a zip file. Will be replaced soon with zip + sqlite file
- https://rumca-js.github.io/search?file=music&view_display_ty... - example showing my fav music. As above uses JSON files, in a zip file
billywhizz
this is nice. i like the idea which has been tried in a few places of running sqlite in the browser directly/locally. the only thing that is really missing to make this work at a bigger scale for read-heavy databases is a very cheap or free static hosting service which does range requests, allows you control of CORS and doesn't have the file size limitations of gist or github pages. maybe this exists already? S3 would do i guess?
you can do kinda magic things like this and build websites that connect to multiple different databases around the web and... well, i'll leave the rest up to your imagination.
go here: https://just.billywhizz.io/sqlite/squeel/
hit CTRL/CMD + Q on your keyboard.
paste in this sql
``` attach database 'https://raw.githubusercontent.com/just-js/just.billywhizz.io...' as chinook ;
select * from albums ; ```
and hit CTRL/CMD + g to run the queries.
IshKebab
I mean if you only have a few thousand records you barely need a database at all.
akavi
Hmm, this article is a little confusing. I'm not familiar with Vitess or Citus, but am familiar with "manually" sharded Postgres/Mysql, and I'm not sure I understand if there's any "interaction effects" of the decision to shard or not and the decision between MySQL/Postgres and Sqlite.
Like, the article's three sections are:
1. The challenges of sharding
2. The benefits of these new sharded Sqlite solutions over conventional Sqlite
3. A list conflating the benefits of SQL databases generally with the benefits of Sqlite
None of which answer the question of "Why should I use sharded Sqlite instead of, say, sharded Postgres, for hyperscale?".
NathanFlurry
Author here.
Agreed — I think adding some comparisons to other database partitioning strategies would be helpful.
My 2 cents, specifically about manually partitioning Postgres/MySQL (rather than using something like Citus or Vitess):
SQLite-on-the-server works similarly to Cassandra/DynamoDB in how it partitions data. The number of partitions is decoupled from the number of databases you're running, since data is automatically rebalanced for you. If you're curious, Dagster has a good post on data rebalancing: https://dagster.io/glossary/data-rebalancing.
With manual partitioning, compared to automatic partitioning, you end up writing a lot of extra complex logic for:
- Determining which database each piece of data lives on (as opposed to using partitioning keys which do that automatically)
- Manually rebalancing data, which is often difficult and error-prone
- Adding partitions manually as the system grows
- (Anecdotally) Higher operational costs, since matching node count to workload is tricky
Manual partitioning can work fine for companies like Notion, where teams are already invested in Postgres and its tooling. But overall, I think it introduces more long-term problems than using a more naturally partitioned system.
To be clear: OLTP databases are great — you don’t always need to reach for Cassandra, DynamoDB, or SQLite-on-the-server depending on your workload. But I do think SQLite-on-the-server offers a really compelling blend of the developer experience of Postgres with the scalability of Cassandra.
koakuma-chan
If you think this is a good fit for your case, you should embed SQLite in your application and shard your application. An embedded SQLite is faster and uses less memory than a PostgreSQL running as a separate process and possibly on a different machine.
apitman
This tidbit caught my eye:
> Apple runs the world's largest known number of Cassandra/ScyllaDB instances with roughly 300,000 nodes as of 2022
The Big Tech approach to web services is like everyone piling on a cruise ship to commute to work. Certainly brings some conveniences, but the complexity is insane and overall it feels absurd.
The relatively new DB-per-tenant type approaches described in this article are more like renting/leasing a car, or maybe riding the bus to work. Definitely a step in the right direction.
The future will be a mix of these solutions, and selfhosting/indiehosting. Indiehosting is like owning a car. Once the software is as reliable as cars are, many more people will do it.
koakuma-chan
TDLib (Telegram Database library)[0] uses SQLite.
In Telegram Bot API, each TDLib instance handles more than 24,000 active bots simultaneously.[1]
byoung2
Looks interesting. How would you approach write and multiple read replicas with this setup? My holy grail would be allowing users to get data from read replica, but also writing to the read replica which then forwards those writes to the primary write DB
NathanFlurry
I think the other comments have the application-level approaches covered.
However, I suspect the infrastructure will provide this natively as it matures:
- Cloudflare will probably eventually add read replicas for Durable Objects. They're already rolling it out for D1 (their other SQLite database offering). [1]
- Turso has their own story for read replicas. [2]
[1] https://blog.cloudflare.com/building-d1-a-global-database/#s... [2] https://docs.turso.tech/features/embedded-replicas/introduct...
hinkley
One of the classic solutions to this was to put the data behind a cluster and route by verb. All the GET requests go to the replicas and everything else goes to the writeable instance.
datadrivenangel
Suffers from dirty reads: Write to the primary, get confirmation of write-success, reload page, user sees 'stale' data from the read-replica and thinks you lost their data.
yurishimo
A temporary header that forwards the next X requests to the main write database is a possible solution. Requires a bit more ooomfph to handle the traffic and probably isn’t good for a write heavy app, but for a generally read heavy CRUD app, it’s a decent compromise.
slt2021
there is no need in read replica, because every micro sqlite partition has very low workload (single digit queries per second)
mickeyben
> No built-in cross-database querying, making complex analytics difficult without a dedicated data lake
I've looked at Turso before and this is exactly what came to mind. I do see some use cases where it could work, but for most real-world applications, this is an absolute red flag that shouldn't be overlooked.
NathanFlurry
Author here, happy to answer questions!
grounder
In your chat channel example, you have a table for messages, and a table for participants. How do you join the participants in this chat channel database with whichever database the participants are actually defined in, so the application would be able to show participant details (name, avatar, etc.)?
robertlagrant
I love the idea of Durable Objects, and have thoughts about it in the health space (although having a single provider of them is probably a bad thing) but cross-cutting questions such as this seem to be quite difficult to achieve.
From the chat logs example in your article: how do you cope with a requirement such as "I as a user want to see all my latest thread activity in one place, across all my chat rooms?"
jmull
That's an easy case: write to two places (one for the chat and one for the user's activity).
In this case I think you can let them become inconsistent in the face of, e.g., write errors.
NathanFlurry
This.
> In this case I think you can let them become inconsistent in the face of, e.g., write errors.
For devs using CF Durable Objects, people frequently use CF Queues or CF Workflows to ensure that everything is eventually consistent without significant overhead.
It's a similar pattern to what large cos already do at scale with keeping data up to date across multiple partitions with Cassandra/DynamoDB.
korkybuchek
> From the chat logs example in your article: how do you cope with a requirement such as "I as a user want to see all my latest thread activity in one place, across all my chat rooms?"
create a copy, for example
abdullin
Interesting read, thank you!
Do you use any special tools to manage all these separate databases, track performance and debug problems?
NathanFlurry
The folks over at StarbaseDB (https://starbasedb.com/) are working on building tools for shareded SQLite.
From the companies I've talked to, most developers using this architecture are building quick scripts to do this in-house. Both Turso and Durable Objects SQLite already a surprising amount of usage that people don't talk about much publicly yet, so I suspect some of this tooling will start to be published in the next year.
neilpa
Randomly noticed your post is dated in the future - December 16, 2025
EGreg
Lately I've seen posts about DuckDB, which looks really cool, but Sqlite seems to be compileable with WASM so it can be used in some kind of container. How do the two compare?
necubi
DuckDB is an OLAP (analytical) query engine, sqlite is an OLTP (transactional) database. Modern OLAP engines store and represent data in columnar formats, which makes them very fast at queries that touch many rows (particularly if only a few columns are needed). Queries like "sum all sales for the past month by store."
But they're slow (or incapable) of doing inserts, updates, and deletes, because the columnar formats are typically immutable. They're also relatively slow at operations that need to look at all of the data for a particular row.
OLTP databases are much better for use cases where you're frequently inserting, updating, and accessing individual rows, as for the database backing a web application.
A common pattern is to use an OLTP database (like postgres) to back your application, then replicate the data to an OLAP store like Clickhouse or a data lake to run analytical queries that would overwhelm postgres.
NathanFlurry
DuckDB crushes SQLite in heavy data workloads according to ClickBench by 915x. (Link below since it's looong.)
DuckDB also has a WASM target: https://duckdb.org/docs/stable/clients/wasm/overview.html
I don't know enough about DuckDB to understand the tradeoffs it made compared to SQLite to achieve this performance.
https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQi...
IshKebab
As I understand it DuckDB stores columns separately (column major), whereas SQLite stores rows separately (row major). DuckDB is like structure or arrays and SQLite is like array of structs.
So which is faster depends on your access pattern. There are dumb stupid terrible names for "access all of one row" (OLTP) and "access all of one column" (OLAP) type access patterns.
crazygringo
"By 915x" doesn't seem remotely plausible.
Maybe there's some edge case they've found where SQLite is badly optimized and DuckDB is totally optimized, but that's absolutely not the general case.
Databases are primarily limited by disk/IO speed. Yes there are plenty of optimizations but they result in 10% improvements, not 915x.
ianburrell
DuckDB is an in-memory columnar OLAP database. It is going to be much faster at analytics queries than disk-based OLTP database. It is optimized for fast queries but can't write or handle large data.
EGreg
Why still use SQLite then?
But how does WASM DuckDB store files in IndexedDB? Any info on that?
gwking
I believe the locking models are different making DuckDB less suitable for concurrent read/write but you will have to look up the specifics. As always, for a server environment SQLite should be set to WAL mode and comparisons should be made against that rather than the much older, less concurrent default.
As I recall duckdb’s concurrency model did not sound viable for a web server but I may be behind the times or outright wrong.
jacktheturtle
SQLite is a very good technology.
throwaway984393
[dead]
I’m building a local-first web app, and SQLite works well for my case since a single project can be contained in one database file, just like users are used to with existing desktop applications.
What I’d really like is an easy way to sync the SQLite database state to a cloud service. Most existing options expect you to query against a remotely hosted database and charge per read/write.
Since the database will have around 100,000 rows and you're typically working with all the data at once, streaming parts of it doesn’t make sense for my use case.
The closest I’ve found is Turso, which has offline writes in private beta, and SQLite Cloud, which lists local-first and offline sync as "coming soon."
The simplest approach might be letting users push to S3 storage with versioning. Ideally, it would also support point-in-time restores, tracking incremental updates alongside full snapshots.
Even better, I’d manage minimal server-side infrastructure and just pull the SQLite database from a service that handles syncing and management.