Skip to content(if available)orjump to list(if available)

SQLite-on-the-server is misunderstood: Better at hyper-scale than micro-scale

rstuart4133

Just a single data point but... I am looking at replacing a custom database with SQL. The application will only ever run on one server, so Sqlite3 was a contender. The database is very read mostly, which makes it a strong contender.

The custom database is extraordinary fast at some things, so it's a complex decision. I needed benchmarks.

I chose something that mirrors how data is accessed in the custom database. The existing database does have several tables, but it's hard to use so most accesses are single table, indexed by rowid, with maybe 100,000,000 rows. So I benchmarked a single table, indexed on primary key only, 100,000,000 small rows. Since it was easy because I could use the same SQL I benchmarked both Sqlite3 and Postgresql.

I expected Sqlite3 to beat Postgresql on single row access / updates, and Postgresql3 to get the upper hand on multi row access / updates. I was surprised to see Sqlite3 was about twice as fast as Postgresql on everything, including inserting the 100,000,000 records. It was always single writer. Had I tested multiple writers I expect I would have seen really flex its muscles.

In case your wondering, the custom database is 100 to 1,000 times faster than Sqlite3 at accessing a single record. Getting that performance was achieved with some big tradeoffs, which make it so hard to code for it creates reliability problems.

wewewedxfgdf

Databases are configured for different priorities out of the box.

For example when you insert a row that result in a write to the WAL. That's a data safety measure sqlite unlikely to be taking.

You can configure Postgres to perform better with different tradeoffs - here's some thing to investigate:

  - synchronous_commit=off 
  - wal_level=minimal 
  - wal_writer_delay=200ms 
  - checkpoint_completion_target=0.9 
  - autovacuum=off 
  - bgwriter_delay=10000ms 
  - shared_buffers=512MB 
  - work_mem=64MB 
  - UNLOGGED tables
  - run from RAM Disk (/dev/shm)

toxik

SQLite has had a WAL (write-ahead log) since 2010. Admittedly, it is not the default. [1]

[1] https://www.sqlite.org/wal.html

Sammi

* Not the default in the original c library ... except it is the default in language wrappers like better-sqlite3 for nodejs.

It's probably unlikely that you're consuming the upstream sqlite c lib directly with its intentionally antiquated defaults. You're much more likely using a provider that has tweaked the defaults to more modern standards.

MrLeap

In postgres, Is there a way to create a table that stores in ramdisk, but allows you to trigger writing to disk yourself?

I have a usecase where our application has events where it would be a good time to trigger disk IO. The changes that occur between those app level checkpoints can totally be ephemeral.

sa46

Maybe something like (untested):

- Create an unlogged table named events_active.

- At the app_checkpoint time, start a transaction:

  1. Rename events_active to events_pending_write.

  2. Create a new unlogged table, events_active, to handle writes for the next app_checkpoint.
- In a new transaction:

  1. Set events_pending_write to LOGGED. This writes the entire table into the WAL. [1]

  2. Rename to events_pending_write to events_part_20250304

  3. Attach events_part_20250304 to a partitioned events table.
For stronger consistency, combine the two transactions, but you'll block incoming writes until the transaction completes.

Unlogged tables don't guarantee Postgres uses RAM for the table. If you need a stronger guarantee, mount a directory on a ramdisk and create a table space to hold the events_active table. Then, to promote to a logged table, use a create-table-as statement to copy the table into a new partition.

[1]: https://dba.stackexchange.com/a/195829

null

[deleted]

remram

Is that SQLite with default settings? There are a few tunables for SQLite like mmap_size and page_size which help in most situations: https://sqlite.org/mmap.html (as well as journal_mode=wal)

rstuart4133

Postgresql was a default install. The only sqlite3 customisation was: "PRAGMA main.journal_mode = WAL". In particular I suspect people are asking whether I used exclusive mode or unsafe writes for Sqlite3. The answer is no.

Apart from that, everything was done in "reasonable sized transactions", with the same size being used for both Postgresql and Sqlite3. Necessarily so, because I ran the same SQL statements over both. For single row fetches or updates the transaction was a single row. When I was testing how fast I could update a batch of 10 rows, all 10 were done in a single transaction. (Aside: the transaction overhead far outweighed the cost of one statement. Updating 100 rows only took twice the time of updating 1.)

When creating the database of 100,000,000 rows, they were inserted in transactions of 10,000 rows. I was emulating how our system inserts records into this 100,000,000 row table. It uses an almost ascending primary key, which makes life easier for the underlying b-tree. I've found emulating that sort of thing is important for realistic benchmarks. That's because on this particular table the primary key is (current-unix-time, 32-bit-random-int). current-unix-time was monotonically increasing, but in any given second some 100 randomly ordered 32-bit-random-int's were used. From memory Sqlite3 inserted 100k records per second, Postgresql some number less than that. But don't pay much attention to the absolute numbers, as it was on spinning rust while the real system was doing other things (again because the real system had the same constraints). Only my Macbook with NVME it was several times faster in absolute terms, but Sqlite3 and Postgresql maintained their relative positions.

Quiark

How much of it was because you didn't go through TCP/IP with sqlite?

RestartKernel

Are you able to elaborate on your custom database? Not that I'm in the position to offer any advice, but you got me curious about it.

zhwu

This recent blog actually looks into the case with multiple writers and the distribution for the time for a writer to take the lock: https://blog.skypilot.co/abusing-sqlite-to-handle-concurrenc...

simlevesque

Did you try with duckdb ?

rstuart4133

Not yet. It's on my list. Mostly out of curiosity, because it isn't an OLAP application. While it is read mostly there are single row updates, just not enough to make multiple writers a frequent event.

Salgat

Is something like redis with aof enabled an option? Assuming the rows aren't holding too much data.

bob1029

> Benefits of SQLite-Per-Partition

The obvious caveat here is any situation where you need global tables. I've never worked on a product where 100% of the schema could be partitioned like this. I've done a ton of SQLite-per-<entity type>, but there's always been a singleton SQLite database above this tracking the metadata.

Looking up a user session via token is not an analytics task. Neither is checking if an email address is already registered. These are very pedestrian queries that demand global visibility, ideally with hard transactional guarantees.

kgeist

>I've never worked on a product where 100% of the schema could be partitioned like this

SaaS for B2B works well with this, you partition by organization. Each organization has their own DB. Most organizations are typically small (no more than 100 users), and it greatly simplifies things. However, it's still problematic with large organizations (200k-300k is where it starts to fall apart).

skrebbel

But then any sort of management information, eg "find out how many active customers we have" or "which customer has the most activity" becomes a query over all the DBs. Similarly, admin tools for customer support etc easily become very hard to build. I'd love to know if anybody has a solution for this because otherwise DB-per-tenant sounds very nice indeed.

kgeist

>But then any sort of management information, eg "find out how many active customers we have" or "which customer has the most activity" becomes a query over all the DBs

Create a separate DWH storage where all necessary data from all databases is pulled and processed/aggregated in the background (via change data capture) and saved in a format that is more suitable for analytical queries. The downside is that it requires additional preparation and some upfront design.

>Similarly, admin tools for customer support etc easily become very hard to build.

Our CRM understands the concept of organizations/tenants, so when a customer files a ticket, we know their organization, and from there, for a tech support person, accessing "coolcompany.mysaas.com" is no different from accessing "mysaas.com". In my opinion, it makes tech support easier, because you have far less data/fewer logs to analyze/sift through (only a specific organization's DB) when investigating a problem.

superq

OLAP questions are usually out-of-band and preferably by a tool designed for it (like Clickhouse). Scanning all DB's is something that can be done in the background for most of these use cases.

osigurdson

I don't think we currently have any solutions that allow you get the answer to any question with optimal performance. You could instead push the needed metrics to a stream during your normal processing and aggregate it. This of course duplicates the data as the same information is already available in the tenant databases but is a very reasonable trade-off.

NathanFlurry

Yep.

Nile (https://www.thenile.dev/) is trying to address this use case with a fully isolated PG databases. Though, I don't know how they handle scaling/sharding.

hinkley

Global tables with anything but a tiny rate of write operations is going to bring Amdahl's Law in to ruin several years of your life while you fight against it.

I had a service that was adding 30 ms to TTFB because most of the other decisions we made during page render were predicated on this response. I would later find about a quarter of this time was in bookkeeping code that got out of hand, but that's a story for another day.

The biggest problem was that this data requires a capital-P Process to change, so it took about 20 minutes to change it and fifteen seconds to roll it back. There had been plans to have the service do more but in the end it looked more like a remote request for a feature toggle. We were already broadcasting feature toggle changes through Consul, which is great for this kind of data. So I did more of the same and got to decom a cluster.

Moral of the story is, it matters what kind of global data you require to build your system, and that's the sort of fact you should keep in mind while designing the system because you can pick a design that will scale or one that absolutely won't, because you've given the users features that make the entire system 10x more expensive per request.

jitl

Put your global tables in Postgres/CockroachDB/Vitess/etc. You don't need to force everything to be a SQLite table. Use the right tool for the job. SQLite can be the right tool, and it can not be the right tool.

superq

Agreed, or even just start with sqlite for your globals and then scale to those later.

apitman

Agree with your overall point, but for user session specifically, once a user has entered an identifier can't you use their specific DB for sessions? You also have things like JWTs/PASETO that can carry the DB mapping in the token.

NathanFlurry

> The obvious caveat here is any situation where you need global tables

A lot of people still end up storing data that's not frequently updated in a traditional OLTP database like Postgres.

However:

I think it always helps to think about these problems as "how would you do it in Cassandra/DynamoDB?"

In the case of Cassandra/DynamoDB, the relevant data (e.g. user ID, channel ID, etc) is always in the partitioning key.

For Durable Objects, you can do the same thing by building a key that's something like:

``` // for a simple keys: env.USER_DO.idFromName(userId);

// or for composite keys: env.DIRECT_MESSAGE_CHANNEL_DO.idFromName(`${userAId}:${userBId}`); // assumes user A and B are sorted ```

I've spoken with a lot of companies using _only_ this architecture for Durable Objects and it's working well.

manmal

Is it even realistic to depend on transactional guarantees, with hundreds of services hammering the DB(s) more or less concurrently? Don’t they need to coordinate themselves outside of transactions?

bob1029

> Is it even realistic to depend on transactional guarantees, with hundreds of services hammering the DB(s) more or less concurrently?

If you have hundreds of services hammering the datastore concurrently, then the ability to offer transactional guarantees seems absolutely mandatory for me.

> Don’t they need to coordinate themselves outside of transactions?

I think we're back at the beginning of our journey here.

manmal

Do you mean via distributed transactions? I‘m clueless in that area, and don’t understand how hundreds of self sufficient services would partake in transactions that span different sets of those services, without entangling them badly.

NathanFlurry

> Is it even realistic to depend on transactional guarantees, with hundreds of services hammering the DB(s) more or less concurrently?

If a single request frequently touches multiple partitions, your use cases may not work well.

It's the same deal as Cassandra & DynamoDB: use cases like chat threads or social feeds fit really well because there's a clear ownership hierarchy. e.g. message belongs to a single thread partition, or a social post belongs to a feed partition.

slt2021

would redis/in memory DB be a better suited for this task? I don't think a regular rdbms will meet this requirement at scale

bob1029

A regular RDBMS is insufficient for managing user accounts and sessions?

What kind of scenario do you have in mind?

stepanhruda

You could have another sqlite with this global information related to users / sessions / passwords etc

lbutler

I’m building a local-first web app, and SQLite works well for my case since a single project can be contained in one database file, just like users are used to with existing desktop applications.

What I’d really like is an easy way to sync the SQLite database state to a cloud service. Most existing options expect you to query against a remotely hosted database and charge per read/write.

Since the database will have around 100,000 rows and you're typically working with all the data at once, streaming parts of it doesn’t make sense for my use case.

The closest I’ve found is Turso, which has offline writes in private beta, and SQLite Cloud, which lists local-first and offline sync as "coming soon."

The simplest approach might be letting users push to S3 storage with versioning. Ideally, it would also support point-in-time restores, tracking incremental updates alongside full snapshots.

Even better, I’d manage minimal server-side infrastructure and just pull the SQLite database from a service that handles syncing and management.

rogerbinns

SQLite has a session extension that can record changes on a local database into a changeset and you can replay those changes on another SQLite instance. Note that it replays what the changes were, not the queries that resulted in the changes. When applying changes you provide a conflict handler. (You can also invert changesets making a handy undo/redo feature.)

You can save conflicts to another changeset. There is also a rebaser to help deal with multiple way syncing.

https://www.sqlite.org/sessionintro.html - overview

https://www.sqlite.org/session/sqlite3changeset_apply.html - conflict information

https://www.sqlite.org/session/rebaser.html - rebaser

chii

there's also a CRDT version of this, which allows two databases to be sync'ed to each other in real time (aka, updates to one will eventually make it to the other, and both database would eventually contain the same data).

It's https://vlcn.io/docs/cr-sqlite/intro , and i find it amazing that this is doable in sqlite. It is perfect for small scale collaboration imho, but it also works to sync across local client and remote server (for a single db per user scenario).

hitekker

Interesting link, it'd be great if their solution meets expectations.

Right now, the proof-of-concept they've provided seems simplistic. Their progress seems to have shifted from cr-sqlite into "Zero" instead. I'm guessing it has something to do with CRDTs being quite app-specific and hard to generalize.

I would want to see this library used in production first before hyping it

stronglikedan

> in real time (aka, updates to one will eventually make it to the other

The term you're looking for is "eventual consistency".

roncesvalles

This is just clobbering one of the divergent copies with per-field granularity.

0cf8612b2e1e

Maybe I am misunderstanding which part you want in the cloud, but that sounds like litestream. Let’s you transparently backup a live SQLite database to a remote destination.

https://litestream.io/

gwking

I depend on litestream for production backups and as the months wear on without any releases I am getting more nervous. To be clear, I don’t feel entitled to anything with an open source project like this, but bug reports and fixes seem to be accumulating. I have flirted with the idea of building from main.

I’ve also flirted with the idea of forking litestream and stripping it down dramatically. The reason why is that I don’t like the idea of the production server being in charge of rotation and deletion. It seems like the thing getting backed up shouldn’t have the privilege of deleting backups in case it gets compromised. I might even go so far as to propose that the “even liter stream” process merely writes to a different local volume and then some other process does the uploading but I haven’t gotten beyond the daydream stage.

normie3000

Having run litestream in prod for 2+ years, I share all of these concerns.

> It seems like the thing getting backed up shouldn’t have the privilege of deleting backups in case it gets compromised.

For backups, I added a nightly cron job which exports my SQLite db to a write-only S3 bucket.

chubot

What kind of bugs have you experienced or are you worried about? Backup software shouldn’t need to be frequently updated

edoceo

And comine with the session features that @rogerbinns mentioned. Feels resilient.

adamtaylor_13

Yeah, I was about to suggest litestream. Isn't it local-first-with-cloud-backups?

conradev

SQLite has the ability to do exactly this, minus the versioning: https://sqlite.org/cloudsqlite/doc/trunk/www/index.wiki

Implementing snapshot backups on top of that would be as simple as a VACUUM and S3 PUT

For point-in-time restores, mvsqlite is a cool solution: https://github.com/losfair/mvsqlite

Daril

Have your tried CR-SQLite ? https://vlcn.io/docs/cr-sqlite/intro

It implements CRDT as SQLite extension.

roncesvalles

Offline-first databases are a hard problem because there isn't just one copy of the database on the user's side, there are N copies - every browser tab or device on which the user can open the local database and make an edit. It's basically an AP multi-master database (= the same row can be edited at different nodes at the same time), and you likely cannot achieve good results without a database that natively supports multi-master operations.

9dev

That’s not necessarily true; if you use Origin Private Filesystem along with a Web Worker that acts as a local database server and works off a single SQLite database, you at least have a single DB file per device. From there on, your problem becomes state reconciliation on the server, which CRDTs should help solving.

Not an easy problem for sure, but the web platform is surprisingly capable these days.

roncesvalles

CRDTs are so-so and likely cause issues with maintaining relational DBs' transactional consistency. There's a reason none of the NewSQL databases (to my knowledge) are multi-leader.

null

[deleted]

larkost

I too think that CRDT databases are probably something you should explore. You generally have the whole database locally, and changes get synced pretty easily (but you have to live within the rules of your CRDT).

The one I thought of (mostly because I worked there before they went under/bought by MongoDB) is RealmDB: https://en.wikipedia.org/wiki/Realm_(database)

I have long since lost touch with the state of it, but at the time the syncing to their server was fast and had worked with a long list of environments/languages.

The one thing I will caution: their model was that you almost had to have a database-per-customer. You could have a second one that contained common information, but they had no concept of only syncing part of a database based on some logic. So many customer implications had the clients syncing multiple databases, and then a back-end client that would aggregate the data from all of those databases into one for backend processes. Extra complexity that I always thought was a real killer.

jimbokun

Isn't the simplest way to "sync" to just replace the remote database file with the local database file? One of the nice things about each database being encapsulated as a single file.

timewizard

Enabling WAL mode means you don't have a single file anymore.

catgirlinspace

You could do a checkpoint first though I believe? And if the database is only being updated on your local client I don’t think WAL mode would have much benefit since it’s probably not getting many concurrent writes.

pstuart

More work than grabbing a single file but still easy enough to use: https://www.sqlite.org/backup.html

Cthulhu_

> What I’d really like is an easy way to sync the SQLite database state to a cloud service.

Don't do this, but an application I used to work on (to replace it) copied the sqlite file to a folder then used rsync to sync it with a backup node. Apparently it worked and was good enough for that use case (inefficient php backend application with at most a dozen concurrent users).

100.000 rows is only a few megabytes at most, right? Should be fine.

superq

> Don't do this

What's wrong with that? Of course it will work fine; SQLite, with or without WAL, has a ton of protections against corruption from writes-in-progress, which is what makes hot backups work.

runako

Notable that Ruby on Rails recently shipped its 8.0 release, which had as one of its pillars extending SQLite support to replace cache and job queue components, and to make it production-ready for common types of Web apps.

https://rubyonrails.org/2024/11/7/rails-8-no-paas-required

akavi

Hmm, this article is a little confusing. I'm not familiar with Vitess or Citus, but am familiar with "manually" sharded Postgres/Mysql, and I'm not sure I understand if there's any "interaction effects" of the decision to shard or not and the decision between MySQL/Postgres and Sqlite.

Like, the article's three sections are:

1. The challenges of sharding

2. The benefits of these new sharded Sqlite solutions over conventional Sqlite

3. A list conflating the benefits of SQL databases generally with the benefits of Sqlite

None of which answer the question of "Why should I use sharded Sqlite instead of, say, sharded Postgres, for hyperscale?".

NathanFlurry

Author here.

Agreed — I think adding some comparisons to other database partitioning strategies would be helpful.

My 2 cents, specifically about manually partitioning Postgres/MySQL (rather than using something like Citus or Vitess):

SQLite-on-the-server works similarly to Cassandra/DynamoDB in how it partitions data. The number of partitions is decoupled from the number of databases you're running, since data is automatically rebalanced for you. If you're curious, Dagster has a good post on data rebalancing: https://dagster.io/glossary/data-rebalancing.

With manual partitioning, compared to automatic partitioning, you end up writing a lot of extra complex logic for:

- Determining which database each piece of data lives on (as opposed to using partitioning keys which do that automatically)

- Manually rebalancing data, which is often difficult and error-prone

- Adding partitions manually as the system grows

- (Anecdotally) Higher operational costs, since matching node count to workload is tricky

Manual partitioning can work fine for companies like Notion, where teams are already invested in Postgres and its tooling. But overall, I think it introduces more long-term problems than using a more naturally partitioned system.

To be clear: OLTP databases are great — you don’t always need to reach for Cassandra, DynamoDB, or SQLite-on-the-server depending on your workload. But I do think SQLite-on-the-server offers a really compelling blend of the developer experience of Postgres with the scalability of Cassandra.

itake

> - Determining which database each piece of data lives on (as opposed to using partitioning keys which do that automatically)

Most sharding databases use consistent hashing.

> - Manually rebalancing data, which is often difficult and error-prone

not really. When you setup the database, you choose a highly divisible number of shards and then consistent hashing to spread the data across the shards. Each server hosts N-shards where N changes as your data (and server count) grows

> - Adding partitions manually as the system grows

Not really. Just choose a reasonably high number of shards and divide them across your servers.

> - (Anecdotally) Higher operational costs, since matching node count to workload is tricky

This could be true, but also, there is overhead to managing a ton of SQLite databases too. I think there are tradeoffs here.

koakuma-chan

If you think this is a good fit for your case, you should embed SQLite in your application and shard your application. An embedded SQLite is faster and uses less memory than a PostgreSQL running as a separate process and possibly on a different machine.

renegat0x0

Might be a little bit off topic. I created a web page, with data. I didn't want to host VPS, be charged for traffic. I do not want also to play with cloudflare and self-hosting.

My solution? The app is a webpage, which reads SQLite. If user wants to use app, the database is downloaded, unpacked, and used on users device.

Links:

- https://github.com/rumca-js/Internet-Places-Database - search.html provides a preview for my database file (but code supports also reading zip file).

- https://rumca-js.github.io/search?file=top&page=1&search=neo... - uses JSON files stored in a zip file. Will be replaced soon with zip + sqlite file

- https://rumca-js.github.io/search?file=music&view_display_ty... - example showing my fav music. As above uses JSON files, in a zip file

billywhizz

this is nice. i like the idea which has been tried in a few places of running sqlite in the browser directly/locally. the only thing that is really missing to make this work at a bigger scale for read-heavy databases is a very cheap or free static hosting service which does range requests, allows you control of CORS and doesn't have the file size limitations of gist or github pages. maybe this exists already? S3 would do i guess?

you can do kinda magic things like this and build websites that connect to multiple different databases around the web and... well, i'll leave the rest up to your imagination.

go here: https://just.billywhizz.io/sqlite/squeel/

hit CTRL/CMD + Q on your keyboard.

paste in this sql

``` attach database 'https://raw.githubusercontent.com/just-js/just.billywhizz.io...' as chinook ;

select * from albums ; ```

and hit CTRL/CMD + g to run the queries.

IshKebab

I mean if you only have a few thousand records you barely need a database at all.

x-complexity

> I mean if you only have a few thousand records you barely need a database at all.

Keyword being "barely".

There are organization benefits if you can structure your data into a DB, instead of having each page redundantly hold the same header & metadata info.

renegat0x0

Previously I have been using JSON. However there are multiple structures with relations between them so... this seems to be a database.

Extracting data from it also becomes really really easy with selects. Otherwise I would have to implement, or reuse some algorithms to filter JSON data, etc.

koakuma-chan

TDLib (Telegram Database library)[0] uses SQLite.

In Telegram Bot API, each TDLib instance handles more than 24,000 active bots simultaneously.[1]

[0]: https://github.com/tdlib/td

[1]: https://core.telegram.org/tdlib

apitman

This tidbit caught my eye:

> Apple runs the world's largest known number of Cassandra/ScyllaDB instances with roughly 300,000 nodes as of 2022

The Big Tech approach to web services is like everyone piling on a cruise ship to commute to work. Certainly brings some conveniences, but the complexity is insane and overall it feels absurd.

The relatively new DB-per-tenant type approaches described in this article are more like renting/leasing a car, or maybe riding the bus to work. Definitely a step in the right direction.

The future will be a mix of these solutions, and selfhosting/indiehosting. Indiehosting is like owning a car. Once the software is as reliable as cars are, many more people will do it.

NathanFlurry

Author here, love this take.

I've chatted with a few medium-sized companies looking at Durable Objects for this reason. DB-per-tentant removes much of the need for another dedicated team to provision & maintain infrastructure for the services. It's almost like what microservices were trying to be but fell woefully short of achieving.

It's disappointing (but understandable) that "serverless" received a bad rap. It's never going to fully replace traditional infrastructure, but it does solve a lot of problems.

PaulHoule

I rebuilt my home server a few weeks ago and the thing I was most anxious about getting working was the Ubiquiti controller which controls maybe 5 access points and managed hubs in my house.

The damn thing depends on mongodb and, worse, an old version of mongodb. I found a script that installed it all but it seems a system like that Ubiquiti controller really ought to use sqlite for a "just works" experience. For a while I thought maybe mongo was necessary if you had a really big installation with 10,000+ access points but if this article is right, sqlite ought to be good for the biggest installations out there.

briHass

Been there, have the scars from googling mongo error log messages to prove it.

I've been pleased at work to see some on-prem server apps switching to SQLite recently. Most recent that comes to mind is EFT Server (Windows SFTP/FTPS server), which switched from SQL Server Express to SQLite. SSE always makes me groan: huge install, pain to manage, etc. SQLite is just so easy to copy files around and make backups. I'm sure it's probably faster also.

qingcharles

I've been switching all the projects I had running on SSE to SQLite. What a difference in ease of maintenance!

z3ratul163071

Using mongodb for the controller was a very poor decision. I went through similar issues migrating my controller :(

simonw

"Better At Hyper-Scale Than Micro-Scale"

That's a slightly misleading headline, because it can be interpreted as implying SQLite is BAD at micro-scale. That's not an argument the rest of the piece makes - it's more about how SQLite is great at large scale and people who focus on the small-scale aspects might miss that.

NathanFlurry

Fair point, noted.

vetrom

In my experience, in a multiuser environment SQLite falls over pretty hard, due to a lack of anything like MVCC.

I'm aware that there a bunch of sqlite-compatible implementations and extensions that add MVCC. How close are those to actually being SQLite? Are they just a totally different table storage and transaction engine sitting behind SQLite's SQL parser/compiler and VM?

shipp02

If anyone has heard Joe Armstrong's talk about how communication is limited by latency and data can only travel so fast. I think having smaller a partitions locally is an optimal point.

If You want global consistency then you'll have to either spend some time at runtime to achieve it, Have complicated protocols, fast networking, synchronized clocks.

Does this look like actor model (from Erlang) if you squint a bit?

NathanFlurry

Author here! I agree it's very similar to the actor model, but I kept the article's scope small, so I didn’t cover that.

In fact – Durable Objects talks a bit about its parallels with the actor model here: https://developers.cloudflare.com/durable-objects/what-are-d...

You might also appreciate this talk on building a loosely related architecture using Erlang, though it doesn't implement an actor-per-database pattern – https://www.youtube.com/watch?v=huGVdGLBJEo

chasemp

I was thinking something very similar. Once you've accepted any need at all for global state the next move is to reorient to minimizing it with horizontally scalable point local state and a small targeting dataset and tiered caching system.