Skip to content(if available)orjump to list(if available)

SQLite concurrency and why you should care about it

asa400

In SQLite, transactions by default start in “deferred” mode. This means they do not take a write lock until they attempt to perform a write.

You get SQLITE_BUSY when transaction #1 starts in read mode, transaction #2 starts in write mode, and then transaction #1 attempts to upgrade from read to write mode while transaction #2 still holds the write lock.

The fix is to set a busy_timeout and to begin any transaction that does a write (any write, even if it is not the first operation in the transaction) in “immediate” mode rather than “deferred” mode.

https://zeroclarkthirty.com/2024-10-19-sqlite-database-is-lo...

simonw

Yeah I read the OP and my first instinct was that this is SQLITE_BUSY. I've been collecting posts about that here: https://simonwillison.net/tags/sqlite-busy/

mickeyp

Indeed. Everyone who uses sqlite will get burnt by this one day and spend a lot of time chasing down errant write-upgraded transactions that cling on for a little bit longer than intended.

summarity

I've always tried to avoid situations that could lead to SQLITE_BUSY. SQLITE_BUSY is an architecture smell. For standard SQLite in WAL, I usually structure an app with a read "connection" pool, and a single-entry write connection pool. Making the application aware of who _actually_ holds the write lock gives you the ability to proactively design access patterns, not try to react in the moment, and to get observability into lock contention, etc.

simonw

Even with that pattern (which I use too) you still need to ensure those write operations always start a transaction at the beginning in order to avoid SQLITE_BUSY.

mickeyp

I mean, you're not wrong, and that is one way to solve it, but the whole point of a sensibly-designed WAL -- never mind database engine -- is that you do not need to commit to some sort of actor model to get your db to serialise writes.

sethev

These are performance optimizations. SQLite does serialize writes. Avoiding concurrent writes to begin with just avoids some overhead on locking.

tlaverdure

Yes, these are both important points. I didn't see any mention of SQLITE_BUSY in the blog post and wonder if that was never configured. Something that people miss quite often.

kijin

Wouldn't that "fix" make the problem worse on the whole, by making transactions hold onto write locks longer than necessary? (Not trying to disagree, just curious about potential downsides.)

BobbyTables2

Thats the best explanation I’ve seen of this issue.

However, it screams of a broken implementation.

Imagine if Linux PAM logins randomly failed if someone else was concurrently changing their password or vice versa.

In no other application would random failures due to concurrency be tolerated.

SQLite is broken by design; the world shouldn’t give them a free pass.

asa400

SQLite is a truly remarkable piece of software that is a victim both of its own success and its unwavering commitment to backward compatibility. It has its quirks. There are definitely things we can learn from it.

mickeyp

SQLite is a cracking database -- I love it -- that is let down by its awful defaults in service of 'backwards compatibility.'

You need a brace of PRAGMAs to get it to behave reasonably sanely if you do anything serious with it.

tejinderss

Do you know any good default PRAGMAs that one should enable?

mickeyp

These are my PRAGMAs and not your PRAGMAs. Be very careful about blindly copying something that may or may not match your needs.

    PRAGMA foreign_keys=ON
    PRAGMA recursive_triggers=ON
    PRAGMA journal_mode=WAL
    PRAGMA busy_timeout=30000
    PRAGMA synchronous=NORMAL
    PRAGMA cache_size=10000
    PRAGMA temp_store=MEMORY
    PRAGMA wal_autocheckpoint=1000
    PRAGMA optimize <- run on tx start
Note that I do not use auto_vacuum for DELETEs are uncommon in my workflows and I am fine with the trade-off and if I do need it I can always PRAGMA it.

defer_foreign_keys is useful if you understand the pros and cons of enabling it.

mikeocool

Using strict tables is also a good thing to do, if you value your sanity.

porridgeraisin

You should pragna optimize before TX end, not at tx start.

Except for long lived connections where you do it periodically.

https://www.sqlite.org/lang_analyze.html#periodically_run_pr...

adzm

Really, no mmap?

leetrout

Explanation of sqlite performance PRAGMAs

https://kerkour.com/sqlite-for-servers

mkoubaa

Seems like it's asking to be forked

kbolino

SQLite is fairly fork-resistant due to much of its test suite being proprietary: https://www.sqlite.org/testing.html

justin66

It has been forked at least once:

https://docs.turso.tech/libsql

Leherenn

A bit off topic, but there seems to be quite a few SQLite experts here.

We're having troubles with memory usage when using SQLite in-memory DBs with "a lot" of inserts and deletes. Like maybe inserting up to a 100k rows in 5 minutes, deleting them all after 5 minutes, and doing this for days on end. We see memory usage slowly creeping up over hours/days when doing that.

Any settings that would help with that? It's particularly bad on macOS, we've had instances where we reached 1GB of memory usage according to Activity Monitor after a week or so.

asa400

Are you running vacuums at all? auto_vacuum enabled at all?

https://sqlite.org/lang_vacuum.html

porridgeraisin

In memory DBs don't have anything to vacuum.

However... what you (and OP) are looking for might be pragma shrink_memory [1].

[1] https://sqlite.org/pragma.html#pragma_shrink_memory

kachapopopow

sounds like normal behavior of adjusting buffers to better fit the usecase, not sure if it applies to sqlite or if sqlite even implements dynamic buffers.

stefanos82

When hctree [1] becomes stable in SQLite, it will be the only database I will be using lol!

I presume the `hc` part in project's code name should be High Concurrency.

[1] https://sqlite.org/hctree/doc/hctree/doc/hctree/index.html

ricardobeat

Articles like this leave me with an uneasy feeling that the “solutions” are just blind workarounds - more debugging/research should be able to expose exactly what the problem is, now that would be something worth sharing.

mangecoeur

Sqlite is a great bit of technology but sometimes I read articles like this and think, maybe they should have used postgres. I you don’t specifically need the “one file portability” aspect of sqlite, or its not embedded (in which case you shouldn’t have concurrency issues), Postgres is easy to get running and solves these problems.

eduction

100%. I specifically clicked for the “why you should care” and was disappointed I could not find it.

I certainly don’t mind if someone is pushing the limits of what SQLite is designed for but personally I’d just rather invest the (rather small) overhead of setting up a db server if I need a lot of concurrency.

dv35z

Curious if anyone has strategies on how to perform parallel writes to an SQLite database using Python's `multiprocessing` Pool.

I am using it to loop through a database of 11,000 words, hit an HTTP API for each (ChatGPT) and generate example sentences for the word. I would love to be able to asynchronously launch these API calls and have them come back and update the database row when ready, but not sure how to handle the database getting hit by all these writes from (as I understand it) multiple instances of the same Python program/function.

zie

Technically SQLite can only have 1 writer at any given moment, but it can appear like it works across multiple writers and let it serialize the calls for you.

By default SQLite will not do what you want out of the box. You have to turn on some feature flags(PRAGMA) to get it to behave for you. You need WAL mode, etc read:

* https://kerkour.com/sqlite-for-servers * https://zeroclarkthirty.com/2024-10-19-sqlite-database-is-lo...

My larger question is why multiprocessing? this looks like an IO heavy workload, not CPU bound, so python asyncio or python threads would probably do you better.

multiprocessing is when your resource hog is CPU(probably 1 python process per CPU), not IO bound.

sethev

Have you tried it?

What you're describing sounds like it would work fine to me. The blog post is misleading imho - it implies that SQLite doesn't handle concurrency at all. In reality, you can perform a bunch of writes in parallel and SQLite will handle running them one after the other internally. This works across applications and processes, you just need to use SQLite to interact with the database. The blog post is also misleading when it implies that the application has to manage access to the database file in some way.

Yes, it's correct that only one of those writes will execute at a time but it's not like you have to account for that in your code, especially in a batch-style process like you're describing. In your Python code, you'll just update a row and it will look like that happens concurrently with other updates.

I'll bet that your call to ChatGPT will take far longer than updating the row, even accounting for time when the write is waiting for its turn in SQLite.

Use WAL-mode for the best performance (and to reduce SQLITE_BUSY errors).

mickeyp

Edit: disregard. I read it as he'd done it and had contention problems.

You can't. You have a single writer - it's one of the many reasons sqlite is terrible for serious work.

You'll need a multiprocessing Queue and a writer that picks off sentences one by one and commits it.

hruk

This is just untrue - the naive implementation (make the API call, write a single row to the db) will work fine, as transactions are quite fast on modern hardware.

What do you consider "serious" work? We've served a SaaS product from SQLite (roughly 300-500 queries per second at peak) for several years without much pain. Plus, it's not like PG and MySQL are pain-free, either - they all have their quirks.

mickeyp

Edit: disregard. I read it as he'd done it and had contention problems.

I mean it's not if he's got lock contention from BUSY signals, now is it, as he implies. Much of his issues will stem from transactions blocking each other; maybe they are long-lived, maybe they are not. And those 3-500 queries --- are they writes or reads? Because reads is not a problem.

null

[deleted]

null

[deleted]

ignoramous

  So, I decided on three locking strategies:

  No-Lock
  Optimistic locking
  Pessimistic locking

  As a default, the no-lock behavior does exactly what the name implies. Nothing. This is the default because my research shows that for 99% all of this is not an issue and every interaction at this level will slow down the whole application.
Aren't the mutexes in the more modern implementations (like Cosmo [0]) & runtimes (like Go [1]) already optimized so applications can use mutexes fearlessly?

[0] https://justine.lol/mutex/

[1] https://victoriametrics.com/blog/go-sync-mutex/

porridgeraisin

> So an application that wants to use SQLite as its database needs to be the only one accessing it.

No. It uses OS level locks. fcntl(). You can access it from how many ever processes. The only rule is, single writer (at a time).

> When another part of the application wants to read data, it reads from the actual database, then scans the WAL for modifications and applies them on the fly.

Also wrong. WAL does not contain modifications, it contains the full pages. A reader checks the WAL, and if it finds the page it won't even read the DB. It's a bit like a cache in this sense, that's why shared cache mode was discouraged in favour of WAL (in addition to its other benefits). Multiple versions of a page can exist in the WAL (from different transactions), but each reader sees a consistent snapshot which is the newest version of each page up to its snapshot point.

> For some reason on some systems that run Jellyfin when a transaction takes place the SQLite engine reports the database is locked and instead of waiting for the transaction to be resolved the engine refuses to wait and just crashes

You can set a timeout for this - busy_timeout.

> Reproducible

There's nothing unreliable here. It will fail every single time. If it doesn't, then the write finished too fast for the read to notice and return SQLite busy. Not sure what they are seeing.

> The solution

So they've reimplemented SQLites serialisation, as well as SQLites busy_timeout in C#?

> "engine", "crash"

Sqlite is not an engine. It's literally functions you link into your app. It also doesn't crash, it returns sqlite_busy. Maybe EF throws an exception on top of that.

I have to say, this article betrays a lack of fundamental DB knowledge and only knowing ORMs. Understand the DB and then use the ORM on top of it. Or atleast, don't flame the DB (context: blame-y tone of article) if you haven't bothered to understand it. Speaking of ORMs ...

> EF Core

You're telling me that burj khalifa of abstractions doesn't have room to tune SQLite to what web devs expect?