Litestream: Revamped

113 comments

·May 20, 2025

maxmcd

Looks like the code is here: https://github.com/benbjohnson/litestream/tree/v0.5

Really nice to see this, I wrote this comment almost 2 years ago when I was a little miffed about trying to use litestream and litefs: https://news.ycombinator.com/item?id=37614193

I think this solves most of the issues? You can now freely run litestream on your DB and not worry about issues with multiple writers? I wonder how the handoff is handled.

The read replica FUSE layer sounds like a real nice thing to have.

edit: Ah, it works like this: https://github.com/benbjohnson/litestream/pull/617

> When another Litestream process starts up and sees an existing lease, it will continually retry the lease acquisition every second until it succeeds. This low retry interval allows for rolling restarts to come online quickly.

Sounds workable!

simonw

This post is like they read my mind and implemented everything I wanted from a new Litestream. So exciting.

thewisenerd

ben, thanks for litestream!

we're using it on production for a write-heavy interal use-case (~12GB compressed) for more than a year now; and it's costing us a couple hundred pennies per month (azure).

excited to try the new changes when they land.

tmpz22

Mind sharing some of your operational choices for hosting/deployment? Which Azure services are you using and what configurations? What kind of throughput are you getting? Any tips regarding migrations? Are you using a dedicated server or VPS?

I'll be doing a similar deployment later this year and enjoy reading on the topic.

thewisenerd

for this particular deployment;

we're only using the blob storage on azure. the deployments are on an on-prem kubernetes cluster with replicas=1 and strategy: recreate.

throughput: not very heavy tbf.. one webhook request every 10-ish seconds; each request leads to about 10-100+ entries added in a table.

migrations: since it's an internal console, we just took a couple hours downtime and did it.

bradgessler

I wish Fly would polish the developer experience on top of SQLite. They're close, but it's missing:

1. A built-in UI and CLI that manages SQLite from a volume. Getting the initial database on a Fly Machine requires more work than it should.

2. `fly console` doesn't work with SQLite because it spins up a separate machine, which isn't connected to the same volume where the SQLite data resides. Instead you have to know to run `fly ssh console —pty`, which effectively SSH's into the machine with the database.

The problem in general with SQLite web apps is they tend to be small apps, so you need a lot of them to make a decent amount of money hosting them.

adenta

Brad, what’s your take on Rails 8 w/ SQLite? Are you gravitating towards it these days over Postgres?

bradgessler

Yep! I just migrated a Fly PG cluster database to SQLite because I over-provisioned DB resources and got tired of dealing with the occasional node crashing.

TBH I wish they had their managed PG cluster running because it would have made it easier to downsize, but I’m happy with SQLite.

I used SQLite for another project that I knew was going to max out at 100 concurrent users and it worked great. The best moment was when a user reported a production error I couldn’t recreate locally, so I downloaded the database and recreated it with the latest production data on my laptop. You couldn’t do that with a high-compliance app, but that’s not most apps.

I’m hesitant to outright say “SQLite and Rails is great”because you have to know your app will run on one node. If you know that then it’s fantastic.

null

[deleted]

jasonthorsness

What a coincidence, I was just researching Litestream today! I use Sqlite on my VPS and was thinking about adding this.

Am I understanding correctly that I will be able to restore a database to any point-in-time that is while the litestream process is running? Because auto-checkpointing could consume the WAL while it isn't running?

So for an extreme example if the process crashed for an hour between 2:00 and 3:00, I could restore to 1:55 or 3:05 but the information required to restore between 2:00 and 3:00 is lost?

benbjohnson

Litestream saves WAL segments to a given time granularity. By default, it ships off WAL changes every second so you should be able to restore to any given second in your history (within your retention period).

dolmen

Do you have DST handling issues?

I'm asking because switching from winter time to summer time in Europe happened on March 30th with local time jumping from 2:00 to 3:00.

psanford

This looks great! A few years ago I wrote a sqlite vfs for using dynamodb as a backing store[0] called DonutDB. With the recent addition of CAS to S3, I was thinking about making a new version of DonutDB backed by S3. I'm really glad lightstream supports this so I don't have to!

I can't wait to try this out.

[0]: https://github.com/psanford/donutdb

hamandcheese

> With the recent addition of CAS to S3

Do you have a reference for this? I assume by CAS you mean content addressable storage? I googled but can't find any AWS docs on this.

xyzzy_plugh

Compare And Swap

gcr

The TL;DR is that Amazon S3 now supports "conditional writes" which are guaranteed to fail if the file was written by some other writer. This is implemented by sending the ETag of an object's expected version alongside the write request.

Litestream now depends on this functionality to handle multiple writers. Think of optimistic locking.

https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3...

ignoramous

  We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too. We think what coding agents like Phoenix.new want is a way to try out code on live data, screw it up, and then rollback both the code and the state.

Prescient.

Agents would of course work well if they can go back in time to checkpoints and branch from there, exploring solutions parallely as needed.

Anyone who has experience with building workflows (Amazon SWF, Temporal, and the like) knows how difficult it is to maintain determinism in face of retries & re-drives in multi-tier setups (especially, those involving databases).

Replit recently announced their Agent's integration with Neon's time travel feature [0] for exactly the purpose outlined in TFA. Unlike Fly.io though, Replit is built on GCP and other 3p providers like Neon and it is unclear if both GCP & Databricks won't go all Oracle on them.

[0] https://blog.replit.com/safe-vibe-coding

hobo_mark

If you wanted to use litestream to replicate many databases (ideally, one or more per user), which is one of the use cases described here (and elsewhere), how do you tell litestream to add new databases dynamically? The configuration file is static and I haven't found an API to tell it to track a new db at runtime.

mrkurt

I would expect this problem to get solved. It's tricky to detect new sqlites, but not impossible.

In the meantime, it's pretty straightforward to use as a library.

danielblignaut

Have these changes landed yet? I don't see any updates in the documentation website around some of these features. Most notably, I'm interested in read replication and the ability for a read replica to take over becoming the leader (even if it requires a config update and litestream restart) in cases where we aren't using Consul (which is why LiteFS is not a workable solution in our environment).

benbjohnson

The changes are still in progress. The blog post was just about future work that we're working on. However, we don't have plans to do failover with Litestream at the moment.

danielblignaut

Awesome! BTW, it's really interesting work. The main reason I'm avoiding LiteFS is because of the need for Consul. Does LiteFS support Corrosion at all or a method to bypass the need for Consul? I'm digging a bit deeper into it now just in case.

srameshc

I have been following Ben for a long time but I never knew LiteFS was based on his work. I somehow settled eventually for rqlite for self managed distributed.

https://github.com/rqlite/rqlite

digianarchist

Similar approach to LiteFS but rqlite embeds raft into the project instead of depending on consul.

https://youtu.be/8XbxQ1Epi5w?si=puJFLKoVs3OeYrhS

tptacek

I don't think they're similar at all. LiteFS uses Consul to elect a leader for a single-write-leader multiple-replica configuration, the same way you'd do with Postgres. rqlite (as I understood it last time I looked) runs Raft directly; it gets quorums for every write.

One isn't better than the other. But LiteFS isn't a "distributed SQLite" in the sense you'd think of with rqlite. It's a system for getting read-only replicas, the same way you've been able to do with log shipping on n-tier databases for decades.

apitman

rqlite also requires you to use special client libraries, whereas litefs is transparent to the program.

rads

What will be required from users of the existing Litestream version to upgrade to the new one? Is it a matter of bumping the version when it comes out or is there more to it?

wg0

> Now that we’ve switched to LTX, this isn’t a problem any more. It should thus be possible to replicate /data/*.db, even if there’s hundreds or thousands of databases in that directory.

That was the show stopper. Now multi tenant with per tenant database whee (in theory) each user can roll back to certain point in time or at least completely download their database and take away for whatever they want to do with it is going to be possible.

HN

Litestream: Revamped

Litestream: Revamped