Every System is a Log: Avoiding coordination in distributed applications
81 comments
·January 24, 2025shikhar
logsr
> log as a service
very exciting. this is the future. i am working on a very similar concept. every database is a log at its core, so the log, which is the highest performance part of the system, is buried behind many layers of much lower performing cruft. edge persistence with log-per-user application patterns opens up so many possibilities.
sewen
Some clarification on what "one log" means here:
- It means using one log across different concerns like state a, communication with b, lock c. Often that is in the scope of a single entity (payment, user, session, etc.) and thus the scope for the one log is still small. You would have a lot of independent logs still, for separate payments.
- It does _not_ mean that one should share the same log (and partition) for all the entities in your app, like necessarily funneling all users, payments, etc. through the same log. That goes actually beyond the proposal here - has some benefits of its own, but have a hard time scaling.
magicalhippo
Interesting read, not my area but I think I got the gist of it.
In your Restate example of the "processPayment" function, how do you handle errors of the "accountService" call? Like, what if it times out or returns a server error?
Do you store the error result and the caller of "processPayment" has to re-trigger the payment, in order to generate a new log?
stsffap
By default, failing ctx.run() calls (like the accountService call) will be retried indefinitely until they succeed unless you have configured a retry policy for them. In the case of a configured retry policy where you have exhausted the number of retry attempts, Restate will mark this call as terminally failed and record it in its log as such and return it to the caller.
sewen
You can catch these errors and handle them in a common try/catch manner, and because the results of `ctx.run` are recorded in the log, this is deterministic and reliable
daxfohl
Haven't formed thoughts on the content yet, but happy to see a company launching something non-AI for a change.
gjtorikian
My startup, Yetto (http://www.yetto.app) is building a better way for support professionals to do their job. (Shameless plug but we always gotta hustle.)
We, too, are weighed down by how much space AI-focused companies are taking.
hansonkd
TBH looking at helpdesk software in 2025, I would expect new ones to be built AI first. It would be hard for me to consider one without at least some sort of LLMs helping with triage or at classifications of tickets, etc.
gk1
See https://www.fixify.com/, although they are too modest about mentioning the AI they're using under the hood.
Thaxll
This is exactly this example from Temporal: https://github.com/temporal-sa/temporal-order-fulfill-demo
TuringTest
Excuse me for sounding rough, but - isn't this reinventing comp-sci, one step at a time?
I learned about distributed incrementally -monotonic logs back at the late 90s, with many other ways to do guaranteed transactional database actions. And I'm quite certain these must have been invented in the 50s or 60s, as these are the problems that early business computer users had: banking software. These are the techniques that were buried in legacy COBOL routines, and needed to be slowly replaced by robust Java core services.
I'm sure the Restate designers will have learned terribly useful insights in how to translate these basic principles into a working system with the complexities of today's hardware/software ecosystem.
Yet it makes me wonder if young programmers are only being taught the "build fast-break things" mentality and there are no longer SW engineers able to insert these guarantees into their systems from the beginning, by standing on the shoulders of the ancients that invented our discipline, so that their lore is actually used in practice? Or am I just missing something new in the article that describes some novel twist?
sewen
This is certainly building on principles and ideas from a long history of computer science research.
And yes, there are moment where you go "oh, we implicitly gave up xyz (i.e., causal order across steps) when we started adopting architecture pqr (microservices). But here is a thought on how to bring that back without breaking the benefits of pqr".
If you want, you can think of this as one of these cases. I would argue that there is tremendous practical value in that (I found that to be the case throughout my career).
And technology advances in zig zag lines. You add capability x but lose y on the way and later someone finds a way to have x and y together. That's progress.
mrkeen
I think your points are pretty spot on - most things have already been invented, and there's too much of a move-fast-and-break-things mentality.
Here's a follow-up thought: to what extent did the grey-beards let us juniors down by steering us down a different path? A few instances:
DB creators knew about replicated logs, but we got given DBs, not replicated log products.
The Java creators knew about immutability: "I would use an immutable whenever I can." [James Gosling, 1] but it was years later when someone else provided us with pcollections/javaslang/vavr. And they're still far from widespread, and nowhere near the standard library.
Brendan Eich supposedly wanted to put Scheme into browsers, but his superiors had him make JS instead.
What other stuff have we been missing out on?
[1] https://www.artima.com/articles/james-gosling-on-java-may-20...
trollbridge
I’ve been doing a similar thing, although I called it “append only transaction ledgers”. Same idea as a log. A few principles:
- The order of log entries does not matter.
- Users of the log are peers. No client / server distinction.
- When appending a log entry, you can send a copy of the append to all your peers.
- You can ask your peers to refresh the latest log entries.
- When creating a new entry, it is a very good idea to have a nonce field. (I use nano IDs for this purpose along with a timestamp, which is probabilistically unique.)
- If you want to do database style queries of the data, load all the log entries into an in memory database and query away.
- You can append a log entry containing a summary of all log entries you have so far. For example: you’ve been given 10 new customer entries. You can create a log entry of “We have 10 customers as of this date.”
- When creating new entries, prepare the entry or list of entries in memory, allow the user to edit/revise them as a draft, then when they click “Save”, they are in the permanent record.
- To fix a mistake in an entry, create a new entry that “negates” that entry.
A lot of parallelism / concurrency problems just go away with this design.
Kinrany
> The order of log entries does not matter.
This is surprising, Kafka-like logs are all strictly ordered.
cduzz
* within a partition
XorNot
How do you know summary entries are valid if order doesn't matter?
I.e. "we have 10 customers as of this date" can become immediately invalid if a new entry is appended afterwards with a date before that summary entry (i.e. because it was on a peer which hadn't yet sent it)
clayg
IME you have to be willing to recalculate the summaries up to some kind of consistency window.
Yes you may be changing history and you may have a business reason not to address that revision immediately (you've already billed them?) - but the system can still learn it made a mistake and fix it (add activity from Jan 30 evening that comes in late to the Feb bill?)
log4shell
Calling a WAL a ledger, why? Ledger sounds fancier but why would it be a ledger in this case?
hcarvalhoalves
I believe "ledger" implies commutative property (order does not matter).
log4shell
I am not aware of any such implicit connection of ledger and commutative property, also couldn't find anything as my google-fu is letting me down. Anything I can refer to? Generally curious to know use of term ledger outside of accounting and blockchains.
I have seen it used to mean WAL before, so I am taking this with a dose of skepticism.
glitchc
How do you manage log size for high-transaction systems?
grahamj
The lack of ordering is surprising. Without that you can’t stream without a buffer.
zellyn
sewen (et al)
This is lovely and I'm itching to try it. One question:
We have a use case where a location gets cut off completely from the internet at large. In that case, it makes sense for the local hardware (typically Android and/or iOS tablets or equivalent) to take over as a log owner: even though you're cut off, if you're willing to swallow the risk (and hence cost) of offline payments, you should be able to create orders, fulfill them, pay for them, close them out, send tickets to the kitchen to cook the food or to the warehouse to fetch the tractor, etc.
Does restate include something that covers that use-case? In the noodling/daydreaming a colleague and I have done, we ended up with something very close to restate (I imagined just using Kafka), except that additionally many operations would have a CRDT nature: eg. you should _always_ be allowed to add a payment to an order, because presumably a real-life payment happened.
I've also noodled with the idea of logs whose canonical ownership can be transferred. That covers cases where you start offline and then reconnect, but doesn't work so well for transactions that start out connected (and thus owned in the datacenter) and need to continue offline.
One could also imagine ensuring that > n/2 consensus members are always located inside the restaurant/hardware store/etc., so if you go offline, you can still proceed. It might even be possible to recognize disconnection and then take one full vote to further subdivide that pool of consensus members so if one dies it doesn't halt progress. This feels like it would be getting very tricksy…
jaseemabid
A notable example of a large-scale app built with a very similar architecture is ATproto/Bluesky[1].
"ATProto for Distributed Systems Engineers" describes how updates from the users end up in their own small databases (called PDS) and then a replicated log. What we traditionally think of as an API server (called a view server in ATProto) is simply one among the many materializations of this log.
I personally find this model of thinking about dataflow in large-scale apps pretty neat and easy to understand. The parallels are unsurprising since both the Restate blog and ATProto docs link to the same blog post by Martin Kleppmann.
This arch seems to be working really well for Bluesky, as they clearly aced through multiple 10x events very recently.
[1]: https://atproto.com/articles/atproto-for-distsys-engineers
sewen
That blog post is a great read as well. Truely, the log abstraction [1] and "Turning the DB inside out" [2] have been hugely influential.
In a way this article here suggests to extend that
(1) from a log that represents data (upserts, cdc, etc.) to a log of coordination commands (update this, acquire that log, journal that steo)
(2) have a way to link the events related to a broader operation (handler execution) together
(3) make the log aware of handler execution (better yet, put it in charge), so you can automatically fence outdated executions
[1] https://engineering.linkedin.com/distributed-systems/log-wha...
zellyn
Martin Kleppmann was also directly involved with Bluesky as a consultant.
grahamj
Table/log duality goes back further than Kleppmann though. An earlier article that really influenced me was
https://engineering.linkedin.com/distributed-systems/log-wha...
sewen
A short summary:
Complex distributed coordination and orchestration is at the root of what makes many apps brittle and prone to inconsistencies.
But we can mitigate much of complexity with a neat trick, building on the fact that every system (database, queue, state machine) is effectively a log underneath the hood. By implementing interaction with those systems as (conditional) events on a shared log, we can build amazingly robust apps.
If you have come across “Turning the Database Inside Out” (https://martin.kleppmann.com/2015/11/05/database-inside-out-...), you can think of this a bit like “Turning the Microservice Inside Out”
The post also looks at how this can be used in practice, given that our DBs and queues aren't built like this, and how to strike a sweet-spot balance between this model with its great consistency, and maintaining healthy decoupling and separation of concerns.
jamamp
I wonder how this compares, conceptually, to Temporal? While Temporal doesn't talk about a single centralized log, I feel the output is the same: your event handlers become durable and can be retried without re-executing certain actions with outside systems. Both Restate and Temporal feel, as a developer coding these event handlers, like a framework where they handle a lot of the "has this action been performed yet?" and such for you.
Though to be fair I've only read Temporal docs, and this Restate blog post, without much experience in either. Temporal may not have as much on the distributed locking (or concept of) side of things that Restate does, in this post.
sewen
Temporal is related, but I would say it is a subset of this.
If you only consider appending results of steps of a handler, then you have something like Temporal.
This here uses the log also for RPC between services, for state that outlives an individual handler execution (state that outlives a workflow, in Temporal's terms).
davexunit
My takeaway from this article is that the proposed solution for distributed app coordination is a shared, centralized log. What did I miss?
azmy
IMHO the article is not mainly about the implementation of the Log, but rather leveraging on the idea of the log to build reliable and fault tolerant applications. The implementation of the log itself can be either centralised or decentralised.
sewen
That gist is correct - I would add that the log needs a few specific properties and conceptually be the shared log for state, communication, execution scheduling.
The next step is the, how do you make this usable in practice...
hiAndrewQuinn
I am a huge fan of append-only logs as a fundamental architectural principle. The Log [1] should be required reading for any CS undergraduate.
[1]: https://engineering.linkedin.com/distributed-systems/log-wha...
kasey_junk
I love them so much that I’ve noodled with building a programming language optimized for using them.
Things like types that encode what events are legal in a log, first class support for data versions, fast file read and writes, etc
daxfohl
How do you do GDPR takedowns?
jaseemabid
Immutable append only persistent log doesn't imply store everything _forever_.
If you want to remove something you could add a tombstone record (like Cassandra) and eventually remove the original entry during routine maintenance operations like repacking into a more efficient format, archival into cold storage, TTL handling etc.
UltraSane
Best method is to encrypt data with a per user key so you if you delete the encryption key that customer's data cannot be read. You can create a new customer key at a fixed time interval to make deletion more granular.
This post makes a great case for how universal logs are in data systems. It was strange to me that there was no log-as-service with the qualities that make it suitable for building higher-level systems like durable execution: conditional appends (as called out by the post!), support very large numbers of logs, allow pushing high throughputs with strict ordering, and just generally provide a simple serverless experience like object storage. This led to https://s2.dev/ which is now available in preview.
It was interesting to learn how Restate links events for a key, with key-level logical logs multiplexed over partitioned physical logs. I imagine this is implemented with a leader per physical log, so you can consistently maintain an index. A log service supporting conditional appends allows such a leader to act like the log is local to it, despite offering replicated durability.
Leadership can be an important optimization for most systems, but shared logs also allow for multi-writer systems pretty easily. We blogged about this pattern https://s2.dev/blog/kv-store