Skip to content(if available)orjump to list(if available)

Building a Durable Execution Engine with SQLite

fiddlerwoaroof

Every several years people reinvent serializable continuations

gunnarmorling

Yupp, making that same point in the post :)

> You could think of [Durable Execution] as a persistent implementation of the memoization pattern, or a persistent form of continuations.

andersmurphy

Haha so true. Shame image based programming never really caught on.

Janet lang lets you serialize coroutines which is fun. Make this sort of stuff trivial.

smitty1e

Is this reinvention somehow "transactional" in nature?

websiteapi

there's a lot of hype around durable execution these days. why do that instead of regular use of queues? is it the dev ergonomics that's cool here?

you can (and people already) model steps in any arbitrarily large workflow and have those results be processed in a modular fashion and have whatever process that begins this workflow check the state of the necessary preconditions prior to taking any action and thus go to the currently needed step, or retry ones that failed, and so forth.

kodablah

> is it the dev ergonomics that's cool here?

Yup. Being able to write imperative code that automatically resumes where it left off is very valuable. It's best to represent durable turing completeness using modern approaches of authoring such logic - programming languages. Being able to loop, try/catch, apply advanced conditional logic, etc in a crash-proof algorithm that can run for weeks/months/years and is introspectable has a lot of value over just using queues.

ryeats

As you say it can be done but it's an anti-pattern to use a message queue as a database which is essentially what you are doing for these kinds of long running tasks. The reason is that their are a lot of state your likely going to want to status as a task runs and persist and checkpoint yes you can carefully string together a series of database calls chained with message transactions so you don't lose something when an issue happens but then you also need bespoke logic to restart or retry each step and it can turn into a bit of a mess.

snicker7

Message queues (e.g. SQS) are inappropriate for tracking long-running tasks/workflows. This is due to the operational requirements such as:

- Checking the status of a task (queued, pending, failed, cancelled, completed) - Cancelling a queued task (or pending task if the execution environment supports it) - Re-prioritizing queued tasks - Searching for tasks based off an attribute (e.g. tag)

You really do need a database for this.

tptacek

We build what is effectively a durable execution "engine" for our orchestrator (ours is backed by boltdb and not SQLite, which I objected to, correctly). The steps in our workflows build running virtual machines and include things like allocating addresses, loading BPF programs, preparing root filesystems, and registering services.

Short answer: we need to be able to redeploy and bounce the orchestrator without worrying about what stage each running VM on our platform is in.

JP, the dev that built this out for us, talks a bit about the design rationale (search for "Cadence") here:

https://fly.io/blog/the-exit-interview-jp/

The library itself is open:

https://github.com/superfly/fsm