Skip to content(if available)orjump to list(if available)

Show HN: Obelisk – a WASM-based deterministic workflow engine

Show HN: Obelisk – a WASM-based deterministic workflow engine

28 comments

·April 9, 2025

A lightweight engine for durable execution / deterministic workflows I built with Rust, wasmtime and the WASM Component Model. Its main use is running reliable, long-running workflows that can automatically resume after failures. Looking for feedback on the approach and potential use cases!

SvenL

One issue I had many time with workflow engines was updates. I have a workflow and it has already running instances. 2 scenarios:

Can I update the workflow while it has running instances without interfering the running instances?

Can I update a running instance with a new version of the workflow to patch some flaw? If no, can I replay an updated version of a workflow with the log of an old workflow version?

tomasol

Great questions. If you are fixing a bug in a workflow, which has running executions, there are two scenarios:

Either the fix does not break the determinism, meaning the the execution did not hit the fix yet. In this case the execution can be replayed and continue on the patched WASM component.

Otherwise, the execution replay causes "Non determinism detected" error. In this case you need to handle the situation manually. Since the execution log is in a sqlite file, you can select all execution affected by the bug and perform a custom cleanup. Also you can create a "forked" execution just by copying the execution log + child responses into a new execution, however there is no API for it yet.

> Can I update the workflow while it has running instances without interfering the running instances?

If you mean keep the in-progress executions on the old version of the code, you can do that by introducing a new version in the WIT file and/or change the new function name.

halamadrid

We are using a workflow engine called Unmeshed - which has what you are asking about. Workflow definitions can be updated without running interfering with running instances and if you choose to you can patch updates on to running workflows. And you can also rerun workflows with the same input from an older execution.

emgeee

This is a pretty cool idea but I'm trying to think of the advantage of WASM vs other execution engines.

It seems to me one of the main use-cases for WASM is to execute lambdas, which are often short-lived (like 500ms timeout limits). Maybe this could have a place in embedded systems?

tomasol

The biggest motivator for me is that WASM sandbox provides true deterministic execution. Contrary to engines like temporal, using hashmaps is 100% deterministic here. Attempting to spawn a thread is a compile error. It also performs well - the bottleneck is in the write throughput of sqlite. Last but not least - all the interfaces between workflows and activities are type safe, described in a WIT schema.

AlotOfReading

WASM isn't quite deterministic. An easy example is NaN propagation, which can be nondeterministic in certain circumstances. Obelisk itself seems to allow nondeterminism via the sleep() function. Just create a race condition among a join set. I imagine that might even get easier once the TODO to implement sleep jitter is completed.

It's certainly close enough that calling it deterministic isn't misleading (though I'd stop short of "true determinism"), but there's still sharp edges here with things like hashmaps (e.g. by recompiling: https://dev.to/gnunicorn/hunting-down-a-non-determinism-bug-...).

tomasol

Thanks for bringing that up. Regarding the NaN canonicalization, there is a flag for it in wasmtime [1], I should probably make sure it is turned on.

Although I don't expect to be an issue practically speaking, Obelisk checks that the replay is deterministic and fails the workflow when an unexpected event is triggered. It should be also be possible to add an automatic replay of each finished execution to verify the determinism e.g. while testing.

[1] https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#...

Edit: Enabling the flags here: https://github.com/obeli-sk/obelisk/pull/67

tomasol

> Just create a race condition among a join set.

All responses and completed delays are stored in a table with an auto-incremented id, so the `-await-next` will always resolve to the same value.

As you mention, putting a persistent sleep and a child execution into the same join set is not yet implemented.

genuine_smiles

> An easy example is NaN propagation, which can be nondeterministic in certain circumstances.

Which circumstances?

jcmfernandes

Somewhat similar to Golem - https://github.com/golemcloud/golem - correct?

So, I like this idea, I really do. At the same time, in the short-term, WASM is relatively messy and, in my opinion, immature (as an ecosystem) for prime time. But with that out of the way (it will eventually come), you'll have to tell people that they can't use any code that relies on threads, so they better know if any of the libraries they use does it. How do you foresee navigating this? Runtime errors suck, especially in this context, as fixing them requires either live patching code or migrating execution logs to new code versions.

tomasol

Yeah, looks like Golem went similar route - using WASM Component Model and wasmtime.

There is always this chicken and egg problem on a new platform, but I am hoping that LLMs can solve it partially - the activities are just HTTP clients with no complex logic.

Regarding the restrictions required for determinism, they only apply to workflows, not activities. Workflows should be describing just the business logic. All the complexities of retries, failure recovery, replay after server crash etc. are handled by the runtime. The WASM sandbox makes it impossible to introduce non-determinism - it would cause a compile error so no need for runtime checks.

Philpax

I believe https://flawless.dev/ is another implementation with a very similar technology stack. I'd love to know how you compare and what the key differences are!

tomasol

Indeed. I cannot compare the implementations as flawless is not open source. However on a high level they both share the same philosophy.

I believe the biggest difference is that Obelisk relies on the WASM Component Model:

Obelisk aims to avoid vendor lock-in. It is possible to write activities, workflows and webhooks with no obelisk SDK. Activities and webhooks are WASI 0.2 components that can be run on any compatible runtime like wasmtime e.g. for testing. This should also help with the adoption as any runtime will need a ton of integrations.

jusonchan81

I’m not sure there is much risk in vendor lock in. Look at Temporal. If you use the SDK you are probably locked in for life.

euroderf

Would it be possible to see an example using Go ? Admittedly, docu for Go in the Component Model is pretty uneven.

tomasol

I can do that. Please create an issue if you want to be notified about it.

disintegrator

Really nice project. What’s the reasoning behind the AGPL licensing. My understanding is that it will hurt adoption unless you’re planning to offer paid licensing options? Either way it’s a really nice project and I’m keen to try it out. I’ve found it tricky to get a WASM/WASI setup where I can at least my http requests (probably my own skill issue).

tomasol

Thanks for the kind words. In an ideal world I would like to offer a cloud version that would be monetized. There are a few examples on how to do HTTP requests, I have a demo repository [1] with GraphQL and regular JSON-over-HTTP activities. I do agree that the ecosystem is not mature yet, but I was able to generate HTTP activities using LLM on a single shot.

1: https://github.com/obeli-sk/demo-stargazers

chaosprint

I have a similar idea but for audio effect and music production:

https://github.com/wasm-audio/wasm-audio-examples

For workflow, how do you think of async in wit at this moment?

tomasol

The structured concurrency paradigm in workflows is stricter to what I'm reading here [1]. The whole execution model is different from the Task [2], as workflows are transparently unloaded and replayed.

Obelisk has a concept called join sets [3], where child executions are submitted and awaited. In the future I plan on adding cancellation and allow custom cleanup functions.

[1] https://github.com/WebAssembly/component-model/blob/main/des...

[2] https://github.com/WebAssembly/component-model/blob/main/des...

[3] https://obeli.sk/docs/latest/concepts/workflows/join-sets/