Patterns for Building Realtime Features
25 comments
·February 10, 2025PaulDavisThe1st
Can we not use "realtime" for what is just really "interactive" features? Or even just "duplex" ?
"realtime" has a long history in the computing and technology realms, and this ain't it.
Animats
Indeed. This is just about user interfaces driven by remote events.
There are patterns for "real time". Things such as:
- What matters is the slowest case, not the average case.
- What has to be real time, and what can run in background?
- Avoiding priority inversions.
- Is there a stall timer that trips if you miss the control loop timing? What happens when the stall timer trips?
- What influence do caches have on time repeatability? What's the worst case? Can you keep the worst case from being the nothing-in-cache case?
sb8244
Realms matter. I don't really feel that holding terms hostage between hardware and software worlds is worth much energy.
People that know enough to care will know within 1 second what the article is about.
"Soft real-time" is probably the correct term here, but that is actually more confusing for 99% of people.
"Interactive" is not descriptive. "Duplex" is certainly not obvious.
PaulDavisThe1st
This has nothing to do with soft realtime, hard realtime or realtime in any of its more traditional senses.
TFA is just about the design and deployment of a two-way ("duplex") communication system that makes distributed applications "feel modern, collaborative, and up-to-date"
These sorts of systems have existed for decades; TFA provides a brief overview of 3 design patterns associated with them.
recroad
It’s amazing how much boilerplate stuff you don’t have to worry about when you use Phoenix LiveView. I think I’m in love with it.
rozap
I've been writing elixir for years (i think since 0.14?) and have been writing liveview for years. I'm all in on elixir in general, but I'm not sure I'd use liveview for another big project. Maybe it's just rose colored glasses because the TS/React world certainly has its own issues, but I think TS/react and regular old phoenix is a sweet spot. The composability of live view components still has a number of footguns (duplicate IDs, etc) and I think it has higher coupling than a nicely structured react app. You also have to treat liveviews and components fairly differently. All these design choices are for good reason, but it ends up being annoying in a large app. Also in deeply nested trees with lots of component reuse, static typing from TS really helps with refactors. As the projects grew to be large, I think I'm more productive with phoenix/TS than I am with phoenix/liveview.
I think there's a certain class of small to medium sized application (I'm building several) where liveview can be a good fit, but after writing it professionally and in hobby projects for several years, I'm less convinced that it's a great solution everywhere.
causal
Thanks for taking the time to write this, I feel like I'm always seen the excitement of new users and appreciate some criticism from a veteran.
I've been considering LiveView for a real time application I need to make, but haven't been sure whether it's worth the effort of learning Elixir.
rozap
For a real time application, there is no better option than elixir and phoenix if you actually want to get something working and shipped. Things that you will need to do which would be wildly complex in other stacks are so simple in erlang and elixir. I highly recommend taking the time to learn it - and it's a fairly simple language so that process should go pretty quickly. The debate about Liveview vs React+Phoenix channels is an exercise left up to the reader.
mervz
I have not enjoyed a language and framework like I'm enjoying Elixir and Phoenix! It has become my stack for just about everything.
pawelduda
Exactly! I was halfway through the article and thought how LiveView is basically equivalent of the "push ops" pattern described but beautifully abstracted away and comes for free, while you (mostly) write dynamic HTML markup. Magic!
ellieh
Came to the comments to say this. As I've been learning Elixir + LiveView, I've been consistently surprised at the amount you get for "free"
Tolexx
I think Elixir/Phoenix has to be the best stack currently for building applications with real-time features and that's all thanks to BEAM. Perhaps Go is another decent alternative but I definitely prefer BEAM's concurrency model.
jtwaleson
I'm building a simple version with horizontally scalable app servers that each use LISTEN/NOTIFY on the database. The article says this will lead to problems and you'll need PubSub services, but I was hoping LISTEN/NOTIFY would easily scale to hundreds of concurrent users. Please let me know if that won't work ;)
Some context: The use case is a digital whiteboard like Miro and the heaviest realtime functionality will be tracking all of the pointers of all the users updating 5x per second. I'm not expecting thousands/millions of users as I'm planning on running each instance of the software on-prem.
oa335
I assume you are using Postgres? If so i believe you will be limited by max_notify_queue_pages.
I would instead try would be to create an event table in the db, then create a publication on that table and have clients subscribe to that publication via logical replication. You would have to increase the number of replication slots though.
jtwaleson
Yes, Postgres, will check it out, thank you!
pawelduda
I think you'll for an app like this you'll eventually want to go with something that was designed for a high throughput (I don't think LISTEN/NOTIFY was). Yes, it's nice because you get it out of the box with Postgres, but you'll find that it's not resilient, you need to be careful with managing DB connections (in your case of hundreds/thousands users it is a significant thing), and it's lackluster compared to a specialized PubSub.
martinsnow
How do you handle deployments of realtime back ends which needs state in memory?
jakewins
In general you do it by doing a failover behind some variation of a reverse proxy.
If you can start new instances quickly and clients can handle short delays you can do it by just stopping the old deployment and starting the new one, booting off of the snapshotted state from the prior deployment.
If you need “instant” you do it by implementing some form of catchup and then fail over.
It is a lot easier to do this if you have a dedicated component that ”does” the failover, rather than having the old and new deployments try to solve it bilaterally. Could just be a script ran by a human, or something like a k8s operator if you do this a lot
cess11
On BEAM/OTP you can control how state is handled at code updates. Finicky but you can.
In most other contexts you'd externalise state to a data store like Redis or RDBMS, and spawn one, kill one or do blue-green in the nebula behind your load balancer constellation.
calebio
This article ended too soon :( I was having a really good time reading it, very nice work!
blixt
What I found building multiplayer editors at scale is that it's very easy to very quickly overcomplicate this. For example, once you get into pub/sub territory, you have a very complex infrastructure to manage, and if you're a smaller team this can slow down your product development a lot.
What I found to work is:
Keep the data you wish multiplayer to operate on atomic. Don't split it out into multiple parallel data blobs that you sometimes want to keep in sync (e.g. if you are doing a multiplayer drawing app that has commenting support, keep comments inline with the drawings, don't add a separate data store). This does increase the size of the blob you have to send to users, but it dramatically decreases complexity. Especially once you inevitably want versioning support.
Start with a simple protocol for updates. This won't be possible for every type of product, but surprisingly often you can do just fine with a JSON patching protocol where each operation patches properties on a giant object which is the atomic data you operate on. There are exceptions to this such as text, where something like CRDTs will help you, but I'd try to avoid the temptation to make your entire data structure a CRDT even though it's theoretically great because this comes with additional complexity and performance cost in practice.
You will inevitably need to deal with getting all clients to agree on the order in which operations are applied. CRDTs solve this perfectly, but again have a high cost. You might actually have an easier time letting a central server increment a number and making sure all clients re-apply all their updates that didn't get assigned the number they expected from the server. Your mileage may vary here.
On that note, just going for a central server instead of trying to go fully distributed is probably the most maintainable way for you to work. This makes it easier to add on things like permissions and honestly most products will end up with a central authority. If you're doing something that is actually local-first, then ignore me.
I found it very useful to deal with large JSON blobs next to a "transaction log", i.e. a list of all operations in the order the server received them (again, I'm assuming a central authority here). Save lines to this log immediately so that if the server crashes you can recover most of the data. This also lets you avoid rebuilding the large JSON blob on the server too often (but clients will need to be able to handle JSON blob + pending updates list on connect, though this follows naturally since other clients may be sending updates while they connect).
The trickiest part is choosing a simple server-side infrastructure. Honestly, if you're not a big company, a single fat server is going to get you very far for a long time. I've asked a lot of people about this, and I've heard many alternatives that are cloud scale, but they have downsides I personally don't like from a product experience perspective (harder to implement features, latency/throughput issues, possibility of data loss, etc.) Durable Objects from Cloudflare do give you the best from both worlds, you get perfect sharding on a per-object (project / whatever unit your users work on) basis.
Anyway, that's my braindump on the subject. The TLDR is: keep it as simple as you can. There are a lot of ways to overcomplicate this. And of course some may claim I am the one overcomplicating things, but I'd love to hear more alternatives that work well at a startup scale.
athrun
Thanks for sharing your experience, and what you have found to work.
Sometimes I feel we (fellow HN readers) get caught into overly complex rabbit holes, so it's good to balance it out with some down-to-earth, practical perspectives.
jvanderbot
I've worked in robotics for a long time. In robotics nowadays you always end up with a distributed system, where each robot has to have a view of the world, it's mission, etc, and also of each other robot, and also the command and control dashboards do too, etc etc.
Always always always follow parent's advice. Pick one canonical owner for the data, and have everyone query it. Build an estimator at each node that can predict what the robot is doing when you don't have timely data (usually just running a shadow copy of the robot's software), but try to never ever do distributed state.
Even something as simple as a map gets arbitrarily complicated when you're sensing multiple locations. Just push everyone's guesses to a central location and periodically batch update and disseminate updates. You'll be much happier.
mikhmha
Wow, this sounds like how the AI simulation for my multiplayer game works. Each AI agent has a view of the world and can make local steering decisions to avoid other agents and self preservation. Agents carry out low-level goals that are given to them by squad leaders. A squad leader receives high level "world" objectives from a commander. High-level objectives are broken down into low level objectives distributed among squad units based on their attributes and preferences.
Isn't this what Firebase and similar solved a long time ago?
Having a local copy of a database slice, and hide the sync from developer at all.