The hidden complexity of scaling WebSockets
67 comments
·January 24, 2025jwr
yesbabyyes
I always recommend looking at Server-Sent Events [0] and EventSource [1]. It's a standardization of old style long-polling, mapping very well to the HTTP paradigm and is built in to the web standard.
It's so much easier to reason about than websockets, and a naive server side implementation is very simple.
A caveat is to only use them with HTTP 2 and/or client side logic to only have one connection open to the server, because of browser limits on simultaneous requests to the same origin.
[0] https://developer.mozilla.org/en-US/docs/Web/API/Server-sent... [1] https://developer.mozilla.org/en-US/docs/Web/API/EventSource
bob1029
The last project I worked on went in the same direction.
Everything works great in local/qa/test, and then once we move to production we inevitably have customers with super weird network security arrangements. Users in branch offices on WiFi hardware installed in 2007. That kind of thing.
When you are building software for other businesses to use, you need to keep it simple or the customer will make your life absolutely miserable.
akshayKMR
What are the typical payload sizes in your WebSocket messages? Could you share the median and p99 values?
I've also discovered similar networking issues in my own application while traveling. For example, in Vietnam right now, I was facing recurring issues like long connection establishment times and loss of responsiveness mid-operation. I thought I was losing my mind - I even configured Caddy to not use HTTP3/QUIC (some networks don't like UDP).
I moved some chunkier messages in my app to HTTP requests, and it has become much more stable (though still iffy at times).
catlifeonmars
This is surprising to me as I would expect network equipment to just see a TCP connection given both HTTP and Websockets are an application layer protocol and that long lived TCP connections are quite ubiquitous (databases, streaming services, SSH, etc).
tomrod
Found this same issue trying to scale streamlit. It's just not a good idea.
peteforde
This is all true, but it also serves to remind us that Rails gives developers so much out of the box, even if you're not aware of it.
ActionCable is Rails' WebSockets wrapper library, and it addresses basically every pain point in the post. However, it does so in a way that all Rails developers are using the same battle-tested solution. There's no need for every project to hack together its own proprietary approach.
Thundering herds, heartbeat monitoring are both covered.
If you need a messaging schema, I strongly recommend that you check out CableReady. It's a powerful library for triggering outcomes on the client. It ships with a large set of operations, but adding custom operations is trivial.
https://cableready.stimulusreflex.com/hello-world/
While both ActionCable and CableReady are Rails libraries, other frameworks would score huge wins if they adopted their client libraries.
hinkley
Elixir’s lightweight processes are also a good fit. Though I’ve seen some benchmarks that claim that goroutines can hit even lower overhead per connection.
ramchip
That makes sense, Erlang/Elixir processes are a much higher-level construct than goroutines, and they trade off performance for fault tolerance and observability.
As an example, with a goroutine you have to be careful to handle all errors, because a panic would take down the whole service. In Elixir a websocket handler can crash anywhere without impacting the application. This comes at a cost, because to make this safe Elixir has to isolate the processes so they don't share memory, so each process has its own individual heap, and data gets copied around more often than in Go.
flakes
> As an example, with a goroutine you have to be careful to handle all errors, because a panic would take down the whole service.
Unless you're the default `net/http` library and simply recover from the panic: https://github.com/golang/go/blob/master/src/net/http/server...
atul-jalan
Node has similar libraries like Socket.IO too, but it over-abstracts it a bit in my opinion.
hombre_fatal
I've done my share of building websocket servers from scratch, but when you don't use libraries like ActiveCable or socket.io, you have to build your own MessageID reconciliation so that you can have request/response cycles. Which is generally what you want (or eventually want) in a websocket-heavy application.
send(payload).then(reply => ...)
atul-jalan
Yep, for our application, we have an `executionId` that is sent in essentially every single WebSocket message.
But client and server use it to maintain a record of events.
dilyevsky
At this point why even use a websocket vs a normal request/reply technology like grpc or json-rpc?
mirekrusin
Or just use jsonrpc.
p_l
If you add Content-Negotiation it will have ALL the OSI layers! /s
Honestly, I'm a little surprised and more than a bit depressed how we effectively reinvent the OSI stack so often...
10000truths
The key to managing this complexity is to avoid mixing transport-level state with application-level state. The same approach for scaling HTTP requests also works for scaling WebSocket connections:
* Read, write and track all application-level state in a persistent data store.
* Identify sessions with a session token so that application-level sessions can span multiple WebSocket connections.
It's a lot easier to do this if your application-level protocol consists of a single discrete request and response (a la RPC). But you can also handle unidirectional/bidirectional streaming, as long as the stream states are tracked in your data store and on the client side.
hinkley
Functional core, imperative shell makes testing and this fast iteration a lot easier. It’s best if your business logic knows very little about transport mechanisms.
I think part of the problem is that early systems wanted to eagerly process requests while they are still coming in. But in a system getting 100s of requests per second you get better concurrency if you wait for entire payloads before you waste cache lines on attempting to make forward progress on incomplete data. Which means you can divorce the concept of a payload entirely from how you acquired it.
ignoramous
> system getting 100s of requests per second you get better concurrency if you wait for entire payloads before you waste cache lines
At what point should one scale up & switch to chips with embedded DRAMs ("L4 cache")?
bruce343434
When you've profiled the code running in production and identified memory bottlenecks that can not be solved by algorithmic/datastructural optimizations.
hinkley
I haven’t been tracking price competitiveness on those. What cloud providers offer them?
But you don’t get credit for having three tasks halfway finished instead of one task done and two in flight. Any failover will have to start over with no forward progress having been made.
ETA: while the chip generation used for EC2 m7i instances can have L4 cache, I can’t find a straight answer about whether they do or not.
What I can say is that for most of the services I benchmarked at my last gig, M7i came out to be as expensive per request as the m6’s on our workload (AMD’s was more expensive). So if it has L4 it ain’t helping. Especially at those price points.
magicalhippo
Currently another thread is going[1] which advocates very similar things, in order to reduce complexity when dealing with distributed systems.
Then again, the frontend and backend are a distributed system, so not that weird one comes to similar conclusions.
[1]: https://news.ycombinator.com/item?id=42813049 Every System is a Log: Avoiding coordination in distributed applications
Rldm840
Many years ago, we used to start a streaming session with an http request, then upgrading to websockets after obtaining a response (this was our original "StreamSense" mechanism). In recent years, we changed StreamSense to go websocket first and fallback to http streaming or http long polling in case of issues. At Lightstreamer, we started streaming data 25 years ago over http, then moving to websockets. We've seen so many different behaviors in the wild internet and got some much feedback from the fieldsl in these decades that we believe our current version of Lightstreamer includes heuristics and mechanisms for virtually every possible aspect of websockets that could go wrong. From massive disconnections and reconnections, to enterprise proxies with deep inspections, to mobile users continuously switching networks. I recall when a big customer required us to support one million live websocket connections for each server (mid-sized) keeping low latency. It was challenging but forced us to come up with a brand new internal architecture. So many stories to tell covering 25 years of evolution...
jFriedensreich
I am really unsure why devs around the world keep defaulting to websockets for things that are made for server sent events. In 90% of the usecases i see, websockets are just not the right fit. Everything is simpler and easier with SSE. Some exceptions are high throughput >BI<directional data streams. But even if eg. your synced multiplayer cursors in something like figma use websockets don't use it for everything else eg. your notification updates.
SerCe
I wrote about the way we handle WebSocket connections at Canva a while ago [1]. Even though some small things have changed here and there since the post was published, the overall approach has held up pretty well handling many millions of concurrent connections.
That said, even with great framework-level support, it's much, much harder to build a streaming functionality compared to plain request/response if you've got some notion of a "session".
[1]: https://www.canva.dev/blog/engineering/enabling-real-time-co...
crabmusket
> it's much, much harder to build a streaming functionality compared to plain request/response if you've got some notion of a "session"
This touches something that I think is starting to become understood- the concept of a "session backend" to address this kind of use case.
See the complexity of disaggregation a live session backend on AWS versus CloudFlare: https://digest.browsertech.com/archive/browsertech-digest-cl...
I wrote about session backends as distinct from durable execution: https://crabmusket.net/2024/durable-execution-versus-session...
austin-cheney
The only complexity I have found with regards to scaling WebSockets is knowing the minimum delay between flush event completion and actual message completion to destination. It takes longer to process a message, even on IPC routing, than it does to kill a socket. That has upstream consequences with consideration of redirection and message pipes between multiple sockets. If you kill a socket too early after a message is flushed from the socket there is a good chance the destination sees the socket collapse before it has processed the final message off the socket and that processing delay is not something a remote location is easily aware of.
I have found for safety you need to allow an arbitrary delay of 100ms before killing sockets to ensure message completion which is likely why the protocol imposes a round trip of control frame opcode 8 before closing the connection the right way.
exabrial
I recall another complication with websockets: IIRC it's with proxy load balancers, like binding a connection to a single connection server, even if the backend connection is using HTTP/2. I probably have the details wrong. I'm sure someone will correct my statement.
atul-jalan
I think there is a way to do it, but it likely involves custom headers on the initial connection that the load balancer can read to route to the correct origin server.
I imagine the way it might go is that the client would first send an HTTP request to an endpoint that returns routing instructions, and then use that in the custom headers it sends when initiating the WebSocket connection.
Haven't tried this myself though.
arccy
I think it's more that WebSockets are held open for a long time, so if you're not careful, you can get "hot" backends with a lot of connections that you can't shift to a different instance. It can also be harder to rotate backends since you know you are disrupting a large number of active clients.
dboreham
The trick to doing this efficiently is to arrange for the live session state to be available (through replication or some data bus) at the alternative back end before cut over.
null
superjan
Assuming you control the client code, you can periodically disconnect and reconnect. This could also simplify deployment.
hpx7
Horizontal scaling is certainly a challenge. With traditional load balancers, you don't control which instance your clients get routed to, so you end up needing to use message brokers or stateful routing to ensure message broadcasts work correctly with multiple websocket server instances.
dilyevsky
> WebSocket connections can be unexpectedly blocked, especially on restrictive public networks.
What? How would public network even know you’re running a websocket if you’re using TLS? I dont think it’s really possible in general case
> Since SSE is HTTP-based, it's much less likely to be blocked, providing a reliable alternative in restricted environments.
And websockets are not http-based?
What article describes as challenges seems like very pedestrian things that any rpc-based backend needs to solve.
The real reason websockets are hard to scale is because they pin state to a particular backend replica so if the whole bunch of them disconnect at scale the system might run out of resources trying to re-load all that state
atul-jalan
The initial handshake will usually include an `Upgrade: websocket` header, which can be inspected by networks.
pk-protect-ai
I agree here. I have had an experience of scaling WebSockets server to 20M connections on a single server (with this one https://github.com/ITpC/LAppS.git). However there are several issues with scaling WebSockets, on the backends as well: mutex locking, non-parallel XOR of input stream, utf8 validation. I do not know the state of the above repository code, it seems that it was never updated for at least 5 years. There were bugs in HTTP parsing in the client part for some cases. Though vertical scalability was excellent. Sad this thing never reached production state.
Sytten
The comment about Render/Railway gracefully tranferring connections seems weird? I am pretty sure it just kills the service after the new one is alive which will kill the connections. Not some fancy zero downtime reconnect.
notatoad
for me, the most important lesson i've learned when using websockets is to not use them whenever possible.
i don't hate them, they're great for what they are, but they're for realtime push of small messages only. trying to use them for the rest of your API as well just throws out all the great things about http - like caching and load balancing, and just normal request/response architecture. while you can use websockets for that it's only going to cause you headaches that are already solved by simply using a normal http api for the vast majority of your api.
null
My SaaS has been using WebSockets for the last 9 years. I plan to stop using them and move to very simple HTTP-based polling.
I found that scalability isn't a problem (it rarely is these days). The real problem is crappy network equipment all over the world that will sometimes break websockets in strange and mysterious ways. I guess not all network equipment vendors test with long-lived HTTP websocket connections with plenty of data going over them.
At a certain scale, this results in support requests, and frustratingly, I can't do anything about the problems my customers encounter.
The other problems are smaller, but still annoying, for example it isn't easy to compress content transmitted through websockets.