4-10x faster in-process pub/sub for Go

singron

After a brief skim, it looks like this implementation is highly optimized for throughput and broadcasts whereas a channel has many other usecases.

Consumers subscribing to the same event type are placed in a group. There is a single lock for the whole group. When publishing, the lock is taken once and the event is replicated to each consumer's queue. Consumers take the lock and swap their entire queue buffer, which lets them consume up to 128 events per lock/unlock.

Since channels each have a lock and only take 1 element at a time, they would require a lot more locking and unlocking.

There is also some frequent polling to maintain group metadata, so this could be less ideal in low volume workloads where you want CPU to go to 0%.

minaguib

OP: the readme could really benefit from a section describing the underlying methodology, and comparing it to other approaches (Go channels, LMAX, etc...)

karel-3d

The actual code and the actual bench is very short.

qudat

This is pretty neat, code looks minimal as well. At pico.sh we wrote our own pubsub impl in Go that leveraged channels. We primarily built it to use with https://pipe.pico.sh

https://github.com/picosh/pubsub

With this impl can you stream data or is it just for individual events?

hinkley

It’s always worth discussing what features were thrown out to get the performance boost, whether it’s fair for those features to impose a tax on all users who don’t or rarely use those features, and whether there’s a way to rearrange the code so that the lesser used features are a low cost abstraction, one that you mostly only pay if you use those features and are cheap if not free if you don’t.

There’s a lot of spinoff libraries out there that have provoked a reaction from the core team that cuts down cost of their implementation by 25, 50%. And that’s a rising tide that lifts all boats.

kelindar

This might be useful to some if you need a very light pub/sub inside one process.

I was building a small multiplayer game in Go. Started with a channel fan-out but (for no particular reason) wanted to see if we can do better. Put together this tiny event bus to test, and on my i7-13700K it delivers events in 10-40ns, roughly 4-10x faster than the plain channel loop, depending on the configuration.

zx2c4

> about 4x to 10x faster than channels.

I'd be interested to learn why/how and what the underlying structural differences are that make this possible.

MathMonkeyMan

I didn't look, but I don't think of channels as a pub/sub mechanism. You can have a producer close() a channel to notify consumers of a value available somewhere else, or you can loop through a bunch of buffered channels and do nonblocking sends.

A different design, without channels, could improve on those.

atombender

I prefer to think of channels as a memory-sharing mechanism.

In most cases where you want to send data between concurrent goroutines, channels are a better primitive, as they allow the sender and receiver to safely and concurrently process data without needing explicit locks. (Internally, channels are protected with mutexes, but that's a single, battle-tested and likely bug-free implementation shared by all users of channels.)

The fact that channels also block on send/receive means and support buffering means that there's a lot more to them, but that's how you should think of them. The fact that channels look like a queue if you squint is a red herring that has caused many a junior developer to abuse them for that purpose, but they are a surprisingly poor fit for that. Even backpressure tends to be something you want to control manually (using intermediate buffers and so on), because channels can be fiendishly hard to debug once you chain more than a couple of them. Something forgets to close a close a channel, and your whole pipeline can stall. Channels are also slow, requiring mutex locking even in scenarios where data isn't in need of locking and could just be passed directly between functions.

Lots of libraries (such as Rill and go-stream) have sprung up that wrap channels to model data pipelines (especially with generics it's become easier to build generic operators like deduping, fan-out, buffering and so on), but I've found them to be a bad idea. Channels should remain a low-level primitive to build pipelines, but they're not what you should use as your main API surface.

MathMonkeyMan

> Channels should remain a low-level primitive to build pipelines, but they're not what you should use as your main API surface.

I remember hearing (not sure where) that this is a lesson that was learned early on in Go. Channels were the new hotness, so let's use them to do things that were not possible before. But it turned out that Go was better for doing what was already possible before, but more cleanly.

tombert

Interesting, I need to dig into the guts of this because this seems cool.

I'm a bit out of practice with Go but I never thought that the channels were "slow", so getting 4-10x the speed is pretty impressive. I wonder if it shares any design with LMAX Disruptor...

bob1029

> I wonder if it shares any design with LMAX Disruptor...

I've recently switched from using Disruptor.NET to Channel<T> in many of my .NET implementations that require inter-thread sync primitives. Disruptor can be faster, but I really like the semantics of the built-in types.

https://learn.microsoft.com/en-us/dotnet/core/extensions/cha...

https://learn.microsoft.com/en-us/dotnet/api/system.threadin...

absolute_unit22

> High Performance: Processes millions of events per second, about 4x to 10x faster than channels.

Wow - that’s a pretty impressive accomplishment. I’ve been meaning to move some workers I have to a pub/sub on https://www.typequicker.com.

I might try using this in prod. I don’t really need the insane performance benefits as I don’t have my traffic lol - but I always like experimenting with new open source libraries - especially while the site isn’t very large yet

MunishMummadi

that's a cool site. you will see me more frequently btw do some tech twitter promos.

HN

4-10x faster in-process pub/sub for Go

4-10x faster in-process pub/sub for Go