Skip to content(if available)orjump to list(if available)

Anything can be a message queue if you use it wrongly enough (2023)

rented_mule

In the 1990s, I was at a startup that had a need for a message queue. The only thing we found at the time was a product from TIBCO that was priced way-way-way out of our reach. IIRC, it didn't even run on PCs, only mainframes and minis. Microsoft Exchange Server (Microsoft's email server) had just been released at the time, and we decided to use it as a message queue.

Message-submitting clients used SMTP libraries. Message-consuming clients used Exchange APIs. Consumers would only look at unread messages, they would mark messages as read when they started processing, and move them to a folder other than the Inbox if they succeeded. Many of the queues were multi-producer, but all queues were single-consumer (CPUs were pricey at the time - our servers were all Pentiums and Pentium Pros), which simplified things a lot.

Need a new queue / topic? Add an email address. Need to inspect a queue? Load up an email client. An unexpected benefit was that we could easily put humans in the loop for handling certain queues (using HTML in the messages).

It worked surprisingly well for the 5 years that the company was around. Latency was okay, but not great. Throughput was much better than we would have hoped for - Exchange was almost never the bottleneck.

supportengineer

I can assure you that various companies are still doing this. It still works and for all the same reasons as you list.

stephenlf

Remember when Amazon Video moved from serverless back to a monolith because they were using S3 for storing video streams for near realtime processing? This feels the same. Except Amazon Video is an actual company trying to build real software.

Amazon Video’s original blog post is gone, but here is a third party writeup. https://medium.com/@hellomeenu1/why-amazon-prime-video-rever...

thrance

IIRC they were storing individual frames in S3 buckets and hitting their own internal lambda limits. Funny story tbh.

LeifCarrotson

You remember correctly:

> The main scaling bottleneck in the architecture was the orchestration management that was implemented using AWS Step Functions. Our service performed multiple state transitions for every second of the stream, so we quickly reached account limits. Besides that, AWS Step Functions charges users per state transition.

> The second cost problem we discovered was about the way we were passing video frames (images) around different components. To reduce computationally expensive video conversion jobs, we built a microservice that splits videos into frames and temporarily uploads images to an Amazon Simple Storage Service (Amazon S3) bucket. Defect detectors (where each of them also runs as a separate microservice) then download images and processed it concurrently using AWS Lambda. However, the high number of Tier-1 calls to the S3 bucket was expensive.

They were really deeply drinking the AWS serverless kool-aid if they thought the right way to stream video was multiple microservices accessing individual frames on S3...

pythonaut_16

It’s more honesty that you see from most service providers, both dogfooding the approach and not handwaving the costs.

wat10000

Every time they order Chinese takeout, two thousand cars show up, each carrying one grain of rice.

mikepurvis

Has a lot of “orders from on high to dog food all the things” energy.

breppp

My guess was "no real cost accounting for service usage internally, until one day zero interest ends and a VP changes that"

moi2388

That’s hilarious

lloydatkinson

They deleted their own post?

It couldn’t possibly be because AWS execs were pissed or anything… /s

spectraldrift

People often forget a message queue is just a simple, high-throughput state machine.

It's tempting to roll your own by polling a database table, but that approach breaks down- sometimes even at fairly low traffic levels. Once you move beyond a simple cron job, you're suddenly fighting row locking and race conditions just to prevent significant duplicate processing; effectively reinventing a wheel, poorly (potentially 5 or 10 times in the same service).

A service like SQS solves this with its state management. A message becomes 'invisible' while being processed. If it's not deleted within the configurable visibility timeout, it transitions back to available. That 'fetch next and mark invisible' state transition is the key, and it's precisely what's so difficult to implement correctly and performantly in a database every single time you need it.

groone

Message becomes invisible in a regular relational database when using `SELECT FOR UPDATE SKIP LOCKED`

spectraldrift

That's totally feasible, and works for small to medium traffic (SQS scales seamlessly from 1 message per year to millions per second).

In practice, I've never seen this implemented correctly in the wild- most people don't seem to care enough to handle the transactions properly. Additionally, if you want additional features like DLQs or metrics on stuck message age, you'll end up with a lot more complexity just to get parity with a standard queue system.

A common library could help with this though.

kerblang

Overall it's completely feasible to build a message queue with RDBMS _because_ they have locking. You might end up doing extra work compared to some other products that make message queueing easy/fun/so-simple-caveman-etc.

Now if SQS has some super-scalar mega-cluster capability where one instance can deliver 100 billion messages a day across the same group of consumers, ok, I'm impressed, because most MQ's can't, because... locking. Thus Kafka (which is not a message queue).

I think the RDBMS MQ should be treated as the "No worse than this" standard - if my fancy new message queueing product is even harder to set up, it isn't worth your trouble. But SQS itself IS pretty easy to use.

redbell

On a totally unrelated topic, I once read a meme online that says: "If you ever felt useless, remember ueue in queue!"

null

[deleted]

no_thank_you

The truly cursed thing in the article is this bit near the end (unless this is part of the satire):

"Something amusing about this is that it is something that technically steps into the realm of things that my employer does. This creates a unique kind of conflict where I can't easily retain the intellectial property (IP) for this without getting it approved from my employer. It is a bit of the worst of both worlds where I'm doing it on my own time with my own equipment to create something that will be ultimately owned by my employer. This was a bit of a sour grape at first and I almost didn't implement this until the whole Air Canada debacle happened and I was very bored."

mananaysiempre

Yes, I guess this is how we learn that Tailscale will lay claim to things you do on your own time using your own machine.

IIAOPSW

Even HN comment sections?

npteljes

Of course. A message queue is database, and software that handles it in a specific way to make it a message queue. So, HN could basically be that database backend for that imaginary software that turns it into a message queue.

I don't have fun examples with message queues, but I do remember some with filesystems - a popular target to connect cursed backends to. You can store data in Ping packets [0]. You can store data in the digits of Pi - achieving unbelievable compression [1]. You can store data in the metadata and other unused blocks of images - also known as steganography [2]. People wrote software to use Gmail emails as a file system [3].

That's just from the top of my head, and it really shows that sky's the limit with software.

[0] https://github.com/yarrick/pingfs

[1] https://github.com/ajeetdsouza/pifs

[2] https://en.wikipedia.org/wiki/Steganographic_file_system

[3] https://lwn.net/Articles/99933/

tux3

ACK

pwagland

Although latency is shockingly bad.

unmotivated-hmn

Even HN comment sections?

therein

At least once delivery.

stego-tech

This is beyond cursed and I love it.

dwedge

I thought the "multiple anime personalities explaining things to each other" style of tech blogging was so 2018

unmotivated-hmn

My first time seeing it. I was somewhat pleasantly confused.

dang

Discussed at the time:

Anything can be a message queue if you use it wrongly enough - https://news.ycombinator.com/item?id=36186176 - June 2023 (239 comments)

packetlost

I once had a coworker use GitLab + a git repo + webhooks to implement a queued event system. Some change (I think it was in Jenkins) would call a webhook which would append to some JSON array in a repo, commit it, which would itself trigger something else downstream. It was horrifying and glorious.

adamcharnock

I had a developer colleague a while back who was toying with an idea that would require emitting and consuming a _lot_ of messages. I think it was somewhere on the order of 10k-100k/second. He was looking at some pretty expensive solutions IIRC.

I asked if the messages were all under 1.5kb, he said yes. I asked if at-most-one delivery was ok, he said yes. So I proposed he just grab a router and fire messages through it as UDP packets, then use BGP/ECMP to balance the packets between receivers. Add some queues on the router, then just let the receivers pick up the packets as fast as they could. You'd need some kind of feedback to manage back pressure, but ¯\_(ツ)_/¯

A fairly cheap way to achieve 1M+ messages per second.

I never got the chance to flesh-out the idea fully, but the simplicity of it tickled me. Maybe it would have worked, maybe not.

HeyLaughingBoy

Isn't a fundamental property of a queue that it's FIFO?

UDP message delivery order is not guaranteed. Hell, UDP delivery itself is not guaranteed (although IME, messages don't usually get dropped unless they cross subnets).

bdcravens

> UDP delivery itself is not guaranteed

> I asked if at-most-one delivery was ok, he said yes.

Use case satisfied.

adamcharnock

> UDP message delivery order is not guaranteed

My thinking was that ordering would be pretty unaffected when there is only a single hop. But yeah, we would have needed to test that under load.

DoneWithAllThat

Corollary: every message queue can be a database if you use it wrongly enough.

rcleveng

Generalized a bit: everything can be a database if you use it wrongly enough.