Kafka is Fast – I'll use Postgres
166 comments
·October 29, 2025munchbunny
Capricorn2481
> If it can't fit on one node, do you really need a distributed queue? (Alternative: good ol' load balancing and REST API's, maybe with async semantics and retry semantics)
That sounds distributed to me, even if it wires different tech together to make it happen. Is there something about load balancing REST requests to different DB nodes that is less complicated than Kafka?
oulipo2
I want to rewrite some of my setup, we're doing IoT, and I was planning on
MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)
(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)
this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3
Questions:
1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?
3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?
4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)
What would be the recommendation?
singron
Re (2) there is a lot of vacuuming, but the table is small, and it's usually very fast and productive.
You can run into issues with scheduled queues (e.g. run this job in 5 minutes) since the tables will be bigger, you need an index, and you will create the garbage in the index at the point you are querying (jobs to run now). This is a spectacularly bad pattern for postgres at high volume.
zozbot234
> Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
Doesn't PostgreSQL have transactional schema updates as a key feature? AIUI, you shouldn't be having any data loss as a result of such changes. It's also common to use views in order to simplify the management of such updates.
Sparkyte
You can also use Redis as a queue if the data isn't in danger of being too important.
agentultra
You have to be careful with the approach of using Postgres for everything. The way it locks tables and rows and the serialization levels it guarantees are not immediately obvious to a lot of folks and can become a serious bottle-neck for performance-sensitive workloads.
I've been a happy Postgres user for several decades. Postgres can do a lot! But like anything, don't rely on maxims to do your engineering for you.
sneilan1
Yes, performance can be a big issue with postgres. And vertical scaling can really put a damper on things when you have a major traffic hit. Using it for kafka is misunderstanding the one of the great uses of kafka which is to help deal with traffic bursts. All of a sudden your postgres server is overwhelmed and the kafka server would be fine.
SoftTalker
This is true of any data storage. You have to understand the concurrency model and assumptions, and know where bottlenecks can happen. Even among relational databases there are significant differences.
fukka42
My strategy is to use postgres first. Get the idea off the ground and switch when postgres becomes the bottleneck.
It often doesn't.
jorge-d
Definitely, this is also one of the direction Rails is heading[1]: provide a basis setup most of the people can use out of the box. And if needed you can always plug in more "mature" solutions afterwards.
[1] https://rubyonrails.org/2024/11/7/rails-8-no-paas-required
j45
100%
Postgres isn’t meant to be a guaranteed permanent replacement.
It’s a common starting point for a simpler stack which can retain a greater deal of flexibility out of the box and increased velocity.
Starting with Postgres lets the bottlenecks reveal themselves, and then optimize from there.
Maybe a tweak to Postgres or resources, or consider a jump to Kafka.
fud101
When someone says just use Postgres, are they using the same instance for their data as well for the queue?
marcosdumay
When people say "just use postgres" it's because their immediate need is so low that this doesn't matter.
And the thing is, a server from 10 years ago running postgres (with a backup) is enough for most applications to handle thousands of simultaneous users. Without even going into the kinds of optimization you are talking about. Adding ops complexity for the sake of scale on the exploratory phase of a product is a really bad idea when there's an alternative out there that can carry you until you have fit some market. (And for some markets, that's enough forever.)
victorbjorklund
Yes, I often use PG for queues on the same instance. Most of the time you dont see any negative effects. For a new project with barely any users it doesn’t matter.
j45
It can be a different database in the same server or a separate server.
When you’re doing hundreds or thousands of transactions to begin with it doesn’t really impact as much out of the gate.
Of course there will be someone who will pull out something that won’t work but such examples can likely be found for anything.
We don’t need to fear simplification, it is easy to complicate later when the actual complexities reveal themselves.
ryandvm
I think my only complaint about Kafka is the widespread misunderstanding that it is a suitable replacement for a work queue. I should not be having to explain to an enterprise architect the distinction between a distributed work queue and event streaming platform.
uberduper
Has this person actually benchmarked kafka? The results they get with their 96 vcpu setup could be achieved with kafka on the 4 vcpu setup. Their results with PG are absurdly slow.
If you don't need what kafka offers, don't use it. But don't pretend you're on to something with your custom 5k msg/s PG setup.
PeterCorless
Exactly. Just yesterday someone posted how they can do 250k messages/second with Redpanda (Kafka-compatible implementation) on their laptop.
https://www.youtube.com/watch?v=7CdM1WcuoLc
Getting even less than that throughput on 3x c7i.24xlarge — a total of 288 vCPUs – is bafflingly wasteful.
Just because you can do something with Postgres doesn't mean you should.
> 1. One camp chases buzzwords.
> 2. The other camp chases common sense
In this case, is "Postgres" just being used as a buzzword?
[Disclosure: I work for Redpanda; we provide a Kafka-compatible service.]
kragen
This sounded interesting to me, and it looks like the plan is to make Redpanda open-source at some point in the future, but there's no timeline: https://github.com/redpanda-data/redpanda/tree/dev/licenses
PeterCorless
Correct. Redpanda is source-available.
When you have C++ code, the number of external folks who want to — and who can effectively, actively contribute to the code — drops considerably. Our "cousins in code," ScyllaDB last year announced they were moving to source-available because of the lack of OSS contributors:
> Moreover, we have been the single significant contributor of the source code. Our ecosystem tools have received a healthy amount of contributions, but not the core database. That makes sense. The ScyllaDB internal implementation is a C++, shard-per-core, future-promise code base that is extremely hard to understand and requires full-time devotion. Thus source-wise, in terms of the code, we operated as a full open-source-first project. However, in reality, we benefitted from this no more than as a source-available project.
Source: https://www.scylladb.com/2024/12/18/why-were-moving-to-a-sou...
People still want to get free utility of the source-available code. Less commonly they want be able to see the code to understand it and potentially troubleshoot it. Yet asking for active contribution is, for almost all, a bridge too far.
kermatt
To the issue of complexity, is Redpanda suitable as a "single node implementation" where a Kafka cluster is not needed due to data volume, but the Kafka message bus pattern is desired?
AKA "Medium Data" ?
j45
Is it about what Kafka could get or what you need right now.
Kafka is a full on steaming solution.
Postgres isn’t a buzzword. It can be a capable placeholder until it’s outgrown. One can arrive at Kafka with a more informed run history from Postgres.
kitd
> Kafka is a full on steaming solution.
Freudian slip? ;)
mxey
Doesn’t Kafka/Redpanda have to fsync for every message?
PeterCorless
Yes, for Redpanda. There's a blog about that:
"The use of fsync is essential for ensuring data consistency and durability in a replicated system. The post highlights the common misconception that replication alone can eliminate the need for fsync and demonstrates that the loss of unsynchronized data on a single node still can cause global data loss in a replicated non-Byzantine system."
However, for all that said, Redpanda is still blazingly fast.
https://www.redpanda.com/blog/why-fsync-is-needed-for-data-s...
uberduper
I've never looked at redpanda, but kafka absolutely does not. Kafka uses mmapped files and the page cache to manage durable writes. You can configure it to fsync if you like.
kragen
Definitely not in the case of Kafka. Even with SSD that would limit it to around 100kHz. Batch commit allows Kafka (and Postgres) to amortize fsync overhead over many messages.
UltraSane
On enterprise grade storage writes go to NVRAM buffers before being flushed to persistent storage so this isn't much of a bottleneck.
jaimebuelta
I may be reading a bit extra, but my main take on this is: "in your app, you probably already have PostgreSQL. You don't need to set up an extra piece of infrastructure to cover your extra use case, just reuse the tool you already have"
It's very common to start adding more and more infra for use cases that, while technically can be better cover with new stuff, it can be served by already existing infrastructure, at least until you have proof that you need to grow it.
010101010101
> If you don't need what kafka offers, don't use it.
This is literally the point the author is making.
blenderob
>> If you don't need what kafka offers, don't use it.
> This is literally the point the author is making.
Exactly! I just don't understand why HN invariably always tends to bubble up the most dismissive comments to the top that don't even engage with the actual subject matter of the article!
uberduper
It seems like their point was to criticize people for using new tech instead of hacking together unscalable solutions with their preferred database.
blenderob
That wasn't their point. Instead of posting snarky comments, please review the site guidelines:
"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize."
PeterCorless
But in this case, it is like saying "You don't need a fuel truck. You can transport 9,000 gallons of gasoline between cities by gathering 9,000 1-gallon milk jugs and filling each, then getting 4,500 volunteers to each carry 2 gallons and walk the entire distance on foot."
In this case, you do just need a single fuel truck. That's what it was built for. Avoiding using a design-for-purpose tool to achieve the same result actually is wasteful. You don't need 288 cores to achieve 243,000 messages/second. You can do that kind of throughput with a Kafka-compatible service on a laptop.
[Disclosure: I work for Redpanda]
ilkhan4
I'll push the metaphor a bit: I think the point is that if you have a fleet of vehicles you want to fuel, go ahead and get a fuel truck and bite off on that expense. However, if you only have 1 or 2, a couple of jerry cans you probably already have + a pickup truck is probably sufficient.
kragen
Getting a 288-core machine might be easier than setting up Kafka; I'm guessing that it would be a couple of weeks of work to learn enough to install Kafka the first time. Installing Postgres is trivial.
loire280
In fact, a properly-configured Kafka cluster on minimal hardware will saturate its network link before it hits CPU or disk bottlenecks.
altcognito
This doesn't even make sense. How do you know what the network links or the other bottlenecks are like? There are a grandiose number of assumptions being made here.
theK
Isn't that true for everything on the cloud? I thought we are long into the era where your disk comes over the network there.
UltraSane
A network link can be anything from 1Gbps to 800Gbps.
j45
But it can do so many processes a second I’ll be able to scale to the moon before I ever launch.
adamtulinius
I remember doing 900k writes/s (non-replicated) already back on kafka 0.8 with a random physical server with an old fusionio drive (says something about how long ago this was :D).
It's a fair point that if you already have a pgsql setup, and only need a few messages here and there, then pg is fine. But yeah, the 96 vcpu setup is absurd.
ljm
I wonder if OP could have got different results if they implemented a different schema as opposed to mimicking Kafka's setup with the partitions, consumer offsets, etc.
I might well be talking out of my arse but if you're going to implement pub/sub in Postgres, it'd be worth designing around its strengths and going back to basics on event sourcing.
darth_avocado
The 96 vcpu setup with 24xlarge instance costs about $20k/month on AWS before discounts. And one thing you don’t want in a pub sub system is a single instance taking all the read/writes. You can run a sizeable Kafka cluster for that kind of money in AWS.
joaohaas
Had the same thoughts, weird it didn't include Kafka numbers.
Never used Kafka myself, but we extensively use Redis queues with some scripts to ensure persistency, and we hit throughputs much higher than those in equivalent prod machines.
Same for Redis pubsubs, but those are just standard non-persistent pubsubs, so maybe that gives it an upper edge.
vbezhenar
How do you implement "unique monotonically-increasing offset number"?
Naive approach with sequence (or serial type which uses sequence automatically) does not work. Transaction "one" gets number "123", transaction "two" gets number "124". Transaction "two" commits, now table contains "122", "124" rows and readers can start to process it. Then transaction "one" commits with its "123" number, but readers already past "124". And transaction "one" might never commit for various reasons (e.g. client just got power cut), so just waiting for "123" forever does not cut it.
Notifications can help with this approach, but then you can't restart old readers (and you don't need monotonic numbers at all).
xnorswap
It's a tricky problem, I'd recommend reading DDIA, it covers this extensively:
https://www.oreilly.com/library/view/designing-data-intensiv...
You can generate distributed monotonic number sequences with a Lamport Clock.
https://en.wikipedia.org/wiki/Lamport_timestamp
The wikipedia entry doesn't describe it as well as that book does.
It's not the end of the puzzle for distributed systems, but it gets you a long way there.
See also Vector clocks. https://en.wikipedia.org/wiki/Vector_clock
Edit: I've found these slides, which are a good primer for solving the issue, page 70 onwards "logical time":
https://ia904606.us.archive.org/32/items/distributed-systems...
theK
> unique monotonically-increasing offset number
Isn't it a bit of a white whale thing that a umion can solve all one's subscriber problems? Afaik even with kafka this isn't completely watertight.
singron
The log_counter table tracks this. It's true that a naive solution using sequences does not work for exactly the reason you say.
sigseg1v
What about a `DEFERRABLE INITIALLY DEFERRED` trigger that increments a sequence only on commit?
jimbokun
For me the killer feature of Kafka was the ability to set the offset independently for each consumer.
In my company most of our topics need to be consumed by more than one application/team, so this feature is a must have. Also, the ability to move the offset backwards or forwards programmatically has been a life saver many times.
Does Postgres support this functionality for their queues?
Jupe
Isn't it just a matter of having each consumer use their own offset? I mean if the queue table is sequentially or time-indexed, the consumer just provides a smaller/earlier key to accomplish the offset? (Maybe I'm missing something here?)
altcognito
Correct, offsets and sharding aren't magic. And partitions in Kafka are user defined, just like they would be for postgresql.
altcognito
The article basically states unless you need a lot of throughput, you probably don't need Kafka. (my interpretation extends to say) You probably don't need offsets because you don't need multi-threaded support because you don't need multiple threads.
I don't know what kind of native support PG has for queue management, the assumption here is that a basic "kill the task as you see it" is usually good enough and the simplicity of writing and running a script far outweighs the development, infrastructure and devops costs of Kafka.
But obviously, whether you need stuff to happen in 15 seconds instead of 5 minutes, or 5 minutes instead of an hour is a business decision, along with understanding the growth pattern of the workload you happen to have.
j45
PG has several queue management extensions and I’m working my way through trying them out.
Here is one: https://pgmq.github.io/pgmq/
Some others: https://github.com/dhamaniasad/awesome-postgres
Most of my professional life I have considered Postgres folks to be pretty smart… while I by chance happened to go with MySQL and it became the rdbms I thought in by default.
Heavily learning about Postgres recently has been okay, not much different than learning the tweaks for msssl, oracle or others. Just have to be willing to slow down a little for a bit and enjoy it instead of expecting to thrush thru everything.
misja111
> One camp chases buzzwords .. the other common sense
How is it common sense to try to re-implement Kafka in Posgres? You probably need something similar but more simple. Then implement that! But if you really need something like Kafka, then .. use Kafka!
IMO the author is now making the same mistake as some Kafka evangelists that try to implement a database in Kafka.
enether
I’m making the example of a pub sub system. I’m most familiar with Kafka so drew parallels to it. I didn’t actually implement everything Kafka offers - just two simple pub sub like queries.
phendrenad2
Since everyone is offering what they think the "camps" should be, here's another perspective. There are two camps: (A) Those who look at performance metrics ("96 cores to get 240MB/s is terrible") and assume that performance itself is enough to justify overruling any other concern (B) Those who look at all of the tradeoffs, including budget, maintenance, ease-of-use, etc.
You see this a lot in the tech world. "Why would you use Python, Python is slow" (objectively true, but does it matter for your high-value SaaS that gets 20 logins per day?)
ownagefool
The camps are wrong.
There's poles.
1. Is folks constantly adopting the new tech, whatever the motivation, and 2. I learned a thing and shall never learn anything else, ever.
Of course nobody exists actually on either pole, but the closer you are to either, the less pragmatic you are likely to be.
wosined
I am the third pole: 3. Everything we have currently sucks and what is new will suck for some hitherto unknown reason.
ownagefool
Heh, me too.
I think it's still just 2 poles. However, I probably shouldn't have prescribed motivation to latter pole, as I purposely did not with the former.
Pole 2 is simply never adopt anything new ever, for whatever the motivation.
binarymax
This is it right here. My foil is the Elasticsearch replacement because PG has inverted indices. The ergonomics and tunability of these in PG are terrible compared to ES. Yes, it will search, but I wouldn’t want to be involved in constructing or maintaining that search.
johnyzee
Seems like you would at the very least need a fairly thick application layer on top of Postgres to make it look and act like a messaging system. At that point, seems like you have just built another messaging system.
Unless you're a five man shop where everybody just agrees to use that one table, make sure to manage transactions right, cron job retention, YOLO clustering, etc. etc.
Performance is probably last on the list of reasons to choose Kafka over Postgres.
j45
You expose the api on Postgres much like any other group of developers use and call it a day.
There’s several implementations of queues to increase the chance of finishing what one is after. https://github.com/dhamaniasad/awesome-postgres
losvedir
Maybe I missed it in the design here, but this pseudo-Kafka Postgres implementation doesn't really handle consumer groups very well. The great thing about Kafka consumer groups is it makes it easy to spread the load over several instances running your service. They'll all connect using the same group, and different partitions will be assigned to the different instances. As you scale up or down, the partition responsibilities will be updated accordingly.
You need some sort of server-side logic to manage that, and the consumer heartbeats, and generation tracking, to make sure that only the "correct" instances can actually commit the new offsets. Distributed systems are hard, and Kafka goes through a lot of trouble to ensure that you don't fail to process a message.
mrkeen
Right, the author's worldview is that Kafka is resume-driven development, used by people "for speed" (even though they are only pushing 500KB/s).
Of course the implementation based off that is going to miss a bit.
My general opinion, off the cuff, from having worked at both small (hundreds of events per hour) and large (trillions of events per hour) scales for these sorts of problems:
1. Do you really need a queue? (Alternative: periodic polling of a DB)
2. What's your event volume and can it fit on one node for the foreseeable future, or even serverless compute (if not too expensive)? (Alternative: lightweight single-process web service, or several instances, on one node.)
3. If it can't fit on one node, do you really need a distributed queue? (Alternative: good ol' load balancing and REST API's, maybe with async semantics and retry semantics)
4. If you really do need a distributed queue, then you may as well use a distributed queue, such as Kafka. Even if you take on the complexity of managing a Kafka cluster, the programming and performance semantics are simpler to reason about than trying to shoehorn a distributed queue onto a SQL DB.