Super fast aggregations in PostgreSQL 19
5 comments
·November 26, 2025aidos
yfontana
> In the examples given, it’s much faster, but is that mostly due to the missing indexes? I’d have thought that an optimal approach in the colour example would be to look at the product.color_id index, get the counts directly from there and you’re pretty much done.
So I tried to test this (my intuition being that indexes wouldn't change much, at best you could just do an index scan instead of a seq scan), and I couldn't understand the plans I was getting, until I realized that the query in the blog post has a small error:
> AND c1.category_id = c1.category_id
should really be
> AND p.category_id = c1.category_id
otherwise we're doing a cross-product on the category. Probably doesn't really change much, but still a bit of an oopsie. Anyway, even with the right join condition an index only reduces execution time by about 20% in my tests, through an index scan.
yfontana
Interestingly, "aggregate first, join later" has been the standard way of joining fact tables in BI tools for a long time. Since fact tables are typically big and also share common dimensions, multi-fact joins for drill-across are best done by first aggregating on those common dimensions, then joining on them.
Makes you wonder how many cases there are out there of optimizations that feel almost second nature in one domain, but have never been applied to other domains because no one thought of it.
pgelephant2025
[flagged]
isoprophlex
you didn't even have the decency to strip the inlined citations out of the drivel that your stochastic parrot produced.
stop it. seriously. i viscerally hate this comment and everything it represents.
what would hn look like if we'd all start slinging low effort ai output into the comment fields? i come here to read humans talking to humans. llm output has ZERO place in this comment section, especially verbatim like this.
if i wanted an llm to summarize this, i would have done so myself.
The key idea here seems to be that if you’re grouping on a column on a related table you can do your main aggregation by grouping on the foreign key id on the primary table and use that as a proxy for the data on the related table that you’re actually grouping by.
In the examples given, it’s much faster, but is that mostly due to the missing indexes? I’d have thought that an optimal approach in the colour example would be to look at the product.color_id index, get the counts directly from there and you’re pretty much done.
I have a feeling that Postgres doesn’t make that optimisation (I’ve looked before, but it was older Postgres). And I guess depending on the aggregation maybe it’s not useful in the general case. Maybe in this new world it _can_ make that optimisation?
Anyway, as ever, pg just getting faster is always good.