450× Faster Joins with Index Condition Pushdown

jamesblonde

We call these pushdown joins in rondb. They only support an equality condition for the index condition. Joins with index condition pushdown is a bit of a mouthful.

We also went from like 6 seconds to 50ms. Huge speedup.

Reference

https://docs.rondb.com/rondb_parallel_query/#pushdown-joins

null

[deleted]

flufluflufluffy

I read their website landing page but it’s still kinda confusing — what exactly is readyset? It all sounds like it’s a cache you can set up in front of MySQL/postgres. But then this article is talking about implementing joins which is what the database itself would do, not a cache. But then the blurbs talk about it like it’s a “CDN for your database” that brings your data to the edge. What the heck is it?!

Sesse__

It seems to be some sort of read-only reimplementation of MySQL/Postgres that can ingest their replication streams and materialize views (for caching). Complete with a really primitive optimizer, if the article is to be believed.

ianks

I love this type of practical optimization for DB queries. I’ve always liked how [rom-rb](https://rom-rb.org/learn/core/5.2/combines/) made the combine pattern easy to use when joins are slow. Nice to see this implemented at DB layer

dangoodmanUT

Maybe it's not obvious initially, but in retrospect, this handling of joins feels like the obvious way to handle it.

Push down filters to read the least data possible.

Or, know your data and be able to tell the query engine which kind of join strategy you would like (hash vs push down)

null

[deleted]

SoftTalker

Decades ago we used to provide hints in queries based on "knowing the data" but modern optimizers have a lot better statistics on indexes, and the need to tell the query optimizer what to do should be rare.

bdcravens

What database engine is this in? You reference your product, but I assume this is in MySQL/MariaDB?

https://dev.mysql.com/doc/refman/9.4/en/index-condition-push...

Sesse__

This isn't really the same as MySQL's ICP; it seems more like what MySQL would call a “ref” or “eq_ref” lookup, i.e. a simple lookup on an indexed value on the right side of a nested-loop join. It's bread and butter for basically any database optimizer.

ICP in MySQL (which can be built on top of ref/eq_ref, but isn't part of the normal index lookup per se) is a fairly weird concept where the storage engine is told to evaluate certain predicates on its own, without returning the row to the optimizer. This is to a) reduce the number of round-trips (function calls) from the executor down into the storage engine, and b) because InnoDB's secondary indexes need an extra storage round-trip to return the row (secondary indexes don't point at the row you want, they contain the primary key and then you have to lookup the actual row from the PK), so if you can remove the row early, you can skip the main row lookup.

LtdJorge

Seems like they are caching MySQL with their own layer built on RocksDB.

null

[deleted]

vjerancrnjak

Another example of row based dbs somehow being insanely slow compared to column based.

Just an endless sequence of misbehavior and we’re waving it off as rows work good for specific lookups but columns for aggregations, yet here it is all the other stuff that is unreasonably slow.

tharkun__

It's an example. But not of that.

It's an example of old things being new again maybe. Or reinventing the wheel because the wheel wasn't known to them.

Yes I know, nobody wants to pay that tax or make that guy richer, but databases like Oracle have had JPPD for a long time. It's just something the database does and the optimizer chooses whether to do it or not depending on whether it's the best thing to do or not.

rotis

Exactly. This is a basic optimization technique and all the dinosaur era databases should have that. But if you build a new database product you have to implement these techniques from scratch. There is no way you shortcut that. Reminds me about CockroachDB and them building a query optimizer[1]. They started with rule based one and then switched to cost based. Feature that older databases already had.

[1] https://www.cockroachlabs.com/blog/building-cost-based-sql-o...

sschnei8

I feel like this is more an example of:

“We filtered first instead of reading an entire table from disk and performing a lookup”

Where both OLAP and OLTP dbms would benefit.

To your point, it’s clear certain workloads lend themselves to OLAP and columnar storage much better, but “an endless sequence of misbehavior” seems a bit harsh .

marceloaltmann

Straddled joins were still a bottleneck in Readyset even after switching to hash joins. By integrating Index Condition Pushdown into the execution path, we eliminated the inefficiency and achieved up to 450× speedups.

LtdJorge

Why downvote?

airstrike

Reads like an ad written by an LLM, is my guess.

It could just be that they translated from their original language to English and got that as a byproduct. Many such cases.

Sesse__

It also does not add anything interesting to the discussion. Like, why add a bland summary of the article?

HN

450× Faster Joins with Index Condition Pushdown

450× Faster Joins with Index Condition Pushdown