Skip to content(if available)orjump to list(if available)

Materialized views are obviously useful

erulabs

What a great post. Humble and honest and simple and focused on an issue most developers think is so simple (“why not just vibe code SQL?”, “whatever, just scale up the RDS instance”).

Compliments aside, where this article stops is where things get exciting. Postgres shines here, as does Vitess, Cassandra, ScyllaDB, even MongoDB has materialized views now. Vitess and Scylla are so good, it’s a shame they’re not more popular among smaller startups!

What I haven’t seen yet is a really good library for managing materialized views.

malthejorgensen

Don’t you have to manually “refresh” Postgres materialized views, essentially making it an easier to implement cache (the Redis example in the blog post) rather than the type always-auto-updating materialized view the blog post author is actually touting?

lbreakjai

Out of the box, you're right, but there are extensions that do just that:

https://github.com/sraoss/pg_ivm

It's however not available on RDS, so I've never had the chance to try it myself.

xixixao

Convex's queries[0] are another example, with perhaps a somewhat simpler approach to the tracking and invalidation.

[0] https://stack.convex.dev/how-convex-works

thom

Materialize.com and Snowflake have pretty reliable incremental materialised views now, with caveats that aren’t back breaking. If you can transform in SQL rather than having to build a whole new pipeline or microservice to do the work that’s a pure operational win. I consider this alongside hybrid transactional/analytical databases to be the holy grail of data infrastructure. Finally we can stop just shuffling data around, support almost all workloads in one place, and get some work done.

aboodman

FYI, Zero uses incremental view maintenance internally as the core of its sync engine:

https://zero.rocicorp.dev/docs/introduction

IVM is what allows the core feature of our DX: that users can just do a fairly large query with sorts and joins, like:

  zero.query.posts.where('authorID', 42)
    .orderBy('created', 'desc')
    .limit(1000)
    .related('comments',
      c => c.orderBy('created', 'desc').limit(10))
... and we sync updates to this query incrementally to the client.

TanStack DB also uses IVM, but only client-side currently.

If you are interested in such things, you can take a poke around the source here: https://github.com/rocicorp/mono/tree/main/packages/zql. Or come find us in Discord: https://discord.rocicorp.dev/

quectophoton

> And then by magic the results of this query will just always exist and be up-to-date.

With PostgreSQL the materialized view won't be automatically updated though, you need to do `REFRESH MATERIALIZED VIEW` manually.

oftenwrong

There is a PostgreSQL extension that adds support for incremental updates to materialised views: https://github.com/sraoss/pg_ivm

4ndrewl

Just landed here to write this. Materialized Views are _very_ implementation specific and are definitely _not_ magic.

It's important to understand how your implementation works before committing to it.

shivasaxena

Curious if anyone know any implementation where they would be automatically updated?

Now that would be awesome!

EDIT: come to think of it, it would require going through CDC stream, and figuring out if any of the tables affected are a dependency of given materialized view. Maybe with some ast parsing as well to handle tenants/partitions. Sounds like it can work?

lsuresh

Do give us at Feldera a shot -- full IVM for arbitrary SQL + UDFs: https://github.com/feldera/feldera/

magicalhippo

MSSQL and Sybase SQLAnywhere has options for that, we use it a fair bit with both.

At least on SQLAnywhere it seems to be implemented using triggers, ie not unlike what one would do if rolling your own.

4ndrewl

Oracle will do - a couple of options, either a full rebuild or an incremental rebuild.

cyanydeez

Postgesql knows when you try to drop its dependencies so shouldnt be to hard to watch traffic

jelder

Did I miss in the article where OP reveals the magic database that actually does this?

3rd party solutions like https://readyset.io/ and https://materialize.com/ exist specifically because databases don’t actually have what we all want materialized views to be.

anon84873628

In the analytics world, BigQuery MVs seem pretty cool. You can tune the freshness parameters, it will maintain user-specific row-level security, and even rewrite regular queries to use the pre-computed aggregates if possible.

But I don't think there is anything similar in the GCP transactional db options like Spanner or CloudSQL.

rapind

You can do targeted materialized view updates via triggers. It's definitely verbose but does give you a lot of control.

I'm currently parking PostgREST behind Fastly (varnish) for pretty much the same benefits plus edge CDNs for my read APIs. I really just use materialized views for report generation now.

sophiebits

These startups (and a handful of others) are what I meant!

jelder

What others do you know of?

dalyons

Postgres materialized views are pretty terrible / useless compared to other rdbms. I’ve never found a usecase for the very limited pg version.

JohnBooty

I've only used Postgres' and (ages ago) MSSQL's materialized views. What is pg missing compared to the others?

I've found them VERY useful for a narrow range of use cases but, I probably don't realize what I'm missing.

globular-toast

In Postgres a materialized view is basically a table that remembers the query used to generate it. Useful if you want to trigger the refreshes without knowing the queries.

thrown-0825

having a dataset refresh on a timer and cache the result for future queries is pretty useful

bob1029

> I don’t know yet if the implementations of this yet are good enough to use at scale. Maybe they’re slow or maybe the bugs aren’t ironed out yet.

This technique is very well supported in the big commercial engines. In MSSQL's Indexed View case, the views are synchronously updated when the underlying tables are modified. This has implications at insert/update/delete time, so if you are going to be doing a lot of these you might want to do it on a read replica to avoid impact to production writes.

https://learn.microsoft.com/en-us/sql/relational-databases/v...

https://learn.microsoft.com/en-us/sql/t-sql/statements/creat...

sophiebits

TIL, thanks! I know Postgres and MySQL don’t include an equivalent.

jamesblonde

This triggered me in the article

'There are a few startups these days peddling a newfangled technology called “incremental view maintenance” or “differential dataflow”. '

Incremental view maintenance can change recomputation cost of view updates from O(N) to O(1). DBSP is based on z-sets, a generalization of relational algebra. The paper won best paper at SIGMOD. There is a startup, Feldera, commercializing it.

This is just ignorance to dismiss as 'new fangled'.

PerryStyle

+1. Learned about this in DB research course during grad school. Feldera is really cool.

Also I love their website design.

lsuresh

Thanks for the kind words (Feldera co-founder here). I'll pass it on to the design team. :)

anon84873628

I mean, everything you said sounds exactly like the definition of "new fangled" to me. I don't think the term is meant to be so pejorative or dismissive, just that the tech is currently intimidating to people not on the cutting edge. (Edit: e.g. taking graduate level database courses, as mentioned by a sibling comment :-)

There is constantly so much new stuff in software, you have to be a bit willfully ignorant of some things some of the time.

jamesblonde

Does going from O(N) to O(1) sound like "new fangled"? That is the smell of progress

null

[deleted]

Zen1th

You're taking "new fangled" too literally. Is it new, as in not everyone concerned is aware of it? Yes! I think the author is as enthusiastic about this as you are.

jmull

I curious why an index can't handle that first query well.

mb7733

Indexes can only help narrow down to the issues for the project (more generally: matching rows for the query). Once the index narrows down the rows, Postgres still has to count them all, and Postgres isn't particularly fast at that, especially in an active table[0]. That's what the author meant by 'complete index scan of the tasks for the project'.

Of course this isn't really relevant until there are a very large number of rows to count for a given query. Much larger than what is likely for "tasks in a project". I've run into this only with queries that end up counting 10e7/8/9 rows, i.e. more like OLAP workloads

[0] https://wiki.postgresql.org/wiki/Slow_Counting

th0ma5

This is my thing, I often thought of these views as a way to bridge organizational divides rather than technical ones. Still cool! But if you own everything you can do all kinds of other stuff just as easily.

lysergic

Here are some commercial and source-available options:

I'm not affiliated with any of these names, I'm just really interested in IVM:

Materialize Readyset Feldera RisingWave

Jupe

> (Technically speaking, if 100 people load the same page at the same time and the cache isn’t populated yet, then we’ll end up sending 100 queries to the database which isn’t amazing, but let’s just pretend we didn’t hear that.)

Isn't their tech to address that, like golang's "singleflight"?

null

[deleted]

globular-toast

This is a weird article. The author doesn't even mention what database they are talking about then just drops in some SQL that looks like Postgres. If you think Postgres will magically have the right values in it for a materialized view you will be very disappointed...

quasarj

Yeah, that was my thoughts as well. What database is this? In Postgres you definitely have to update materialized views manually....