Skip to content(if available)orjump to list(if available)

Making Postgres scale

Making Postgres scale

192 comments

·March 14, 2025

Eikon

I run a 100 billion+ rows Postgres database [0], that is around 16TB, it's pretty painless!

There are a few tricks that make it run well (PostgreSQL compiled with a non-standard block size, ZFS, careful VACUUM planning). But nothing too out of the ordinary.

ATM, I insert about 150,000 rows a second, run 40,000 transactions a second, and read 4 million rows a second.

Isn't "Postgres does not scale" a strawman?

[0] https://www.merklemap.com/

mmontagna9

People run postgres at scale which is much "larger" than what you are running, which isn't to say that your workload isn't substantial, it is. But there are folks who push it harder, both in terms of writes/rows and workload complexity. It's one thing to write a ton of rows into a single table, it's another thing entirely to write into many tables, index all of them and then query them at scale; you didn't mention much about your workload complexity so I'm not trying to suggest it isn't complex, but there are certainly plenty of folks on this forum who can attest to it being tricky to scale high write workloads, in large organizations with many stakeholders and complex query loads.

Eikon

> it's another thing entirely to write into many tables, index all of them and then query them at scale

Well, that’s pretty much what I am doing.

Shorn

> PostgreSQL compiled with a non-standard block size

Do you think this could become less important for your use case the new PG17 "I/O combining" stuff?

https://medium.com/@hnasr/combining-i-os-in-postgresql-17-39...

Eikon

No, because the custom block size is about reducing zfs r/w amplification.

kelafoja

One of things I find challenging is understand the meaning of the word "scales". It is sometimes used differently in different contexts.

Can it be performant in high load situations? Certainly. Can is elastically scale up and down based on demand? As far as I'm aware it cannot.

What I'm most interested in is how operations are handled. For example, if it's deployed in a cloud environment and you need more CPU and/or memory, you have to eat the downtime to scale it up. What if it's deployed to bare metal and it cannot handle the increasing load anymore? How costly (in terms of both time and money) is it to migrate it to bigger hardware?

nine_k

When it "scales", it usually means "scales up". A scalable solution is such that can withstand a large and increasing load, past the usual limitations of obvious solutions.

Being elastic is nice, but not always needed. In most cases of database usage, downsizing never happens, or expected to happen: logically, data are only added, and any packaging and archiving only exists to keep the size manageable.

prng2021

You’re conflating things. The question was about scaling compute and memory up and down based on load and you’re commenting about never needing to downsize on storage.

pluto_modadic

a database scaling dramatically up and down /under load/ and expecting it to perform the same as steady state seems a bit weird, vs a single, beefy database with a beefy ingest job and a bunch of read only clients searching it?

like you're more likely to encounter two phases (building the DB in heavy growth mode, and using the DB in light growth heavy read mode).

A business that doesn't quite yet know what size the DB needs to be has a frightening RDS bill incoming.

rednafi

Damn, that’s a chonky database. Have you written anything about the setup? I’d love to know more— is it running on a single machine? How many reader and writer DBs? What does the replication look like? What are the machine specs? Is it self-hosted or on AWS?

By the way, really cool website.

Eikon

I'll try to get a blog post out soon!

> Damn, that’s a chonky database. Have you written anything about the setup? I’d love to know more— is it running on a single machine? How many reader and writer DBs? What does the replication look like? What are the machine specs? Is it self-hosted or on AWS?

It's self-hosted on bare metal, with standby replication, normal settings, nothing "weird" there.

6 NVMe drives in raidz-1, 1024GB of memory, a 96 core AMD EPYC cpu.

A single database with no partitioning (I avoid PostgreSQL partitioning as it complicates queries and weakens constraint enforcement, and IHMO is not providing much benefits outside of niche use-cases).

> By the way, really cool website.

Thank you!

tracker1

> A single database with no partitioning (I avoid PostgreSQL partitioning as it complicates queries and weakens constraint enforcement, and IHMO is not providing much benefits outside of niche use-cases).

That's kind of where I'm at now... you can vertically scale a server so much now (compared to even a decade ago) that there's really no need to bring a lot of complexity in IMO for Databases. Simple read replicas or hot spare should be sufficient for the vast majority of use cases and the hardware is way cheaper than a few years ago, relatively speaking.

I spent a large part of the past decade and a half using and understanding all the no-sql options (including sharding with pg) and where they're better or not. At this point my advice is start with PG, grow that DB as far as real hardware will let you... if you grow to the point you need more, then you have the money to deal with your use case properly.

So few applications have the need for beyond a few million simultaneous users, and avoiding certain pitfalls, it's not that hard. Especially if you're flexible enough to leverage JSONB and a bit of denormalization for fewer joins, you'll go a very, very long way.

rednafi

> I'll try to get a blog post out soon!

Please do.

> It’s self-hosted on bare metal, with standby replication, normal settings, nothing “weird” there.

16TB without nothing weird is pretty impressive. Our devops team reached for Aurora way before that.

> 6 NVMe drives in raidz-1, 1024GB of memory, a 96-core AMD EPYC CPU.

Since you’re self hosted, I’m you aren’t on AWS. How much is this setup costing you now if you don’t mind sharing.

> A single database with no partitioning (I avoid PostgreSQL partitioning as it complicates queries and weakens constraint enforcement, and IMHO does not provide many benefits outside of niche use cases).

Beautiful!

JohnBooty

Your replies are really valuable and informative. Thank you so much.

Question - what is your peak utilization % like? How close are you to saturating these boxes in terms of CPU etc?

GordonS

I'm also self-hosting Postgres, and the project is getting to the point where a standby would be a good idea to ensure higher availability.

Did you use any particular guide for setting up replication? Also, how do you handle failover/fallback to/from standby please?

justinclift

> 6 NVMe drives in raidz-1

Did you benchmark io rate with different ZFS layouts?

6 NVMe drives in mirrored pairs would probably be substantially higher latency and throughput

Though you'd probably need more pairs of drives to match your current storage size. Or get higher capacity NVMe drives. :)

stuartjohnson12

> It's self-hosted on bare metal, with standby replication, normal settings, nothing "weird" there.

I can build scalable data storage without a flexible scalable redundant resilient fault-tolerant available distributed containerized serverless microservice cloud-native managed k8-orchestrated virtualized load balanced auto-scaled multi-region pubsub event-based stateless quantum-ready vectorized private cloud center? I won't believe it.

chrisandchris

And here I am, having an ERP software at hand whose database I "should not query" with some (3-6) joins and likely in the 100'000s of records in total in the larger tables because "the DB server will go down".

WorldWideWebb

PeopleSoft?

frollogaston

"Postgres does not scale" means that you can't just give it more machines, which is true. At some point you've got the fastest available machine, maybe you've played with the settings, and that's it. Then you can embark on the kind of manual sharding journey the article describes.

But most of the time, an RDBMS is the right tool for the job anyway, you just have to deal with it.

Eikon

> "Postgres does not scale" means that you can't just give it more machines, which is true.

Well, it’s only true for writes.

frollogaston

It's still true for reads if you need them to be fully consistent, but yeah, read replicas are the answer sometimes.

null

[deleted]

MR4D

Having some issues with your numbers, but I’m probably just missing something…

If you insert 150K rows per second, that’s roughly 13 Billion rows per day.

So you’re inserting 10%+ of your database size every day?

That seems weird to me. Are you pruning somewhere? If not, is your database less than a month old? I’m confused.

Eikon

Well, that's why I said "ATM", it's not a sustained rate, all the time. And yes, there's a bunch of DELETEs too.

vanviegen

Regarding MerkleMap: Ouch! I never realized letsencrypt is leaking all of the sub domains i'm creating. :-/ I thought the names were giving me a bit of extra security through obscurity.

Why does it do that? I thought only revocations need to be published?

jorams

It does that because it's required to. The system is called Certificate Transparency[1] and browsers require certificates to be in CT logs to be accepted.

If you want to hide what subdomains you have you can use a wildcard certificate, though it can be a bit harder to set up.

[1]: https://developer.mozilla.org/en-US/docs/Web/Security/Certif...

vanviegen

Thanks for explaining!

aprdm

99.9% of the companies in the world will never need more than 1 beefy box running postgres with a replica for a manual failover and/or reads.

frollogaston

Availability is trickier than scalability. An async replica can lose a few recent writes during a failover, and a synchronous replica is safer but slower. A company using some platform might not even know which one they're using until it bites them.

wavemode

99.9% of companies also aren't going to feel the performance difference of synchronous replication.

That being said, the setups I typically see don't even go that far. Most companies don't mitigate for the database going down in the first place. If the db goes down they just eat the downtime and fix it.

fmajid

Skype open-sourced their architecture way back, using PL/Proxy to route calls based on shard. It works, is quite elegant, handled 50% of all international phone calls in the noughties. My old company used it to provide real-time analytics on about 300M mobile devices.

https://wiki.postgresql.org/images/2/28/Moskva_DB_Tools.v3.p...

https://s3.amazonaws.com/apsalar_docs/presentations/Apsalar_...

Keyframe

Skype has had from the beginning the requirement that all database access must be implemented through stored procedures.

That presentation starts with hard violence.

fmajid

Righteous violence.

One of the benefits of stored procedures they don't mention is SECURITY DEFINER, which is like setuid.

You can for instance have a user table with login and hashed password, have a stored procedure that can verify login and password, without giving SELECT access to the user table to the database user your application use.

Stored procedures also block SQL injection attacks.

Tostino

If the database team designed a thoughtful API with stored procedures, this can actually be a quite nice way to interact with a database for specific uses.

Being 100% hard and fast on that rule seems like a bad idea though.

dboreham

Fashionable 20 years ago but thankfully everyone who had that bee in their bonnet seems to have retired.

sgarland

I would much rather this than letting devs write their own queries, ORM or not. But then, I own the DBs, and I am constantly dealing with the result of poor schema and query decisions.

craigkerstiens

Probably as useful is the overview of what pgdog is and the docs. From their docs[1]: "PgDog is a sharder, connection pooler and load balancer for PostgreSQL. Written in Rust, PgDog is fast, reliable and scales databases horizontally without requiring changes to application code."

[1] https://docs.pgdog.dev/

rednafi

Another option is going full-scale with CockroachDB. We had a Django application backed by PostgreSQL, which we migrated to CockroachDB using their official backend.

The data migration was a pain, but it was still less painful than manually sharding the data or dealing with 3rd party extensions. Since then, we’ve had a few hiccups with autogenerated migration scripts, but overall, the experience has been quite seamless. We weren’t using any advanced PostgreSQL features, so CockroachDB has worked well.

skunkworker

Unless their pricing has changed, it’s quite exorbitant when you need a lot of data. To the point that one year of cockroachdb would cost 5x the cost of the server it was running on.

CharlesW

I think you're referring to the CockroachDB Cloud DBaaS offering vs. CockroachDB itself, correct?

rednafi

Not the parent but yeah, most likely. But then again, you probably don’t want to maintain your own deployment of Cockroach fleet.

rednafi

This is still true. I wouldn’t use Cockroach if it were my own business. Also, they don’t offer any free version to try out the product. All you get is a short trial period and that’s it.

CharlesW

> Also, they don’t offer any free version to try out the product.

The site makes it seems as if I can install CockroachDB on Mac, Linux, or Windows and try it out for as long as I like. https://www.cockroachlabs.com/docs/v25.1/install-cockroachdb... Additionally, they claim CockroachDB Cloud is free for use "up to 10 GiB of storage and 50M RUs per organization per month".

traceroute66

> Another option is going full-scale with CockroachDB

Just beware that CockroachDB is not a drop-in replacement for PostgreSQL.

Last time I looked it was missing basic stuff. Like stored functions. I don't call stored functions an "advanced feature".

frollogaston

I also wonder what limitations you get on queries. Cause in Spanner, there's something about cross-directory queries, or Citus can't do cross-shard FKs iirc. There's no magic way to shard a relational DB, you have to deal with it one way or another.

vvern

Cockroach really doesn’t have limitations when it comes to queries. It is kinda magic in that regard. The single partition queries are the special case and are treated as rare. Cockroach ranges are quite small compared to other systems (~256MiB).

benjiro

> Just beware that CockroachDB is not a drop-in replacement for PostgreSQL.

O, its way worse then that... A ton of small functionality tend to be missing or work differently. Sometimes even small stuff like column[1:2] does not even exist in CRDB and other times its things like ROW LEVEL SECURITY ... you know, something you may want on a mass distributed database then will be use for tenant setups (i hear they are finally going to implement it this year).

The main issue is the performance... You are spending close to ~4x the resources on a similar performing setup. And this is a 2x Pgres (sync) vs 3x CRDB (where every node is just a replica). Its not the replication itself but the massive overhead on the raft protocol + the way less optimized query planner.

To match Pgres, your required to deploy around 10 a 12 nodes, on hardware that needs about twice the resources. That is the point where CRDB performance on a similar level.

The issue is, well, you increase the resources to your Postgresql server and the gap is back. You can scale Postgresql to insane levels with 64, 96 CPUs ... Or imagine, you know, just planning your app in advance in such a way, that you spread your load over multiple Postgresql instances. Rocket science folks ;)

CRDB is really fun to watch, the build in GUI, the replication being very visual, but its a resource hog. The storage system (peble) eats a ton of resources to compact the data, when you can simply solve that with a PostgreSQL instance with zfs (with ironically, often better compression).

I do not joke when i say, that seeing those step cpu spikes in the night hours of the compacter working, is painful. Even a basic empty CRDB instance, just logging its own usage, runs between 7 to 50% on a quad ARM N1, constantly.

PostgreSQL? You do not even know its running. Barely any CPU usage, what memory usage?

And we have not talked license/payment issues ... Of "free enterprise" version with FORCED telemetric on, AND your not allowed to hide the server (aka, if it can not call home, it goes into restricted mode in 7 days, with like 50 queries / second ... aka, the same as just shutting down your DB ). By the way, some people reported that its 125+ dollar/vcore payment. Given that even the most basic CRDB gets you 3x instances, with minimum 4 cores ... Do the math. Yea, after 10M income but they chance their licenses every few years, so who knows next year or the year after that.

Interesting product, shitty sales company behind it. I am more interested to see when the postgresql storage extension orioledb comes out, so it solve the main issue that prevents postgresql scaling even more, namely the write/vacuum issue. And ofcourse a better solution to upgrade postgresql versions, there CRDB is leaps and bound better.

levkk

I'm glad you brought up the migration, because one of the main goals behind our project is to automate migrating to a sharded deployment. You can think of your DB as the base case (num_shards = 1), and PgDog as the recursive solution.

rednafi

Automatic data transfer would be super cool to get out of the box. We had a custom multi-tenancy solution for our application that heavily used PostgreSQL schemas. One schema for each customer.

It was a pain to get that work with Cockroach since it doesn’t optimize cross schema queries and suggests one DB per customer. This was a deal breaker for us and we had to duplicate data to avoid cross schema queries.

Being able to live within Postgres has its advantages.

mindcrash

Well, ofcourse it does! :)

Another (battle tested * ) solution is to deploy the (open source) Postgres distribution created by Citus (subsidiary of Microsoft) on nodes running on Ubuntu, Debian or Red Hat and you are pretty much done: https://www.citusdata.com/product/community

Slap good old trusty PgBounce in front of it if you want/need (and you probably do) connection pooling: https://www.citusdata.com/blog/2017/05/10/scaling-connection...

*) Citus was purchased by Microsoft more or less solely to provide easy scale out on Azure through Cosmos DB for PostgreSQL

gigatexal

Is it really that easy? What are the edge cases?

levkk

It's not. We tried. Plus, it doesn't work on RDS, where most of production databases are. I think Citus was a great first step in the right direction, but it's time to scale the 99% of databases that don't run on Azure Citus already.

mindcrash

That's because Amazon wants to do whatever they like themselves... you apparently can get stuff to work by running your own masters (w/ citus extension) in EC2 backed by workers (Postgres RDS) in RDS:

https://www.citusdata.com/blog/2015/07/15/scaling-postgres-r... (note that this is a old blog post -- pg_shard has been succeeded by citus, but the architecture diagram still applies)

And me saying "Apparently" because I have no experience dealing with large databases on AWS.

Personally had no issues with Citus too, both on bare metal/VMs and as SaaS on Azure...

caffeinated_me

Depends on your schema, really. The hard part is choosing a distribution key to use for sharding- if you've got something like tenant ID that's in most of your queries and big tables, it's pretty easy, but can be a pain otherwise.

mindcrash

Same pain as with good old (native) partitioning, right? :)

As with partitioning, in my experience something like a common key (identifying data sets), tenant id and/or partial date (yyyy-mm) work pretty great

gourneau

I’m working with several Postgres databases that share identical schemas, and I want to make their data accessible from a single interface.

Currently, I’m using Postgres FDWs to import the tables from those databases. I then create views that UNION ALL the relevant tables, adding a column to indicate the source database for each row.

This works, but I’m wondering if there’s a better way — ideally something that can query multiple databases in parallel and merge the results with a source database column included.

Would tools like pgdog, pgcat, pganimal be a good fit for this? I’m open to suggestions for more efficient approaches.

Thanks!

999900000999

I'm kind of interested in why we can't make a better database with all of our modern technology.

Postgres is a fantastic workhorse, but it was also released in the late 80s. Who, who among you will create the database of the future... And not lock it behind bizarro licenses which force me to use telemetry.

HighlandSpring

There are "better" databases but they're better given some particular definition that may not be relevant to your needs. If SQL/the relational model and ACID semantics is what you need then postgres is simply the best in class. The fact it dates back to the 80s is probably an advantage (requirement?) when it comes to solving a problem really well

trescenzi

I guess I’d ask why is something having been first released in the late 80s, or any decade, as positive or negative? Some things are still used because they solve the problems people have. Some things are still used just because of industry capture. I’m not honestly sure where I’d put Postgres.

Are there specific things you’d want from a modern database?

999900000999

Relating to the article, better scaling. Saying run it on a bigger box is a very brute force way to optimize an application.

While they come up with some other tricks here, that's ultimately what's scaling postgres means.

If I imagine a better database, it would have native support for scaling, a postgres compatible data layer as well as first party support for NoSQL( JSONB columns don't cut it since if you have simultaneous writes unpredictable behavior tends to occur).

It needs to also have a permissible license

mike_hearn

What does permissible license mean? If you mean open source, no such database exists AFAIK. If you mean you can run it locally for free for dev purposes, on prem without telemetry etc, then Oracle is clearly the best option.

Compared to Postgres, Oracle DB:

• Scales horizontally with full SQL and transactional consistency. That means both write and read masters, not replicas - you can use database nodes with storage smaller than your database, or with no storage, and they are fully ACID.

• Has full transactional MQ support, along with many other features.

• Can scale elastically.

• Doesn't require vacuuming or have problems with XID wraparound. These are all Postgresisms that don't affect Oracle due to its better MVCC engine design.

• Has first party support for NoSQL that resolves your concern (see SODA and JSON duality views).

I should note that I have a COI because I work part time at Oracle Labs (and this post is my own yadda yadda), but you're asking why does no such database exist and whether anyone can make one. The database you're asking for not only exists but is one of the world's most popular databases. It's also actually quite cheap thanks to the elastic scaling. Spec out a managed Oracle DB in Oracle's cloud using the default elastic scaling option and you'll find it's cheaper than an Amazon Postgres RDS for similar max specs!

throwaway7783

Can you please expand on the JSONB unpredictable behavior? We are about to embark on a journey to move some of our data from MongoDB to postgres (with some JSONB). While we don't have significant concurrent writes to a table, would be very helpful to understand the issues

Reefersleep

What's an example of non-brute force scaling?

aprdm

Why is it brute force and why is it bad ?

thinkingtoilet

Postgres 17.4 was released last month. Show some respect to the devs.

wmf

Anyone who creates a better database is going to want to get paid for it which either means DBaaS or those weird licenses.

HDThoreaun

Postgres is the most worked on database in the world right now. Its original release date doesnt mean work stopped. No-sql was a new thing a decade-ish ago but most companies probably dont need it. New data platforms focused on scaling like snowflake and cockroach have come too but again for most use cases postgres is better.

moltar

Look at AWS presentation/talk about Aurora DSQL [1]

It’s a Postgres facade. But everything beyond that is a complete reimagining and a rewrite to scale independently.

I personally think it’s going to eat a lot of market share when it solves some remaining limitations.

[1] https://youtu.be/huGmR_mi5dQ?si=ALw4XjdDJBxkZWRv

DoctorOW

Have you looked at CockroachDB? PostgreSQL compatibility with modern comforts (e.g. easy cloud deployments, horizontal scaling, memory safe language)

rednafi

Came here to say this. Cockroach solves the sharding issue by adopting consistent hashing-based data distribution, as described in the Dynamo paper. However, their cloud solution is a bit expensive to get started with.

I_am_tiberius

Does CockroachdB already support ltree?

eximius

Something I don't see in the pgdog documentation is how cross-shard joins work. Okay, if I do a simple `select * from users order by id`, you'll in-memory order the combined results for me. But if I have group by and aggregations and such? Will it resolve that correctly?

levkk

Aggregates are a work in progress. We're going to implement them in this order:

1. count

2. max, min, sum

3. avg (needs a query rewrite to include count)

Eventually, we'll do all of these: https://www.postgresql.org/docs/current/functions-aggregate..... If you got a specific use case, reach out and we'll prioritize.

eximius

Heh, no chance I can introduce this at work and hard to have a personal project requiring it. :)

I think you probably need some documentation to the effect of the current state of affairs, as well as prescriptions as to how to work around it. _Most_ live workloads, even if the total dataset is huge, have a pretty small working set. So limiting DB operations to simple fetches and doing any complex operations in memory is viable, but should be prescribed as the solution or people will consider it's omission as a fault instead of a choice.

levkk

I ended up adding support for GROUP BY: https://github.com/pgdogdev/pgdog/pull/43

I had it in the back of my mind for a while, nice to have it in code. Works pretty well, as long as columns in GROUP BY are present in the result set. Otherwise, we would need to rewrite the query to include them, and remove them once we're done.

levkk

No worries. It's early days, the code and docs are just a few months in the making. I'm happy to keep you updated on the progress. If you want, send your contact info to hi@pgdog.dev.

- Lev

briankelly

People talk about scale frequently as a single dimension (and usually volume as it relates to users) but that can be oversimplifying for many kinds of applications. For instance, as you are thinking about non-trivial partitioning schemes (like if there is high coupling between entities of the same kind - as you see in graphs) is when you should consider alternatives like the Bigtable-inspired DBs, since those are (relatively) more batteries included for you.

> It’s funny to write this. The Internet contains at least 1 (or maybe 2) meaty blog posts about how this is done

It would’ve been great to link those here. I’m guessing one refers to StackOverflow which has/had one of the more famous examples of scaled Postgres.

levkk

I was thinking of the Instagram post years ago. And maybe the Instacart one.

fourseventy

I run a postgresql db with a few billion rows at about 2TB right now. We don't need sharding yet but when we do I was considering Citus. Does anyone have experience implementing Citus that could comment?

caffeinated_me

It can be great, depending on your schema and planned growth. Questions I'd be asking in your shoes:

1. Does the schema have an obvious column to use for distribution? You'll probably want to fit one of the 2 following cases, but these aren't exclusive:

    1a. A use case where most traffic is scoped to a subset of data. (e.g. a multitenant system). This is the easiest use case- just make sure most of your queries contain the column (most likely tenant ID or equivalent), and partially denormalize to have it in tables where it's implicit to make your life easier. Do not use a timestamp. 

    1b. A rollup/analytics based use case that needs heavy parallelism (e.g. a large IoT system where you want to do analytics across a fleet). For this, you're looking for a column that has high cardinality witout too many major hot spots- in the IoT use case mentioned, this would probably be a device ID or similar
2. Are you sure you're going to grow to the scale where you need Citus? Depending on workload, it's not too hard to have a 20TB single-server PG database, and that's more than enough for a lot of companies these days.

3. When do you want to migrate? Logical replication in should work these days (haven't tested myself), but the higher the update rate and larger the database, the more painful this gets. There's not a lot of tools that are very useful for the more difficult scenarios here, but the landscape has changed since I've last had to do this

4. Do you want to run this yourself? Azure does offer a managed service, and Crunchy offers Citus on any cloud, so you have options.

5. If you're running this yourself, how are you managing HA? pg_auto_failover has some Citus support, but can be a bit tricky to get started with.

I did get my Citus cluster over 1 PB at my previous job, and that's not the biggest out out there, so there's definitely room to scale, but the migration can be tricky.

Disclaimer: former Citus employee

saisrirampur

Interesting technology. Similar to Citus but not built as an extension. The Citus coordinator, which is a Postgres database with the Citus extension, is replaced by a proxy layer written in Rust. That might provide more flexibility and velocity implementing distributed planning and execution than being tied to the extension ecosystem. It would indeed be a journey to catch up with Postgres on compatibility, but it's a good start.

sroussey

So like the MySQL proxies of long ago?

There are definitely advantages of not running inside the system you wish to orchestrate.

Better keep up with the parser though!

levkk

We use the Postgres parser directly, thanks to the great work of pg_query [1].

[1] https://github.com/pganalyze/pg_query.rs