Skip to content(if available)orjump to list(if available)

Jepsen: Amazon RDS for PostgreSQL 17.4

nijave

It's not entirely clear but this isn't an issue in multi instance upstream Postgres clusters?

Am I correct in understanding either AWS is doing something with the cluster configuration or has added some patches that introduce this behavior?

belter

Yes its different. This is a deeper overview of what they did: https://youtu.be/fLqJXTOhUg4

Specially here: https://youtu.be/fLqJXTOhUg4?t=434

tibbar

The submitted title buries the lede: RDS for PostgreSQL 17.4 does not properly implement snapshot isolation.

aphyr

Folks on HN are often upset with the titles of Jepsen reports, so perhaps a little more context is in order. Jepsen reports are usually the product of a long collaboration with a client. Clients often have strong feelings about how the report is titled--is it too harsh on the system, or too favorable? Does it capture the most meaningful of the dozen-odd issues we found? Is it fair, in the sense that Jepsen aims to be an honest broker of database safety findings? How will it be interpreted in ten years when people link to it routinely, but the findings no longer apply to recent versions? The resulting discussions can be, ah, vigorous.

The way I've threaded this needle, after several frustrating attempts, is to have a policy of titling all reports "Jepsen: <system> <version>". HN is of course welcome to choose their own link text if they prefer a more descriptive, or colorful, phrase. :-)

dang

Given that author and submitter (and commenter!) are all the same person in this case I think we can go with your choice :)

The fact that the thread is high on HN, plus the GP comment is high in the thread, plus that the audience is familiar with how interesting Jepsen reports tend to be, is probably enough to get the needful across.

altairprime

I emailed the mods and asked them to change it to this phrase copy-pasted from the linked article:

> Amazon RDS for PostgreSQL multi-AZ clusters violate Snapshot Isolation

belter

And your comment also...In Multi-AZ clusters.

Well this is from Kyle Kingsbury, the Chuck Norris of transactional guarantees. AWS has to reply or clarify, even if only seems to apply to Multi-AZ Clusters. Those are one of the two possibilities for RDS with Postgres. Multi-AZ deployments can have one standby or two standby DB instances and this is for the two standby DB instances. [1]

They make no such promises in their documentation. Their 5494 pages manual on RDS hardly mentions isolation or serializable except in documentation of parameters for the different engines.

Nothing on global read consistency for Multi-AZ clusters because why should they.... :-) They talk about semi-synchronous replication so the writer waits for one standby to confirm log record, but the two readers can be on different snapshots?

[1] - "New Amazon RDS for MySQL & PostgreSQL Multi-AZ Deployment Option: Improved Write Performance & Faster Failover" - https://aws.amazon.com/blogs/aws/amazon-rds-multi-az-db-clus...

[2] - "Amazon RDS Multi-AZ with two readable standbys: Under the hood" - https://aws.amazon.com/blogs/database/amazon-rds-multi-az-wi...

n2d4

> They make no such promises in their documentation. Their 5494 pages manual on RDS hardly mentions isolation or serializable

Well, as a user, I wish they would mention it though. If I migrate to RDS with multi-AZ after coming from plain Postgres (which documents snapshot isolation as a feature), I would probably want to know how the two differ.

gymbeaux

Par for the course

ezekiel68

In my reading of this, it looks like the practical implication could be that reads happening quickly after writes to the same row(s) might return stale data. The write transaction gets marked as complete before all of the distributed layers of a multi AZ RDS instance have been fully updated, such that immediate reads from the same rows might return nothing (if the row does not exist yet) or older values if the columns have not been fully updated.

Due to the way PostgreSQL does snapshotting, I don't believe this implies such a read might obtain a nonsense value due to only a portion of the bytes in a multi-byte column type having been updated yet.

It seems like a race condition that becomes eventually consistent. Or did anyone read this as if the later transaction(s) of a "long fork" might never complete under normal circumstances?

aphyr

This isn't just stale data, in the sense of "a point-in-time consistent snapshot which does not reflect some recent transactions". I think what's going on here is that a read-only transaction against a secondary can observe some transaction T, but also miss transactions which must have logically executed before T.

mikesun

"I think what's going on here is that a read-only transaction against a secondary can observe some transaction T, but also miss transactions which must have logically executed before T."

i was intuitively wondering the same but i'm having trouble reasoning how your example in the post with transactions 1, 2, 3, 4 exhibits this behavior. in the example, is transaction 2 the only read-only transaction and therefore the only transaction to read from the read replica? i.e. transactions 1, 3, 4 use the primary and transaction 2 uses the read replica?

mushufasa

> These phenomena occurred in every version tested, from 13.15 to 17.4.

I was worried I had made the wrong move upgrading major versions, but it looks like this is not that. This is not a regression, but just a feature request or longstanding bug.

password4321

It would be great to get all the Amazon RDS flavors Jepsen'd.

aphyr

I have actually been working on this (very slowly, in occasional nights and weekends!) Peter Alvaro and I reported on a safety issue in RDS for MySQL here too: https://jepsen.io/analyses/mysql-8.0.34#fractured-read-like-...

film42

I think AWS will need to update their documentation to communicate this. Will a snapshot isolation fix introduce a performance regression in latency or throughput? Or, maybe they stand by what they have as being strong enough. Either way, they'll need to say something.

kevincox

I think the ideal solution from AWS would be fixing the bug and actually providing the guarantees that the docs say that they do.

film42

I agree, but I have a feeling this isn't a small fix. Sounds like someone picked a mechanism that seemed to be equivalent but is not. Swapping that will require a lot of time and testing.

zaphirplane

Yet bellow your comment is a quote that this is since v13 and above is a comment that there is no mention in the docs.

Using the words Bug and guarantee is throwing the casual readers off the mark ?

cr3ative

This is in such a thick academic style that it is difficult to follow what the problem actually might be and how it would impact someone. This style of writing serves mostly to remind me that I am not a part of the world that writes like this, which makes me a little sad.

glutamate

In the beginning, when you read papers like this, it can be hard work. You can either give up or put some effort in to try to understand it. Maybe look at some of the other Jepsen reports, some may be easier. Or perhaps an introductory CS textbook. With practice and patience it will become easier to read and eventually write like this.

You may not be part of that world now, but you can be some day.

EDIT: forgot to say, i had to read 6 or 7 books on Bayesian statistics before i understood the most basic concepts. A few years later i wrote a compiler for a statistical programming language.

cr3ative

I’ll look to do so, and appreciate your pointers. Thank you for being kind!

concerndc1tizen

The state of the art is always advancing, which greatly increases the burden of starting from first principles.

I somewhat feel that there was a generation that had it easier, because they were pioneers in a new field, allowing them to become experts quickly, while improving year-on-year, being paid well in the process, and having great network and exposure.

Of course, it can be done, but we should at least acknowledge that sometimes the industry is unforgiving and simply doesn't have on-ramps except for the privileged few.

_AzMoo

> I somewhat feel that there was a generation that had it easier

I don't think so. I've been doing this for nearly 35 years now, and there's always been a lot to learn. Each layer of abstraction developed makes it easier to quickly iterate towards a new outcome faster or with more confidence, but hides away complexity that you might eventually need to know. In a lot of ways it's easier these days, because there's so much information available at your fingertips when you need it, presented in a multitude of different formats. I learned my first programming language by reading a QBasic textbook trying to debug a text-based adventure game that crashed at a critical moment. I had no Internet, no BBS, nobody to help, except my Dad who was a solo RPG programmer who had learned on the job after being promoted from sweeping floors in a warehouse.

null

[deleted]

jorams

It uses a lot of very specific terminology, but the linked pages like the one on "G-nonadjacent" do a lot to clear up what it all means. It is a lot of reading.

Essentially: The configuration claims "Snapshot Isolation", which means every transaction looks like it operates on a consistent snapshot of the entire database at its starting timestamp. All transactions starting after a transaction commits will see the changes made by the transaction. Jepsen finds that the snapshot a transaction sees doesn't always contain everything that was committed before its starting timestamp. Transactions A an B can both commit their changes, then transactions C and D can start with C only seeing the change made by A and D only seeing the change made by B.

ZYbCRq22HbJ2y7

> such a thick academic style

Why? Because it has variables and a graph?

What sort of education background do you have?

renewiltord

It's maximal information communication. Use LLM to distill to your own knowledge level. It is trivial with modern LLM. Very good output in general.

benatkin

It addresses the reader no matter how knowledgeable they are. It's a very good use of hypertext, making it so that a knowledgeable reader won't need to skip over much.

joevandyk

[flagged]

rezonant

Posting ChatGPT outputs directly in a post with no attribution or indication that you are doing so is not helpful or authentic.

Sesse__

Hello ChatGPT.

senderista

Great summary, could you share the prompt you used?

benatkin

Hey ChatGPT, make me a comment about <url> that will get flagged on HN. You're the best.

belter

Please remove this LLM generated post

bananapub

posting this sort of LLM-generated garbage should get a ban.

have some respect for yourself and everyone else, christ.

vlovich123

Have you tried using an LLM? I’ve found good results getting at the underlying concepts and building a mental model that works for me that way. It makes domain expertise - that often has unique terminology for concepts you already know or at least know without a specific name - more easily accessible after a little bit of a QA round.

oblio

I wonder how Aurora fares on this?

henning

I thought this kind of bullshit was only supposed to happen in MongoDB!

kabes

Then you haven't read enough jepsen reports. Distributed system guarantees generally can't be trusted

__alexs

Postgres is not a distributed system in this configuration usually though is it?

semiquaver

The result is for “Amazon RDS for PostgreSQL multi-AZ clusters” which are certainly a distributed system.

I’m not well versed in RDS but I believe that clustered is the only way to use it.

dragonwriter

A multi-AZ cluster is necessarily a distributed system.

bananapub

I think zookeeper is still the only distributed system that got through jepsen without dataloss bugs, though at high cost: https://aphyr.com/posts/291-jepsen-zookeeper

robterrell

Didn't FoundationDB get a clean bill of health?

necubi

Aphyr didn’t test foundation himself, but the foundation team did their own Jepsen testing which they reported passing. All of this was a long time ago, before Foundation was bought by Apple and open sourced.

Now members of the original Foundation team have started Antithesis (https://antithesis.com/) to make it easier for other systems to adopt this sort of testing.

MarkMarine

wasn't tested because: "haven't tested foundation in part because their testing appears to be waaaay more rigorous than mine."

https://web.archive.org/web/20150312112552/http://blog.found...

bananapub

apparently wasn't tested because Kyle thought the internal testing was better than jepsen itself: https://abdullin.com/foundationdb-is-back/

Thaxll

Those memes are 10 years old, you know that some very tech company use MongoDB right? We're talking billions a year.

xmodem

Billion dollar companies lose their customer’s data all the time.

djfivyvusn

What is your point?

colesantiago

Do people still use MongoDB in production?

I was quite surprised to read that Stripe uses MongoDB in the early days and still today and I can't imagine the sheer nightmares they must have faced using it for all these years.

senderista

MongoDB has come a long way. They acquired a world-class storage engine (WiredTiger) and then they hired some world-class distsys people (e.g. Murat Demirbas). They might still be hamstrung by early design and API choices but from what I can tell (never used it in anger) the implementation is pretty solid.

computerfan494

MongoDB is a very good database, and these days at scale I am significantly more confident in its correctness guarantees than any of the half-baked Postgres horizontal scaling solutions. I have run both databases at seven figure a month spend scale, and I would not choose off-the-shelf Postgres for this task again.

colechristensen

mongodb is a public company with a market cap of 14.2 billion dollars. so yes, people still use it in production

djfivyvusn

I've been looking for a job the last few weeks.

Literally the only job ad I've seen talking about MongoDB was a job ad for MongoDB itself.

skywhopper

This is an unfortunate report in a lot of ways. First, the title is incomplete. Second, there’s no context as to the purpose of the test and very little about the parameters of the test. It makes no comparison to other PostgreSQL architectures except one reference at the end to a standalone system. Third, it characterizes the transaction isolation of this system as if it were a failure (see comments in this thread assuming this is a bug or a missing feature of Postgres). Finally, it never compares the promises made by the product vendors to the reality. Does AWS or Postgres promise perfect snapshot isolation?

I understand the mission of the Jepsen project but presenting results in this format is misleading and will only sow confusion.

Transaction isolation involves a ton of tradeoffs, and the tradeoffs chosen here may be fine for most use cases. The issues can be easily avoided by doing any critical transactional work against the primary read-write node only, which would be the only typical way in which transactional work would be done against a Postgres cluster of this sort.

Sesse__

Postgres does indeed promise perfect snapshot isolation, and Amazon does not (to the best of my knowledge) document that their managed Postgres service weakens Postgres’ promises.

billiam

New headline: AWS RDS is not CockroachDB or Spanner. And it's not trying to be.