Distributed Transactions at Scale in Amazon DynamoDB (2023)

61 comments

·January 12, 2025

samsquire

So one shot transactions can check if every timestamp in every write and item inside the transaction packet depends on data that is before the timestamp of that particular monotonic transaction timestamp?

And the pattern of including "check" transaction item is how we manually maintain data integrity (characteristic of Atomic in DBMS)

And we know which transactions are writing because they told us they wanted to write in the prepare phase (the part that the transaction manager handles separate from the one shot transaction information perspective from the client with its own communication between the transaction manager and storage nodes)

I implemented a toy dynamodb that is a trie in front of a hash map, it handles the "begins with" query style.

XorNot

The only reason I ever used DynamoDB was because no one asked any questions if I bought an Amazon service which didn't look like a regular database, which in turn made a whole product deployment component on a deadline possible.

I couldn't really find any compelling reason to use it though: an RDBMS would've been way easier.

smashedtoatoms

I came here for the bad takes, and I have not been disappointed. Dynamo slays when you know your access patterns and need consistent performance and no operations requirements. Turns out, that's the case most of the time. Think about it as application state instead of a db. It's not key-value like Redis. GSIs with compound keys allow access to data across multiple dimensions on virtually unlimited data with consistent performance. Its weakness is querying data across dimensions you didn't plan on. If you need that regularly, it sucks. If you need that once in awhile, write a migration.

davidjfelix

Agreed. It's wild to me how many people think they need arbitrary queries on their transactional database and then go write a CRUD app with no transactional consistency between resources and everything is a projection from a user or org resource -- you can easily model that with Dynamo. You can offload arbitrary analytical queries or searches to a different database and stop conflating that need with your app's core data source.

pdhborges

Well my experience has always been the opposite. New query patterns are always appearing. The difference between an OLTP and an OLAP query is not as clear cut as one might imagine that justifies huge changes to an existing system.

cldcntrl

> Turns out, that's the case most of the time.

Here's most of the time out in the real world:

- Low-cardinality partition key leading to hot keys, trashing capacity utilization.

- Bad key design means access patterns are off the table forever, as nobody wants to take on data migration with BatchWriteItem.

- Read/write spikes causing throttling errors. The capacity concept is difficult - people don't understand how capacity relates to partitions and object sizes, or wrongly assume "On-Demand Capacity" means throttling is impossible, or that Provisioned Capacity Autoscaling is instant.

- Multiple GSIs to cover multiple access patterns = "why is our bill so high?".

I've seen these issues over and over again while working with real organizations.

Of course it's impressive technology, it's just so littered with traps that I've stopped recommending it except in very specific cases.

tbarbugli

Using DynamoDB in 2025 is such a weird proposition. Horrible dev experience, no decent clients/libs, complex pricing, weird scaling in/out mechanism, slow, it only works well for well defined use-cases.

eknkc

2 times I have used DynamoDB and been extremely happy;

- In a SAAS API service we used dynamodb to look up API keys and track their daily usage data. It is fast enough to look up k/v pairs (api key => key info). And also aggregate small sets (We'd sum up call counts for current month and check if the API key had enough credits). This meant that the API itself did not need our RDBMS to function. We also had a postgresql instance for all relational data, subscriptions, user info etc. Had a trigger that would push any api key / subscription change to DynamoDB. In case of RDS issues, things kept chugging along.

- Working on a large buzzfeed like social media / news site in my country. We needed to store a lot of counters (reactions to articles, poll answers etc). All went into dynamodb and looked up from there. No hits on actual rdbms. There were a lot of traffic and dynamo made scaling things / keeping rds from melting easy for this kind of non critical data.

I'd not build an entire thing on DynamoDB but for specific use cases, I just loved it.

rad_gruchalski

> We also had a postgresql instance for all relational data, subscriptions, user info etc. Had a trigger that would push any api key / subscription change to DynamoDB.

Wouldn't doing it right there in postgres limit your footprint?

eknkc

We did not want postgres to be a central failure point of this API.

Needed a pretty high uptime guarantee so we decided that as long as AWS region is up and running, the API would also be available by using only completely managed aws services like dynamodb, lambda etc. Also had a bunch of beefy servers around other providers (hetzner, online.net etc) handling the actual work. They did not have any other dependencies either.

narmiouh

Redis?

eknkc

What would we gain from Redis in these use cases?

We used it extensively on the second project I mentioned and a couple of other projects for caching / rate limiting and distributed locking needs. Never enabled the persistence layer (which I believe is pretty durable). So we only treated as an ephemeral data store, lowering the architectural complexity of things significantly. Otherwise you need to think about backups, testing backups, clustering in case of scaling needs, I have no idea how persistence works with clustering... DynamoDB is fully managed and solid.

ndr

is it as easy to make that data durable?

guiriduro

Way too many teams choose Dynamodb too soon. Scalability, 0 management, coolness whatever. They don't realise until its too late that their application data needs are changing with feature requests and that with Ddb it implies doing 3D-chess each time to ensure the denormalised data is re-arranged the right way, rather than just using PostgreSQL with JSONB and adding an index, until/if it gets to FAANG scale, a bridge you can safely cross much later on.

null

[deleted]

mrkeen

Very often I find myself wanting to store item(s) using a key.

My items are not relations, and I don't see the point in transforming them to and from relational form. And if I did, each row would have like 5 columns set to NULL, in addition to a catch-all string 'data' column where I put the actual stuff I really need. Which is how you slow down an SQL database. So RDBMS is no good for me, and I'm no good for RDBMS.

RDBMS offers strong single-node consistency guarantees (which people leave off by default by using an isolation level of 'almost'!). But even without microservices, there are too many nodes: the DB, the backend, external partner integrations, the frontend, the customer's brain. You can't do if-this-then-that from the frontend, since 'this' will no longer be true when 'that' happens. So even if I happen to have a fully-ACID DB, I still lean into events & eventual consistency to manage state across the various nodes.

Given that I'm using more data than a naive CRUD/SQL app would (by storing events for state replication) and my data is stringy enough to kill my (and others') performance. So what's the solution? Make my read-writes completely independent from other read-writes - no joins, no foreign keys, etc.

The thing that would put me off using DynamoDB is the same reason I wouldn't use any other tech - can I download it? For this reason I'd probably reach for Cassandra first. That said I haven't looked at the landscape in a while and there might be much better tools.

But it also wouldn't matter what I want to use instead of DynamoDB, because the DevOps team of wherever I work will just choose whatever's native&managed by their chosen cloud provider.

throwaway82452

> The thing that would put me off using DynamoDB is the same reason I wouldn't use any other tech - can I download it?

Amazon provides a downloadable version for development. I don't know how close it is to the real thing, but it makes it easier to do local dev.

Localstack also supports it in their paid version

dygd

The downloadable version is nowhere near ready for production. It's performance is also excruciatingly slow.

Lapapapaja

> I still lean into events & eventual consistency to manage state across the various nodes.

You can get really far with a RDMS before event sourcing etc is needed, the benefit being both your dev and user experience are going to be much simpler and easier.

If you already know your problem domain and scaling concerns up front sure. But starting with a scalable pattern like this is a premature optimization otherwise and will just slow you down.

mrkeen

> You can get really far with a RDMS before event sourcing etc is needed

You can manage up to 0 partners easily. Once you go above that threshold, you're into "2-Generals" territory. At that point you're either inconsistent, eventually-consistent, or you're just bypassing your own database and using theirs directly.

> dev and user experience are going to be much simpler and easier.

I have objects, not relations. I'm not going to do the work of un-nesting a fat json transaction to store it in a single relation (or worse, normalise it into rows across multiple tables).

mike_hearn

Some RDBMS only offer single node consistency but others can scale write masters horizontally (e.g. Oracle).

njitbew

> Horrible dev experience, no decent clients/libs, complex pricing, weird scaling in/out mechanism, slow, it only works well for well defined use-cases.

Most of these arguments probably don't outweigh the benefits. If you're in need of a managed, highly-consistent, highly-scalable, distributed database, and you're already an AWS customer, what would you use instead?

oweiler

Aurora Serverless Postgres e.g.

belter

Completely different use cases....

andrewstuart

Postgres running in a computer.

llama-mini

> it only works well for well defined use-cases.

Sounds to me DynamoDB works well for well defined use-cases. That to me is a plus!

lambrospetrou

Exactly this.

DynamoDB is a pain in the ass if you want to do too many relational or arbitrary queries. It's not for data exploration.

It is my favourite database though (next to S3)! For cases where my queries are pretty much known upfront, and I want predictable great performance. As Marc Brooker wrote in [1], "DynamoDB’s Best Feature: Predictability".

I consistently get single digit millisecond GETs, 10-15ms PUTs, and a few more milliseconds for TransactWriteItems.

Are you able to complex joins? No. Are you able to do queries based on different hash/sort keys easily? Not without adding GSIs or a new table. The issue in the past few years was the whole craze around "single-table design". Folks took it literally as having to shove all their data in a single table, instead of understanding the reason and the cases that worked well. And with ongoing improvements of DynamoDB those cases were getting fewer and fewer over time.

But, that's what tradeoffs are about. With on-demand tables, one-shot transactions, actually serverless storage/scaling, and predictable performance you get very, very far.

1. https://brooker.co.za/blog/2022/01/19/predictability.html

belter

> DynamoDB is a pain in the ass if you want to do too many relational or arbitrary queries

You are using it wrong. And no, it's not irony.

moomoo11

I’ve used dynamodb before but most people have no idea what they’re talking about re: it and start blaming the tool.

Like when we implemented it me and my colleague spent a couple days understanding single table design and how to handle the access patterns we wanted to support.

Trying to explain this to paper smart but lazy colleagues who then skipped understanding and went straight to implement something wrong and blamed the tool really opened my eyes.

For us dynamo made sense. We were tracking global quantities and things like that. Didn’t need to be real time but did need to be present across regions fast.

belter

> It only works well for well defined use-cases.

This strongly hints at a misunderstanding of the purpose of a NoSQL system.

laurent_du

Scaling in and out is too slow to be usable. DynamoDB is great though, for many of my use cases it's a very good fit.

aynyc

Funny, I've just been asked to provide a basic implementation in DynamoDB for large amount of real time transactions for fraud detection. Let's see how that goes.

rednafi

No, thank you. DynamoDB promotes a terrible dev workflow, and there’s no way around it. It’s like embedding S3 directly into your application code.

The client libraries are gigantic, and the documentation is misleading at times. Plus, Dynamo expects your access patterns to be static, which isn’t true most of the time. Hyperscaling is great, but many aren’t willing to give up everything else just for that.

UltraSane

I tried using it once and designing a schema for it makes relational databases seem fun by comparison.

rednafi

Relational databases are indeed fun. There's no such thing as schemaless data; the schema either lives in your database or in your code. In the case of Dynamo, the schema lives in your code, determined by your access patterns. The problem is that access patterns change, and this state of extreme denormalization is difficult to adapt to evolving business requirements.

UltraSane

Schemas are a lot easier to modify in code. As a network admin I have spent many hours using solarwinds and directly using SQL to query the database and it is a non-normalized mess of duplicate fields. They have MAC addresses stored in 10 different places in 4 different formats. I think the reason for this is that they simply can't easily change existing tables so have to create new ones.

Kinrany

Schemaless and denormalized are orthogonal qualities

GiorgioG

DynamoDB is great if you want to fork over a lot of money to AWS on a monthly basis. It's inability to allow you to query on arbitrary attributes/fields without creating a separate index (i.e. copy of the entire collection) is mind-boggling. There are so many better, less expensive options out there for 99.99999% of use-cases. Friends don't let friends use DynamoDB.

belter

> It's inability to allow you to query on arbitrary attributes/fields without creating a separate index (i.e. copy of the entire collection) is mind-boggling.

This attitude of everybody is incompetent at <Big Company>...I need to teach them, is what is mind-boggling.

Your statement misunderstands the design principles of a system like this one or others similar. It's precisely those design principles that led to not allowing to run arbitrary queries on the backend.

You are trying to optimize for the 99.9th percentiles at massive scale...

unethical_ban

Hey Skinny,

What kind of system/application/query pattern does DynamoDB optimize for that relational databases are worse options for?

I'm legitimately ignorant and a lot of people seem to dislike it. I remember thinking that running queries in JSON seemed silly. Heh.

belter

My name is Marco Inaros!

> What kind of system/application/query pattern does DynamoDB optimize for

Start here:

"Real-world use cases for Amazon DynamoDB" - https://d1.awsstatic.com/events/reinvent/2019/REPEAT_1_Real-...

> I'm legitimately ignorant and a lot of people seem to dislike it.

Start here:

"AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB " - https://youtu.be/HaEPXoXVf2k

Then:

"Build with DynamoDB | S1 E5 – A Data Modeling Use Case Deep Dive" - https://youtu.be/mcZwJQ7O8iw

"Model hierarchical automotive component data using Amazon DynamoDB" - https://aws.amazon.com/blogs/database/model-hierarchical-aut...

"AWS re:Invent 2024 - Advanced data modeling with Amazon DynamoDB" - https://youtu.be/hjqrDqVaiw0

GiorgioG

No, it's the morons who pick a technology (in this case DynamoDB) without understanding it is the wrong tool for the job at hand, leaving the rest of us holding the bag for years on end for their poor choice of tools.

mlhpdx

Not my experience at all. The DX is different, but not bad in my, and it has been extremely reliable and inexpensive for my use cases and workloads. I built mainstream stuff and a global multi-region replicated DDB table with backups is essentially free.

That said, I don’t do query time aggregation anymore, which seems to be a common challenge folks hit. Other approaches that implement streaming/incremental aggregation and make it look roughly like query time probably are simpler.

insaneisnotfree

And how does this work if you have multiple (single table SIC) databases on a microservices architecture?

klysm

Any time I see the words “distributed transactions” I get alarm bells in my head that the wrong thing is probably being done.

qaq

AWS needs to buy a proper newSQL databse Dynamo is just horribly outdated product.

redwood

What's your take on Aurora DSQL?

My take is anything single cloud provider proprietary and tabular in 2025 is going to over time feel too limited. Having a json column doesn't cut it. But I'm a believer in document databases

sgarland

Relational databases (sans JSON columns) are limited in the same way that seatbelts limit your ability to be transfenestrated during a crash.

Having a rigid and well-designed schema is a mechanism to keep you or your team from doing stupid things.

tempworkac

nosql does not imply no schema

qaq

Honestly haven't really had to deal with use case where json column does not cut it, but I would imaging there might be some.

belter

> AWS needs to buy a proper newSQL databse Dynamo is just horribly outdated product.

I am sorry but engaging in good faith, can you quality a little bit? Are you aware DynamoDB works as tier one product within AWS? Meaning it's one of the core pillars of the implementation of many other products?

Did you look at these references?: https://aws.amazon.com/dynamodb/customers/

Have you seen the real world use?: https://aws.amazon.com/blogs/aws/prime-day-2023-powered-by-a...

qaq

very low limit on number of keys in transaction very low limit on ops/sec on a single key extremely cumbersome to change the data layout development experience is just abysmal

Before Spanner, Cockroach etc. for some workloads you didn't have alternatives but that time is long gone.

HN

Distributed Transactions at Scale in Amazon DynamoDB (2023)

Distributed Transactions at Scale in Amazon DynamoDB (2023)