AWS S3 SDK breaks its compatible services

87 comments

·February 20, 2025

js2

Okay, but I'd never expect an AWS SDK to remain backwards compatible with third-party services and would be leery about using an AWS SDK with anything but AWS. It's on the third parties to keep their S3-compatible API, well, compatible.

On the client side, you just have to pin to an older version of the AWS SDK till whatever compatible service you're using updates, right?

Also, this is the first I've heard of OpenDAL. Looks interesting:

https://opendal.apache.org/

It's had barely any discussion on HN:

https://hn.algolia.com/?q=opendal

saghm

> Okay, but I'd never expect an AWS SDK to remain backwards compatible with third-party services and would be leery about using an AWS SDK with anything but AWS. It's on the third parties to keep their S3-compatible API, well, compatible.

Back when Microsoft started offering CosmosDB, I was working at MongoDB on the drivers team (which develops a slew of first-party client libraries for using MongoDB), and several of the more popular drivers got a huge influx of "bug" reports of users having issues with connecting to CosmosDB. Our official policy was that if a user reported a bug with a third-party database, we'd do a basic attempt to reproduce it with MongoDB, and if it actually turned out to be a bug that was with our code, it would show up there, and we'd fix it. Otherwise, we didn't spend any time trying to figure out what the issue with CosmosDB was. In terms of backwards compatibility, we spent enough time worrying about compatibility for arbitrary versions of our own client and server software for it to be worth spending any time thinking about how changes might impact third-party databases.

In the immediate week or two after CosmosDB came out, a few people tried out the drivers they worked on to see if we could spot any differences, and although at least for basic stuff it seemed to work fine, there were a couple small oddities with specific fields in the responses during the connection handshake and stuff like that, and I think as a joke someone made a silly patch to their driver that checked those fields and logged something cheeky, but management was pretty clear that they had zero interest in any sort of proactive approach like that; the stance was basically that drivers were intentionally licensed permissively and users were free to do anything they wanted with them, and it only became our business if they actually reached out to us in some way.

jameslars

Hard agree. If AWS were offering “S3 compatibility certification” or similar I could see framing this as an AWS/S3 problem. This seems like the definition of “S3 compatible” changed, and now everyone claiming it needs to catch up again.

tabony

Agree too.

Just because you can get something to work doesn’t mean it’s supported. Using an Amazon S3 library on a non-Amazon service is “works but isn’t supported.”

Stick to only supported things if you want reliability.

julik

The interesting part here is that if AWS docs state "We expect header X with request Y" the "compatible storage" implementors tend to add validations for presence of header X. In that sense it is tricky for them, but I would still argue that from Postel's law perspective they should not validate things that strictly. There are also ways to determine whether a header is supplied. The AWSv4 signature format adds signed headers, and (AFAIK) the checksum headers usually get signed. The URL signature can be decoded and if the headers are present in the list of signed headers you can go and validate the presence of the said header. The SDK would never sign a header which it doesn't supply.

genewitch

a little less charitable is amazon is throwing its weight around to quash competition before they can get started; and shove tech debt onto third parties.

I stopped giving amazon the benefit of the doubt about any aspect of their operations about 8 years ago.

jameslars

Less charitable or More cynical? How is Amazon supposed to track a 3rd party pulling their SDK and then reverse-engineering their own service side to work with the SDK? Assuming we're all okay with that premise to begin with, all sorts of other questions start popping up.

Do these 3rd parties get veto power over a feature they can't support?

Can they delay a launch if they need more time to make their reverse-engineered effort compatible again?

It seems a hard to defend position that this is at all Amazon's problem. The OP even links to the blog post announcing this change months ago. If users pay you for your service to remain S3-compatible that seems like its on you to make sure you live up to that promise, not Amazon.

Clicking through to the actual git issues, it definitely seems like the maintainers of Iceberg have the right mental model here too. This is their problem to fix. After re-reading this post this mostly feels like a click-baity way to advertise OpenDAL, which the author appears to be heavily involved in.

WatchDog

It's one thing when the changes are obviously designed to damage competition, like Microsoft's embrace extend extinguish strategy, but in this case, the breaking changes seem to be pretty obviously motivated by a real need, and there isn't anything preventing so called "S3-compatible" storage services from implementing this new feature.

akerl_

Did Amazon recommend that other 3rd party products use their SDK as their own client?

elchananHaas

It should be fairly easy to upgrade compatible APIs server side from reading the AWS docs. All that needs doing is to accept and ignore the new checksum header. I also expect that taking advantage of the checksum would be reasonable, a CRC32 isn't that hard.

https://aws.amazon.com/blogs/aws/introducing-default-data-in...

julik

Seconded - it's also unnecessarily creative overvalidation on the part of the devs at those joints.

bandrami

Ceph does an S3 store as part of its filesystem, IIRC

tonyhart7

its unified storage layer well you guessed it using rust

profmonocle

Treating a proprietary API as a standard is risky - this is a good example of why. From Amazon's point of view there's no reason to keep the S3 SDK backwards compatible with old versions of the S3 service, because they control the S3 service. Once this feature was rolled out in all regions, it was safe to update the SDK to expect it.

Amazon may not be actively hostile to using their SDK with third party services, but they never promised to support that use case.

(disclaimer: I work for AWS but not on the S3 team, I have no non-public knowledge of this and am speaking personally)

freedomben

This is the correct take IMHO. I generally dislike Amazon (and when it comes to things like the Kindle, I actively hate them for the harm they are doing), but I think this is the key. S3 is not and never has been advertised as an open standard. It's API was copied/implemented by a lot of other services, but keeping those working is not Amazon's responsibility. It's on the developer of a service using those competitors to ensure they are using a compatible client.

I do think some of the vendors did themselves an active disservice by encouraging use of the aws sdk in their documentation/examples, but again that's on the vendor, not on Amazon who is an unrelated third party in that arrangement.

I would guess that Amazon didn't have hostile intentions here, but truthfully their intentions are irrelevant as Amazon shouldn't be part of the equation. For example, if I use Backblaze, the business relationship here is between me and Backblaze. My choice to use the AWS SDK for that doesn't make Amazon part of it anymore than it would if I found some random chunk of code on github and used that instead.

pradn

Well, you do have to worry about customers using old client libraries / SDKs, even if your whole backend has migrated to a new API.

Many customers don't like to upgrade unless they need to. It can be significant toil for them. So, you do see some tail traffic in the wild that comes from SDKs released years ago. For a service as big as S3, I bet they get traffic from SDKs ever longer than that.

arccy

the server has to be compatible with old clients, but new clients don't have to be compatible with old servers, which is the case here

pradn

Ah, I see.

freedomben

I think you've got the contract backwards. The server can't break old clients, but new clients can break the old server since Amazon controls the old server and can ensure that all of them are fully upgraded before the client updates are published.

julik

Even more so: treating a proprietary API as a standard but _also_ adding your own checks on top which crash the interaction "because it seemed more correct to you". No, you are not guaranteed a CRC and you are not guaranteed a Content-MD5. Or - you may be getting them, but then do check whether they are in the signed headers of the request at least.

dougb

I got bit by this a month ago. You can disable the new behavior by setting 2 environment variables

  export AWS_REQUEST_CHECKSUM_CALCULATION=when_required
  export AWS_RESPONSE_CHECKSUM_CALCULATION=when_required

or adding the following 2 lines to a profile in ~/.aws/config

  request_checksum_calculation=when_required
  response_checksum_validation=when_required

Or just pin your AWS SDK to version before the following.

<https://github.com/aws/aws-sdk-go-v2/blob/release-2025-01-15...>

<https://github.com/boto/boto3/issues/4392>

<https://github.com/aws/aws-cli/blob/1.37.0/CHANGELOG.rst#L19>

<https://github.com/aws/aws-cli/blob/2.23.0/CHANGELOG.rst#223...>

<https://github.com/aws/aws-sdk-java-v2/releases/tag/2.30.0>

<https://github.com/aws/aws-sdk-net/releases/tag/3.7.963.0>

<https://github.com/aws/aws-sdk-php/releases/tag/3.337.0>

and wait for your S3 Compatible Object store to add a fix to support this.

nicce

Oh, I just debugged this 4 hours today and now there is HN post and even better approach in comments….

aaronbrethorst

Many S3-compatible services are recommending that their users use the S3 SDK directly, and changing the default settings in this way can have a direct impact on their users.

This is wholly predictable; AWS isn't in the business of letting other companies OpenSearch them.

null_deref

I don’t think that the article at least state it was malicious. Also some (major?) businesses benefits exist when your company sets the standards for a large market

benatkin

That’s an odd way to describe it. Elasticsearch was a wrapper around Lucene. Had they started with a restrictive license rather than wait until it got popular under the nonrestrictive license, Solr might have taken off more. OpenSearch is what the community needed. That Amazon did it is fine.

aaronbrethorst

Here's a good discussion from here in 2021 about the fork that colors my perception: https://news.ycombinator.com/item?id=26780848

benatkin

Yeah Amazon’s motivations aren’t great, but it’s occupying a space that opened up when Elastic changed the license on Elasticsearch. Nobody’s going to create another permissively licensed alternative to it just because they’re annoyed it’s an Amazon project.

riknos314

As a user of S3, but not any service with an S3 compatible API, this execution of the change is perfect for me as I get the benefits with 0 effort - including needing to learn about the availability of the feature.

AWS is beholden first and foremost to their paying customers, and this is the best option for most S3 customers.

genewitch

amazon is beholden to their shareholders. The customers are an inconvenient necessity.

cyberax

[flagged]

joshstrange

I'm not sure what the other option is here? Keep old defaults and hope users update?

I wouldn't be happy to find out they did it /just/ to break third-party S3 providers but it seems like it's an easy enough thing to turn it off right?

I'm just not sure how comfortable I am with the phrasing here (or maybe I'm reading too much into it).

julik

The other option is not being overly strict with the data you receive from a client, especially when dealing with a protocol which is not a standard.

joshstrange

I think the issue is that the client (the SDK) is complaining about a missing header it’s expecting to receive due to a new default in the client.

My guess is the client has options you pass in, they added a new default (or changed one, I’m not clear on that), and the new default sends something up to the server (header/param/etc) asking for the server to send back the new checksum header, the server doesn’t respond with the header, the client errors out.

julik

Maybe I need to read more on this regression, you might be right.

benmanns

Another case of Hyrum's Law, where the entire functionality of the S3 SDK and any competing service provider borrowing from it becomes Amazon's problem to fix at their own cost. Maybe it's time for a non-Amazon but S3 API compatible library to emerge among the other cloud storage providers offering S3 compatible APIs. OpenDAL looks interesting. Also another reminder to run thorough integration tests before updating your dependencies.

smw

The problem here is that if you're providing a public s3 compatible object storage system, you likely have a number of users using the aws sdk directly. It's not your dependencies, it's your users' dependencies that caused the issue.

onei

It's not even just your users. I work on a S3-compatible service where a good chunk of the test suite is built on the AWS SDK.

In reality, AWS are the reference S3 implementation. Every other implementation I've seen has a compatibility page somewhere stating which features they don't support. This is just another to add to the list.

immibis

Which you told them to use.

xena

This bit us pretty hard at Tigris, but we had a fix out pretty quickly. I set up some automation with some popular programming languages so that we can be aware of the next time something like this happens. It also bit me in my homelab until I patched Minio: https://xeiaso.net/notes/2025/update-minio/

femto113

> the AWS team has implemented it poorly by enforcing it

This is whiny and just wrong. Best behavior by default is always the right choice for an SDK. Libraries/tools/clients/SDKs break backwards compatibility all the time. That's exactly what semver version pinning is for, and that's a fundamental feature of every dependency management system.

AWS handled this exactly right IMO. Change was introduced in Python SDK version 1.36.0 which clearly indicatesbreaking API changes, and their changelog also explicitly mentions this new default

   api-change:``s3``: [``botocore``] This change enhances integrity protections for new SDK requests to S3. S3 SDKs now support the CRC64NVME checksum algorithm, full object checksums for multipart S3 objects, and new default integrity protections for S3 requests.

https://github.com/boto/boto3/blob/2e2eac05ba9c67f0ab285efe5...

hot_gril

I want to see the author using GCP. That's where you get actual compatibility breakages.

kuschku

You mention semver, yet you also show that this API breaking change was introduced in a minor version.

Not entirely sure that's how things work?

r3trohack3r

You're not wrong - the semver doesn't indicate a breaking API change. But, to be fair, this wasn't a breaking API change.

Any consumer of this software using it for its intended purpose (S3) didn't need to make any changes to their code when upgrading to this version. As an AWS customer, knowing that when I I upgrade to this version my app will continue working without any changes is exactly what this semver bump communicates to me.

I believe calling this a feature release is correct.

0x457

While I agree that the author is just whining about this situation and that AWS did nothing wrong, I'd argue that a change in defaults is a breaking change.

julik

Sometimes people forget that the S3 API is not an industry standard, but a proprietary inspectable interface the original author is at liberty to modify to their liking. And it does, indeed, have thorny edges like "which headers are expected", "which headers do you sign", what the semantics of the headers are and so forth.

It is also on the implementors of the "compatible" services to, for example, not require a header that can be assumed optional. If it is not "basic HTTP" (things like those checksums) - don't crash the PUT if you don't get the header unless you absolutely require that header. Postel's law and all.

The mention in the Tigris article is strange: is boto now doing a PUT without request content-length? That's not even valid HTTP 1.1

robocat

Any third party is one update away from an external business shock should Amazon change their API.

Setting up a business so that all your customers fail at the same moment is a poor business practice: nobody can support all their customers breaking at once. I'm guessing competitors compete on price, not reliability.

Amazon has the incentive to break third parties, since their customers are likely to switch to Amazon. Why else use the Amazon code unless you're ready to migrate or the service is low importance?

r3trohack3r

I think there is a strong incentive to support the S3 API for for customer. Not having to change any of your code other than the URL the SDK points too probably makes closing sales way easier.

But if your customer remains on the S3 SDK, the same reduced switching cost you enjoyed is now enjoyed by your competitors - and you have to eat the support cost when you stop being compatible with the S3 SDK (regardless of why you are no longer compatible).

merb

actually the new default is sane. its WAY WAY WAY WAY better than before. especially for multi uploads. it's basically one of the features where gcloud had a insane edge. another thing was If-Match in cloud storage.

semiquaver

Just curious, what’s so much better about a different hash digest? Is CRC32 not fast enough?

merb

The past did not use crc32 , it used content-md5 but not everywhere. It also did not support full object checksums for multipart. And here is the thing that was problematic: if you uploaded parts you could not check the object after it was uploaded since aws did not calculate a hash and safe it so that you can do a head call and compare your locally generated checksum with the one online. There are cases where generating a checksum up front is infeasible and using content-md5 it was not so easy / fast to chunk the upload and generating the crc while uploading. And here is the biggest benefit: The crc algorithm do not need to be concatinated in order. So basically you can parallelize the hash generation since you can hash 4kb chunks and concat them. And in some cases the sdk did not generate a checksum at all.

Edit: I forgot, since full object checksums are now the default, aws can now upload multiple parts in parallel, which was not possible before. (For upload multipart)