It's five grand a day to miss our S3 exit

174 comments

·March 27, 2025

ifightcrime

Cloud has always been more expensive. I remember being quoted 250k/month for bandwidth when I was paying 15k with rackspace 10+ years ago. You’re paying for convenience and speed. The Math stops working when you grow to a certain point.

You can mitigate this to some extent by making some key architecture + vendor decisions upfront when first building… or just consider that some day you’ll need to do things like this. It’s not a novel problem.

vidarh

It's horrifyingly hard to convince people of this, though, even you can present them with actual numbers.

A lot of people have convinced themselves that cloud is cheap, to the point that they don't even do a cursory investigation.

A lot of those even don't do the bare minimum to reduce hosting costs within the cloud they choose, or choose one of the cheaper clouds (AWS is absolutely extortionate for anything that requires significant amount of outbound bandwith), or put caching/CDN's in front (you can trivially slash your AWS egress costs dramatically).

Most of my consultancy work is on driving cost efficiencies for cloud, and I can usually safely guarantee the fee will pay for itself within months because people don't fix even the most low hanging fruit.

tialaramex

Periodically management says we shouldn't have a DC, just put everything in the cloud.

OK says HPC, here's the quote for replacing one of the (currently three) supercomputers with a cloud service. Oh dear, that's bigger than your entire IT budget isn't it? So I guess we do need the DC for housing the supercomputers.

If we'd done that once I'd feel like well management weren't to know, but it recurs with about a 3-5 year periodicity. The perception seems to be "Cloud exists, therefore it must be cheaper, because if it wasn't cheaper why would it exist?" which reminds me of how people persuade themselves the $50 "genuine Apple" part must be better because if it wasn't better than this $15 part why would Apple charge $50 for it? Because you are a sucker is why.

azinman2

Apple may have markup, but the part is for sure more likely to be higher quality: https://www.cultofmac.com/news/apple-thunderbolt-4-cable-com...

whstl

It's just people conflating popularity with <every positive attribute>.

If <service> is popular, it must also be cheap, beautiful, well documented, have every feature that exists and make you popular with your friends.

I once had a Product Manager try to start an argument with me: "Explain to me how it is possible that the service we pay 25k a month doesn't have <feature>. You don't know what you are saying.". It just didn't do what he wanted, and getting angry with them over the phone didn't magically made the feature appear.

vidarh

Yeah, I used to be asked to price out a move to AWS every year at one position. After several years Hetzner finally got cheaper than operating our own colo's, but only basically because we were in London and London real-estate is expensive, and so colo space is accordingly expensive, while Hetzner's DC space is dirt cheap.

AWS, however, remained 2x-3x as expensive, with the devops time factored in.

> The perception seems to be "Cloud exists, therefore it must be cheaper, because if it wasn't cheaper why would it exist?

People are also blithely unaware that large customers get significant discounts, and so I regularly has to explain that BigCo X being hosted in AWS means at most that it is cost-effective for them because their spend means they're getting a significant discount over the already highest volume published pricing, and my clients usually are nowhere close to spend enough to be able to get those discounts.

magicalhippo

> If we'd done that once I'd feel like well management weren't to know, but it recurs with about a 3-5 year periodicity.

So basically every time management changes[1]?

[1]: https://maexecsearch.com/average-c-suite-tenure-and-other-im...

koliber

Regarding apple parts, I recently replaced a broken screen on a MacBook pro with an OEM part. I can’t get the color to look right. Not to mention the one vertical row where pixels look off (not dead, but not normal either). The guy at the shop said I would not notice. I am now kicking myself for not going with the real thing.

throwaway894345

I think management is just prone to wanting to believe the grass is greener on the other side. If you are already a cloud org with negotiated pricing and cost-optimization management would ask about building a data center and you would show them how much you would need to expand your IT staff in order to acquire the skills to operate the new data center never mind the upfront cost.

paulcole

[flagged]

diggan

> A lot of people have convinced themselves that cloud is cheap

I've noticed this too, freelancing/consulting around in companies. I'm not sure where this idea even comes from, because when cloud first started making the news, the reasoning went something like "We're OK paying more since it's flexible, so we can scale up/down quickly", and that made sense. But somehow today a bunch of people (even engineers) are under the belief that cloud somehow is cheaper than the alternatives. That never made sense to me, even when you take into account hiring people specifically for running the infrastructure, unless you're a one-person team or have to aggressively scale up/down during a normal day.

bostik

I can provide an example where cloud, despite its vastly higher unit costs, makes sense. Analytics in high finance (note: not HFT). Disclosure: my employer provides systems for that.

A fair number of our clients routinely spin up workloads that are CPU bound on hundreds-to-thousands of nodes. These workloads can be EXTREMELY spiky, with a baseload for routine background jobs needing maybe 3-4 worker nodes, but with peak uses generating demand for something like 2k nodes, saturating all cores.

These peak uses also tend to be relatively time sensitive, to the point where having to wait two extra minutes for a result has real business impact. So our systems spin up capacity as needed, and once the load subsides, terminates unused nodes. After all, new ones can be brought up at will. When the peak loads are high (& short) enough, and the baseload low enough, the elastic nature of cloud systems has merit.

I would note that these are the types of clients who will happily absorb the cross-zone networking costs to ensure they have highly available, cross-zone failover scenarios covered. (Eg. have you ever done the math on just how much a busy cross-zone Kafka cluster generates in zonal egress costs?) They will still crunch the numbers to ensure that their transient workload pools have sufficient minimum capacity to service small calculations without pre-warm delay, while only running at high(er) capacity when actually needed.

Optimising for availability of live CPU seconds can be a ... fascinating problem space.

SteveNuts

> paying more since it's flexible, so we can scale up/down quickly

I’ve heard this argument too and I think I’ve seen exactly one workload where it actually made sense and was tuned properly and worked reliably.

dagw

I've noticed this too, freelancing/consulting around in companies. I'm not sure where this idea even comes from

Internal company accounting can be weird and lead to unintuitive local optima. At companies I've worked at, what was objectively true was that cloud was often much cheaper than what the IT department would internally bill our department/project for the equivalent service.

vidarh

I think it's because people think their workloads are extremely spiky, and so assume they will spin up/down loads enough to save money, and that has translated into cloud being perceived as cheap.

But devs rarely pay attention to metrics. I've had clients with expensive Datadog setups where it was blatantly obvious that nobody had ever dug into the performance data, because if they did they'd have noticed that key metrics were simply not fed to it.

If they did pay attention, most of them would realise that their autoscaling rarely kicks in all that much, if at all. Often because it's poorly tuned, but also because most businesses see small enough daily cycles.

Factor in that the cost difference between instances vs. managed servers is quite significant, and you need to have significant spikes much shorter in duration than most businesses day/night variation to save money.

It can make sense to be able to spin up more capacity quickly, but then people need to consider that 1) a lot of managed hosting providers has hardware standing by and can automatically provision it for you rapidly too - unless you insist on only using your own purchased servers in a colo, you can get additional capacity quickly, 2) a lot of managed hosting providers also have cloud instances so you can mix and match, 3) worst case you can spin up cloud instances elsewhere and tie it into your network via a VPN.

Some offer the full range from colo via managed servers to cloud instances in the same datacentres.

Once you prep for a hybrid setup, incidentally, cloud becomes even less competitive, because suddenly you can risk pushing the load factor on your own/managed servers much closer to the wire, knowing you can spin up cloud instances as a fallback. As a result, the cost per request for managed servers drops significantly.

I also blame a lot of this on business often shielding engineering from seeing budgets and costs. I've been in quite senior positions in a number of companies where the CEO or CFO were flabbergasted when I asked for basics costing of staff and infra, because I saw it as essential in planning out architecture. Engineers who aren't used to seeing cost as part of their domain will never have a good picture of costs.

jinjin2

Yes. We saved ridiculous amounts of money (and made it a lot faster) by moving our analytics workloads from Snowflake to a few bare-metal nodes running Exasol. But it took months to convince management even though we had clear numbers showing the sheer magnitude of the cost reduction. They had drunk the cloud kool-aid, and were adamant that it would be cheaper, numbers be damned.

mbreese

I think one business argument for cloud is capital expenses vs operational expenses. If you’re (over) paying for cloud resources vs an in house option (or colo), those are numbers that are a straight expense. When you own hardware, those are on your books until they depreciate off. For some businesses, that can make sense.

Now, a good accountant probably wouldn’t care one way or the other. Debits and credits balance either way. And spending more still means less profit in the long term, no matter how it looks on the books. But, in addition to the flexibility, that was what I always thought of as the main cloud benefit. It’s the same with leasing vs buying cars/computers/etc…

NorwegianDude

You don't have to buy the hardware. It's very common to rent it.

vidarh

But that too is based on people not knowing the alternatives, as renting managed servers can be close to a wash vs. leasing hardware for a colo (often to the point that relatively cost of land near your preferred managed hosting providers vs. colos that work with access to staff etc. might be what makes one or the other cheaper). Buying outright can be cheaper but isn't necessary.

None of the colo'd setups I've worked on bar one used purchased servers - it's all been leased. But the majority of non-cloud workloads I've worked on have not even been leased, but rented.

tialaramex

Cloud made sense for the startup I worked for previously. If you are a startup then a $1M per year expense makes much more sense than a $5M up front purchase with 5-10 years of life - in five years you might be billionaires or you might be bankrupt and until then the Cloud was better.

ksec

Partly because AWS give out a lot of free credit for start ups, and basically allow them to grow without planning any infrastructure. VCs who are invested into Amazon also wants to push the cloud narrative. Starts up who dont want to deal with servers, want massive scale when they think the website and later an app went viral.

That was in the late 00s and early 10s. PHP, Python, Ruby and even Java were slow. Every single language and framework has had massive performance improvements in the past 15 to 20 years. Anywhere from Java 2x to Ruby 3 - 10x.

When a server max out at 6 - 8 with Xeon core, compare to today at 192 Core. Every Core is at leats 2 - 3x faster per clock, with higher clock speed we are talking about 100x difference. Especially when IO used to be on HDD, SSD is easily 1000x faster. What used to wait for I/O is no longer an issue, the aggregate difference when all things added together including software could be 300x to 500x.

What you would need 500 2U server in 2010, you could now do it in one.

Modern web developers are so abstracted with hardware I dont think many realise what sort of difference in hardware improvements. I remember someone posted before 2016 Basecamp had dozens of Racks before moving to cloud. Now they have grown a lot bigger with Hey and they are only doing it with 8 racks and room to spare.

AWS on the other hand is trying to move more workload to ARM Graviton where they have a cost advantage. Given Amazon's stock price are now dependent on AWS, I dont think they will lower their price by much in the future. And we desperately need some competition in that area.

graemep

For smaller businesses it seems to be its the safe option because its what everyone does.

I have even had it suggested that it might make selling a business or attracting investors harder if you used your own servers (not at the scale of having your own datacentre, just rented servers - smaller businesses still).

Another thing that comes up is that it might be more expensive but its a small fraction of operational expenses so no one really cares.

whstl

For smaller businesses it's often "the only thing Joe knew when he was building it".

j45

You have a great point about finding cost efficiencies - there was a time cloud was cheaper.

Maybe it's an understanding that doesn't change because the decision makers were non-techincal people (when finance oversees IT despite not understanding it)

Virtualizing and then sharing a dedicated server as a VPS was a big step forward.

Only, hardware kept getting cheaper and faster, as well as internet.

vidarh

> when finance oversees IT despite not understanding it

... and when IT often do not even get to see the spend, and/or isn't expected to.

I've had clients where only finance had permissions to get at the billing reports, and engineering only ever saw the billing data when finance were sufficiently shocked by a bill to ask them to dig into it - at which point they cared for long enough to get finance off their backs, and then stopped caring again.

peeters

The reality is when you get to another certain point (larger than the point you describe) you start negotiating directly with those cloud providers and bypass their standard pricing models entirely.

It's the time in between that's the most awkward. When the potential savings are there that hiring an engineering team to internalize infrastructure will give a good return (were current pricing to stay), but you're not so big that just threatening to leave will cause the provider to offer you low margin pricing.

All I'd say is don't assume you're getting the best price you can get. Engineers are often terrible negotiators, we'd rather spend months solving a problem than have an awkward conversation. Before you commit to leaving, take that leverage into a conversation with your cloud sales rep.

aleph_minus_one

> Engineers are often terrible negotiators, we'd rather spend months solving a problem than have an awkward conversation.

My experience is the opposite: lots of software developers ("engineers") would love to do "brutal" negotiations to fight against the "choking" done by the cloud vendors.

The reason why you commonly don't let software developers do these negotiations is thus the complete opposite: they apply (for the mentioned reasons) an ultra-hardball negotiation style (lacking all the diplomatic and business customs of politeness) that leads to vast lands of burnt soil. Thus, many (company) customers of the cloud providers fear that this hardball negotiation style destroys any future business relationship with the respective (and perhaps for reputation reasons a lot of other) cloud service provider(s).

TrueDuality

Even with the discounts of volume pricing cloud prices are still quite inflated unless you need to inherit specific controls like the P&E ones from FedRAMP High/GovCloud. The catch there is lock-in technologies that may require to re-develop large swaths of your applications if you're heavily reliant on cloud-native tools.

Even going multi-region, hiring dedicated 24/7 data center staff, and purchasing your own hardware amortizes out pretty quickly and can you a serious competitive advantage in pricing against others. This is especially true if you are a large consumer of bandwidth.

diggan

> The reality is when you get to another certain point (larger than the point you describe) you start negotiating directly with those cloud providers and bypass their standard pricing models entirely.

And even if you do, you still end up with pretty horrible pricing, still paying per GB of "premium" traffic for some outrageously stupid reason, instead of going the route of unmetered connections and actually planning your infrastructure.

selfhoster

> It's the time in between that's the most awkward.

That's an odd way to describe hemorrhaging money.

dbbk

But the article states they negotiated.

peeters

This was more a response to the comment I replied to, that cloud is always more expensive. And saying it more for everyone, not OP.

It's almost always less expensive at the start, which is super important for the early stages of a company (your capital costs are basically zero when choosing say AWS).

Then after you're established, it's still cheaper when considering opportunity costs (minor improvements in margin aren't usually the thing that will 10x a company's value, and adding headcount has a real cost).

But then your uniqueness as a company will come into play and there will be some outsized expense that seems obscene for the value you get. For the article writer, it was S3, for the OP, it's bandwidth. For me it's lambdas (and bizarrely, cloud watch alarms). That's when you need to have a hard look and negotiate. Sometimes the standard pricing model really doesn't consider how you're using a certain service, after all it's configured to optimize revenue in the general case. That doesn't mean the provider isn't going to be willing to take a much lower margin on that service if you explain why the pricing model is an issue for you.

jonatron

At what sort of scale can you do that? $1M, $10M, $100M, $1B?

peeters

So obviously this is an extreme, but I worked for a company that had long dismissed third party cloud providers as too expensive (customers would be routing all of their network traffic through our data centers, so obviously the bandwidth costs would just be too dang high). Then that company got purchased by a certain mega corporation who then negotiated an exclusive deal with GCP, and the math flipped. It was now far too expensive to run our own set of datacenters. Google was willing to take such a low margin on bandwidth that it made no sense not to.

So in this case, hundreds of billions. But the principle stands at lower company sizes, just with different numbers and amounts of leverage.

sokoloff

I don’t remember if our first enterprise agreement was at $1M or $2M, but it was low and in that neighborhood [but also 10 years ago, well before cloud was the default and had growth baked into it].

Cloud providers are looking for multi-year term, commitment to growth as much as/more than exact spend level now.

tecleandor

In my experience with GCP, go through a Google partner (that will aggregate multiple clients to get discounts) and you'll be able to get commitment discounts with $500K/year or even less. But don't save too much money during your commitment period: if you don't expend your commitment, you'll pay for it anyway, and you might even lose some discounts.

Also, one trick to inflate your commitment expenses is asking your SaaS providers if it's possible to pay them through AWS or GCP marketplaces: it often counts against your commitment minimum expense, so not everything has to be instances and storage.

ksec

Sounds like the trap for Middle Class.

binarymax

Even without your own rack or colo, The math with AWS stops working as soon as you no longer fit in the free tier, since providers like Hetzner are 40% cheaper.

gizmo

S3 is designed for 99.999999999% durability. Hetzner's Volume storage is just replication between 3 different physical servers.

In terms of durability that's a universe apart.

SteveNuts

S3 is beyond impressive, but how many workloads truly need that? I’ve never had a single instance of data loss on a NetApp or Pure array.

sebazzz

On the other hand you have transient failures in the cloud (at least on Azure - this behavior is even documented) so does that count towards the 99.99999%?

jstummbillig

I find it intuitively absolutely bizarre that Cloud does not outright win at any scale. In my mind everything about it seems more optimizable with more scale. Obviously I am missing something, but all Cloud pricing looks so significantly more expensive than I feel it should in a healthy and mature market.

djha-skin

It's not a novel problem but it _is_ a relatively novel (bad) economic environment. We've been in "let the good times roll" mode longer than ten years. In comparison to 2009-2011, it was different. Many ops professionals are younger than that and have gone their entire careers without doing anything on premise.

I remember trying to convince some very talented but newly minted ops professionals -- my colleagues -- to go on prem for cost. This was last year. They were scared. They didn't know how that would work or what safety guarantees there would be. They had a point, because the org I was at then didn't have any on prem presence, since they were such a young organization that they started in the cloud during "the good times". They always hired younger engineers for cost, so nearly no one in the org even knew how to do on prem infra. Switching then would have been a mistake for that org, even though cloud costs (even with large commit agreements) were north of six figures a month.

hodgesrm

> The Math stops working when you grow to a certain point.

That point is different for every business. For most of them it depends on how big cloud is in your COGS (cost of goods sold) which affects gross margins, which in turn is one of the most meaningful measures of company financial health. Depending on the nature of your business and the amount of revenue you collect in sales, many companies will never reach the point where there's measurable payback from repatriating. Others may reach that point, but it's a lower priority than other things like opening up new markets.

Many commenters seem to hold very doctrinaire opinions on this topic, when it's mostly basic P&L math.

t0mas88

Around that certain point you can also talk to AWS or GCP and get very significant discounts. I'm surprised 37signals and AWS didn't find a number that worked for both.

I've seen a few of these deals with other vendors up close, the difference with public pricing is huge if you spend millions per year.

dilyevsky

I worked/consulted for several companies who had multimillion per year cloud commits, sometimes with different clouds, and those discounts are not competitive with onprem like at all

bigfatkitten

DHH has said previously that they already have a very good deal when compared with list price. But AWS still couldn't come close to on prem costs.

j45

If it takes talking to them to get discounts, might as well look at all the options and get the real discount of not being on the cloud.

dangus

It’s more than mere “convenience.” You’re also paying to avoid hiring a bunch of employees to physically visit data centers around the globe.

And if you’re not doing that you are hiring a bare metal servers provider that is still taking a portion of the money you’d be paying AWS.

Even if you don’t need to physically visit data centers thanks to your server management tools, the difference in the level of control you have between cloud and bare metal servers is large. You’re paying to enable workflows that have better automation and virtual networking capabilities.

I recently stood up an entire infrastructure in multiple global locations at once and the only reason I was able to do it in days instead of weeks or months was because of the APIs that Amazon provides that I can leverage with infrastructure automation tooling.

Once you are buying AWS reservations and avoiding their most expensive specialized managed products the price difference isn’t really worth trying to recover for many types of businesses. It’s probably worth it for Hey since they are providing a basic email service to consumers who aren’t paying a whole lot. But they still need something that’s “set it and forget it” which is why they are buying a storage solution that already comes with an S3 compatible API. So then I have to ask why they don’t save even more money and just buy Supermicro servers and install their own software? We all know why: because Amazon’s APIs are where the value is.

There is a lot of profit margin in software and usually your business is best spending their effort working on their core product rather than keeping the lights on, even for large companies. Plus, large companies get the largest discounts from cloud providers which makes data centers even less appealing.

“Convenience” isn’t just convenience, it’s also the flexibility to tear it all down and instantly stop spend. If I launch a product and it fails I just turn it off and it’s gone. Not so if I have my own data center and now I’ve got excess capacity.

luckylion

I agree, but I don't think you're in the majority. I don't think most cloud-customers are utilizing all of those additional things that a big cloud provider offers.

How many are actually multi-region? How many actually do massive up/down-scaling on short notice? How many actually use many of those dozens to hundreds of services? How many actually use those complex permissions?

My experience tells me there are some, but there are more who treat AWS/GPC/Azure like a VPS-hoster that's 5-10x more expensive than other hosters. They are not multi-region, they don't do scaling, they go down entirely whenever the AZ has some issues etc. The most they do is maybe use RDS instead of installing mysql/pgsql themselves.

bigfatkitten

> How many are actually multi-region?

The fact half the internet seems to fall over whenever us-east-1 has a hiccup is quite telling.

dangus

A lot more than you’re giving them credit for.

This idea that their basic users go down entirely when the AZ has some issues is ridiculous, a standard autoscaling group and load balancer basically forces you to be multi-AZ. Very much unlike a VPS.

Using RDS instead of self-installing SQL eliminates the need for an entire full time role for DB admin. So that’s kind of a big deal despite it being a “basic” use case.

A lot of services like ECS, elastic beanstalk, can make it so that you can wait longer to hire operations people and when you do they can migrate to more scalable solutions without having to do a major migration to some other provider or build up a self hosted solution custom. If you outgrow a VPS you have to do a major migration.

And if you take a look at the maturity and usefulness of the terraform providers SDKs, and other similar integrations of VPS and bare metal providers they are very basic when comparing to BOTO and the terraform provider.

I struggle to replicate the level of automation I can achieve with these cloud tools on my own homelab with Proxmox.

scarface_74

I can’t speak too much for small companies. But there are a lot of large enterprises and smaller businesses and government agencies that do use more AWS services than just compute + storage + web services. Do need the elasticity etc.

For instance, I was surprised how large the market was for Amazon Connect - Amazon’s hosted call centers. It’s one of the Amazon services I have some experience in and I still get recruiters contacting me for those jobs even though I don’t really emphasize that specialty.

My experience is from 7 years of working with AWS. First at a startup with a lot of complex ETL and used a lot of services. But the spend wasn’t that great.

My next 5 years was between working at AWS (Professional Services) and two years at a a third party consulting company (full time) mostly as an implementation lead.

Even though my specialty is “cloud native application development” and I avoid migrations like the plague, most of the money in cloud consulting are large companies deciding to move to the cloud because they decided that the redundancy, lower maintenance overhead, and other higher level services were worth it.

j45

This might be a little incomplete.

It's trivial, to get equipment at a datacenter, where the equipment is visited for you on your behalf if you wish.

You can place your own equipment in a datacenter to manage yourself (dedicated servers).

You can have varying amounts of the hardware up to the software layer managed for you as a managed server, where others on site will do certain tasks.

Both of these can still be cheaper than cloud (which provides a convenience and a large markup to make often open source tools easy to administer from a web browser), and then paying someone to manage the cloud.

Global location at once can still be done with the reality of hybrid-cloud or cloud-agnostic setup requirements (not to be tied to one cloud only for fallback and independence).

comrade1234

Years ago (10+) when I was deciding to use aws or hosting our own hardware in a colo for my company I found a spreadsheet where you plug in your hardware and hosting costs and the current Amazon pricing for their various products and it would give you how long to break-even on the initial investment.

It was always just above three years to break even. After we decided to self-host I still kept tracking the prices in the spreadsheet and as hardware costs fluctuated Amazon adjusted their prices to match. I figured someone at Amazon must have had the same spreadsheet I was working with.

vidarh

Sounds way too long unless you use no extra services, and no outbound bandwidth to speak of.

AWS outbound bandwidth costs in particular is tens of times higher than what you can get elsewhere, to the point that when clients insist on S3, e.g. because they're worried about durability (which is a valid consideration), I usually ask if they'd be happy to put a hefty cache at a cheaper provider in front - if you use lots of bandwidth, it's not usual for it to be cost effective to cache 100% of the dataset somewhere cheaper just to avoid AWS bandwidth charges.

spockz

Is that including or excluding write off? If you have to replace the hardware every 3 years then it would be equivalent. IIRC hardware is replaced every 3-5 years because otherwise it is out of support.

sgarland

That’s an ops decision. The most common things to die are RAM and PSUs. The latter are redundant on every server I’ve ever seen. The former is dirt cheap (especially as the hardware ages), and extremely easy to replace.

I have 13 year old Dell R620s that have been running 24/7/365 for a few years in a suboptimal environment at this point (I mean, minus occasional restarts for kernel updates, brief maintenance periods, etc.). The only thing I’ve had to replace were RAM and a single PSU.

j45

Write off is am accounting term, not operational.

Hardware like cars and laptop can continue to perform after they are written off, or even after the warranty.

The grade of hardware used is critical in servers.

Hyperscaling might mean commodity based servers. Hosting a large app does not mean using commodity component servers.

Hardware, when self hosting, does not need to be replaced every 3-5 years because it does not fail every 3-5 years. Depends on load and a bunch of factors.

Why?

We wouldn’t buy the cheap and disposable components a massive cloud or social media network might use to scale faster because they have a massive budget.

Besides, do providers really replace all their servers every 3-5 years? Hosting companies don’t seem to.

The cloud is many multiples more expensive than self hosting especially at scale. Hosting and cloud tools have brought down labour costs tremendously.

For the hardware, with the extremely clean environments servers run in, plus much cleaner electricity hardware runs much longer.

Purchase actual enterprise grade servers (HP Proliant, etc) that a company would buy for themselves for maximum reliability (compared to the commodity based ones of clouds) and those have so much reliability built into them that they sometimes never die.

You can still buy used proliant servers many, many, many generations old and they hum along just fine. It is bizarre but not.

Support is a few things. Warranty on parts and software. Extended support options which amounts to (hardware warranty) are always available for a fee, and achievable on your own.

If your software is a hypervisor you will be mirrored.

If a server has an issue the affected machine moves the load elsewhere.

The server has hot swap equipment. Takes a few moments to swap components if needed.

If you are self hosting, you can buy a used server or two or theee to have a backup and mirror and spare parts. It’s like buying a few NUCs.

Hosting corporately can be done not just with buying, but leasing too (meaning hardware swapping can happen). Add to this moving older equipment to less demanding tasks (if they ever do stay at load)z

vidarh

> Write off is am accounting term, not operational.

That's the point. I've just decommisioned 10 year old servers for a client. They were still working fine, but the system had finally been replaced.

If you're calculating break-even based on the rate at which you're writing off the accounting value of the servers, you'll end up with a far longer time to break-even than if you amortise the hardware cost over the projected actual lifetime of the hardware.

rco8786

I'm sure Amazon has mountains of data to arrive at that target number. But subjectively, it also feels about right in terms of juicing just the most profit margin from customers without being obviously overpriced. It's a nice "sweet spot" where it's just too far out on the horizon for companies, especially SMBs, to really account for or to think the up front cost/effort is currently worth it vs putting that effort into revenue generating activities.

Hardware cycle is probably about 3 years also.

Gigachad

Business needs change often enough that it’s hard to justify locking something in for three years vs paying a small amount extra for flexibility.

rvz

> $1.5 million/year

That is excessive and it's already $4K a day.

Lots of teams really underestimate cloud costs since there is an assumption that the hundreds of millions they are raising will give them enough runway to survive a few years despite losing money for years.

Even scaling would be somewhat of an issue depending on the tech stack. Imagine the cost of running standard Java micro-services and the "solution" was to "spin up hundreds of more nodes". The worst that I have seen was a bank proudly having up to 8,000 - 10,000 separate micro-services.

Just imagine the daily cost of that. Unjustifiable.

But of course the AWS cloud consultants would be happy to shill you their offerings at "cheap" prices, but in reality the pricing is designed for you to accumulate millions in costs as you scale on the tiniest amount of usage, even for testing.

So before you build the software, one must think about the costs of scaling if it becomes widely used rather than taking the easy approach in just spinning up nodes and increasing more costs and act as if you don't have the capital to solve the problem. You can only do that for so long until you don't.

whstl

I remember that at a previous company somehow it leaked that the AWS cost was 50% of all the developers staff salary.

There was nowhere near the same volume of data as Basecamp/Hey, not there was much processing power needed. It was purely bad engineering accumulated over 10 years.

lpapez

I was once contracted to work on a project where the monthly GCP bill for Postgres was $60k per month - this was basically my YEARLY rate at that time, just for managed Postgres.

After some time I was quite familiar with their stack and had gathered considerable domain experience. This led to an idea how to halve the database load (and the cost would presumably fall by a similar percentage), which I wanted to use as leverage during contract renegotiation.

I boldly offered to work for free to halve their database load, in exchange for being paid half the money this optimization would save over the course of one year. This would basically triple my pay, and they would still save money.

They declined, and I moved to a better opportunity.

Last I heard they had to pay a team of 4 new consultants for a year to implement the same idea I had. Without the domain knowledge, consultants couldn't progress as fast as I suspect I could have done (my estimated was 2 months of work).

I know it's very petty, but I regret revealing too many implementation details of the idea during the pitch and allowing the company to contract other consultants to see it done.

skrebbel

If you built up that domain knowledge while being paid top dollar per hour by the same company, then I understand their reluctance to go along with your offer. It feels a little bit extortionate to be honest. I wouldn't go along with it either, not because it's a bad deal in isolation, but because it sets a bad precedent. It basically tells every employee/contractor that if they know a way to add a lot of measurable value, they can use that as a bargaining chip to 3x their pay. This also discourages trying to add any value that isn't as easily expressed in dollars (which is the case for many important things, such as product quality improvements).

I think part of the expectation when contracting somewhere long-term (or just being an employee, for that matter) is that the amount of value you add per hour worked increases sharply over time, and slower than your fee. In other words, initially you're overpaid wrt your value-add, and then that corrects itself over time as you figure out what the company is all about.

vidarh

I've made similar pitches to clients many times, and one thing I've learned is that ironically the problem is promising the actual saving, vs. offering a much smaller saving.

The challenge is that people don't believe you when you tell them they can save that much, no matter how evidence you prepare. I'm starting a sales effort for my agency right now, and one of the things we've worked on is to promise less than what we determine we can deliver after reviewing the clients costs, and raising our prices, because it's ironically easier to close on the basis of a promise to deliver 20%-30% savings at a relatively high cost than a promise to deliver 50%+ with little effort.

whstl

> I know it's very petty

No it isn't.

sgarland

My current and last jobs had monthly RDBMS bills in excess of $1 million/month. It is staggering. We could buy two fully-loaded 42U racks in separate DCs and be net positive after a few months. I’ve done the math, in great detail.

No go. “It’s hard to hire for that skill set.” Is it $9 million/year hard?! You already have a team lead – me. This shit is not that hard; people will figure it out, I promise.

j45

It's not petty. It's the profit margin average solutions get to make over expertise with individual tools, or groups of tools.

Aperocky

Is 50% that bad? If instead you hire engineer to maintain access to some kind of file storage on the internet, would it cost more or less?

Would be alarming if it is 500% the staff salary, but at 50% that just seems the cost of outsourcing to standard that likely won't be achieved in house.

whstl

Considering it was about 100 developers, it was horrible.

The two major problems were:

1. The volume of data itself was not that that big (I had a backup on my laptop for reproductions), but it was just too heavy for even the biggest things in AWS. Downtimes were very frequent. This is mostly due to decisions from 10 years ago.

2. Teams constantly busy putting out fires but still getting only 1-2% salary increases due to lack of new features.

EDIT: Since people like those war stories. The major cause for the performance issues was that each request from an internal user would sometimes trigger hundreds of queries to the database. Or worse: some GET requests would also perform gigantic writes to the Double-Entry Accounting system. It was very risky and very slow.

This was mostly due to over-reliance on abstractions that were too deep. Nobody knew which joins to make in the DB, or was too afraid, so they would instead call 5 or 6 classes and joining manually causing O(N^2) issues.

To give a dimension of how stupid it was: one specific optimization I worked on changed the rendering time of a certain table from 25 seconds to 2 miliseconds. It was nothing magic.

I'm glad I left.

vidarh

My experience from offering devops services on retainer to a number of clients is that the ones that host in cloud environments spend more money on me for similar scale setups than the ones that host on managed setups.

And even if you don't want the hassle of storing the data yourself, there are many far cheaper outsourced options than S3.

null

[deleted]

Nextgrid

> the hundreds of millions they are raising will give them enough runway to survive a few years despite losing money for years.

It's more that the decision makers at every stage are not incentivized to care, or at least, were not during the ZIRP period. This is slowly changing, as evidenced by more and more talks of "cloud exits".

Software engineers are encouraged by the job market to fill their resume with buzzwords and overengineer their solutions.

Engineering managers are encouraged by the job market to increase their headcount, so complicated solutions requiring lots of engineers actually play in their favor.

CTOs are encouraged by the job and VC funding market to make it look like their company is doing groundbreaking things and solving complex problems, so overengineering again plays in their favor. The fact these problems are self-inflicted doesn't matter, because everyone is playing the same game and has no reason to call them out for it.

Cloud providers reward companies/CTOs for behaving that way by extending invites to their conferences, which gives the people involved networking opportunities and "free" exposure for the company to hire more engineers to fuel the dumpster fire even more.

no_wizard

Testing in particular is something I hate about AWS being the most egregious.

You don’t get any testing services baked into the pricing, you’re paying production pricing for setting up / tearing down environments for testing. They have little to nothing in the ways of running emulators locally for services and it leads to other solutions of varying quality.

It’s outrageous and something i will always hold against AWS forever. Not to mention their CDK is for shit. Their APIs are terrible and poorly documented. I don’t know why anyone chooses them still other than they seem to have the “nobody got fired for choosing AWS” effect.

Azure is really good at providing emulators for lots of their core services for local testing for instance. Firebase is too, though I can’t vouch for the wider GCP ecosystem

4ndrewl

This is where your choice of which cloud services to use comes into play - Containerised web apps with Postgres on RDS? Simple to move off onto self hosting _if_ you can prove a business model that needs scaling. All-in on some proprietary services - less so.

iLoveOncall

> Even scaling would be somewhat of an issue depending on the tech stack. Imagine the cost of running standard Java micro-services and the "solution" was to "spin up hundreds of more nodes". The worst that I have seen was a bank proudly having up to 8,000 - 10,000 separate micro-services. Just imagine the daily cost of that.

I'm not going to preach for thousands of micro-services necessarily, but they also make scaling easier and cheaper.

Not every service in your application receives the same load, and being able to scale up by increasing the 20% of Lambdas that receive 80% of the traffic, will result in massive savings too.

owebmaster

> but they also make scaling easier and cheaper.

Easier is arguable but cheaper is not, for sure.

Nextgrid

> a bank proudly having up to 8,000 - 10,000 separate micro-services

Monzo in the UK?

killer32

We’ve been running Rook-Ceph in production across multiple client environments. In one example, we built a setup with 8 refurbished Dell servers (128GB RAM, 8–14 JBOD disks each) over 10G networking. It supports geo-replication between sites and has been stable for over 2 years. Total hardware cost was under $100k.

Rook simplifies the operational overhead of Ceph quite a lot, especially in Kubernetes-native stacks. For teams with large data and HA requirements, it's been a solid on-prem alternative. SLA-backed managed services are also becoming more common, which helps reduce the operational burden even further.

andrewaylett

I'd like to note that they're not swapping like-for-like here. The differences might not be material for the use-case, and AWS probably don't even offer a product that does exactly what the Pure Storage system does, but S3 has a lot more resiliency as well as other features.

That doesn't mean they're wrong to move, it means you need to be careful to make sure that you pay for what you need, and try to avoid paying extra if all it gives you is stuff you don't want. I value the extra functionality, so I'm not moving my data off S3.

diggan

I wish people were as careful when thinking about adopting S3 as they seemingly are careful when others move away from S3.

Somewhere along the line, people starting defaulting to "at least 3 nodes for backend" and "cloud services for all infrastructure" even if the product they're building haven't even found product market fit.

Sure, if you know for a fact that your traffic will go and down more than 50% during a normal day, go for something that scales up and down quickly. But for most other use cases, the extra cost of cloud doesn't really make much financial sense, unless you're a fat VC-funded startup cat.

gavinray

That reminds me of a friend who ran a SaaS business off a Heztner VPC.

It was a Node.js app he deployed via SSH and ran under a systemd job.

Used directories of JSON files as a database and the business logic was handled by a single endpoint that took JSON RPC payloads with different action types and metadata.

The app scaled to ~10,000 daily users like this.

stavros

I run multiple SaaSes, some popular, some not so much, off a single $20/mo Hetzner VPS with Dokku on it. It works great, and I've never needed to worry about anything.

Meanwhile, I see friends working on MVPs with 1-2 non-paying customers who already have costs in the thousands of dollars a month, but "it's fine because we got free money for a year". Yes, but that means that your company now has an expiration date of a year.

andrewaylett

Hear, hear.

My personal AWS bill is roughly $10/month, all for S3. We're not talking millions here :). Personal compute is a mix of OVH and on-prem.

Work is an entirely different kettle of fish, at an entirely different scale, and primarily runs compute on spot: https://aws.amazon.com/blogs/aws/capacity-optimized-spot-ins...

Being able to scale down, rather than needing to pay for peak capacity, genuinely does save us large amounts of money. But it's a capability that we needed to build out, not something that happened by magic. And it does require that our services are big enough to scale for load, not just for redundancy.

nodesocket

Any idea what software they are using for the s3 data migration? rclone?

roxolotl

Yea I’d love to know the answer to this. Of course you can just download petabytes of data but I imagine there’s gotta be more to it than just rclone. Or maybe it is that simple.

Symbiote

I used rclone to migrate about 2PB of data between datacentres last year.

There were some ways to make it go a bit faster, found by reading the rclone manual, but otherwise no surprises.

I wasn't sure what the maximum transfer speed could have been, but as one side was still the production system I didn't care to reach the limit anyway. Over 10Gb/s anyway.

nodesocket

Were the files lots of smaller files or lots of large files? I imagine if lots of small files requires a lot of concurrent threads to utilize a 10Gbps connection.

ksec

I wonder how many Racks would there be after the total Cloud Exit from Basecamp. It is a nice way to think about scale.

The whole business that used to be 1/5 of the size 10 years ago were running on 100s of racks. And today it could be less than 10.

mindcrash

David wrote about that earlier. They are running on just about 8 racks in each DC (and they use 2).

That might exclude the new Pure Storage setup, though.

https://world.hey.com/dhh/we-stand-to-save-7m-over-five-year...

ksec

I think the Pure Storage is at best half a rack.

I dont think it is 8 rack per DC, it is 4 rack per DC.

>We currently spend about $60,000/month on eight dedicated racks between our two data centers

And they are only doing 64 Core. We will have 256 Zen 6 core next year. And it seems by next year, if they were willing to pay for density they could have fitted everything inside one rack per DC.

Exciting times.

Edit: Actually if Intel were to push 18A on server it would make performance / density even better.

nijave

>Actually if Intel were to push 18A on server it would make performance / density even better

Don't you end up needing pretty insane electric capacity not to mention cooling for that kind of density?

christina97

I very much appreciate the concrete numbers in this post!

Axsuul

Cloud is expensive but hardware failures are at least handled gracefully. With coloc you'd have some serious downtime. That means you'd need to get to a certain level of redundancy in order to have coloc make sense.

I'd love to move to coloc for my SaaS but it doesn't feel as resilient. Please correct me if I'm wrong as I'd love to move off the cloud.

cullenking

Enterprise server gear is pretty reliable, and you build your infra to be fully redundant. In our setup, no single machine failure will take us offline. I have 13 machines in a rack running a > 10mm ARR business, and haven't had any significant hardware failures. We have had occasional drive failures, but everything is a RAID1 at a minimum so they are a non issue.

We just replaced our top of rack firewall/proxies that were 11 years old and working just fine. We did it for power and reliability concerns, not because there was a problem. App servers get upgraded more often, but that's because of density and performance improvements.

What does cause a service blip fairly regularly is a single upstream ISP. I will have a second ISP into our rack shortly, which means that whole class of short outage will go away. It's really the only weak spot we've observed. That being said, we are in a nice datacenter that is a critical hub in the pacific northwest. I'm sure a budget datacenter will have a different class of reliability problems that I am not familiar with.

But again, an occasional 15m outage is really not a big deal business wise. Unless you are running a banking service or something, no one cares when something happens for 15m. Heck, all my banks regularly have "maintenance" outages that are unpredictable. I promise, no one relaly cares about five nines of reliability in the strong majority of services.

Axsuul

Sounds great. Yep, what I mean is you will need to make your systems fully redundant before considering cloud if your business depends on reliability and uptime. That usually requires the business to reach a certain scale first.

sgarland

Sure, but making something redundant is not really that difficult. HAProxy in front N nodes across M racks, ideally in separate DCs, and then a floating IP in front of your HAProxies. Set up hot standby for your DB.

I used to joke that my homelab almost had better reliability than any company I’d been at, save for my ISP’s spotty availability. Now that I have a failover WAN, it literally is more reliable. In the five years of running a rack, I’ve had precisely one catastrophic hardware failure (mobo died on a Supermicro). Even then, I had a standby node, so it was more of an annoyance (the standby ran hotter and louder) than anything.

floren

Hot spares and remote hands will get you a lot.

And when you get down to it, AWS isn't actually that reliable. I thought EBS volumes had magic redundancy foo but it turns out they can fail and they fail in a less obvious way than a regular disk. AWS networking is constantly bouncing and the virtual network adapters just sometimes stop working. They're also runnung old CPUs.

Depending on your workload you may be able pay off your new hardware with just a couple months' savings.

dilyevsky

This. With AFRs as they are today and warranty options and remote hands it’s hardly as bad as most people seem to think especially if their past recollection of working with colocation is from 20 years ago

Axsuul

Got any recommended providers?

nijave

Not sure cloud is necessarily more resilient--imo it's less resilient. On the other hand, it's fully automated with robust APIs so there's easy tools to mitigate failures like node/machine sets (scale sets, scaling groups, auto scaling groups, whatever the provider calls them)

You could use an orchestration solution to help handle automatic failover. There's a handful of container-based options from heavy duty Kubernetes to Docker Swarm and Nomad.

Containers are nice since you can bypass most of the host management where you only need basic security patching and installation of your container runtime. There's also k8s distros like OpenShift to make k8s setup easier if you go that route.

Axsuul

Yep I use orchestration (Nomad) but still you would need hardware redundancy. For example, the database server is currently a single point of failure. In the cloud, if there's a hardware failure, it will simply go down and come back up with a new instance. In coloc, you'd need to have the data center debug and replace hardware which means extended downtime.

sgarland

You would not need to have extended downtime. Every major RDBMS that I’m aware of supports standby nodes, and if you want, a full active-active cluster (not recommended, personally).

The downtime is as long as you have your health check monitoring interval set up for.

NorwegianDude

When using colocation, nothing is stopping people from storing the database data externally from the server running the database like some cloud services do. But doing so, either in cloud or not, does have a serious downside: greatly increased latency.

calderwoodra

Their current rate is about ~20% cheaper (4k/day), so missing the deadline isn't that big of a deal.

albert_e

Even if they move only part of the data before the deadline ..they stop paying for that the moment they delete it from S3. (Doesn't work that way in reverse.)

They can start saving thousands of dollars even before the deadline if they are able to start moving as soon as their own infra is up and incrementally move and delete data from S3. If their data consumers can work with that, that is.

If part of the data does not need the full S3 guarantees of durability and availability ..they could probably save more by using cheaper tiers while on S3.

otterley

Is Hey not able to take advantage of S3 intelligent tiering? It’s way cheaper than S3 standard for infrequently accessed objects. I wouldn’t be surprised if Hey’s access patterns lend themselves well to it.

(Yes, I’m aware that Hey’s decision to evacuate the cloud is a fait accompli, but I also can’t help but wonder if there are loads of potential savings that are being left out of the discussion.)

HN

It's five grand a day to miss our S3 exit

It's five grand a day to miss our S3 exit