Skip to content(if available)orjump to list(if available)

We cut our Mongo DB costs by 90% by moving to Hetzner

0xbadcafebee

> Here's how we managed to cut our costs by 90%

You could cut your MongoDB costs by 100% by not using it ;)

> without sacrificing performance or reliability.

You're using a single server in a single datacenter. MongoDB Atlas is deployed to VMs on 2-3 AZs. You don't have close to the same reliability. (I'm also curious why their M40 instance costs $1000, when the Pricing Calculator (https://www.mongodb.com/pricing) says M40 is $760/month? Was it the extra storage?)

> We're building Prosopo to be resilient to outages, such as the recent massive AWS outage, so we use many different cloud providers

This means you're going to have multiple outages, AND incur more cross-internet costs. How does going to Hetzner make you more resilient to outages? You have one server in one datacenter. Intelligent, robust design at one provider (like AWS) is way more resilient, and intra-zone transfer is cheaper than going out to the cloud ($0.02/GB vs $0.08/GB). You do not have a centralized or single point of failure design with AWS. They're not dummies; plenty of their services are operated independently per region. But they do expect you to use their infrastructure intelligently to avoid creating a single point of failure. (For example, during the AWS outage, my company was in us-east-1, and we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.)

I get it; these "we cut bare costs by moving away from the cloud" posts are catnip for HN. But they usually don't make sense. There's only a few circumstances where you really have to transfer out a lot of traffic, or need very large storage, where cloud pricing is just too much of a premium. The whole point of using the cloud is to use it as a competitive advantage. Giving yourself an extra role (sysadmin) in addition to your day job (developer, data scientist, etc) and more maintenance tasks (installing, upgrading, patching, troubleshooting, getting on-call, etc) with lower reliability and fewer services, isn't an advantage.

toast0

> Intelligent, robust design at one provider (like AWS) is way more resilient, and intra-zone transfer is cheaper than going out to the cloud ($0.02/GB vs $0.08/GB).

If traffic cost is relevant (which it is for a lot of use cases), Hetzner's price of $1.20/TB ($0.0012 / GB) for internet traffic [1] is an order of magnitude less than what AWS charges between AWS locations in the same metro. If you host only at providers with reasonable bandwidth charges, most likely all of your bandwidth will be billed at less than what AWS charges for inter-zone traffic. That's obscene. As far as I can tell, clouds are balancing their budgets on the back of traffic charges, but nothing else feels under cost either.

> For example, during the AWS outage, my company was in us-east-1, and we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.

This doesn't always work out. During the GCP outage, my service was running fine, but other similar services were having trouble, so we attracted more usage, which we would have scaled up for, except that the GCP outage prevented that. Cloud makes it very expensive to run scaled beyond current needs and promises that scale out will be available to do just in time...

[1] https://docs.hetzner.com/robot/general/traffic/

koakuma-chan

> You could cut your MongoDB costs by 100% by not using it ;)

I cut my Mongo DB costs by 100% by piping my data to /dev/null.

goastler

> you're going to have multiple outages us: 0, aws: 1. Looking good so far ;)

> AND incur more cross-internet costs hetzner have no bandwidth traffic limit (only speed) on the machine, we can go nuts.

I understand you point wrt the cloud, but I spend as much time debugging/building a cloud deployment (atlas :eyes: ) as I do a self-hosted solution. Aws gives you all the tools to build a super reliable data store, but many people just chuck something on us-east-1 and go. There's you single point of failure.

Given we're constructing a many-node decentralised system, self-hosted actually makes more sense for us because we've already had to become familiar enough to create a many-node system for our primary product.

When/if we have a situation where we need high data availability I would strongly consider the cloud, but in the situations where you can deal with a bit of downtime you're massively saving over cloud offerings.

We'll post a 6-month and 1-year follow-up to update the scoreboard above

kdazzle

I’m curious about the resilience bit. Are you planning on some sort of active-active setup with mongo? I found it difficult on AWS to even do active-passive (i guess that was docdb), since programatically changing the primary write node instance was kind of a pain when failing over to a new region

runako

> many people just chuck something on us-east-1 and go

Even dropping something on a single EC2 node in us-east-1 (or at Google Cloud) is going to be more reliable over time than a single dedicated machine elsewhere. This is because they run with a layer that will e.g. live migrate your running apps in case of hardware failures.

The failure modes of dedicated are quite different than those of the modern hyperscaler clouds.

chubot

It's not an apples-to-apples comparison, because EC2 and Google Cloud have ephemeral disk - persistent disk is an add-on, which is implemented with a complex and frequently changing distributed storage system

On the other hand, a Hetzner machine I just rented came with Linux software RAID enabled (md devices in the kernel)

---

I'm not aware of any comparisons, but I'd like to see see some

It's not straightforward, and it's not obvious the cloud is more reliable

The cloud introduces many other single points of failure, by virtue of being more complex

e.g. human administration failure, with the Unisuper incident

https://news.ycombinator.com/item?id=40366867

https://arstechnica.com/gadgets/2024/05/google-cloud-acciden... - “Unprecedented” Google Cloud event wipes out customer account and its backups

Of course, dedicated hardware could have a similar type of failure, but I think the simplicity means there is less variety in the errors.

e.g. A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable - Leslie Lamport

MobileVet

Thanks for sharing the story and committing to a 6-month and 1 year follow up. We will definitely be interested to hear further how it went over time.

In the mean time, I am curious where the time was spent debugging and building Atlas deployments? It certainly isn't the cheapest option, but it has been quite a '1 click' solution for us.

mnutt

> we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.

I think it was just luck of the draw that the failure happened in this way and not some other way. Even if APIs falling over but EC2 instances remaining up is a slightly more likely failure mode, it means you can't run autoscaling, can't depend on spot instances which in an outage you can lose and can't replace.

dspillett

> You're using a single server in a single datacenter.

This is a common problem with “bare metal saved us $000/mo” articles. Bare metal is cheaper than cloud by any measure, but the comparisons given tend to be misleadingly exaggerated as they don't compare like-for-like in terms of redundancy and support, and after considering those factors it can be a much closer result (sometimes down as far as familiarity and personal preference being more significant).

Of course unless you are paying extra for multi-region redundancy things like the recent us-east-1 outage will kill you, and that single point of failure might not really matter if there are several others throughout your systems anyway, as is sometimes the case.

PenguinCoder

Premature optimization. Not every single service needs or require 5 nines.

Glamklo

[dead]

mads_quist

OK guys, running on a single instance is REALLY a BAD IDEA for non-pet-projects. Really bad! Change it as fast as you can.

I love Hetzner for what they offer but you will run into huge outages pretty soon. At least you need two different network zones on Hetzner and three servers.

It's not hard to setup, but you need to do it.

MaKey

I think you're being overly dramatic. In practice I've seen complexity (which HA setups often introduce) causing downtimes far more often than a service being hosted only on a single instance.

lewiscollard

Yes, any time someone says "I'm going to make a thing more reliable by adding more things to it" I either want to buy them a copy of Normal Accidents or hit them over the head with mine.

antoniojtorres

agree on single instance, but for hetzner, I run 100+ large bare metal servers in hetzner, have for at least 5 years and there’s only been one significant outage they had, we do spread across all their datacenter zones and replicate, so it’s all been manageable. It’s worth it for us, very worth it.

tracker1

As much as I like MongoDB as a developer, the last thing I ever want to do is manage a deployment again.

I feel like some of these articles miss a few points, even in this one. The monthly cost of the MongoDB hosting was around $2k... that's less than a FT employee salary, and if it can spare you the cost of an employee, that's not a bad thing.

On the flip side, if you have employee talent that is already orchestrating Kubernetes across multiple clouds, then sure it makes sense to internalize services that would otherwise be external if it doesn't add too much work/overhead to your team(s).

In either case, I don't think the primary driver in this is cost at all. Because that 90% quoted reduction in hosting costs is balanced by the ongoing salary of the person or people who maintain those systems.

rmoriz

I‘m a big fan of owning the stack but why not spend the money on redundancy? At least a couple of machines in a different data center at Hetzner or another provider (OVH, Scaleway, Vultr, …) can easily fit your budget.

arbol

We will be adding additional db servers and running our own replica set eventually. We're just not there yet. Thanks for reading!

hinkley

But then you’ll be tripling your costs.

Business people are weird about numbers. You should have claimed 70% even if the replicas do nothing and made them work later on. This is highly likely to bite you on the ass.

PeterZaitsev

Note, if you're looking for MongoDB Enterprise features you can find many of them with Percona Server for MongoDB, which you can use for free the same way as MongoDB Community

arbol

Nice, thanks for the tip!

kachapopopow

Always consider if 12 hours of lost revenue is worth the savings. Recently hetzner has been flakey with minimum or no response for support or even status updates that anything was wrong. My favorite was them blaming an issue on my side just to have a maintenance status update the day after about congestion.

arbol

Atlas wasn't giving us any support for $3K per month. Hetzner at least have some channel to contact them, which is an improvement. That said, if their uptime is rubbish them we'll probably migrate again. Moving back to Atlas is not an option as we were getting hammered by the data transfer costs and this was only going to increase due to our architecture. Thanks for reading!

kosherhurricane

500GB isn't a lot of data, and $3K/month seems like an extortion for that little data.

Having said that, MongoDB pricing page promises 99.995% uptime, which is outstanding, and would probably be hard to beat that doing it oneself, even after adding redundancy. But maybe you don't need that much uptime for your particular use case.

arbol

Its more like 700GB now on the new server and we were about to have to migrate to a higher tier on Atlas.

> maybe you don't need that much uptime for your particular use case.

Correct. Thanks for reading!

zamalek

OVH is allegedly pretty good. I host all my personal stuff on Hetzner right now so I can't speak to it personally.

arbol

We also use OVH and have so far not had any downtime in about 6 months.

izacus

My Hetzner instances all have higher reliability and uptime than AWS deployments. For years now.

That was an interesting surprise.

0x073

Using hetzner since 5 years never had issues and only 1 downtime in one data center.

zkmon

Atlas is plain robbery. I see companies paying 600K USD/month on a few clusters, mostly used for testing. The problem is they got locked into this, by doing a huge migration of their apps and switching to a different tech would easily take 2 to 5 years.

CodesInChaos

MongoDB Atlas is so overpriced that you can probably save already 90% by moving to AWS.

computerfan494

Most of the cost in their bill wasn't from MongoDB, it was cost passed on from AWS

darth_avocado

Highly doubt that. MongoDB has 5000 well paid employees and is not a big loss making enterprise. If most of the cost was pass through to AWS, they’d not be able to do that. Their quarterly revenue is $500M+ but also spend $200M in sales and marketing and $180M in R&D. (All based on their filings)

CodesInChaos

Was it? Assuming an M40 cluster consists of 3 m6g.xlarge machines, that's $0.46/hr on-demand compared to Atlas's $1.04/hr for the compute. Savings plans or reserved instances reduce that cost further.

CodesInChaos

I don't remember the numbers (90% is probably a bit exaggerated) but our savings of going from Atlas to MongoDB Community on EC2 several years ago were big.

In addition to direct costs, Atlas had also expensive limitations. For example we often spin up clone databases from a snapshot which have lower performance and no durability requirements, so a smaller non-replicated server suffices, but Atlas required those to be sized like the replicated high performance production cluster.

lunias

Just host on a server in your basement. Put another instance in someone else's basement. I'm only half joking - track the downtime.

ianberdin

I’m starting to worry about this Hetzner trend. It can end up to get the price skyrocketing.

goastler

There's other providers (OVH, etc) so I'm sure the price will remain competitive

arbol

Hopefully not. Their console is pretty bad so I reckon that will put a lot of people off.

the_duke

The cloud console is pretty good though? Even does live sync!

The old one for dedicated servers (robot) is horribly outdated though.

arbol

Ah right, we're on robot so I've not seen the cloud one. Robot is old! :)

patrickmcnamara

The new console is completely fine.

dehrmann

EC2 is sort of a ceiling price.

righthand

We’re just going to end up with everyone moving from Amazon to Hetzner and the same issue will remain. High prices, lockin, etc will appear.

We need an American “get off American big tech” movement.

Differentiate people! Reading “we moved from X to Y” does not mean everyone move from X to Y, it means start considering the Y values and research other Y’s around you.

arbol

We also use OVH, Contabo, Hostwinds... Architect so you can be multi-provider and reduce internet centralisation!

righthand

Nice, if you write an article about it, try to leave the focus off of a single hosting provider. Encouraging the differentiation is important too (next time! I’m not dogging the movement or your efforts in this article, I love to see reduced reliance of Amazon in general).

cmrdporcupine

> We need an American “get off American big tech” movement.

As a non-American, I use Hetzner precisely to have my projects not hosted anywhere near the US.

zzzeek

Hetzner is German?

Lapel2742

> Hetzner is German?

Yes. Hetzner is a German company from Gunzenhausen.

https://en.wikipedia.org/wiki/Hetzner

k4rnaj1k

[dead]

tuhgdetzhh

I recently did a total cost of ownership analysis for moving off AWS to Hetzner: https://beuke.org/hetzner-aws/

euph0ria

You probably want to store the backup somewhere else, ie. not Hetzner.

They are known to just cancel accounts and cut access.

sdoering

Any proof of that? I am a Hetzner customer and had never heard of this before. Would be good to know what I got into.

ch2026

A few years back I launched an io game and used hetzner as my backend. an hour into launch day they null routed my account because their anti-abuse system thought my sudden surge in websocket connections was an attack (unclear if they thought it was inbound or outbound doing the attacking).

I had paid for advertising on a few game curation sites plus youtubers and streamers. Lovely failure all thanks to Hetzner. Took 3 days and numerous emails with the most arrogant Germans you’ve ever met before my account was unlocked.

I switched to OVH and while they’re not without their own faults (reliability is a big one), it’s been a far better experience.

__turbobrew__

OVH also null routes, it has happened to me.

It seems like you have to go to one of the big boys like hurricane electric where you are allowed to use the bandwidth you paid for without someone sticking their fingers in it.

arcanemachiner

There are a lot of such stories if you go digging around HN and reddit threads. Haven't seen a lot of these stories in a while, so it may be happening less now.

null

[deleted]

arbol

Good shout. I think we'll also run replicas on other providers. We've got some complex geo-fencing stuff to do with regards to data hence why we're just on Hetzner right now.