Why is everything so scalable?
131 comments
·October 9, 2025BirAdam
sgarland
THANK YOU. People look at me like I’m insane when I tell them that their overly-complicated pipeline could be easily handled by a couple of beefy servers. Or at best, they’ll argue that “this way, they don’t have to manage infrastructure.” Except you do - you absolutely do. It’s just been partially abstracted away, and some parts like OS maintenance are handled (not that that was ever the difficult part of managing servers), but you absolutely need to configure and monitor your specific XaaS you’re renting.
macNchz
Working on various teams operating on infrastructure that ranged from a rack in the back of the office, a few beefy servers in a colo, a fleet of Chef-managed VMs, GKE, ECS, and various PaaSes, what I've liked the most about the cloud and containerized workflows is that they wind up being a forcing function for reproducibility, at least to a degree.
While it's absolutely 100% possible to have a "big beefy server architecture" that's reasonably portable, reproducible, and documented, it takes discipline and policy to avoid the "there's a small issue preventing {something important}, I can fix it over SSH with this one-liner and totally document it/add it to the config management tooling later once we've finished with {something else important}" pattern, and once people have been doing that for a while it's a total nightmare to unwind down the line.
Sometimes I want to smash my face into my monitor the 37th time I push an update to some CI code and wait 5 minutes for it to error out, wishing I could just make that band-aid fix, but at the end of the day I can't forget to write down what I did, since it's in my Dockerfile or deploy.yaml or entrypoint.sh or Terraform or whatever.
gaoshan
Anyone that says, "they don’t have to manage infrastructure" I would invite them to deal with a multi-environment terraform setup and tell me again that about what they don't have to manage.
BobbyTables2
Have always felt the same.
I’ve seen an entire company proudly proclaim a modern multicore Xeon with 32GB RAM can do basic monitoring tasks that should have been possible with little more than an Arduino.
Except the 32GB Xeon was far too slow for their implementation...
hrimfaxi
Depending on your regulatory environment, it can be cost-effective to not have to maintain your own data center with 24/7 security response, environmental monitoring, fire suppression systems, etc. (of course, the majority of businesses are probably not interested in things like SOC 2)
wongarsu
This argument comes up a lot, but it feels a bit silly to me. If you want a beefy server you start out with renting one. $150/month will give you a server with 24 core Xeon and 256GB of RAM, in a data center with everything you mentined plus a 24/7 hands-on technician you can book. Preferably rent two servers, because reliablity. Once you outgrow renting servers you start renting rack space in a certified data center with all the same amenities. Once you outgrow that you start renting entire racks, then rows of racks or small rooms inside the DC. Then you start renting portions of the DC. Once you have outgrown that you have to seriously worry about maintaining your own data center. But at that point you have so much scale that this will be the least of your worries
skydhash
I look at what I can do with an old mac mini (2011) and it’s quite good. I think the only issue with hardware is technical maintenance, but at the scale of a small companies, that would probably be having a support contract with Dell and co.
huflungdung
What I say is that we massively underestimate just how fast computers are these days
ahartmetz
Indeed - they are incredibly fast, it's just buried under layers upon layers of stuff
donatj
Exactly this! The educational product I work on is used by hundreds of thousands of students a day, and the secret to our success is how simple our architecture is. PHP monoliths + Cache (Redis/Memcached) scale super wide basically for free. We don't really think about scalability, it just happens.
I have a friend whose startup had a super complicated architecture that was falling apart at 20 requests per second. I used to be his boss a lifetime ago and he brought me in for a meeting with his team to talk about it. I was just there flabbergasted at "Why is any of this so complicated?!" It was hundreds of microservices, many of them black boxes they'd paid for but had no access to the source. Your app is essentially an async chat app, a fancy forum. It could have been a simple CRUD app.
I basically told my friend I couldn't help, if I can't get to the source of the problematic nodes. They'll need to talk to the vendor. I explained that I'd probably rewrite it from the ground up. They ran out of runway and shut down. He's an AI influencer now...
dig1
> The general setup for the largest players in that space was haproxy in front of nginx in front of several PHP servers in front of a MySQL database that had one primary r/w with one read only replica.
You'd be surprised that the most stable setups today are run this way. The problem is that this way it's hard to attract investors; they'll assume you are running on old or outdated tech. Everything should be serverless, agentic and, at least on paper, hyperscalable, because that sells further.
> Today at AWS, it is easily possible for people to spend a multiple of the cost of that hardware setup every month for far less compute power and storage.
That is actually the goal of hyperscalers: they are charging you premium for way inferior results. Also, the article stated a very cold truth: "every engineer wants a fashionable CV that will help her get the next job" and you won't definitely get a job if you said: "I moved everything from AWS and put it behind haproxy on one bare-metal box for $100/mo infra bill".
ahoka
Are those over engineered systems even actually scalable? I know teams who designed a CQRS architecture using messages queues and a distributed NoSQL database and fail to sustain 10req/s for a read in something that is basically a CRUD application. Heck once someone literally said "But we use Kafka, why aren't we fast?!".
arealaccount
Exactly this, every time I see kafka or similar its a web of 10M microprocesses that take more time in invocation alone than if you just ran the program in one go.
_kb
How very kafkaesque.
sgarland
I watched in amusement as the architecture team at $JOB eagerly did a PoC of a distributed RDBMS, only to eventually conclude that the latency was too high. Gee… if only someone had told you that would happen when you mentioned the idea. Oh wait.
CableNinja
I thought i knew about scaled deployments before i started working where i do now. After staring here, i realized i had no idea what an environment of huuuuge scale actually was. Id been part of multi site deployments and scaled infra, but it was basically potatoes comparatively. We have a team whose platform we, on IT, call the DoS'er of the company. Its responsible for processing hundreds of thousands of test runs a day, and data is fed to a plethora of services after. The scale is so large that they are able to take down critical services, or deeply impact them, purely due to throughput, if a developer goes too far (like say uploading a million small logs to an s3 bucket every minute).
We also have been contacted by AWS having them ask us what the hell we are doing, for a specific set of operations. We do a huge prep for some operations, and the prep feeds massive amounts of data through some AWS services, so much so, they thought we were under attack or had been compromised. Nope, just doin data ingestion!
aeyes
The architecture you describe is ok because in the end it is a fairly simple website. Little user interaction, limited amount of content (at most a few million records), few content changes per day. The most complex part is probably to have some kind of search engine but even with 10 million videos an ElasticSearch index is probably no larger than 1GB.
The only problem is that there is a lot of video data.
ben_w
This is probably also true for 98% of startups.
I think most people don't realise that "10 million" records is small, for a computer.
(That said, I have had to deal with code that included an O(n^2) de-duplication where the test data had n ~= 20,000, causing app startup to take 20 minutes; the other developer insisted there was no possible way to speed this up, later that day I found the problem, asked the CTO if there was a business reason for that de-duplication, removed the de-duplication, and the following morning's stand-up was "you know that 20 minute startup you said couldn't possibly be sped up? Yeah, well, I sped it up and now it takes 200ms")
phkahler
I thought you were going to say to reduced O(n^2) to O(n*log(n)), but you just deleted the operation. Normally I'd say that's great, but just how much duplicate data is being left around now? Is that OK?
bobdvb
In streaming your website is typically totally divorced from your media serving. Media serving is just a question of cloud storage and pointing at an hls/dash manifest in that object store. Once it starts playing the website itself does almost nothing. Live streaming adds more complexity but it's still not much of a website problem.
Maintaining the media lifecycle, receiving, transcoding, making it available and removing it, is the big task but that's not real-time, it's batch/event processing at best efforts.
The biggest challenges with streaming are maintaining the content catalogue, which aren't just a few million records but rich metadata about the lifecycle and content relationships. Then user management and payments tends to also have a significant overhead, especially when you're talking about international payment processing.
gf000
As opposed to what problem?
Like I can honestly have trouble listing too many business problems/areas that would fail to scale with their expected user count, given reasonable hardware and technical competence.
Like YouTube and Facebook are absolute outliers. Famously, stackoverflow used to run on a single beefy machine (and the reason they changed their architecture was not due to scaling issues), and "your" startup ain't needing more scale than SO.
yobbo
Many startup business models have no chance of becoming profitable unless they reach a certain scale, but they might have less than 1% probability of reaching that scale. Making it scalable is easy work since it is deterministic, but growing customers is not.
Another perspective is that the defacto purpose of startups (and projects at random companies) may actually be work experience and rehearsal for the day the founders and friends get to interview at an actual FAANG.
I think the author's “dress for the job you want, not the job you have” nails it.
stavros
Unfortunately, you can't really get experience from solving hypothetical problems. The actual problems you'll encounter are different, and while you can get experience in a particular "scalable" stack, it won't be worth its maintenance cost for a company that doesn't need it.
nicoburns
I guess the work is deterministic, but it often (unintentionally) makes the systems being developed non-deterministic!
potatolicious
Ah yes. I once worked at a startup that insisted on Mongo despite not having anywhere near the data volume for it to make any sense at all. Like, we're talking 5 orders of magnitude off of what one would reasonably expect to need a Mongo deployment.
I was but a baby engineer then, and the leads would not countenance anything as pedestrian as MySQL/Postgres.
Anyway, fast forward a bit and we were tasked with building an in-house messaging service. And at that point Mongo's eventual consistency became a roaring problem. Users would get notifications that they had a new message, and then when they tried to read it it was... well... not yet consistent.
We ended up implementing all kinds of ugly UX hacks to work around this, but really we could've run the entire thing off of sqlite on a single box and users would've been able to read messages instantaneously, so...
nicoburns
I've seen similar with Firebase. Luckily I took over as tech lead at this company, so I was able to migrate us to Postgres. Amusingly, as well as being more reliable, the Postgres version (on a single small database instance) was also much faster than the previous Firebase-based version (due to it enabling JOINs in the database rather than in application code).
ahartmetz
>“dress for the job you want, not the job you have”
I don't think I should dress down any further :>
CaptainOfCoit
I've seen startups killed because of one or two "influential" programmers deciding they need to start architecturing the project for 1000TPS and 10K daily users, as "that's the proper way to build scalable software", while the project itself hasn't even found product-market fit yet and barely has users. Inevitably, the project needs to make a drastic change which now is so painful to do because it no longer fits the perfect vision the lead(s) had.
Cue programmers blaming the product team for "always changing their mind" as they discover what users actually need, and the product team blaming developers for being hesitant to do changes, and when programmers agree, it takes a long time to undo the perfect architecture they've spent weeks fine-tuning against some imaginary future user-base.
stavros
> 1000TPS and 10K daily users
I absolutely agree with your point, but I want to point out, like other commenters here, that the numbers should be much larger. We think that, because 10k daily users is a big deal for a product, they're also a big deal for a small server, but they really aren't.
It's fantastic that our servers nowadays can easily handle multiple tens of thousands of daily users on $100/mo.
smoe
In my opinion, if those influential programmers actually architected around some concrete metrics like 1,000 TPS and 10K daily users, they would end up with much simpler systems.
The problem I see is much more about extremely vague notions of scalability, trends, best practices, clean code, and so on. For example we need Kafka, because Kafka is for the big boys like us. Not because the alternatives couldn’t handle the actual numbers.
CV-driven development is a much bigger issue than people picking overly ambitious target numbers.
strken
I've seen senior engineers get fired and the business suffer a setback because they didn't have any way to scale beyond a single low spec VPS from a budget provider, and their system crashed when a hall full of students tried to sign up together during a demo and each triggered 200ms of bcrypt CPU activity.
nasmorn
This seems weird. I have a lot of experience with rails which is considered super slow. But the scenario you describe is trivial. Just get a bigger VPS and change a single env var. even if you fucked up everything else like file storage etc you can still to that. If you build your whole application in way where you can’t scale anything you should be fired. That is not even that easy
hedora
People screw up the bcrypt thing all the time. Pick a single threaded server stack (and run on one core, because Kubernetes), then configure bcrypt so brute forcing 8 character passwords is slow on an A100. Configure kubernetes to run on a medium range CPU because you have no load. Finally, leave your cloud provider's HTTP proxy's timeout set to default.
The result is 100% of auth requests timeout once the login queue depth gets above a hundred or so. At that point, the users retry their login attempts, so you need to scale out fast. If you haven't tested scale out, then it's time to implement a bcrypt thread pool, or reimplement your application.
But at least the architecture I described "scales".
strken
Of course you should be fired for doing that! I meant the example as an illustration of how "you don't need to scale" thinking turns into A-grade bullshit.
You do, in fact, need to scale to trivial numbers of users. You may even need to scale to a small number of users in the near future.
sgarland
That’s a skill issue, not an indictment on the limitations of the architecture. You can spin up N servers and load-balance them, as TFA points out. If the server is a snowflake and has nothing in IaC, again, not an architectural issue, but a personnel / knowledge issue.
CaptainOfCoit
Wonder which one happens more often? Personally I haven't worked in that kind of "find the person to blame" culture which would led to something like that, so I haven't witnessed what you're talking about, but I believe you it does happen in some places.
ipsento606
> they didn't have any way to scale beyond a single low spec VPS from a budget provider
they couldn't redeploy to a high-spec VPS instead?
kunley
I frankly don't believe that in a workplace where an userbase can be characterized as a "hall full of students" anyone was fired overnight. Doesn't happen at these places. Reprimanded, maybe
hedora
More frequently, anyone that sounded the alarm about this was let go months ago, so the one that'd be fired is the one in charge of the firing.
Instead, they celebrate "learning from running at scale" or some nonsense.
the8472
1000TPS isn't that much? Engineer for low latency and with a 10ms budget that'd be 10 cores if it were CPU-bound, less in practice since usually part of the time is spent in IO wait.
hedora
I doubt much time would be in I/O wait if this was really a scale up architecture. Ignoring the 100's of GB of page cache, it should be sitting on NVMe drives, where a write is just a PCIe round trip, and a read is < 1ms.
CaptainOfCoit
> 1000TPS isn't that much?
Why does that matter? My argument is: Engineer for what you know, leave the rest for when you know better, which isn't before you have lots of users.
the8472
What I'm saying is that "building for 1000TPS" is not what gets you an overengineered 5-layer microservice architecture. If you build for a good user experience (which includes low latency) you get that not-that-big scale without sharding.
drob518
And with CPUs now being shipped with 100+ cores, you can brute force that sucker a long way.
systems
Clearly this project failed for either
1. scaling for a very specific use case, or because
2. it hasn't even found product-market fit
Blaming the failure or designing for scale seem misplaced, you can scale while remaining agile and open to changeotabdeveloper4
> 1000TPS and 10K daily users
That is not a lot. You can host that on a Raspberry Pi.
jwr
I don't get this scalability craze either. Computers are stupid fast these days and unless you are doing something silly, it's difficult to run into CPU speed limitations.
I've been running a SaaS for 10 years now. Initially on a single server, after a couple of years moved to a distributed database (RethinkDB) and a 3-server setup, not for "scalability" but to get redundancy and prevent data loss. Haven't felt a need for more servers yet. No microservices, no Kubernetes, no AWS, just plain bare-metal servers managed through ansible.
I guess things look different if you're using somebody else's money.
ben_w
> unless you are doing something silly, it's difficult to run into CPU speed limitations.
Yes, but it's not difficult to do something silly without even noticing until too late. Implicitly (and unintentionally) calling something with the wrong big-O, for example.
That said, anyone know what's up with the slow deletion of Safari history? Clearly O(n), but as shown in this blog post still only deleted at a rate of 22 items in 10 seconds: https://benwheatley.github.io/blog/2025/06/19-15.56.44.html
phkahler
>> Yes, but it's not difficult to do something silly without even noticing until too late. Implicitly (and unintentionally) calling something with the wrong big-O, for example.
On a non-scalable system you're going to notice that big-O problem and correct it quickly. On a scalable system you're not going to notice it until you get your AWS bill.
hedora
Also, instead of having a small team of people to fight scalable infrastructure configuration, you could put 1-2 full time engineers on performance engineering. They'd find big-O and constant factor problems way before they mattered in production.
Of course, those people's weekly status reports would always be "we spent all week tracking down a dumb mistake, wrote one line of code and solved a scaling problem we'd hit at 100x our current scale".
That's equivalent to waving a "fire me" flag at the bean counters and any borderline engineering managers.
drob518
One of the silliest things you can do to cripple your performance is build something that is artificially over distributed, injecting lots of network delays between components, all of which have to be transited to fulfill a single user request. Monoliths are fast. Yes, sometimes you absolutely have to break something into a standalone service, but that’s rare.
floating-io
For how many users, and at what transaction rate?
Not disagreeing that you can do a lot on a lot less than in the old days, but your story would be much more impactful with that information. :)
crazygringo
Scalability isn't just about CPU.
It's just as much about storage and IO and memory and bandwidth.
Different types of sites have completely different resource profiles.
sreekanth850
Microservice is not a solution for scalability. There are multiple options for building scalable software, even a monolith or a modular monolith with proper loadbalanced setup will drastically reduce the complexity of microservice and get massive scale. Only bottleneck will be db.
jcarrano
I think part of the problem is (some) programmers being unable to draw clear encapsulation boundaries when writing a monolith. I'm not even referring to imposing a discipline for a whole team, but the ability to design a clean internal API and stick to it oneself.
radarsat1
> scalability needs a whole bunch of complexity
I am not sure this is true. Complexity is a function of architecture. Scalability can be achieved by abstraction, it doesn't necessarily imply highly coupled architecture, in fact scalability benefits from decoupling as much as possible, which effectively reduces complexity.
If you have a simple job to do that fits in an AWS Lambda, why not deploy it that way, scalability is essentially free. But the real advantage is that by writing it as a Lambda you are forced to think of it in stateless terms. On the other hand if suddenly it needs to coordinate with 50 other Lambdas or services, then you have complexity -- usually scalability will suffer in this case, as things become more and more synchronous and interdependent.
> The monolith is composed of separate modules (modules which all run together in the same process).
It's of course great to have a modular architecture, but whether or not they run in the same process should be an implementation detail. Barriers should be explicit. By writing it all depending on local, synchronous, same-process logic, you are likely building in all sorts of implicit barriers that will become hidden dangers when suddenly you do need to scale. And by the way that's one of the reasons we think about scaling in advance, is that when the need comes, it comes quickly.
It's not that you should scale early. But if you're designing a system architecture, I think it's better to think about scaling, not because you need it, but because doing so forces you to modularize, decouple, and make synchronization barriers explicit. If done correctly, this will lead to a better, more robust system even when it's small.
Just like premature optimization -- it's better not to get caught up doing it too early, but you still want to design your system so that you'll be able to do it later when needed, because that time will come, and the opportunity to start over is not going to come as easily as you might imagine.
saidinesh5
> If you have a simple job to do that fits in an AWS Lambda, why not deploy it that way, scalability is essentially free. But the real advantage is that by writing it as a Lambda you are forced to think of it in stateless terms.
What you are describing is already the example of premature optimization. The moment you are thinking of a job in terms of "fits in an AWS Lambda" you are automatically stuck with "Use S3 to store the results" and "use a queue to manage the jobs" decisions.
You don't even know if that job is the bottleneck that needs to scale. For all you know, writing a simple monolithic script to deploy onto a VM/server would be a lot simpler deployment. Just use the ram/filesystem as the cache. Write the results to the filesystem/database. When the time comes to scale you know exactly which parts of your monolith are the bottleneck that need to be split. For all you know - you can simply replicate your monolith, shard the inputs and the scaling is already done. Or just use the DB's replication functionality.
To put things into perspective, even a cheap raspberry pi/entry level cloud VM gives you thousands of postgres queries per second. Most startups I worked at NEVER hit that number. Yet their deployment stories started off with "let's use lambdas, s3, etc..". That's just added complexity. And a lot of bills - if it weren't for the "free cloud credits".
bpicolo
> The moment you are thinking of a job in terms of "fits in an AWS Lambda" you are automatically stuck with "Use S3 to store the results" and "use a queue to manage the jobs" decisions.
I think the most important one you get is that inputs/outputs must always be < 6mb in size. It makes sense as a limitation for Lambda's scalability, but you will definitely dread it the moment a 6.1mb use case makes sense for your application.
CaptainOfCoit
> It's of course great to have a modular architecture, but whether or not they run in the same process should be an implementation detail
It should be, but I think "microservices" somehow screwed up that. Many developers think "modular architecture == separate services communicating via HTTP/network that can be swapped", failing to realize you can do exactly what you're talking about. It doesn't really matter what the barrier is, as long as it's clear, and more often than not, network seems to be the default barrier when it doesn't have to be.
dapperdrake
The complexity that makes money is all the essential complexity of the problem domain. The "complexity in the architecture" can only add to that (and often does).
This is the part that is about math as a language for patterns as well as research for finding counter-examples. It’s not an engineering problem yet.
Once you have product market fit, then it becomes and engineering problem.
abujazar
I've seen my share of insanely over-engineered Azure locked-in applications that could easily have been run on an open source stack on a $20 VM.
alpine01
There's a now famous Harvard lecture video on YouTube of Zuckerberg earlier in the Facebook days, where he walks through the issues they hit early on.
https://www.youtube.com/watch?v=xFFs9UgOAlE
I watched it ages ago, but I seem to remember one thing that I liked was that each time they changed the architecture, it was to solve a problem they had, or were beginning to have. They seemed to be staying away from pre-optimization and instead took the approach of tackling problems as they had as they appeared, rather than imagining problems long before/if they occurred.
It's a bit like the "perfect is the enemy of done" concept - you could spend 2-3x the time making it much more scalable, but that might have an opportunity cost which weakens you somewhere else or makes it harder/more expensive to maintain and support.
Take it with a pinch of salt, but I thought it seemed like quite a good level-headed approach to choosing how to spend time/money early on, when there's a lot of financial/time constraints.
acron0
Ugh, there is just something so satisfying about developer cynicism. It gives me that warm, fuzzy feeling.
I basically agree with most of what the author is saying here, and I think that my feeling is that most developers are at least aware that they should resist technical self-pleasure in pursuit of making sure the business/product they're attached to is actually performing. Are there really people out there who still reach for Meta-scale by default? Who start with microservices?
lpapez
> Are there really people out there who still reach for Meta-scale by default? Who start with microservices?
Anecdotally, the last three greenfield projects I was a part of, the Architects (distinct people in every case) began the project along the lines of "let us define the microservices to handle our domains".
Every one of those projects failed, in my opinion not primarily owing to bad technical decisions - but they surely didn't help either by making things harder to pivot, extend and change.
Clean Code ruined a generation of engineers IMO.
robertlagrant
I think this sounds more like Domain Driven Design than Clean Code.
ahoka
It kinda started with Clean Code. I remember some old colleagues walking around with the book in their hand and deleting ten year old comments in every commit they made: "You see, we don't need that anymore, because the code describes itself". It made a generation (generations?) of software developers think that all the architectural patterns were found now, we can finally do real engineering and just have to find the one that fits for the problem at hand! Everyone asked the SOLID principles during interviews, because that's how real engineers design! I think "cargo cult" was getting used at that time too to describe this phenomenon.
drob518
> The first problem every startup solves is scalability. The first problem every startup should solve is “how do we have enough money to not go bust in two months”, but that’s a hard problem, whereas scalability is trivially solvable by reading a few engineering blogs, and anyway it’s not like anyone will ever call you out on it, since you’ll go bust in two months.
I laughed. I cried. Having a back full of microservices scars, I can attest that everything said here is true. Just build an effin monolith and get it done.
treve
A bit of an alternative take on this, but I talk to a lot of folks at small start-ups (in Toronto, if that matters), but it seems like most people actually get this right and understand not to bring in complexity until later. Things like microservices seems like they are mostly understood as a tool that's not really meant to solve a real scalibility problem and is massive liability early on.
The exceptions are usually just inexperienced people at the helm. My feeling is, hire someone with adequate experience and this is likely not an issue.
I do think architecture astronauts tend to talk a lot more about their houses of cards, which makes it seem like these set ups are more popular than they are.
smokel
ThoughtWorks gathers this phenomenon under the term "envy": Web Scale envy [1] or Big Data envy [2] are two relevant blips on their technology radar. It is typically better to keep things simple.
[1] https://www.thoughtworks.com/radar/techniques/high-performan...
[2] https://www.thoughtworks.com/radar/techniques/big-data-envy
Just to be honest for a bit here... we also should be asking what kind of scale?
Quite a while ago, before containers were a thing at all, I did systems for some very large porn companies. They were doing streaming video at scale before most, and the only other people working on video at that scale were Youtube.
The general setup for the largest players in that space was haproxy in front of nginx in front of several PHP servers in front of a MySQL database that had one primary r/w with one read only replica. Storage (at that time) was usually done with glusterfs. This was scalable enough at the time for hundreds of thousands of concurrent users, though the video quality was quite a bit lower than what people expect today.
Today at AWS, it is easily possible for people to spend a multiple of the cost of that hardware setup every month for far less compute power and storage.