Skip to content(if available)orjump to list(if available)

Leaving serverless led to performance improvement and a simplified architecture

jjice

Like the article says, I think serverless has it's place, but I don't think it's for most applications. I can't see myself _ever_ using serverless services as a core part of my application for pretty much any startup, if I can avoid it. The infrastructure overhead is actually worse, IMO.

Everything is so platform specific and it's much stranger to test and develop against locally. Each platform has a different way to test, and the abstraction layers that exist (unless this has changed recently) always had pitfalls, since there are no true standards.

I'd much rather have a docker image as my deliverable "executable" so I can test, but still abstract away some stuff, like environment setup. Giving me a minimal Linux environment and filesystem feels like the most comfortable level of abstraction for me to develop well and also deploy and run in production effectively. I can also run that on demand or (most commonly) run that as a server that sits and waits for requests.

dprotaso

That’s why Knative (Serverless on Kubernetes) accepts containers. It’s the standard packaging format that lets you lift and shift apps to many different platforms.

wredcoll

Let me tell you about all the fun I'm having trying to execute my amazon lambda app locally so I can test before deploying...

voodooEntity

As someone who worked with serverless for multiple years (mostly amazon lambda but others too) i can absolutly apporove the authors points.

While it "takes away" some work from you, it adds this work on other points to solve the "artificial induced problems".

Another example i hit was a hard upload limit. Ported an application to a serverless variant, had an import API for huge customer exports. Shouldnt be a problem right? Just setup an ingest endpoint and some background workers to process the data.

Tho than i learned : i cant upload more than 100mb at a time through the "api gateway" (basically their proxy to invoke your code) and when asking if i could change it somehow i just was told to tell our customers to upload smaller file chunks.

While from a "technical" perspective this sounds logical, our customers not gonne start exchanging all their software so we get a "nicer upload strategy".

For me this is comparable with "it works in a vacuum" type of things. Its cool in theory, but as soon it hits reality you will realice quite fast that the time and money you safed on changing from permanent running machines to serverless, you will spend in other ways to solve the serverless specialities.

akdev1l

The way to work around this issue is to provide a presigned S3 url

Have the users upload to s3 directly and then they can either POST you what they uploaded or you can find some other means of correlating the input (eg: files in s3 are prefixed with the request id or something)

I agree this is annoying and maybe I’ve been in AWS ecosystem for too long.

However having an API that accepts an unbounded amount of data is a good recipe for DoS attacks, I suppose the 100MB is outdated as internet has gotten faster but eventually we do need some limit

voodooEntity

Well i partly agree, and if i would be the one building the counterpart, i prolly had used presigned s3 urls also.

In this specific case im getting oldschool file upload request from software that was partly written before the 2000s - noones gonne adjust anything any more.

And ye, just accepting giant size uploads is far from good in terms of "Security" like DoS - but ye we talking about stupidly somewhere between 100 and 300mb CSV files (called them "huge" because in terms of product data 200-300mb text include quite alot) - not great but well we try to satisfy our customers needs.

But ye like all the other points - everything is solvable somehow - just needs us to spend more time to solve something that technickly wasn't a real problem in first place.

Edit: Another funny example. In a similar process on another provider i downloaded files in a similar size range from S3 to parse them - which died again and again. After contacting the hoster, because their logs litearlly just stopped no error tracing nothing) they told me that basically their setup only allows for 10mb local storing - and the default (in this case aws s3 adapter for PHP) always downloads it even if you tell it to "stream". So i build a solution that used HTTP ranged requests to "fake stream" the file into memory in smaller chunks so i could process it afterwards without completely download it. Just another example of : yes its solvable, but annoying.

kunley

For a best price-to-performance ratio create your instances and do whatever is needed on them. Software stacks are not that complicated to delegate everything to the Wizards of Cloud Overcharging.

gloomyday

I think developers are drowning in tools to make things "easy", when in truth many problems are already easy with the most basic stuff in our tool belt (a compiler, some bash scripts, and some libraries). You can always build up from there.

This tooling fetish hurts both companies and developers.

827a

Yeah; IMO Docker was our last universal improvement to productivity, in 2013, and very little we've invented since then can be said to have had such a wide-ranging positive impact, with such few drawbacks. Some systems are helpful for some companies, but then try to get applied to other companies where they don't make sense and things fall apart or productivity suffers. Cloudflare and others are trying to make v8 isolates a thing, and while they are awesome for some workloads, people want them to be the "next docker", and they aren't.

The model "give me docker image, we put it on internet" is staggeringly powerful. It'll probably still be the most OP way to host applications in 2040.

hadlock

Docker + IaC* for me; git ops, immutable servers, immutable code, immutable config, (nearly) immutable infrastructure means I haven't had to drop to the command line on a server since 2015. If something is wrong you restart the container, if that doesn't work you restart the host it's running on. The "downside" to this is my "admin" shell skills outside of personal dev laptop commands have gotten rusty.

*Terraform, imo, released in ~2014

mystifyingpoi

> If something is wrong you restart the container, if that doesn't work you restart the host it's running on

Haha, lucky you. If only world was this beautiful :) I regularly shell into Kubernetes nodes to debug memory leaks from non-limited pods, or to check some strange network issues.

sgarland

It’s that, and the fact that precious few people seem to understand fundamentals anymore, which is itself fed by the desire to outsource everything to 3rd parties. You can build an entire stack where the only thing you’ve actually made is the core application, and even that is likely to be influenced if not built by AI.

The industry is creating learned helplessness.

akdev1l

A lot of people don’t know about compilers, bash scripts and libraries.

bamboozled

Excerpt AWS lambda is stupidly cheap!

mystifyingpoi

The amount of 0s in the price per second is mesmerizing, but just multiply this by 24h and 30 days, and you are well within the price range of a better EC2 with much better performance, plus you can process 1000 req/s instead of 1 req/s for the same price.

Esophagus4

For certain workloads :)

And that is actually the advantage of serverless, in my mind. For some low-traffic workloads, you can host for next to nothing. Per invocation, it is expensive, but if you only have a few invocations of a workload that isn't very latency sensitive, you can run an entirely serverless architecture for pennies per month.

Where people get burned is moving high traffic volumes to serverless... then they look at their bill and go, "Oh my god, what have I done!?" Or they try to throw all sorts of duct tape at serverless to make it highly performant, which is a fool's errand.

cogman10

Exactly. I've always found that how people want to use lambda is the exact opposite of how to use it cost effectively.

I've seen a lot of people want to use lambdas as rest endpoints and effectively replace their entire API with a cluster of lambdas.

But that's about the most expensive way to use a lambda! 1 request, one lambda.

Where these things are useful is when you say "I have this daily data pull and ETL that I need to do." Then all the sudden the cost is pretty dang competitive.

codegeek

How long is a piece of string ?

"Cheap" is relevant if you are talking about work load that is one off and doesn't run continuously. A lot of people use serverless to run a 24-7 service which sort of defeats the purpose. It doesn't get that cheap anymore.

Serverless is good if you have one off tasks that are used intermittently and are not consistent.

kburman

The takeaway here isn’t that serverless doesn’t work, it’s that the authors didn’t understand what they were building on. Putting a latency-critical API on a stateless edge runtime was a rookie mistake, and the pain they describe was entirely predictable.

Esophagus4

I’ve found this to be true, with one caveat.

Most cloud pain people experience is from a misunderstanding / abuse of solutions architecture and could have been avoided with a more thoughtful design. It tends to be a people problem, not a tool problem.

However, in my experience cloud vendors sell the snot out of their offerings, and the documentation is closer to marketing than truthful technical documentation. Their products’ genuine performance is a closely guarded proprietary secret, and the only way to find out… e.g. whether Lambdas are fast enough for your use case, or whether AWS RDS cross-region replication is good enough for you… is to run your own performance testing.

I’ve been burned enough times by AWS making it difficult to figure out exactly how performant their services are, and I’ve learned to test everything myself for the workloads I’ll be running.

whstl

> the documentation is closer to marketing than truthful technical documentation

I participated in AWS training and certification given by AWS for a company to obtain a government contract and I can 100% say that the PAID TRAINING itself is also 100% marketing and developer evangelism.

ecshafer

100% agree with you. I took a corporate training, and at one point crammed for the developer cert. It it just marketing. There is never a question where the answer is "Just run this service on EC2 yourself". It is about maximizing your usage of AWS services.

ivape

Infra will always be full of so much nonsense because it’s really hard to tell successful developers their code and system design is unusable. People use it because they are paid to do so usually, but it’s literally some of the worst product development I’ve ever seen.

AWS will hopefully be reduced to natural language soon enough with AI, and their product team can move on (most likely they moved on a long time ago, and the revolving door at the company meant it was going remain a shittily thought out platform in long term maintenance).

Danjoe4

This is exactly why I'd rather get a fat VPS from a reputable provider. As long as the bandwidth is sufficient the only limitation is vertical scaling.

dlisboa

I'm partial to this, the only thing I've found that is harder to achieve is the "edge" part of cloud services. Having a server at each continent is enough for most needs but having users route to the closest one is not as clear to me.

I know about Anycast but not how to make it operational for dynamic web products (not like CDN static assets). Any tips on this?

stego-tech

You took the words right out of my mouth. Between aggressive salespeople marketing any given product as a panacea for everything and mandates from above to arbitrarily use X thing to do Y, there’s a lot of just plain bad architecture out there.

osigurdson

>> is to run your own performance testing

I think they are shooting themselves in the foot with this approach. If you have to run a monte carlo simulation on every one of their services at your own time and expense just to understand performance and costs, people will naturally shy away from such black boxes.

usui

> people will naturally shy away from such black boxes.

I don't this isn't true. In fact, it seems that in the industry, many developers don't proceed with caution and go straight into usage, only to find the problems later down the road. This is a result of intense marketing on the part of cloud providers.

gonzo41

I feel like every cloud build meeting should have a moment where everyone has to defend the question "Wait! could this be a regular database with a regular app on a server with a regular cache?"

bunderbunder

I would not assume this was a "rookie mistake". I've been here once or twice, and a common story is that engineers don't want to do it a certain way, but management overrules them for some vague hand-wavy reason like, "This way is more modern." Another common story is that you know you're not choosing the most [scalable|robust|performant|whatever] design, but ancillary constraints like time and money push you into a "worse is better" decision.

Or maybe the original implementation team really didn't know what they were doing. But I'd rather give them the benefit of the doubt. Either way, I appreciate them sharing these observations because sharing these kinds of stories is how we collectively get better as a professional community.

nougati

The takeaway isn't that they didn't understand, it's that they are sharing information which you agree is valuable

ramraj07

Bo Burmham said, "self awareness does not absolve anyone of anything"

But here I dont think they (or their defenders) are still aware of the real lesson here.

Theres literally zero information thats valuable here. Its like saying "we used an 18 wheeler as our family car and then we switched over to a regular camry and solved all our problems." What is the lesson to be learned in that statement?

The real interesting post mortem would be if they go, "god in retrospect what a stupid decision we took; what were we thinking? Why did we not take a step back earlier and think, why are we doing it this way?" If they wrote a blog post that way, that would likely have amazing takeaways.

chronark_

I can assure you that was pretty close to the internal conversation lol

Not sure what the different takeaways would be though?

kburman

What's valuable about rediscovering that stateless architectures requiring network round-trips for state access are slower than in-memory state? This isn't new information, it's a predictable consequence of their architecture choice that anyone with distributed systems experience could have told them on day zero.

chronark_

Not everyone is born with experience in distributed systems

torginus

My personal experience is that if you want guaranteed anything (quick scaling, latency, CPU, disk or network throughput), your best bet is to manually provision EC2 instances (or use some API that does). Once you give up control hoping to gain performance for free, you usually end up with an unfixable bottleneck.

randomtoast

If you're looking for a middle ground between VMs and serverless, ECS Fargate is a good option. Because a container is always running, you won't experience any cold start times.

sgarland

Yes, though unless you’re provisioning your own EC2s for them to run on, you have no guarantee about the server generation, and IME AWS tends to provision older stuff for Fargate.

This may or may not matter to you depending on your application’s needs, but there is a significant performance difference between, say, an m4 family (Haswell / Broadwell) and an m7i family (Sapphire Rapids) - literally a decade of hardware improvements. Memory performance in particular can be a huge hit for latency-sensitive applications.

evantbyrne

ECS is good, just expensive and still requires more devops than it should. Docker Swarm is an easy way to run production container services on VMs. I built a free golang tool called Rove that provisions fresh Ubuntu VMs in one command and diffs updates. It's also easy-enough to use Swarm directly.

osigurdson

There isn't much for them to mess with in EKS either. It is very close to the metal and easy to reason about.

compiler-guy

This is basically criticizing them for admitting to being one of today's 10,000.

https://xkcd.com/1053/

Personally, I appreciate the info and the admission.

czhu12

> Putting a latency-critical API on a stateless edge runtime

Isn’t this the whole point of serverless edge?

It’s understood to be more complex, with more vendor lockin, and more expensive.

Trade off is that it’s better supported and faster by being on the edge.

Why would anyone bother to learn a proprietary platform for non critical, latency agnostic service?

kburman

You're confusing network proximity with application architecture. Edge deployment helps connection latency. Stateless runtime destroys it by forcing every cache access through the network.

The whole point of edge is NOT to make latency-critical APIs with heavy state requirements faster. It's to make stateless operations faster. Using it for the former is exactly the mismatch I'm describing.

Their 30ms+ cache reads vs sub-10ms target latency proves this. Edge proximity can't save you when your architecture adds 3x your latency budget per cache hit.

osigurdson

Realistically, they should be able to do sub ms cache hits which land in the same datacenter. I know cloudflare doesn't have "named" datacenters like other providers but at the end of the day, there are servers somewhere and if your lambda runs twice in the same one there is no reason why a pull-through cache can't experience a standard intra data-center latency hit.

I wonder if there is anything other than good engineering getting in the way of this and even sub us intra-process pull through caches for busy lambda functions. After all, if my lambda is getting called 1000X per second from the same point of presence, why wouldn't they keep the process in memory?

whynotmaybe

On serverless, whenever you call your code, it has to be executed but first the infrastructure has to find a place to run it and sometimes if there's no running instance available, it must fire up a new instance to run your code.

That's hot start VS cold start.

ochronus

But but it's webscale!

yilugurlu

These two have resonated with me deeply.

- Eliminated complex caching workarounds and data pipeline overhead

- Simplified architecture from distributed system to straightforward application

We, as developers/engineers (put whatever title you want), tend to make things complex for no reason sometimes. Not all systems have to follow state-of-the-art best practices. Many times, secure, stable, durable systems outperform these fancy techs and inventions. Don't get me wrong, I love to use all of these technologies and fancy stuff, but sometimes that old, boring, monolithic API running on an EC2 solves 98% of your business problems, so no need to introduce ECS, K8S, Serverless, or whatever.

Anyway, I guess I'm getting old, or I understand the value of a resilient system, and I'm trying to find peace xD.

ramraj07

But when were serverless systems like lambda and cloud workers "best practices" for low latency apis?

hedora

According to their marketing material, when they started supporting running in edge pop's, they became the best option for low-latency APIs.

daxfohl

Last I heard (~5 years ago), lambda@edge doesn't actually run on edge POPs anyway; they're just hooks that you can put in your edge configs that execute logic in the nearest region before/after running your edge config. But it's definitely a datacenter round-trip to invoke them.

Adding that much compute to an edge POP is a big lift; even firecracker gets heavy at scale. And security risk for executing arbitrary code since these POPs don't have near the physical security of a datacenter, small scale makes more vulnerable to timing attacks, etc.

pjmlp

Their problem isn't serverless, rather Cloudflare Workers and WebAssembly.

All major cloud vendors have serveless solutions based on containers, with longer managed lifetimes between requests, and naturally the ability to use properly AOT compiled languages on the containers.

fabian2k

At that point, why should I use serverless at all? If I have to think about the lifetime of the servers running my serverless functions?

OvervCW

Serverless only makes sense if the lifetime doesn't matter to your application, so if you find that you need to think about your lifetime then serverless is simply not the right technology for your use case.

pjmlp

Because it is still less management effort than taking full control of the whole infrastructure.

Usually a decision factor between more serverless, or more DevOps salaries.

fabian2k

I would doubt that this is categorically true. Serverless inherently makes the whole architecture more complex with more moving parts in most cases compared to classical web applications.

ramraj07

Serverless is not a panacea. And the alternative isn't always "multiple devops salaries" - unless the only two options you see are server serverless vs outrageously stupid complicated kubernetes cluster to host a website.

array_key_first

There's a huge gap between serverless and full infra management. Also, IMO, serverless still requires engineers just to manage that. Your concerns shift, but then you need platform experts.

null

[deleted]

johannes1234321

For a thing, which permanently has load it makes little sense.

It can make sense if you have very differing load, with few notable spikes or on an all in on managed services, where serverless things are event collectors from other services ("new file in object store" - trigger function to update some index)

OvervCW

Agree, it seems like they decided to use Cloudflare Workers and then fought them every step of the way instead of going back and evaluating if it actually fit the use case properly.

It reminds me of the companies that start building their application using a NoSQL database and then start building their own implementation of SQL on top of it.

CuriouslyC

Ironically, I really like cloudflare but actively dislike workers and avoid them when possible. R2/KV/D1 are all fantastic and being able to shard customer data via DOs is huge, but I find myself fighting workers when I use them for non-trivial cases. Now that Cloudflare has containers I'm pushing people that way.

zaphirplane

Hey! Bet I can guess who

iainmerrick

In that scenario, how do you keep cold startup as fast as possible?

The nice thing about JS workers is that they can start really fast from cold. If you have low or irregular load, but latency is important, Cloudflare Workers or equivalent is a great solution (as the article says towards the end).

If you really need a full-featured container with AOT compiled code, won't that almost certainly have a longer cold startup time? In that scenario, surely you're better off with a dedicated server to minimise latency (assuming you care about latency). But then you lose the ability to scale down to zero, which is the key advantage of serverless.

pjmlp

Apparently not nice enough, given that they rewrote the application in Go.

Serverless with containers is basically managed Kubernetes, where someone else has the headache to keep the whole infrastructure running.

CuriouslyC

Cloudflare has containers now too, and having used AppRunner and Cloud Run, it's much easier to work with. Once they get rid of the container caps and add more flexibility in terms of container resources, I would never go back to the big cloud containers, the price and ease of use of Cloudflare's containers just destroy them.

pjmlp

I doubt that the bill would be that much cheaper, nonetheless thanks for making me aware they are a thing now.

CuriouslyC

They're much cheaper, they're just DOs, and they get billed as such. They also have faster cold start times and automatic multi-region support.

Quarrel

Indeed.

They get to the bottom of the post and drop:

> Fargate handles scaling for us without the serverless constraints

They dropped workers for containers.

keyle

You're saying serverless can have really low latency and fast 24/7?

Isn't serverless at the base the old model, of shared vms, except with a ton of people?

I'm old school I guess, baremetal for days...

pjmlp

Yes, check Cloud Run, AWS Lambda, Azure Functions with containers.

torginus

I think someone should make a timeline of software technology eras, each beginning with 'why XYZ is the future' and ending with articles like this.

codegeek

"Self-Hosting : Being tied to Cloudflare's runtime meant our customers couldn't self-host Unkey. While the Workers runtime is technically open source, getting it running locally (even in dev mode) is incredibly difficult.

With standard Go servers, self-hosting becomes trivial:"

A key point that I always make. Serverless is good if you want a simple periodic task to run intermittently without worrying about a full time server. The moment things get more complex than that (which in real world it almost always is), you need a proper server.

K0IN

After building my first Serverless/Cloudflare worker app, this is why I migrated to Deno. Deno enables you to run the same codebase in deno (self-hosted/local) and in deno deploy (serverless platform from deno).

I wanted my app to be self-hostable as well, and Cloudflare worker is a hard ecosystem lock to their platform, which makes it undesirable (imo).

Here is a link to my reasoning from back then: https://github.com/K0IN/Notify/pull/77#issuecomment-16776070...

scottydelta

I ported my worker project into Django since cloudflare workers wouldn’t allow selection of region for hosting workers which is generally required due to data compliances. This is something all cloud providers provide from day one yet cloudflare made it an enterprise feature.

Also the vendor lock-in doesn’t help with durable objects and D2 instead of simply doing what supabase and others are doing by providing Postgres or standard SQLite as a service.

seethishat

Linux servers running Go apps? Would be nice to see server cost and specs, backup strategy, etc.

gethly

What do you find so peculiar about it? A lot of people are running Go apps on VPSs.

sgarland

TFA states that they’re running on AWS Fargate.

That said, as an example, an m8g.8xlarge gives you 32 vCPU / 128 GiB RAM for about $1000/month in us-east-1 for current on-demand pricing, and that drops to just under $700 if you can do a 1-year RI. I’m guessing this application isn’t super memory-heavy, so you could save even more by switching to the c-family: same vCPU, half the RAM.

Stick two of those behind a load balancer, and you have more compute than a lot of places actually need.

Or, if you have anything resembling PMF, spend $10K or so on a few used servers and put them into some good colo providers. They’ll do hardware replacement for you (for a fee).

ape4

Next article - why we switched from our own servers to serverless for reliability. A small performance hit was worth it.

tpetry

They just use two servers and configure a loadbalancer within Cloudflare. Come on. Self-Hosting is no rocket science. You don‘t have to make it seem complicated. People have been doing this decades before AWS invented serverless.

kordlessagain

Yet, idiots remain.

wltr

Backup strategy? What do you mean by that?

seethishat

Servers go down. What is the plan to get them "backup" and running ;)

fabian2k

They probably don't need one for the application servers. And they probably already have a backup strategy for their DBs.

stackskipton

Most server outages are caused by hardware failures which EC2 MOSTLY abstracts from you.

Also, if it's just Golang, point Ansible or whatever deploys at new server and trigger a deploy.

wltr

But is this not needed with the so-called cloud systems?

illuminator83

I'm assuming "High Availability" is what is really meant here.

tacker2000

Incredible that these kinds of services were hosted like this.

I guess they never came out of MVP, which could warrant using serverless, but in the end it makes 0 sense to use some slow solution like this for the service they are offering.

Why didnt they go with a self hosted backend right away?

Its funny how nowadays most devs are too scared to roll their own and just go with the cloud offerings that cost them tech debt and actual money down the road.

chronark_

We did initially but thought cloud flare was a better solution for scalability and latency.

We believed their docs/marketing without doing extensive benchmarks, which is on us.

The appeal was also to use the same typescript stack across everything, which was nice to work with

ramraj07

Where did their marketing or documentation say this service is perfect for low latency APIs?

chronark_

I doubt they literally said “perfect for low latency APIs” but their messaging is definitely trying to convince you that they’re fast globally, just look at the workers.ckoudflare.com page