Skip to content(if available)orjump to list(if available)

The testing pyramid is an outdated economic model

orwin

I'm pretty sure you ought to make the tests you think are needed, and the 'form' is not a good metric.

Our libraries with a lot of logic and calculations are dominated by unit tests, our libraries that talk to external API are dominated by integration tests. That's just good testing, I'm not sure you need to imagine a pyramid or a vase to decide the tests you do.

diggan

People are addicted to metrics, so when "code coverage" becomes something to measure, people tend to go crazy with the testing, even trivial stuff that doesn't really need to be put under tests.

My personal rule of thumb is something like: If it makes you go slower, you're doing too little/much testing, if it makes you go faster, you're doing the right amount of testing.

If you find yourself having to rewrite 10% of the code base every time a test changes, you're probably doing too much testing (or not treating your test case as production code). If you find yourself breaking stuff all over the place when doing other things, you're doing too little testing.

As most things, it's a balance, too extreme in either direction will hurt.

cogman10

Completely agree.

I think the problem some devs want hard and fast rules when a lot of the time the right answer is "it depends" and experience dictated actions.

msoad

Here is a conundrum:

With all that AI generated code being pushed, as a leader I wonder which is better? Enforce a ton of e2e so no code that is really well thought through all aspects of the solution can go past CI or does this enable AI to go even crazier and break all sort of best practices to just pass the test?

creesch

Okay? This seems like a fluff blog post as the trophy concept was coined back in 2018 if I am correct. Coming from wiremock it makes sense given their product, but it is just marketing fluff.

Honestly, as long as the GUI tip remains as small as possible I am mostly fine with whatever shape it takes below there. For modern web applications with a lot of APIs it does make sense to use a trophy. For other applications without such a communication layer a more traditional pyramid does make more sense.

What a lot of people often seem to completely overlook in discussions like this is that the pyramid isn't a goal in itself. It is intended as a way to think about where you place your tests. More specifically place tests where they make sense, provide most value and are least fragile.

Which is why the GUI should be avoided for any test that are testing logic, hence being the smallest section on whatever shape you come up with. Everything else highly depends on what sort of infrastructure you are dealing with, the scope of your application, etc.

hitchstory

I think the premise is correct and I think you are disagreeing with it.

Yes, the pyramid was set out as a goal in its original incarnation. That was deeply wrong. The shape ought to be emergent and determined by the nature of the app being tested (i went into detail on what should determine that here https://news.ycombinator.com/item?id=42709404)

Some of the most useful tests Ive worked with HAVE had a large GUI tip. The GUI behavior was the most stable surface whose behavior was clearly defined which everybody agreed upon. all the code got tested. GUI tests provided the greatest freedom to refactor, covered the most bugs and provided the most value by far on that project.

GUI tests are not inherently fragile or inherently too slow either. This is just a tendency that is highly context specific, and as the "pyramid" demonstrates - if you build a rule out of a tendency that is context specific it's going to be a shit rule.

VMG

> The pyramid is also an artifact of the era in which it was created.

> Computers were slower, testing and debugging tools were rudimentary, and developer infrastructure was bloated, cumbersome, and inefficient."

What AMD giveth, Electron taketh away.

No matter how fast computers get, developers will figure out a way to use that extra compute to make the build and the test cycle slower again.

Of course it is all relative - it is hard to define what a "unit" test is when you are building on top of enormous abstractions to begin with.

No matter what you call the test, it should be fast. I feel productive when I can iterate on a piece of code with 2 second feedback time.

giorgioz

> What AMD giveth, Electron taketh away.

This is actually true but the moralistic negative tone and no explanation about it makes me think the writer did not understand why this is happening and why it has both PROs and CONs. It's similar to some other statements I heard before on this subject "It's pointless to add/increase roads, there will always be traffic". It's true there will always be traffic but it's not pointless. There will always be traffic because moving more cars becomes faster so more people do it. You should consider though that traffic on a single lane helped 100 people while traffic on a 2 lanes street helped many more people. The same is true for software development. Computers get faster, but programs tend to stay around the same amount of perceived speed. Like roads increase and yet there is still the same amount of traffic. When computers get faster it means that developers can write code faster and so they can write more code and/or cheaper code. Writing programs becomes also cheaper so developer need to be less expert and trained. The computer that brought astronauts to the moon was probably less powerful than today's smart thermostat. Yet to land on the moon with that computer required a team of people that were likely at phd level, intensely focus and dedicated and they were all socially and culturally adjacent to the inventor of the computer. By comparison, today's programs do trivial things using immense resources. And yet because many more developers can code, there are also immensely more programs about millions of use cases that are developed all over the world, by people that do not even speak English in some cases.

So programs did become less efficient because the true bottleneck was not the efficiency of the program. The true bottleneck was developer hours and skills.

This doesn't mean that it's okay for all programs to be slow or that you should be satisfied in using programs that you perceive as slow. The correlation between speed/efficiency of a program and its UX it's a Bell curve. At the beginning the faster it gets the better the UX. After a certain speed though the UX improves marginally. If the final user cannot distinguish between the speed of two different programs it means the bottleneck is not anymore about speed and another characteristic becomes the bottleneck. This said, there will always be work for efficiency engineers or low level developers to write more performant code. But not all code will require to be written as efficiently as possible.

mkoubaa

Software has gotten so slow we've forgotten how fast computers are

tobyhinloopen

It kinda depends on your architecture. If you can run integration tests for cheap, it makes sense to favor them over smaller unit tests.

I like to design my applications so all slow components can be mocked by faster alternatives, and have the HTTP stack as thin as possible so I can basically call a function and assert the output, while the output closely resembles the final HTTP response, either rendering a template with a blob of data, or rendering the blob of data as JSON.

ak681443

Tests are a way to write your logic twice (once for the code and once during the assertions) with the assumption that you're unlikely to make the same mistake twice.

Integration tests are better replaced by something like contract testing IMO to still retain the test parallelism.

mrkeen

We don't need this article because the 90%-unit/10%-itest was only ever a goal to aspire to. Just like achieving 90% code-coverage - no need for a thinkpiece to say that 40% or 60% is now 'the right amount' of code-coverage.

We like units because they are fast, deterministic, parallelisable... all the good stuff. Relative to that ideal, integrations are slower, flakier, more sequential, etc.

While I've never gone full-TDD, those guys have it absolutely right that testability is a design/coding activity, not a testing activity. TDD will tell you if you're writing unit-testable or not, but it won't tell you how. Dependency-inversion / Ports-and-adapters / Hexagonal-architecture are the topics to read on how to write testable code.

What's my personal stake in this? Firstly, our bugfix-to-prod-release window is about four hours. Way too long. Secondly, as someone relatively new to this codebase, when I stumble across some suspicious logic, I can't just spit out a new unit test to see what it does, since it's so intermingled with MS-SQL and partner integrations. Our methods pass around handles to the DB like candy.

So what I think has happened here, is that we generally don't think about writing testable code as an industry. Therefore our code is all integrations, and no units. So when we go to test it, of course the classic testing pyramid is unachievable.

null

[deleted]

weinzierl

"The pyramid is also an artifact of the era in which it was created. Computers were slower, testing and debugging tools were rudimentary, and developer infrastructure was bloated, cumbersome, and inefficient."

In addition to that, I think a major point is that the testing pyramid was conceived in a world where desktop GUI applications ruled. Testing a desktop GUI was incredibly expensive and automation extremely fragile. That is in my opinion where the pointy tip of the pyramid came from in the first place.

"But the majority of tests are of the entire service, via its API [..]"

I think this is where you get the best bang for your buck because your goal to keep your tests robust is well aligned with the goal to keep the API stable. This is not the case above and below, where the goal of robust tests is always at odds with change, quick adaption and rapid iteration.

dtech

these pointy shape still holds, because we often have multiple services now and testing across services is difficult and expensive.

jillesvangurp

I apply the following reasoning model to testing:

- Integration tests are expensive to run and take time to write; therefore it is important to maximize their value. The ultimate integration test is an end to end test because it maximizes the scope what is under test and the potential for weird feature interactions to trigger exactly those kind of failures you want to find with such a test.

- Unit tests are orders of magnitude cheaper to run; so have lots of them but make sure they are easy to maintain and simple so they minimize time spent on them.

- Anything in between is a compromise between shooting for realism vs. execution speed. Still expensive to run and maintain but it just does not deliver a lot of value.

- Test coverage becomes exponentially harder with the size of the unit you are testing. Test coverage for integration tests is a meaningless notion. With end to end integration tests you shoot for realism, not coverage. They should cover things that users of your system would use in ways that they would use them.

- Mocking and faking is needed to unit test code that violates the SOLID principles and is otherwise hard to test. So they have the development overhead of an integration test but they deliver the value of a unit test. This is not ideal. It's better to unit test code that is very testable and cover the rest with integration tests that deliver more valuable insight. Lots of very complex unit tests are hard to develop and limited in value.

I just removed the one remaining test that used mockk in my Kotlin code base. I have hundreds of API integration tests. And lots of simple unit tests. I focus my unit tests on algorithms, parameters, and those sort of things. My integration tests ensure the system does what it needs to.

I run integration tests concurrently so they complete quickly. This increases their value because it proves the code still works if there's more than one user in the system.

TypingOutBugs

This is just an ad, but in principle I agree integration tests can usually bring more value (depending on the system, but in my general experience across many companies of all sizes).

sohnee

I think we all agree that the majority of tests should be fast, automated, and robust. When the pyramid was written, that probably _did_ mean unit tests. In 2025 it doesn't.

Let's keep the pyramid but rename the segments!

pyrale

I would argue that this opinion has developed because the footprint of the average service has shrinked over the years.

If you're testing large, complex services that involve many different behaviours, you're still going to have a test pyramid. If you've implemented microservices, what you used to call an integration test has now become an e2e test in your new architecture. And you still don't want to have mostly that.