Testing Without Mocks: A Pattern Language (2023)
59 comments
·March 25, 2025bccdee
bulatb
> It's the Thing module's responsibility to correctly set up the test doubles involved.
But why?
Maybe there's some fundamental difference of opinion here, because I'm thinking definitely not, no way, test-only code paths should never reach prod, and code that ships should never even know that testing is a thing that happens.
bccdee
`makeThingDoerForTest` is a standalone function (or static method) that never gets called in production code. I don't see why it needs to be in an entirely separate module. If you want, you could put it in a separate thing_test module, but then you'd have to expose a more powerful constructor for thing_test to access. I'd rather hide that inside the thing module, and I see no benefit to splitting this functionality across two modules just for the sake of not mixing my peas with my potatoes.
sethammons
That's the trick - it is not test only code. You use the same objects, the same code.
The code has to be written so it takes in the altered unit via parameters or other DI or so it knows that the current system is set to short circuit some calls.
Ie, you could have MyClient(testHTTPClient).GetResource(foo) used in tests and MyClient(realHTTPClient).GetResource(foo). The testHTTPClient would get to the actual connection part and return a configured response or error.
Your entire logic is the same, your code could "receive" a forced 404 or timeout or 200 OK. It is up to the testHTTPClient that is only changing how the http connection is handled.
I call these unit-integration tests. You are checking all units work together while not actually working with the outside world.
chrisoverzero
You’re responding to a comment containing a method call `thing.makeThingDoerForTest(1, 2, 3)` and saying “That's the trick - it is not test only code.”
Can you square those for me? I don’t get it.
larusso
It goes a bit into the same direction rust went by shipping the unit tests usually inside the module. The thought is intriguing to keep the „mock“ setup up to date even when doing refactors. But I wonder if this solves the issue. I tend to not use mocks. And I try to stay away from spies. These two tie the tests to close to the implementation like the article also described. The reason why I don‘t ship test setup in production code is the simple fact that the test / spec should not be based on production code. Hard to put in words for me but the test should be impartial. It sets up its test world and run the subject in it. It should not use the subject to describe the test environment. I think this will lead to false positives.
I have an example where a coworker used constants from production to assert return values. I stated that the test can‘t use them because the test is the specification and needs to be in control what is correct and what is not. Even if that means we write duplicate string values. In this case somebody changed a const value to a wrong value. But for the test this was all fine.
bluGill
What is the harm in a test only path in production? If you are an embedded systen with limited memory there is obvious harm but most code isn't that. There are other possible harms but they are less obvious and may not imply. There are also many different ways to implement this.
i did something like this where in production I have if function pointer isn't null return the result of it instead of running the production code (and an unlikely compiler directive on the if). The function itself is implemented in a test only shared library so the cost to production is low. If you can break in you can set that function - but if you can inject arbitary code we are already sunk.
cratermoon
> no way, test-only code paths should never reach prod, and code that ships should never even know that testing is a thing that happens.
What's your reasoning for this? As a counter-example, the hardware you are using right now almost certainly has test points throughout the circuitry. Beyond all the uses the test points get during development and manufacturing, they are essential for repair and re-calibration after being sold to the customer and put into production.
sitkack
So much needless discussion because someone makes an ill thought out hot take. Most engineered objects of any substantial complexity include built in test systems. They would be impossible to build without them.
ngruhn
I guess overcoming that dogma is exactly the tradeoff.
bulatb
It's just a feeling, not a dogma. I'll happily throw it away. But why?
sitkack
> test-only code paths should never reach prod
Why? Humanities most reliable engineered objects contain their own test and monitoring systems.
null
jdlshore
You nailed it.
sethammons
So this is a very thorough article. Aside from using unfamiliar terms, nearly every concept is valuable and familiar. I am a big proponent of not mocking / no mocking frameworks/libraries. 15+ years of building large SaaS systems with hundreds of contributors and optimizing for testing so deployment is the best balance of safe and fast.
I use nearly every pattern mentioned to write my tests in Go. I now use the same patterns in Python where I can - this new org is less mature in its testing solution and mocks and spaghetti abound.
For similar, Go specific resources look up, "Go fakes vs mocks."
Other related reading on reasons to avoid mocks: https://martinfowler.com/articles/mocksArentStubs.html
fwiw, not a fan of a couple techniques like nullables and prefer Fakes/Fake Constructors, but it is splitting hairs. Overall the message is correct: avoid mocking. Have a way for systems under test to have implementation details changed while interfaces remain the same.
cratermoon
> Aside from using unfamiliar terms, nearly every concept is valuable and familiar
A couple of decades ago I worked on some ideas I had for "test points" in software, modeled on the kinds of test points that chips and circuits have built in. My ideas were precursors to some of the observability tools like OpenTelemetry, but were more passive.
fhd2
Wow, a breath of fresh air. I read this before I went to sleep, hoping to wake up to some kind of revelation I might share in a comment like this, but no luck. I'm still torn.
I remember writing tests in not very powerful languages like Java in 2005. Back then the idea was that you just need to decouple your objects, add interfaces for everything, and you will reap good design, courtesy of being forced to by TDD. What I usually saw was an overly complex code base with dependency injection and difficult to debug problems at runtime.
This article seems like the natural evolution of this thinking, but turning it around: It's not saying a unit that wants to be tested needs to enable tests. It's saying a unit used by other units needs to support _them_ being tested. Maybe I'm easy to impress, but I find that refreshingly novel, wondering why I never thought of it.
OTOH, with powerful mocking frameworks and more dynamic features even in static languages, the part of the industry I've been a witness to took another turn: Let units do whatever they want, focusing just on what's needed in production, and in order to test them, we'll build an elaborate fake environment for them. The major downside being that people changing a unit need to worry about whatever elaborate fakes _other_ units have created for them in their tests.
So it's coupling vs decoupling production code and tests. The former approach makes it easier to author and maintain test suites. The latter approach makes it easier to reason about production code.
In the last decade or so, I've valued the latter more, despite the drawbacks. With this style, coupling production code and tests is suddenly on the table for me again.
bluGill
Things happend in a different order. Tdd and automated tests were developed first without test doubles. However they eventually hit problems because some things were painful to use - networks and the like. Mocks were one tool eventually inventeded. Then latter learners heard of such things and way overused them. We are now finding the correct middle ground.
aszen
Lot of jargon and nothing new in terms of ideas. Again testing nothing but fakes and trivial interactions.
Here's my opinion test your code with all its side effects and infrastructure.
To do that fast and deterministically, switch to faster lighter weight versions of your production dependencies when running tests.
For example use SQLite or PgLite for databases, use in memory file system, use Mailpit for emailing etc.
Simulate your infrastructure dependencies, inject faults into them, test the real deal, SQL, file system semantics, protocols.
Don't leave the hard part of your testing to Q/A and users.
ajmurmann
IMO you should have a healthy test pyramid which contains both types of tests. At the very top should be a smallish set of tests that are as real as possible and it's ok if they are a little slower. At the bottom you have more fine-grained tests that focus on one unit (classes != units) and tests it quickly and in isolation. Ideally we get this isolation by designing good interfaces (not necessarily in the Java sense) and can just not hook it up to real collaborators (maybe we pass callbacks to the email API or have boring glue code that just passes the return value on to the API) but sometimes a test double is ok. If we use a test double for network interactions to third parties, it's another great idea to replay recorded responses with something like Ruby's VCR which at least allows us to exercise the client code in medium-priced tests.
This allows you to get very quick feedback while in the weeds but also gives us comprehensive test coverage.
imiric
I agree. Arguments against the classical test pyramid are common, but IME they often come from a place of laziness. It essentially boils down to "unit tests are a chore to maintain, hence ...<hand waving>... integration tests are more valuable". Laziness _is_ one of the three virtues of a good programmer, but these arguments completely miss the mark. I would actually go a step further and argue that their popularity is doing tangible harm to software quality.
For a program to be robust it needs to be tested at all layers, and the testing pyramid is still a good rule of thumb to follow. Designing the program with testing in mind is crucial, and as you say, interfaces and their equivalent in any language are the right approach to achieve this. There was a period of time when monkeypatching was popular in dynamically typed languages, and perhaps still is (I've been out of the loop for a few years), but we always knew that it was a dirty trick, and that the software should be architected better to make it testable.
If anyone is curious what this might look like, I wrote a small project in Go[1] last year that is built from the ground up with testing in mind. There's a global `App` struct that is initialized in `main() `, and serves as the main container for all I/O the program does. This means that internally no component ever makes direct OS calls, but does it via interfaces, which can be easily mocked in tests. This makes unit and integration testing a breeze, and allows other niceties like passing down a global logger. I've adopted this pattern in all Go apps I write from scratch, and advocated for it on existing projects I've worked on, and it's worked great so far.
bluGill
Interfaces are only useful if there is reason they won't change otherwise they get in the way. Reason is often policical - this is where another group talks to my code and so the less we have to know about the other the better (not that the code is hidden just that you ignore it) reason could also be it is used everywhere so any change would be months of effort to make - longer if anyone else needs to impment anything.
if change is easy then better to make the change - you never really know all future requiresments so getting it right is impossible.
unless getting it right is possible. if you are implementing a new language you should know enough to get the interface to your strings right on the first try. That is not the normal case. (Strings will soon become impossible to change)
maxwellg
I very much agree- but swapping out the database is a scary operation if you use the database for any sort of nontrivial logic. You need to make sure that your unique indicies, partial indicies, constraints, views, triggers, etc. work just as well. Another option is to start each test in an isolated transaction and roll back at the end instead of committing, but that comes with its own set of headaches too.
jbjbjbjb
I’m struggling to see how nullables are different from mocks it just does the mocking in the production code. In the example I would just replace GitWriter with a mock of GitWriter and it achieves the same thing, no?
rco8786
Yea that's exactly what I am seeing also, but I'm open to the idea that I am missing something.
loevborg
I can say without exaggeration that this article changed the way I write software. Many of the concepts are interesting by themselves (embedded stubs, nullables, sociable tests) but together these patterns form a strong approach for writing fast-running tests.
At first I wasn't sure how well this was going to work, but 30,000 lines later I can see how Shore's approach applies beautifully to real-life projects.
The only thing I'm struggling with is getting the rest of the team to adopt Testing Without Mocks. It takes a bit of effort to learn how to use it correctly.
bulatb
Honestly I'd probably resist this too. Maybe you could practice on me? Help see the benefit?
I'm seeing three main points in this advice:
1. Don't misuse mocks. (Mocks, fakes, whatever.)
2. Write testable, well-factored code with pluggable dependencies.
3. Don't use mocks at all. Use "nullables" instead.
I'm totally on board with (1) and (2) as good advice in general, no matter how you're testing. But (3) seems like an independent choice. What benefit does (3) specifically deliver that you wouldn't get with (1) and (2) and mocks instead of nullables?
I'm ready to admit I'm missing something, but I don't see it.
loevborg
I'll give it a try.
The beauty of the nullable approach is that when you're testing the system under test (SUT), you're testing it together with its dependencies. Not just that - you're running the real production code of the dependencies (including transitive dependencies!), just with a off switch toggled.
What the off switch does is really just prevent side effects, like making actual HTTP requests to the outside world or reading bytes from the file system. Using embedded stub makes this a breeze (another revelation for me).
For example, while building the "/v1/add-account" endpoint, you write a test for the request handler function. If you write this test using Shore's approach, you'll exercise the whole tree of dependencies, down to the leaves.
This is what is meant by sociable tests - the SUT isn't isolated from its dependencies. It's the real code all the way down, except that you're calling createNull() instead of create() to instantiate the classes. There are many benefits here, but to me the most important ones are: (1) you don't need to write mocks as separate classes, (2) you can catch errors that arise in the interplay of the various classes or modules and (3) as a result you get a lot more confidence that everything is still working after refactoring.
A sociable microtest is a little like an integration test, but you don't exercise any out-of-process dependencies (perhaps with the exception of the RDBMS) so the tests run in milliseconds rather than seconds.
You commented elsewhere that you're worried about the separation of test code and prod code. Yes, this is a bit of a holy cow that I also had trouble adapting to. It turns out, having test-focused code mixed in with your prod classes is totally fine. I'd perhaps prefer a cleaner separation but honestly it's not a big issue in practice.
Leherenn
I'm all aboard this, in fact it's how I like to write my "unit" tests, but still I'm unsure about this nullable and test code with prod.
What I do is I simply mock as little as possible (leaf nodes such as timers, remote calls, sometimes RDBMS/filesystem, ...). But I'm not sure what embedding the mock in the prod code gains you? I wonder if part of it is down to the language used?
For instance it says "Put the stub in the same file as the rest of your code so it’s easy to remember and update when your code changes.". Maybe it's because they're using a dynamic language? I'm using C++, so any change in interface will fail to compile the mock, making this less important. You could change the behaviour whilst keeping the same interface, but it's very rare in my experience (and definitely should be avoided).
Nullable I could see some values in some cases where you also want to be able to disable a functionality in prod. But in that case you have a function in the interface to turn it off, so you might as well use that. I can see where using Nullable in construction avoids having to expose this method through the whole dependency tree, but at the same time you lose the ability to dynamically turn in on/off.
bulatb
Thanks, this helped me understand some things. Unfortunately I don't have time to write a real reply right now, but I think I'm more convinced than before. Thanks for taking the time.
thom
Part of testing is risk management. With mocks you run the risk of the behaviour of your mocks being out of sync with the actual systems you’re mocking. By moving the test-time behaviour closer to the runtime behaviour, you reduce that risk somewhat. If you have other ways of managing that risk or you just never see that happening, you’re good.
kikimora
You can use mocks, however, I found in practice that hand written fake implementations (aka Nullables) are much easier to maintain.
hyperbolablabla
This article was way longer than it needed to be, and I found the overuse of jargon particularly distracting from the meat of the message.
john-tells-all
the author also wrote:
A Light Introduction to Nullables
https://www.jamesshore.com/v2/projects/nullables/a-light-int...
rat87
Personally I like record/replay style testing where you save real network(or even hardware communication) info then commit it and replay it to get quick unit tests. That plus local file based SQL tests
cratermoon
In 1997 Cem Kaner wrote a paper titled "Improving the Maintainability of Automated Test Suites"[1]. One of the key pain points Kaner identified was the fragility of capture/replay tests. There's a reason why the test automation tools popular in the 90s fell out of favor: any time the system under test changes, the capture/replay tests will be out of sync. In addition, capture/replay tests aren't very good at finding defects.
globular-toast
Sure but the problem is they are slow and the combinatorial explosion of possibilities at the edge of your application will kill you. You can't enumerate every possible code path in an application of reasonable size. This is why a balanced approach is usually employed: a small number of representative happy/unhappy path integration tests to test the high level user feedback etc, with lower-level tests on the underlying layers and libraries etc. If you could just do it with high level tests, we'd all do it that way.
meowtimemania
How do you handle when fields are added or removed? Do you have to rerecord all the affected tests?
vkou
Yes.
When you change behavior you should expect to re-record.
3np
That's integration testing. Also valuable but complementary to actual unit tests.
globular-toast
The "unit" in unit test refers to the test, not the code under test. High level tests that are isolated from other tests using mechanisms like fixtures, setup/teardown etc are still unit tests.
knallfrosch
"Dependency injection (DI) frameworks work around the problem, but we don’t want to require such magic. Ensure all classes have a constructor or factory that doesn’t take any parameters"
Yeah, sure. Apart from the fact that most frameworks require constructor-based injection and make it very easy to use in tests.
"Write one or two end-to-end tests that make sure your code starts up and runs a common workflow. "
Getting the e2e tests to run is the hard part. Spinning up the external services is the slow part. Once you make everything work, you throw away all the invested time to run a single e2e test? Modern frameworks, like Playwright, allow you to simply record (!) your mouse clicks and write tests from that. You can easily 'write' 200 e2e tests like this and you don't even need a dev to do it.
The article's structure is commendable, but its ideas feel like taken straight from the 2000s.
earnestinger
If I’m reading it right, proposal is to hand-roll the mocks/test-doubles and store them in production code. (In addition to how to structure the app)
While article is interesting and idea workable, the title seems a bit misleading.
(Including the “pattern language” part. I bet 90%of readers were expecting to see some kind of DSL)
loevborg
Yes, I think it's fair to say that nullables are a form of test doubles. But the approach is meaningfully different from other approaches as it avoids the problems that you get from using mocks in the narrow sense of the word: you're only testing against the behavior of the mocks (not the real code) and you need to keep updating the mocks as the system changes, which adds maintenance cost.
knallfrosch
I don't really see how production-code mocks differ from library-based mocks. If you add but a single constructor parameter to your class, you already need to change your production-mock/stub too.
earnestinger
the benefits I see:
- nullable (test double/mock) updates would be more natural, easier, since it is located right there in prod code. (One level of indirection removed)
- with authoritative location of test double, it reduces the risk of test double proliferation/duplication
(I doubt I will use it in the proposed form though, saving test double code in prod is not for me probably)
cluckindan
> LoginClient.createNull().getUserInfo(...) should return a default response without actually talking to the third-party login service.
This seems like a security nightmare. In defense terms: here’s the side road to the castle with gates open, guarded by a single variable.
jdlshore
How are you imagining that variable gets flipped? If an attacker has the ability to run arbitrary code, you’re already screwed.
To put it another way, somewhere in your code you do authorization: `if (isAuthorized) enterTheCastle()`. The gate to the castle, guarded by a single variable. In both cases, you have to be careful to program it correctly.
In the case of Nullables, it’s easy to program it correctly. Multiple things would have to go wrong for the nulled version to be accidentally used in production. Also, unless you go out of your way to make a security vulnerability, Nullables’ default behavior is to do nothing. They default closed rather than open.
knallfrosch
> How are you imagining that variable gets flipped?
Not by an external actor, but by an internal bug.
rco8786
I'm genuinely confused about this. The very first example uses a Mock. It's just a manually defined mock that the author calls a "stub". There is no discernable difference between this and just using a mock, that I can see.
cratermoon
No,"mock" here refers to some library or toolkit external to the system under test. Those mock libraries tend to require quite a bit of convoluted setup to create, involving defining both the behavior and the expectations. After the test is run, the mock runs its checks to flag any unmet expectations. The stub defined here does not include any complex setup and post-test checking. Its behavior is entirely a "no-op" version of the production dependency. In addition, because mocking tools are general purpose, they have complex logic for handling a broad range of inputs and verification tests. This technique instead uses simpler, bespoke code and leaves the Assert part of the three As to the test code.
rco8786
I guess we have used different mocking libraries. The ones I'm most familiar with start with very simple use cases ("here is an object that looks like this Class, and when you call function X on it, return Y") and don't appear to have any noticeable amount of performance degradation.
One could just as easily have each class/file export a mocked object rather than a real one that happens to do nothing.
That angle of it though, I can see being beneficial. Where each class has the responsibility of defining how it should be used in a test case, vs each individual test case having to define the mocking behavior for any class it touches.
bluGill
Mocks check that you call a function with the correct arguements. Stubs return a canned value but don't tell you that if they did. spys watch something happen. Fakes model the real thing with enough complexity that you write tests for them.
vkou
This is very interesting for greenfield work, but would be a pig to implement in a large brownfield application - because it requires you to design your production code to include all of these hooks, and to manually implement stubs and wrapper code for every third-party dependency.
---
The 'performance' gain promised by this is one I can't take seriously. Computers are fast, and test execution is parallelizable. I really don't care if spinning up a mock in a test 0.0005 ms, or 0.5ms. Any mental effort I have to expend to go from one to the other is absolutely not worth it.
Especially when 99.99% of my test runtime is spent on the kind of E2E tests that this would not help with (Because I actually want to verify production behavior - not the behavior of my fake implementations).
I'm pretty fond of this article. I think it's an interesting approach, and I think you can generalize it out into some very useful principles.
The big takeaway, I think, is that a particular module of code should encapsulate its own test double configuration & injection behind a set of testing factories. So rather than this:
Do something like this: Shore's "nullables" are a version of this, but to me it doesn't really matter what makeThingDoerForTest is doing internally to build the mocked-up test version of a ThingDoer. What matters is that the test doesn't have to worry about it: It's the Thing module's responsibility to correctly set up the test doubles involved. The unit test can use a special constructor that exposes only the minimum necessary configuration, and then implementation details about how ThingDoer interacts with Foo can be hidden inside the Thing module, as they should be.