Testing Without Mocks: A Pattern Language (2023)

63 comments

·March 25, 2025

bccdee

I'm pretty fond of this article. I think it's an interesting approach, and I think you can generalize it out into some very useful principles.

The big takeaway, I think, is that a particular module of code should encapsulate its own test double configuration & injection behind a set of testing factories. So rather than this:

    m = new FooMock()
    m.call("Bar", 1, 2)
    m.call("Baz", 3)
    p = new thing.ThingDoer(m)
    actual = p.doThing()
    assert(expected == actual)

Do something like this:

    p = thing.makeThingDoerForTest(1, 2, 3)
    actual = p.doThing()
    assert(expected == actual)

Shore's "nullables" are a version of this, but to me it doesn't really matter what makeThingDoerForTest is doing internally to build the mocked-up test version of a ThingDoer. What matters is that the test doesn't have to worry about it: It's the Thing module's responsibility to correctly set up the test doubles involved. The unit test can use a special constructor that exposes only the minimum necessary configuration, and then implementation details about how ThingDoer interacts with Foo can be hidden inside the Thing module, as they should be.

bulatb

> It's the Thing module's responsibility to correctly set up the test doubles involved.

But why?

Maybe there's some fundamental difference of opinion here, because I'm thinking definitely not, no way, test-only code paths should never reach prod, and code that ships should never even know that testing is a thing that happens.

sethammons

That's the trick - it is not test only code. You use the same objects, the same code.

The code has to be written so it takes in the altered unit via parameters or other DI or so it knows that the current system is set to short circuit some calls.

Ie, you could have MyClient(testHTTPClient).GetResource(foo) used in tests and MyClient(realHTTPClient).GetResource(foo). The testHTTPClient would get to the actual connection part and return a configured response or error.

Your entire logic is the same, your code could "receive" a forced 404 or timeout or 200 OK. It is up to the testHTTPClient that is only changing how the http connection is handled.

I call these unit-integration tests. You are checking all units work together while not actually working with the outside world.

chrisoverzero

You’re responding to a comment containing a method call `thing.makeThingDoerForTest(1, 2, 3)` and saying “That's the trick - it is not test only code.”

Can you square those for me? I don’t get it.

larusso

It goes a bit into the same direction rust went by shipping the unit tests usually inside the module. The thought is intriguing to keep the „mock“ setup up to date even when doing refactors. But I wonder if this solves the issue. I tend to not use mocks. And I try to stay away from spies. These two tie the tests to close to the implementation like the article also described. The reason why I don‘t ship test setup in production code is the simple fact that the test / spec should not be based on production code. Hard to put in words for me but the test should be impartial. It sets up its test world and run the subject in it. It should not use the subject to describe the test environment. I think this will lead to false positives.

I have an example where a coworker used constants from production to assert return values. I stated that the test can‘t use them because the test is the specification and needs to be in control what is correct and what is not. Even if that means we write duplicate string values. In this case somebody changed a const value to a wrong value. But for the test this was all fine.

bluGill

What is the harm in a test only path in production? If you are an embedded systen with limited memory there is obvious harm but most code isn't that. There are other possible harms but they are less obvious and may not imply. There are also many different ways to implement this.

i did something like this where in production I have if function pointer isn't null return the result of it instead of running the production code (and an unlikely compiler directive on the if). The function itself is implemented in a test only shared library so the cost to production is low. If you can break in you can set that function - but if you can inject arbitary code we are already sunk.

bccdee

`makeThingDoerForTest` is a standalone function (or static method) that never gets called in production code. I don't see why it needs to be in an entirely separate module. If you want, you could put it in a separate thing_test module, but then you'd have to expose a more powerful constructor for thing_test to access. I'd rather hide that inside the thing module, and I see no benefit to splitting this functionality across two modules just for the sake of not mixing my peas with my potatoes.

cratermoon

> no way, test-only code paths should never reach prod, and code that ships should never even know that testing is a thing that happens.

What's your reasoning for this? As a counter-example, the hardware you are using right now almost certainly has test points throughout the circuitry. Beyond all the uses the test points get during development and manufacturing, they are essential for repair and re-calibration after being sold to the customer and put into production.

sitkack

So much needless discussion because someone makes an ill thought out hot take. Most engineered objects of any substantial complexity include built in test systems. They would be impossible to build without them.

ngruhn

I guess overcoming that dogma is exactly the tradeoff.

bulatb

It's just a feeling, not a dogma. I'll happily throw it away. But why?

bluGill

Call it demo mode and it will become more important than the real to some of you company.

if the test differences are significant I would agree to not pollote production but often they are not so the harm is minimal.

sitkack

> test-only code paths should never reach prod

Why? Humanities most reliable engineered objects contain their own test and monitoring systems.

null

[deleted]

jdlshore

You nailed it.

sethammons

So this is a very thorough article. Aside from using unfamiliar terms, nearly every concept is valuable and familiar. I am a big proponent of not mocking / no mocking frameworks/libraries. 15+ years of building large SaaS systems with hundreds of contributors and optimizing for testing so deployment is the best balance of safe and fast.

I use nearly every pattern mentioned to write my tests in Go. I now use the same patterns in Python where I can - this new org is less mature in its testing solution and mocks and spaghetti abound.

For similar, Go specific resources look up, "Go fakes vs mocks."

Other related reading on reasons to avoid mocks: https://martinfowler.com/articles/mocksArentStubs.html

fwiw, not a fan of a couple techniques like nullables and prefer Fakes/Fake Constructors, but it is splitting hairs. Overall the message is correct: avoid mocking. Have a way for systems under test to have implementation details changed while interfaces remain the same.

cratermoon

> Aside from using unfamiliar terms, nearly every concept is valuable and familiar

A couple of decades ago I worked on some ideas I had for "test points" in software, modeled on the kinds of test points that chips and circuits have built in. My ideas were precursors to some of the observability tools like OpenTelemetry, but were more passive.

fhd2

Wow, a breath of fresh air. I read this before I went to sleep, hoping to wake up to some kind of revelation I might share in a comment like this, but no luck. I'm still torn.

I remember writing tests in not very powerful languages like Java in 2005. Back then the idea was that you just need to decouple your objects, add interfaces for everything, and you will reap good design, courtesy of being forced to by TDD. What I usually saw was an overly complex code base with dependency injection and difficult to debug problems at runtime.

This article seems like the natural evolution of this thinking, but turning it around: It's not saying a unit that wants to be tested needs to enable tests. It's saying a unit used by other units needs to support _them_ being tested. Maybe I'm easy to impress, but I find that refreshingly novel, wondering why I never thought of it.

OTOH, with powerful mocking frameworks and more dynamic features even in static languages, the part of the industry I've been a witness to took another turn: Let units do whatever they want, focusing just on what's needed in production, and in order to test them, we'll build an elaborate fake environment for them. The major downside being that people changing a unit need to worry about whatever elaborate fakes _other_ units have created for them in their tests.

So it's coupling vs decoupling production code and tests. The former approach makes it easier to author and maintain test suites. The latter approach makes it easier to reason about production code.

In the last decade or so, I've valued the latter more, despite the drawbacks. With this style, coupling production code and tests is suddenly on the table for me again.

bluGill

Things happend in a different order. Tdd and automated tests were developed first without test doubles. However they eventually hit problems because some things were painful to use - networks and the like. Mocks were one tool eventually inventeded. Then latter learners heard of such things and way overused them. We are now finding the correct middle ground.

loevborg

I can say without exaggeration that this article changed the way I write software. Many of the concepts are interesting by themselves (embedded stubs, nullables, sociable tests) but together these patterns form a strong approach for writing fast-running tests.

At first I wasn't sure how well this was going to work, but 30,000 lines later I can see how Shore's approach applies beautifully to real-life projects.

The only thing I'm struggling with is getting the rest of the team to adopt Testing Without Mocks. It takes a bit of effort to learn how to use it correctly.

bulatb

Honestly I'd probably resist this too. Maybe you could practice on me? Help see the benefit?

I'm seeing three main points in this advice:

1. Don't misuse mocks. (Mocks, fakes, whatever.)

2. Write testable, well-factored code with pluggable dependencies.

3. Don't use mocks at all. Use "nullables" instead.

I'm totally on board with (1) and (2) as good advice in general, no matter how you're testing. But (3) seems like an independent choice. What benefit does (3) specifically deliver that you wouldn't get with (1) and (2) and mocks instead of nullables?

I'm ready to admit I'm missing something, but I don't see it.

loevborg

I'll give it a try.

The beauty of the nullable approach is that when you're testing the system under test (SUT), you're testing it together with its dependencies. Not just that - you're running the real production code of the dependencies (including transitive dependencies!), just with a off switch toggled.

What the off switch does is really just prevent side effects, like making actual HTTP requests to the outside world or reading bytes from the file system. Using embedded stub makes this a breeze (another revelation for me).

For example, while building the "/v1/add-account" endpoint, you write a test for the request handler function. If you write this test using Shore's approach, you'll exercise the whole tree of dependencies, down to the leaves.

This is what is meant by sociable tests - the SUT isn't isolated from its dependencies. It's the real code all the way down, except that you're calling createNull() instead of create() to instantiate the classes. There are many benefits here, but to me the most important ones are: (1) you don't need to write mocks as separate classes, (2) you can catch errors that arise in the interplay of the various classes or modules and (3) as a result you get a lot more confidence that everything is still working after refactoring.

A sociable microtest is a little like an integration test, but you don't exercise any out-of-process dependencies (perhaps with the exception of the RDBMS) so the tests run in milliseconds rather than seconds.

You commented elsewhere that you're worried about the separation of test code and prod code. Yes, this is a bit of a holy cow that I also had trouble adapting to. It turns out, having test-focused code mixed in with your prod classes is totally fine. I'd perhaps prefer a cleaner separation but honestly it's not a big issue in practice.

Leherenn

I'm all aboard this, in fact it's how I like to write my "unit" tests, but still I'm unsure about this nullable and test code with prod.

What I do is I simply mock as little as possible (leaf nodes such as timers, remote calls, sometimes RDBMS/filesystem, ...). But I'm not sure what embedding the mock in the prod code gains you? I wonder if part of it is down to the language used?

For instance it says "Put the stub in the same file as the rest of your code so it’s easy to remember and update when your code changes.". Maybe it's because they're using a dynamic language? I'm using C++, so any change in interface will fail to compile the mock, making this less important. You could change the behaviour whilst keeping the same interface, but it's very rare in my experience (and definitely should be avoided).

Nullable I could see some values in some cases where you also want to be able to disable a functionality in prod. But in that case you have a function in the interface to turn it off, so you might as well use that. I can see where using Nullable in construction avoids having to expose this method through the whole dependency tree, but at the same time you lose the ability to dynamically turn in on/off.

bulatb

Thanks, this helped me understand some things. Unfortunately I don't have time to write a real reply right now, but I think I'm more convinced than before. Thanks for taking the time.

thom

Part of testing is risk management. With mocks you run the risk of the behaviour of your mocks being out of sync with the actual systems you’re mocking. By moving the test-time behaviour closer to the runtime behaviour, you reduce that risk somewhat. If you have other ways of managing that risk or you just never see that happening, you’re good.

bccdee

The value-add for (3) is that "nullables" encapsulate all the fiddly details that mocks (tend to) expose.

Here's an example. With mock injection:

    m = new FooMock()
    m.call("Bar", 1, 2)
    m.call("Baz", 3)
    p = new Thing(m)
    actual = p.doSomething()
    assert(expected == actual)

With a test-specific factory method:

    p = Thing::makeTestThing(1, 2, 3)  // static method
    actual = p.doSomething()
    assert(expected == actual)

With the test-specific factory, mocked dependencies are encapsulated by Thing. You can configure the their behaviour by passing different parameters to the factory method, but implementation details like `m.call("Bar", 1, 2)` are hidden. The test doesn't need to know that `doSomething` calls `Foo.Bar`.

I'm not sure whether the author of the article would consider `Thing` to be a nullable if `makeTestThing` uses a conventional, automatically-generated `FooMock` under the hood, but I don't think it really matters. To me, the big benefit comes from `Thing` assuming the responsibility to configure and inject its own mocks.

kikimora

You can use mocks, however, I found in practice that hand written fake implementations (aka Nullables) are much easier to maintain.

aszen

Lot of jargon and nothing new in terms of ideas. Again testing nothing but fakes and trivial interactions.

Here's my opinion test your code with all its side effects and infrastructure.

To do that fast and deterministically, switch to faster lighter weight versions of your production dependencies when running tests.

For example use SQLite or PgLite for databases, use in memory file system, use Mailpit for emailing etc.

Simulate your infrastructure dependencies, inject faults into them, test the real deal, SQL, file system semantics, protocols.

Don't leave the hard part of your testing to Q/A and users.

ajmurmann

IMO you should have a healthy test pyramid which contains both types of tests. At the very top should be a smallish set of tests that are as real as possible and it's ok if they are a little slower. At the bottom you have more fine-grained tests that focus on one unit (classes != units) and tests it quickly and in isolation. Ideally we get this isolation by designing good interfaces (not necessarily in the Java sense) and can just not hook it up to real collaborators (maybe we pass callbacks to the email API or have boring glue code that just passes the return value on to the API) but sometimes a test double is ok. If we use a test double for network interactions to third parties, it's another great idea to replay recorded responses with something like Ruby's VCR which at least allows us to exercise the client code in medium-priced tests.

This allows you to get very quick feedback while in the weeds but also gives us comprehensive test coverage.

imiric

I agree. Arguments against the classical test pyramid are common, but IME they often come from a place of laziness. It essentially boils down to "unit tests are a chore to maintain, hence ...<hand waving>... integration tests are more valuable". Laziness _is_ one of the three virtues of a good programmer, but these arguments completely miss the mark. I would actually go a step further and argue that their popularity is doing tangible harm to software quality.

For a program to be robust it needs to be tested at all layers, and the testing pyramid is still a good rule of thumb to follow. Designing the program with testing in mind is crucial, and as you say, interfaces and their equivalent in any language are the right approach to achieve this. There was a period of time when monkeypatching was popular in dynamically typed languages, and perhaps still is (I've been out of the loop for a few years), but we always knew that it was a dirty trick, and that the software should be architected better to make it testable.

If anyone is curious what this might look like, I wrote a small project in Go[1] last year that is built from the ground up with testing in mind. There's a global `App` struct that is initialized in `main() `, and serves as the main container for all I/O the program does. This means that internally no component ever makes direct OS calls, but does it via interfaces, which can be easily mocked in tests. This makes unit and integration testing a breeze, and allows other niceties like passing down a global logger. I've adopted this pattern in all Go apps I write from scratch, and advocated for it on existing projects I've worked on, and it's worked great so far.

[1]: https://github.com/hackfixme/disco

bluGill

Interfaces are only useful if there is reason they won't change otherwise they get in the way. Reason is often policical - this is where another group talks to my code and so the less we have to know about the other the better (not that the code is hidden just that you ignore it) reason could also be it is used everywhere so any change would be months of effort to make - longer if anyone else needs to impment anything.

if change is easy then better to make the change - you never really know all future requiresments so getting it right is impossible.

unless getting it right is possible. if you are implementing a new language you should know enough to get the interface to your strings right on the first try. That is not the normal case. (Strings will soon become impossible to change)

maxwellg

I very much agree- but swapping out the database is a scary operation if you use the database for any sort of nontrivial logic. You need to make sure that your unique indicies, partial indicies, constraints, views, triggers, etc. work just as well. Another option is to start each test in an isolated transaction and roll back at the end instead of committing, but that comes with its own set of headaches too.

jbjbjbjb

I’m struggling to see how nullables are different from mocks it just does the mocking in the production code. In the example I would just replace GitWriter with a mock of GitWriter and it achieves the same thing, no?

rco8786

Yea that's exactly what I am seeing also, but I'm open to the idea that I am missing something.

bluGill

Write() can be called byte by byte, with the full buffer to write, or anything in between. in test what mattere is the write thing is written not how you many times you call write.

hyperbolablabla

This article was way longer than it needed to be, and I found the overuse of jargon particularly distracting from the meat of the message.

john-tells-all

the author also wrote:

A Light Introduction to Nullables

https://www.jamesshore.com/v2/projects/nullables/a-light-int...

rat87

Personally I like record/replay style testing where you save real network(or even hardware communication) info then commit it and replay it to get quick unit tests. That plus local file based SQL tests

cratermoon

In 1997 Cem Kaner wrote a paper titled "Improving the Maintainability of Automated Test Suites"[1]. One of the key pain points Kaner identified was the fragility of capture/replay tests. There's a reason why the test automation tools popular in the 90s fell out of favor: any time the system under test changes, the capture/replay tests will be out of sync. In addition, capture/replay tests aren't very good at finding defects.

1 https://kaner.com/pdfs/autosqa.pdf

globular-toast

Sure but the problem is they are slow and the combinatorial explosion of possibilities at the edge of your application will kill you. You can't enumerate every possible code path in an application of reasonable size. This is why a balanced approach is usually employed: a small number of representative happy/unhappy path integration tests to test the high level user feedback etc, with lower-level tests on the underlying layers and libraries etc. If you could just do it with high level tests, we'd all do it that way.

meowtimemania

How do you handle when fields are added or removed? Do you have to rerecord all the affected tests?

vkou

Yes.

When you change behavior you should expect to re-record.

3np

That's integration testing. Also valuable but complementary to actual unit tests.

globular-toast

The "unit" in unit test refers to the test, not the code under test. High level tests that are isolated from other tests using mechanisms like fixtures, setup/teardown etc are still unit tests.

knallfrosch

"Dependency injection (DI) frameworks work around the problem, but we don’t want to require such magic. Ensure all classes have a constructor or factory that doesn’t take any parameters"

Yeah, sure. Apart from the fact that most frameworks require constructor-based injection and make it very easy to use in tests.

"Write one or two end-to-end tests that make sure your code starts up and runs a common workflow. "

Getting the e2e tests to run is the hard part. Spinning up the external services is the slow part. Once you make everything work, you throw away all the invested time to run a single e2e test? Modern frameworks, like Playwright, allow you to simply record (!) your mouse clicks and write tests from that. You can easily 'write' 200 e2e tests like this and you don't even need a dev to do it.

The article's structure is commendable, but its ideas feel like taken straight from the 2000s.

earnestinger

If I’m reading it right, proposal is to hand-roll the mocks/test-doubles and store them in production code. (In addition to how to structure the app)

While article is interesting and idea workable, the title seems a bit misleading.

(Including the “pattern language” part. I bet 90%of readers were expecting to see some kind of DSL)

loevborg

Yes, I think it's fair to say that nullables are a form of test doubles. But the approach is meaningfully different from other approaches as it avoids the problems that you get from using mocks in the narrow sense of the word: you're only testing against the behavior of the mocks (not the real code) and you need to keep updating the mocks as the system changes, which adds maintenance cost.

knallfrosch

I don't really see how production-code mocks differ from library-based mocks. If you add but a single constructor parameter to your class, you already need to change your production-mock/stub too.

earnestinger

the benefits I see:

- nullable (test double/mock) updates would be more natural, easier, since it is located right there in prod code. (One level of indirection removed)

- with authoritative location of test double, it reduces the risk of test double proliferation/duplication

(I doubt I will use it in the proposed form though, saving test double code in prod is not for me probably)

cluckindan

> LoginClient.createNull().getUserInfo(...) should return a default response without actually talking to the third-party login service.

This seems like a security nightmare. In defense terms: here’s the side road to the castle with gates open, guarded by a single variable.

jdlshore

How are you imagining that variable gets flipped? If an attacker has the ability to run arbitrary code, you’re already screwed.

To put it another way, somewhere in your code you do authorization: `if (isAuthorized) enterTheCastle()`. The gate to the castle, guarded by a single variable. In both cases, you have to be careful to program it correctly.

In the case of Nullables, it’s easy to program it correctly. Multiple things would have to go wrong for the nulled version to be accidentally used in production. Also, unless you go out of your way to make a security vulnerability, Nullables’ default behavior is to do nothing. They default closed rather than open.

knallfrosch

> How are you imagining that variable gets flipped?

Not by an external actor, but by an internal bug.

vkou

This is very interesting for greenfield work, but would be a pig to implement in a large brownfield application - because it requires you to design your production code to include all of these hooks, and to manually implement stubs and wrapper code for every third-party dependency.

---

The 'performance' gain promised by this is one I can't take seriously. Computers are fast, and test execution is parallelizable. I really don't care if spinning up a mock in a test 0.0005 ms, or 0.5ms. Any mental effort I have to expend to go from one to the other is absolutely not worth it.

Especially when 99.99% of my test runtime is spent on the kind of E2E tests that this would not help with (Because I actually want to verify production behavior - not the behavior of my fake implementations).

rco8786

I'm genuinely confused about this. The very first example uses a Mock. It's just a manually defined mock that the author calls a "stub". There is no discernable difference between this and just using a mock, that I can see.

cratermoon

No,"mock" here refers to some library or toolkit external to the system under test. Those mock libraries tend to require quite a bit of convoluted setup to create, involving defining both the behavior and the expectations. After the test is run, the mock runs its checks to flag any unmet expectations. The stub defined here does not include any complex setup and post-test checking. Its behavior is entirely a "no-op" version of the production dependency. In addition, because mocking tools are general purpose, they have complex logic for handling a broad range of inputs and verification tests. This technique instead uses simpler, bespoke code and leaves the Assert part of the three As to the test code.

rco8786

I guess we have used different mocking libraries. The ones I'm most familiar with start with very simple use cases ("here is an object that looks like this Class, and when you call function X on it, return Y") and don't appear to have any noticeable amount of performance degradation.

One could just as easily have each class/file export a mocked object rather than a real one that happens to do nothing.

That angle of it though, I can see being beneficial. Where each class has the responsibility of defining how it should be used in a test case, vs each individual test case having to define the mocking behavior for any class it touches.

cratermoon

https://martinfowler.com/articles/mocksArentStubs.html#TheDi...

bluGill

Mocks check that you call a function with the correct arguements. Stubs return a canned value but don't tell you that if they did. spys watch something happen. Fakes model the real thing with enough complexity that you write tests for them.

HN

Testing Without Mocks: A Pattern Language (2023)

Testing Without Mocks: A Pattern Language (2023)