The mock discussion still misses the real solution, which is to refactor the code so that you have a function that simple reads the file and returns json that is essentially a wrapper around open and doesn't need to be tested.
Then have your main function take in that json as a parameter (or class wrapping that json).
Then your code becomes the ideal code. Stateless and with no interaction with the outside world. Then it's trivial to test just like and other function that is simple inputs translated outputs (ie pure).
Every time you see the need for a mock, you're first thought should be "how can I take the 90% or 95% of this function that is pure and pull it out, and separate the impure portion (side effects and/or stateful) that now has almost no logic or complexity left in it and push it to the boundary of my codebase?"
Then the complex pure part you test the heck out of, and the stateful/side effectful impure part becomes barely a wrapper over system APIs.
Funnily enough, I am preparing a simple presentation at work to speak about exactly that. The idea of separating "logic" from I/O and side effects is an old one and can be found in many architectures (like hexagonal architecture). There is plenty of benefit doing this, but testing is a big one.
It should be obvious, but this is not something that seem to be thought in school or in most workplaces, and when it is, it's often through the lens of functional programming, which most just treat as a curiosity and not a practical thing to use at work. So I started to teach this simple design principle to all my junior dev because this is something that is actually quite easy to implement, does not need a complete shift of architecture/big refactor when working on existing code, and is actually practical and useful.
> Then the complex pure part you test the heck out of, and the stateful/side effectful impure part becomes barely a wrapper over system APIs.
In practice the issues I see with this are that the "side effect" part is usually either: extensive enough to still justify mocking around testing it, and also intertwined enough with your logic to be hard to remove all the "pure" logic. I rarely see 90-95% of functions being pure logic vs side effects.
E.g. for the first, you could have an action that requires several sequenced side effects and then your "wrapper over APIs" still needs validation of calling the right APIs in the right oder with the right params, for various scenarios. Enter mocks or fakes. (And sometimes people will get clever and say use pubsub or events for this, but... you're usually just making the full-system-level testing there harder, as well as introducing less determinism around your consistency.)
For the second, something like "do steps I and J. If the API you call in step J fails, unwind the change in I." Now you've got some logic back in there. And it's not uncommon for the branching to get more complex. Were you building everything in the system from first principles, you could try to architect something where I and J can be combined or consolidated in a way to work around this; when I and J are third party dependencies, that gets harder.
You're right that it's not always easy to separate pure from effectful code. But you definitely can (model it as a state machine) and I think it's worth it, especially if those steps can take a long time, can fail, might be executed in parallel, etc.
For instance, I once worked on payment-related code at a large online retailer. The steps I and J from your example would have been calls to the payment gateway's API (payment initiation, actual payment request). There was also a step K (polling for payment confirmation) and even a step K' (a payment confirmation callback the gateway might or might not call before or after we get around polling for the payment status ourselves). And often there was even user interaction in between (the 3DS/3DS2 credit card payment scheme that's common here in the EU). Every single one of those steps could fail for a myriad of reasons, e.g. time out, be rejected, … and we had to make sure we always failed gracefully and, most importantly, didn't mess up our payment or order records.
Of course this was an old enterprise Java code base, created by people who had long left the company, and all this had been written just the way you imagine it. It was an absolute mess.
Every single time I worked on this code base I secretly wished one of the original authors had heard of state machines, pure vs. effectful code, and unit tests.
I've had great success swapping in a in-memory database via dependency injection and just running 100% of the application, end to end.
in the ideal case my tests start by writing some randomised data using the external API, I then update it(if applicable) using the external API, and finally read it, also using the external API, and compare the actual result with what I expected.
I use randomised data to avoid collisions with other tests, which might cause flakiness and/or prevent running the tests concurrently. I avoid having seed data in the database if at all possible.
It's the only approach I've found that can survive a major refactor of the codebase. Anything short of breaking the external API, which is typically a no-no anyway, shouldn't break these tests.
Doing a refactor and being able to rely on the test suite for finding bugs and inconsistencies is amazing. Of course they won't find 100% of all bugs,but this way at least you know that a failing test means there's a problem in your production code.
Thou shalt never mock the database, for thou shalt anger the database when thou moketh it.
In all seriousness, I have found this to be a useful suggestion, because the purpose of a test is to make sure invariants don't break in real code. When you mock the database, you're excluding large amounts of real code from test.
The risk to that approach is that you end up writing code that cannot deal with the real world problems of I/O, such as timeouts, failed reads, jitter, and other weird behaviour.
Separating I/O from logic makes a lot of sense and makes tests much easier to write and code much easier to reason about, but you'll still need to implement some sort of mocking interface if you want to catch I/O problems.
> how can I take the 90% or 95% of this function that is pure and pull it out, and separate the impure portion (side effects and/or stateful) that now has almost no logic or complexity left in it
They addressed this concern already. These are not contradicting approaches.
I think the disconnect is that the impure portion often has substantial amounts of inherent complexity. If you're working at a high level that's handled by a lower level (OS, framework, etc.) but somebody is writing that framework or OS or HAL. And they need to test too, and ultimately someone is going to have to make a test double of the hardware interface unless all testing is going to require hardware-in-the-loop.
I think the important thing is that the code look pure from a testing perspective.
Say you've got a function that accesses a key-value store. Ideally, you can factor out the i/o so that you do all your reads up front and all your writes at the end, leaving a pure function in the middle. But if the code is too tangled up in its side effects for that, the next best thing is to create a fake KV store and then wrap the function like this:
I agree with you, however convincing an entire team of devs to explicitly separate the interface of impure parts of code is very difficult.
If you introduce a mocking library to the test portion of the codebase, most developers will start to use it as a way to shortcut any refactoring they don't want to do. I think articles like this that try to explain how to better use mocks in tests are useful, although I wish they weren't necessary.
Dependency injection frameworks can be a pain, but basic dependency injection (e.g. pass a DB handle to anything that needs the DB) is a must. What's the alternative? Having everyone independently create their own DB connections?
It's my preference to a point. I think you can over-do it. I've worked in systems that didn't use it and worked just fine. I would bet most folks err on the side of too little, unless they have some kind of framework doing the heavy lifting.
I always liked the phrase 'Hoist your I/O' [1] but yes, you can only hoist it up so many times until its outside of your application completely (making it completely pure, and now someone else's responsibility).
You still need to write a test for how it all comes together and you should write tests for your error handling. You need a mock to respond with an error.
This blog post talks as if mocking the `open` function is a good thing that people should be told how to do. If you are mocking anything in the standard library your code is probably structured poorly.
In the example the author walks through, a cleaner way would be to have the second function take the Options as a parameter and decouple those two functions. You can then test both in isolation.
> If you are mocking anything in the standard library your code is probably structured poorly.
I like Hynek Schlawak's 'Don’t Mock What You Don’t Own' [1] phrasing, and while I'm not a fan of adding too many layers of abstraction to an application that hasn't proved that it needs them, the one structure I find consistently useful is to add a very thin layer over parts that do I/O, converting to/from types that you own to whatever is needed for the actual thing.
These layers should be boring and narrow (for example, never mock past validation you depend upon), doing as little conversion as possible. You can also rephrase the general purpose open()-type usage into application/purpose-specific usages of that.
Then you can either unittest.mock.patch these or provide alternate stub implementations for tests in a different way, with this this approach also translating easily to other languages that don't have the (double-edged sword) flexibility of Python's own unittest.mock.
> This blog post talks as if mocking the `open` function is a good thing that people should be told how to do. If you are mocking anything in the standard library your code is probably structured poorly.
Valgrind is a mock of standard library/OS functions and I think its existence is a good thing. Simulating OOM is also only possible by mocking stuff like open.
> Valgrind is a mock of standard library/OS functions and I think its existence is a good thing.
That is mostly wrong.
Valgrind wraps syscalls. For the most part it just checks the arguments and records any reads or writes to memory. For a small number of syscalls it replaces the syscall rather than wrapping it (for instance calls like getcontext where it needs to get the context from the VEX synthetic CPU rather than the real CPU).
Depending on the tool it can also wrap or replace libc and libpthread functions. memcheck will replace all allocation functions. DRD and Helgrind wrap all pthread functions.
$ cat test.c
void main (void) {
malloc (1000);
}
$ make test
cc test.c -o test
$ valgrind --leak-check=full --show-leak-kinds=all -s ./test
Memcheck, a memory error detector
Command: ./test
HEAP SUMMARY:
in use at exit: 1,000 bytes in 1 blocks
total heap usage: 1 allocs, 0 frees, 1,000 bytes allocated
1,000 bytes in 1 blocks are still reachable in loss record 1 of 1
at 0x483877F: malloc (vg_replace_malloc.c:307)
by 0x109142: main (in test.c:2)
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 1,000 bytes in 1 blocks
suppressed: 0 bytes in 0 blocks
> vg_replace_malloc.c:307
What do you think that is? Valgrind tracks allocations by providing other implementations for malloc/free/... .
Are you trying to explain to me how Valgrind works? If you do know more than me then please join us and become a Valgrind developer.
Mostly it wraps system calls and library calls. Wrapping means that it does some checking or recording before and maybe after the call. Very occasionally it needs to modify the arguments to the call. The rest of the time it passes the arguments on to the kernel or libc/libpthread/C++ lib.
There are also functions and syscalls that it needs to replace. That needs to be a fully functional replacement, not just looking the same as in mocking.
I don’t have any exact figures. The number of syscalls varies quite a lot by platform and on most platforms there are many obsolete syscalls that are not implemented. At a rough guess, I’d say there are something like 300 syscalls and 100 lib calls that are handled of which 3/4 are wrapped and 1/4 are replaced.
> Are you trying to explain to me how Valgrind works?
Sorry that wasn't my intention. You are a Valgrind developer? Thanks, it's a good project.
It seems like I have a different understanding of mocking than other people in the thread and it shows. My understanding was, that Valgrind provides function replacements via dynamic linking, that then call into the real libc. I would call that mocking, but YMMV.
I think when testing code with an open call, it is a good idea to test what happens on different return values of open. If that is not what you intent to test for this method, then that method shouldn't contain open at all, as already pointed out by other comments.
If the code's running in a space shuttle, you probably want to test that path.
If it's bootstrapping a replicated service, it's likely desirable to crash early if a config file couldn't be opened.
If it's plausible that the file in question is missing, you can absolutely test that code path, without mocking open.
If you want to explicitly handle different reasons for why opening a file failed differently, by all means, stress all of that in your tests. But if all you have is a happy path and an unhappy path, where your code doesn't care why opening a file failed, all you need to test is the case where the file is present, and one where it is not.
Modifying the file system to be would be kind of like mocking to me. I very much, don't want my daemons or user-facing applications to just crash, when a file is missing. That's kind-of the worst thing you can do.
> Modifying the file system to be would be kind of like mocking to me.
Modifying the file system's implementation would be. Including a valid_testdata.txt and an invalid_testdata.txt file in your test's directory, however, is not 'modifying the file system', any more than declaring a test input variable is 'mocking memory access'.
> don't want my daemons or user-facing applications to just crash, when a file is missing
If the file is important, it's the best kind of thing you can do when implementing a non-user-facing service. The last thing you want to do is to silently and incorrectly serve traffic because you are missing configuration.
You want to crash quickly and let whatever monitoring system you have in place escalate the problem in an application-agnostic manner.
Details matters, but good test doubles here are important. You want to capture all calls to IO and do something different. You don't want tests to break because someone has a different filesystem, didn't set their home directory as you want it setup, or worse is trying to run two different tests at the same time and the other test is changing files the other wants.
Note that I said test doubles. Mocks are a bit over specific - they are about verifying functions are called at the right time with the right arguments, but the easy ability to set return values makes it easy to abuse them for other things (this abuse is good, but it is still abuse of the intent).
In this case you want a fake: a smart service that when you are in a test setups a temporary directory tree that contains all the files you need in the state that particular test needs, and destroys that when the test is done (with an optional mode to keep it - useful if a test fails to see debug). Depending on your situation you may need something for network services, time, or other such things. Note that in most cases a filesystem itself is more than fast enough to use in tests, but you need isolation from other tests. There are a number of ways to create this fake, it could override open, or it could just be a GetMyProgramDir function that you override are two that I can think of.
Even in the case you mention you really shouldn't be overriding these methods. Your load settings method should take the path of the settings file as an argument, and then your test can set up all the fake files you want with something like python's tempfile package
There are a number of different ways to solve this problem. I too use the path of settings in my code, but I'm not against overriding open and all the other file io functions. Of course this article is about python which has different abilities than other languages, what is best in python is not what is best in other languages, and I'm trying to stay at a higher level that a particular language.
> This blog post talks as if mocking the `open` function is a good thing that people should be told how to do.
It does. And this is exactly the problem, here!
> TFA: The thing we want to avoid is opening a real file
No! No, no, no! You do not 'want to avoid opening a real file' in a test.
It's completely fine to open a real file in a test! If your code depends on reading input files, then your test should include real input files in it! There's no reason to mock any of this. All of this stuff is easy to set up in any unit test library worth it's salt.
> then your test should include real input files in it! There's no reason to mock any of this.
That's okay for testing some branches of your code. But not all. I don't want to have to actually crash my hard drive to test that I am properly handling hard drive crashes. Mocking[1] is the easiest way to do that.
[1] For some definition of mock. There is absolutely no agreement found in this space as to what the terms used mean.
> In Why your mock doesn’t work I explained this rule of mocking:
> Mock where the object is used, not where it’s defined.
For anyone looking for generic advice, this is a quirk of python due to how imports work in that language (details in the linked post) and shouldn't be considered universal.
Honestly I don't buy it.
Worse, this is one of the reason I prefer to do "minimal integration tests" instead of unit tests.
Take the example snippet of code
>>> The thing we want to avoid is opening a real file
and then the article goes and goes around patching stdlib stuff etc.
But instead I would suggest the real way to test it is to actually create the damn file, fill it with the "normal" (fixed) content and then run the damn test.
This is because after years of battling against mocks of various sort I find that creating the "real" resource is actually less finicky than monkeypatching stuff around.
Apart from that; yeah, sure the code should be refactored and the paths / resources moved out of the "pure logical" steps, but 1) this is an example and 2) this is the reality of most of the actual code, just 10x more complex and 100x more costly to refactor.
That works fine for files, but what if the integration is with a third party service for example?
You can create an actual mock networked service but it's much more work.
I think this is an example explaining what seems like a good practice for using mocks in python to me, the actual code in the post is barely "supporting cast".
If it's HTTP you can create the fixtures and serve them with a mock server. I'm a frontend dev, so backend APIs are like 3rd parties to me.
I use a browser extension for scraping actual backend responses, which downloads them with a filename convention the mock server understands. I mostly use it for development, but also for setting up screenshot tests. For example,
If you make the function pure it will be easier to test. Pass the moving parts as function parameters, then you can pass in the mocks in the actual functions when testing. Example:
If you’re doing TDD, you could just view this as moving the “open” call to your unit test. As others point out, that encourages pure functions that can pipe in input from other sources than just file paths.
Great article. In addition, updating your mocking code can often be time-consuming. To try to make this easier, I built mock[1], which streamlines the process of setting up mock services for testing.
Arguably this is a problem in when the patch is unapplied.
Presumably in the coverage case it’s being called by a trace function, which inevitably runs during test execution — and while we want the trace function to be called during the test function, we really want it without any patches the test function is using. But this arguably requires both an ability for the trace function to opt-out of patches and for the patcher to provide a way to temporarily disable all of them.
I feel like the #1 reason mocks break looks nothing like this and instead looks like: you change the internal behaviors of a function/method and now the mocks interact differently with the underlying code, forcing you to change the mocks. Which highlights how awful mocking as a concept is; it is of truly limited usefulness for anything but the most brittle of tests.
Don't test the wrong things; if you care about some precondition, that should be an input. If you need to measure a side effect, that should be an output. Don't tweak global state to do your testing.
Most of the real world is about manipulating the real world. For algorithms it is fine to say depend on the pure inputs/outputs. However what we care about is that global state is manipulated correctly and so the integration tests that verify that are what are important. In most cases your algorithm shouldn't be unit tested separately since it is only used in one place and changes when the users change: there is no point in extra tests. If the algorithm is used in many places comprehensive unit tests are important, but they get in the way when the algorithm is used only once and so the tests just inhibit changes to the algorithm as requirements change (you have to change the user, the integration tests, and the unit tests that are redundant).
As such I disagree. Global state is what you should be testing - but you need to be smart about it. How you setup and verify global state matters. Don't confuse global state above with global state of variables, I mean the external state of the program before and after, which means network, file, time, and other IO things.
IO and global state is also just inputs that can be part of arrange-act-assert. Instead of mocking your database call to always return "foo" when the word "SELECT" is in the query, insert a real "foo" in a real test database and perform a real query.
Again I've heard "but what if my database/table changes so rapidly that I need the mock so I don't need to change the query all the time", in which case you ought to take a moment to write down what you're trying to accomplish, rather than using mocks to pave over poor architectural decisions. Eventually, the query fails and the mock succeeds, because they were completely unrelated.
So far I've only seen mocks fail eventually and mysteriously. With setups and DI you can treat things mostly as a black box from a testing point of view, but when mocks are involved you need surgical precision to hit the right target at the right time.
In my experience global state is the testing bug farm. Tests that depend on global state are usually taking dependencies they aren’t even aware of. Test initializations grow into complex “poke this, prod that, set this value to some magic number” setups that attempt to tame the global state but as global state grows, this becomes more and more difficult and inconsistent. Inter-test dependencies sneak in, parallelism becomes impossible, engineers start turning off “flaky” tests because they’ve spent hours or days trying to reproduce failures only to eventually give up.
This sort of development is attractive when starting up a project because it’s straightforward and the testing “bang for the buck” is high. But it degrades quickly as the system becomes complex.
> Instead of mocking your database call to always return "foo" when the word "SELECT" is in the query, insert a real "foo" in a real test database and perform a real query.
Consider not sprinkling “select” statements throughout your code instead. This tight coupling makes good testing much more difficult (requiring the “set all the global state” model of testing) but is also just generally not good code structure. The use of SQL is an implementation detail that most of your code shouldn’t need to know about.
A thin layer around the DB interactions gives you a smaller set of code that needs to be tested with state, gives you a scoped surface area for any necessary mocking, makes it much easier of you need to change storage systems, and also gives you a place that you can reason over all the possible DB interactions. This is just good separation of concerns.
You didn't use global state in the same sense that I did. I mean global as in state of the entire world, not just the computer you are on.
I tamed our inter-set dependencies by doing things like starting dbus on a different port for each tests - now I can test with real dbus in the loop and my tests are fast and isolated. We have a strict rule of what directories we are allowed to write to (embedded system - the others are read only in production), so it is easy to point those to a temp dir. It was some work to set that up, but it tames most of your issues with global state and allows me to verify what really counts: the system works.
For a CRUD web app your database separation of concerns makes sense. However in my domain we have lots of little data stores and nobody else needs access to that store. As such we put it on each team to develop the separation that makes sense for them - I don't agree with all their decisions, but they get to deal with it.
Sure. I wasn’t responding to your statements but I can understand what you mean. If your code manipulates the physical world (and not in a “sets this bit on a disk” sort of way) then you have to validate that action. Whether that valuation should be decoupled from the business logic that drives that piece really depends on the complexity of the system. At some point you are going to do full integration valuation so it’s a question of balance. How much is tested that way vs more isolated.
Tests that work and verify the system works are the Pri0 requirement. Most of the conversations about how best to test are structured for the benefit of people who are struggling with meeting the Pri0 because of maintainability. With enough effort any strategy can work.
> However in my domain we have lots of little data stores and nobody else needs access to that store.
If the little data stores are isolated to small individual areas of code then you probably already have the necessary isolation. Introducing the lightweight data store isolation layer might be useless (or not, context dependent). Now if these individual areas are doing things like handing off result sets to other code then I would have something different to say.
I don't understand the "but what if my database/table changes so rapidly that I need the mock so I don't need to change the query all the time". Are they changing your database schema at runtime (not just software update time when you can do migrations)? Otherwise you should have a test helper that creates your entire database schema so that when it changes you can update the schema in one place and all tests are re-run against the new one. (you should also have some insert common test data helpers that are easy to update as needed should the new schema affect the changes)
I haven't seen mocks fail mysteriously. I've seen them fail often though because requirements change and instead of updating the callers (generally a small number) you end up with 200 tests failing and give up because updating all the tests is too hard. Mocks are always about implementation details - sometimes you have no choice, but the more you can test actual behavior the better.
> you change the internal behaviors of a function/method and now the mocks interact differently with the underlying code, forcing you to change the mocks
Rarely should a mock be “interacting with the underlying code”, because it should be a dead end that returns canned data and makes no other calls.
If your mock is calling back into other code you’ve probably not got a mock but some other kind of “test double”. Maybe a “fake” in Martin Fowler’s terminology.
If you have test doubles that are involved in a bunch of calls back and forth between different pieces of code then there’s a good chance you have poorly factored code and your doubles are complex because of that.
Now, I won’t pretend changes don’t regularly break test doubles, but for mocks it’s usually method changes or additions and the fix is mechanical (though annoying). If your mocks are duplicating a bunch of logic, though, then something else is going on.
To fully show my hand: What I do not like are mocks which have you outline a series of behaviors/method calls which are expected to be called, typically with specific input parameters, often with methods in order. In python, this looks like setting up Mocks and then using `assert_called` methods. In Go, this means using Gomock-generated structs which have an EXPECT.
A test should not fail when the outputs do not change. In pursuit of this ideal I often end up with fakes (to use Martin Fowler's terms) of varying levels of complexity, but not "mocks" as many folks refer to them.
Yeah I think true mocks often lead to fragile, hard-to-read tests. A text which expects a bunch of specific interactions with a mock is white-box, whereas a test which tests the state of a fake after the operation is complete is mostly black-box. Then, if your dependency gets updated, only the fake breaks, rather than all of your test expectations.
A particularly complex fake can even be unit-tested, if need be. Of course, if you're writing huge fakes, there's probably something wrong with your architecture, but I feel like good testing practices should give you options even when you're working with poorly architected code.
So this I agree with. Mocks are often used to over-constrain and end up validating that the implementation does not change rather than does not regress.
There are some specific cases, such as validating that caching is working as expected, where it can make sense to fully validate every call. Most of the time, though, this is a pointless exercise that serves mostly to make it difficult to maintain tests.
It can sometimes also be useful as part of writing new code, because it can help validate that your mental model for the code is correct. But it’s a nightmare for maintenance and committing over-constrained tests just creates future burden.
In Fowler terminology I think I tend to use Stubs rather than Mocks for most cases.
I see a lot of times people (read: me) are lazy and make a mock that does not have anywhere near the behavior of the original. It's more like a very partial stub. I will mock an api with 20 possible calls with the 2 that I use. Unsurprisingly, this mock is closely tied to the current implementation.
I think making narrow use-case-specific test doubles is often better than a big shared one. You can even wrap the API with an interface that only exposes the methods you need.
Why even mock anything in this example? You need to read the source code to work out what to mock, reaching deep inside the code to name some method to mock.
But what if you just passed in the contents of the file or something?
Edit: oh wait actually this is what the very last line in the blog post says! But I think it should be emphasized more!
The main reason why your mock breaks later is because you refactored the code. You did the one thing tests are supposed to help you do, and the tests broke. If code was never modified, you wouldn't need automated tests. You'd just test it manually one time and never touch it again. The whole point of tests is you probably will rewrite internals later as you add new features, improve performance or just figure out better ways to write things. Mock-heavy tests are completely pointless in this respect. You end up rewriting the code and the test every time you touch it.
There are really only a few reasons to use mocks at all. Like avoiding network services, nondeterminism, or performance reasons. If you need to do a lot of mocking in your tests this is a red flag and a sign that you could write your code differently. In this case you could just make the config file location an optional argument and set up one in a temp location in the tests. No mocking required and you're testing the real API of the config file module.
The mock discussion still misses the real solution, which is to refactor the code so that you have a function that simple reads the file and returns json that is essentially a wrapper around open and doesn't need to be tested.
Then have your main function take in that json as a parameter (or class wrapping that json).
Then your code becomes the ideal code. Stateless and with no interaction with the outside world. Then it's trivial to test just like and other function that is simple inputs translated outputs (ie pure).
Every time you see the need for a mock, you're first thought should be "how can I take the 90% or 95% of this function that is pure and pull it out, and separate the impure portion (side effects and/or stateful) that now has almost no logic or complexity left in it and push it to the boundary of my codebase?"
Then the complex pure part you test the heck out of, and the stateful/side effectful impure part becomes barely a wrapper over system APIs.
Funnily enough, I am preparing a simple presentation at work to speak about exactly that. The idea of separating "logic" from I/O and side effects is an old one and can be found in many architectures (like hexagonal architecture). There is plenty of benefit doing this, but testing is a big one.
It should be obvious, but this is not something that seem to be thought in school or in most workplaces, and when it is, it's often through the lens of functional programming, which most just treat as a curiosity and not a practical thing to use at work. So I started to teach this simple design principle to all my junior dev because this is something that is actually quite easy to implement, does not need a complete shift of architecture/big refactor when working on existing code, and is actually practical and useful.
> Then the complex pure part you test the heck out of, and the stateful/side effectful impure part becomes barely a wrapper over system APIs.
In practice the issues I see with this are that the "side effect" part is usually either: extensive enough to still justify mocking around testing it, and also intertwined enough with your logic to be hard to remove all the "pure" logic. I rarely see 90-95% of functions being pure logic vs side effects.
E.g. for the first, you could have an action that requires several sequenced side effects and then your "wrapper over APIs" still needs validation of calling the right APIs in the right oder with the right params, for various scenarios. Enter mocks or fakes. (And sometimes people will get clever and say use pubsub or events for this, but... you're usually just making the full-system-level testing there harder, as well as introducing less determinism around your consistency.)
For the second, something like "do steps I and J. If the API you call in step J fails, unwind the change in I." Now you've got some logic back in there. And it's not uncommon for the branching to get more complex. Were you building everything in the system from first principles, you could try to architect something where I and J can be combined or consolidated in a way to work around this; when I and J are third party dependencies, that gets harder.
You're right that it's not always easy to separate pure from effectful code. But you definitely can (model it as a state machine) and I think it's worth it, especially if those steps can take a long time, can fail, might be executed in parallel, etc.
For instance, I once worked on payment-related code at a large online retailer. The steps I and J from your example would have been calls to the payment gateway's API (payment initiation, actual payment request). There was also a step K (polling for payment confirmation) and even a step K' (a payment confirmation callback the gateway might or might not call before or after we get around polling for the payment status ourselves). And often there was even user interaction in between (the 3DS/3DS2 credit card payment scheme that's common here in the EU). Every single one of those steps could fail for a myriad of reasons, e.g. time out, be rejected, … and we had to make sure we always failed gracefully and, most importantly, didn't mess up our payment or order records.
Of course this was an old enterprise Java code base, created by people who had long left the company, and all this had been written just the way you imagine it. It was an absolute mess.
Every single time I worked on this code base I secretly wished one of the original authors had heard of state machines, pure vs. effectful code, and unit tests.
I've had great success swapping in a in-memory database via dependency injection and just running 100% of the application, end to end.
in the ideal case my tests start by writing some randomised data using the external API, I then update it(if applicable) using the external API, and finally read it, also using the external API, and compare the actual result with what I expected.
I use randomised data to avoid collisions with other tests, which might cause flakiness and/or prevent running the tests concurrently. I avoid having seed data in the database if at all possible.
It's the only approach I've found that can survive a major refactor of the codebase. Anything short of breaking the external API, which is typically a no-no anyway, shouldn't break these tests.
Doing a refactor and being able to rely on the test suite for finding bugs and inconsistencies is amazing. Of course they won't find 100% of all bugs,but this way at least you know that a failing test means there's a problem in your production code.
Thou shalt never mock the database, for thou shalt anger the database when thou moketh it.
In all seriousness, I have found this to be a useful suggestion, because the purpose of a test is to make sure invariants don't break in real code. When you mock the database, you're excluding large amounts of real code from test.
The risk to that approach is that you end up writing code that cannot deal with the real world problems of I/O, such as timeouts, failed reads, jitter, and other weird behaviour.
Separating I/O from logic makes a lot of sense and makes tests much easier to write and code much easier to reason about, but you'll still need to implement some sort of mocking interface if you want to catch I/O problems.
> how can I take the 90% or 95% of this function that is pure and pull it out, and separate the impure portion (side effects and/or stateful) that now has almost no logic or complexity left in it
They addressed this concern already. These are not contradicting approaches.
I think the disconnect is that the impure portion often has substantial amounts of inherent complexity. If you're working at a high level that's handled by a lower level (OS, framework, etc.) but somebody is writing that framework or OS or HAL. And they need to test too, and ultimately someone is going to have to make a test double of the hardware interface unless all testing is going to require hardware-in-the-loop.
I think the important thing is that the code look pure from a testing perspective.
Say you've got a function that accesses a key-value store. Ideally, you can factor out the i/o so that you do all your reads up front and all your writes at the end, leaving a pure function in the middle. But if the code is too tangled up in its side effects for that, the next best thing is to create a fake KV store and then wrap the function like this:
doThing isn't a pure function, but doTest is. Now you can write tests like this: I guess you could call that "imperative core, functional shell," lol.I agree with you, however convincing an entire team of devs to explicitly separate the interface of impure parts of code is very difficult.
If you introduce a mocking library to the test portion of the codebase, most developers will start to use it as a way to shortcut any refactoring they don't want to do. I think articles like this that try to explain how to better use mocks in tests are useful, although I wish they weren't necessary.
This sounds great until one of your other functions calls that function.
You're just describing dependency injection, but if you say that, people won't want to listen cause doing that all the time sucks.
Dependency injection frameworks can be a pain, but basic dependency injection (e.g. pass a DB handle to anything that needs the DB) is a must. What's the alternative? Having everyone independently create their own DB connections?
It's my preference to a point. I think you can over-do it. I've worked in systems that didn't use it and worked just fine. I would bet most folks err on the side of too little, unless they have some kind of framework doing the heavy lifting.
"sans-I/O" is one term for that style. I like it a lot but it's not a free lunch.
I always liked the phrase 'Hoist your I/O' [1] but yes, you can only hoist it up so many times until its outside of your application completely (making it completely pure, and now someone else's responsibility).
[1] https://www.youtube.com/watch?v=PBQN62oUnN8
Or "functional core, imperative shell".
Your just moving the problem one layer up.
You still need to write a test for how it all comes together and you should write tests for your error handling. You need a mock to respond with an error.
Not necessarily. If your error handling looks like this:
then you probably don't need to test it. All you're doing is bubbling up the error handling from other libraries' functions.now you have N+1 tests!
This blog post talks as if mocking the `open` function is a good thing that people should be told how to do. If you are mocking anything in the standard library your code is probably structured poorly.
In the example the author walks through, a cleaner way would be to have the second function take the Options as a parameter and decouple those two functions. You can then test both in isolation.
> If you are mocking anything in the standard library your code is probably structured poorly.
I like Hynek Schlawak's 'Don’t Mock What You Don’t Own' [1] phrasing, and while I'm not a fan of adding too many layers of abstraction to an application that hasn't proved that it needs them, the one structure I find consistently useful is to add a very thin layer over parts that do I/O, converting to/from types that you own to whatever is needed for the actual thing.
These layers should be boring and narrow (for example, never mock past validation you depend upon), doing as little conversion as possible. You can also rephrase the general purpose open()-type usage into application/purpose-specific usages of that.
Then you can either unittest.mock.patch these or provide alternate stub implementations for tests in a different way, with this this approach also translating easily to other languages that don't have the (double-edged sword) flexibility of Python's own unittest.mock.
[1] https://hynek.me/articles/what-to-mock-in-5-mins/
> This blog post talks as if mocking the `open` function is a good thing that people should be told how to do. If you are mocking anything in the standard library your code is probably structured poorly.
Valgrind is a mock of standard library/OS functions and I think its existence is a good thing. Simulating OOM is also only possible by mocking stuff like open.
> Valgrind is a mock of standard library/OS functions and I think its existence is a good thing.
That is mostly wrong.
Valgrind wraps syscalls. For the most part it just checks the arguments and records any reads or writes to memory. For a small number of syscalls it replaces the syscall rather than wrapping it (for instance calls like getcontext where it needs to get the context from the VEX synthetic CPU rather than the real CPU).
Depending on the tool it can also wrap or replace libc and libpthread functions. memcheck will replace all allocation functions. DRD and Helgrind wrap all pthread functions.
What do you think that is? Valgrind tracks allocations by providing other implementations for malloc/free/... .
Are you trying to explain to me how Valgrind works? If you do know more than me then please join us and become a Valgrind developer.
Mostly it wraps system calls and library calls. Wrapping means that it does some checking or recording before and maybe after the call. Very occasionally it needs to modify the arguments to the call. The rest of the time it passes the arguments on to the kernel or libc/libpthread/C++ lib.
There are also functions and syscalls that it needs to replace. That needs to be a fully functional replacement, not just looking the same as in mocking.
I don’t have any exact figures. The number of syscalls varies quite a lot by platform and on most platforms there are many obsolete syscalls that are not implemented. At a rough guess, I’d say there are something like 300 syscalls and 100 lib calls that are handled of which 3/4 are wrapped and 1/4 are replaced.
> Are you trying to explain to me how Valgrind works?
Sorry that wasn't my intention. You are a Valgrind developer? Thanks, it's a good project.
It seems like I have a different understanding of mocking than other people in the thread and it shows. My understanding was, that Valgrind provides function replacements via dynamic linking, that then call into the real libc. I would call that mocking, but YMMV.
All rules exist to be broken in the right circumstances. But in 99.9% of test code, there's no reason to do any of that.
I think when testing code with an open call, it is a good idea to test what happens on different return values of open. If that is not what you intent to test for this method, then that method shouldn't contain open at all, as already pointed out by other comments.
That depends on what your error recovery plan is.
If the code's running in a space shuttle, you probably want to test that path.
If it's bootstrapping a replicated service, it's likely desirable to crash early if a config file couldn't be opened.
If it's plausible that the file in question is missing, you can absolutely test that code path, without mocking open.
If you want to explicitly handle different reasons for why opening a file failed differently, by all means, stress all of that in your tests. But if all you have is a happy path and an unhappy path, where your code doesn't care why opening a file failed, all you need to test is the case where the file is present, and one where it is not.
Modifying the file system to be would be kind of like mocking to me. I very much, don't want my daemons or user-facing applications to just crash, when a file is missing. That's kind-of the worst thing you can do.
> Modifying the file system to be would be kind of like mocking to me.
Modifying the file system's implementation would be. Including a valid_testdata.txt and an invalid_testdata.txt file in your test's directory, however, is not 'modifying the file system', any more than declaring a test input variable is 'mocking memory access'.
> don't want my daemons or user-facing applications to just crash, when a file is missing
If the file is important, it's the best kind of thing you can do when implementing a non-user-facing service. The last thing you want to do is to silently and incorrectly serve traffic because you are missing configuration.
You want to crash quickly and let whatever monitoring system you have in place escalate the problem in an application-agnostic manner.
Details matters, but good test doubles here are important. You want to capture all calls to IO and do something different. You don't want tests to break because someone has a different filesystem, didn't set their home directory as you want it setup, or worse is trying to run two different tests at the same time and the other test is changing files the other wants.
Note that I said test doubles. Mocks are a bit over specific - they are about verifying functions are called at the right time with the right arguments, but the easy ability to set return values makes it easy to abuse them for other things (this abuse is good, but it is still abuse of the intent).
In this case you want a fake: a smart service that when you are in a test setups a temporary directory tree that contains all the files you need in the state that particular test needs, and destroys that when the test is done (with an optional mode to keep it - useful if a test fails to see debug). Depending on your situation you may need something for network services, time, or other such things. Note that in most cases a filesystem itself is more than fast enough to use in tests, but you need isolation from other tests. There are a number of ways to create this fake, it could override open, or it could just be a GetMyProgramDir function that you override are two that I can think of.
Your tests are either hermetic, or they're flaky.
That means the test environment needs to be defined and versioned with the code.
Even in the case you mention you really shouldn't be overriding these methods. Your load settings method should take the path of the settings file as an argument, and then your test can set up all the fake files you want with something like python's tempfile package
There are a number of different ways to solve this problem. I too use the path of settings in my code, but I'm not against overriding open and all the other file io functions. Of course this article is about python which has different abilities than other languages, what is best in python is not what is best in other languages, and I'm trying to stay at a higher level that a particular language.
> This blog post talks as if mocking the `open` function is a good thing that people should be told how to do.
It does. And this is exactly the problem, here!
> TFA: The thing we want to avoid is opening a real file
No! No, no, no! You do not 'want to avoid opening a real file' in a test.
It's completely fine to open a real file in a test! If your code depends on reading input files, then your test should include real input files in it! There's no reason to mock any of this. All of this stuff is easy to set up in any unit test library worth it's salt.
> then your test should include real input files in it! There's no reason to mock any of this.
That's okay for testing some branches of your code. But not all. I don't want to have to actually crash my hard drive to test that I am properly handling hard drive crashes. Mocking[1] is the easiest way to do that.
[1] For some definition of mock. There is absolutely no agreement found in this space as to what the terms used mean.
> In Why your mock doesn’t work I explained this rule of mocking:
> Mock where the object is used, not where it’s defined.
For anyone looking for generic advice, this is a quirk of python due to how imports work in that language (details in the linked post) and shouldn't be considered universal.
Honestly I don't buy it. Worse, this is one of the reason I prefer to do "minimal integration tests" instead of unit tests. Take the example snippet of code
and the very first comment just below>>> The thing we want to avoid is opening a real file
and then the article goes and goes around patching stdlib stuff etc.
But instead I would suggest the real way to test it is to actually create the damn file, fill it with the "normal" (fixed) content and then run the damn test.
This is because after years of battling against mocks of various sort I find that creating the "real" resource is actually less finicky than monkeypatching stuff around.
Apart from that; yeah, sure the code should be refactored and the paths / resources moved out of the "pure logical" steps, but 1) this is an example and 2) this is the reality of most of the actual code, just 10x more complex and 100x more costly to refactor.
That works fine for files, but what if the integration is with a third party service for example?
You can create an actual mock networked service but it's much more work.
I think this is an example explaining what seems like a good practice for using mocks in python to me, the actual code in the post is barely "supporting cast".
If it's HTTP you can create the fixtures and serve them with a mock server. I'm a frontend dev, so backend APIs are like 3rd parties to me.
I use a browser extension for scraping actual backend responses, which downloads them with a filename convention the mock server understands. I mostly use it for development, but also for setting up screenshot tests. For example,
screenshot the app and pixel diff it screenshot…> I use a browser extension for scraping actual backend responses
Can you tell the name of the extension ?
https://github.com/ericfortis/download-http-requests-browser...
You can automate it with the right libraries, such as https://github.com/kevin1024/vcrpy
This one runs the real request and saves the response, faking it later by returning what it saved instead of making the request again.
If you make the function pure it will be easier to test. Pass the moving parts as function parameters, then you can pass in the mocks in the actual functions when testing. Example:
refactor for easier testing in your test you can now mock a and b> An overly aggressive mock can work fine, but then break much later. Why?
Because you are testing against implementation, not specification.
You’re welcome.
If you’re doing TDD, you could just view this as moving the “open” call to your unit test. As others point out, that encourages pure functions that can pipe in input from other sources than just file paths.
Great article. In addition, updating your mocking code can often be time-consuming. To try to make this easier, I built mock[1], which streamlines the process of setting up mock services for testing.
https://dhuan.github.io/mock/latest/examples.html
Arguably this is a problem in when the patch is unapplied.
Presumably in the coverage case it’s being called by a trace function, which inevitably runs during test execution — and while we want the trace function to be called during the test function, we really want it without any patches the test function is using. But this arguably requires both an ability for the trace function to opt-out of patches and for the patcher to provide a way to temporarily disable all of them.
I feel like the #1 reason mocks break looks nothing like this and instead looks like: you change the internal behaviors of a function/method and now the mocks interact differently with the underlying code, forcing you to change the mocks. Which highlights how awful mocking as a concept is; it is of truly limited usefulness for anything but the most brittle of tests.
Don't test the wrong things; if you care about some precondition, that should be an input. If you need to measure a side effect, that should be an output. Don't tweak global state to do your testing.
Most of the real world is about manipulating the real world. For algorithms it is fine to say depend on the pure inputs/outputs. However what we care about is that global state is manipulated correctly and so the integration tests that verify that are what are important. In most cases your algorithm shouldn't be unit tested separately since it is only used in one place and changes when the users change: there is no point in extra tests. If the algorithm is used in many places comprehensive unit tests are important, but they get in the way when the algorithm is used only once and so the tests just inhibit changes to the algorithm as requirements change (you have to change the user, the integration tests, and the unit tests that are redundant).
As such I disagree. Global state is what you should be testing - but you need to be smart about it. How you setup and verify global state matters. Don't confuse global state above with global state of variables, I mean the external state of the program before and after, which means network, file, time, and other IO things.
IO and global state is also just inputs that can be part of arrange-act-assert. Instead of mocking your database call to always return "foo" when the word "SELECT" is in the query, insert a real "foo" in a real test database and perform a real query.
Again I've heard "but what if my database/table changes so rapidly that I need the mock so I don't need to change the query all the time", in which case you ought to take a moment to write down what you're trying to accomplish, rather than using mocks to pave over poor architectural decisions. Eventually, the query fails and the mock succeeds, because they were completely unrelated.
So far I've only seen mocks fail eventually and mysteriously. With setups and DI you can treat things mostly as a black box from a testing point of view, but when mocks are involved you need surgical precision to hit the right target at the right time.
> global state
In my experience global state is the testing bug farm. Tests that depend on global state are usually taking dependencies they aren’t even aware of. Test initializations grow into complex “poke this, prod that, set this value to some magic number” setups that attempt to tame the global state but as global state grows, this becomes more and more difficult and inconsistent. Inter-test dependencies sneak in, parallelism becomes impossible, engineers start turning off “flaky” tests because they’ve spent hours or days trying to reproduce failures only to eventually give up.
This sort of development is attractive when starting up a project because it’s straightforward and the testing “bang for the buck” is high. But it degrades quickly as the system becomes complex.
> Instead of mocking your database call to always return "foo" when the word "SELECT" is in the query, insert a real "foo" in a real test database and perform a real query.
Consider not sprinkling “select” statements throughout your code instead. This tight coupling makes good testing much more difficult (requiring the “set all the global state” model of testing) but is also just generally not good code structure. The use of SQL is an implementation detail that most of your code shouldn’t need to know about.
A thin layer around the DB interactions gives you a smaller set of code that needs to be tested with state, gives you a scoped surface area for any necessary mocking, makes it much easier of you need to change storage systems, and also gives you a place that you can reason over all the possible DB interactions. This is just good separation of concerns.
You didn't use global state in the same sense that I did. I mean global as in state of the entire world, not just the computer you are on.
I tamed our inter-set dependencies by doing things like starting dbus on a different port for each tests - now I can test with real dbus in the loop and my tests are fast and isolated. We have a strict rule of what directories we are allowed to write to (embedded system - the others are read only in production), so it is easy to point those to a temp dir. It was some work to set that up, but it tames most of your issues with global state and allows me to verify what really counts: the system works.
For a CRUD web app your database separation of concerns makes sense. However in my domain we have lots of little data stores and nobody else needs access to that store. As such we put it on each team to develop the separation that makes sense for them - I don't agree with all their decisions, but they get to deal with it.
Sure. I wasn’t responding to your statements but I can understand what you mean. If your code manipulates the physical world (and not in a “sets this bit on a disk” sort of way) then you have to validate that action. Whether that valuation should be decoupled from the business logic that drives that piece really depends on the complexity of the system. At some point you are going to do full integration valuation so it’s a question of balance. How much is tested that way vs more isolated.
Tests that work and verify the system works are the Pri0 requirement. Most of the conversations about how best to test are structured for the benefit of people who are struggling with meeting the Pri0 because of maintainability. With enough effort any strategy can work.
> However in my domain we have lots of little data stores and nobody else needs access to that store.
If the little data stores are isolated to small individual areas of code then you probably already have the necessary isolation. Introducing the lightweight data store isolation layer might be useless (or not, context dependent). Now if these individual areas are doing things like handing off result sets to other code then I would have something different to say.
I don't understand the "but what if my database/table changes so rapidly that I need the mock so I don't need to change the query all the time". Are they changing your database schema at runtime (not just software update time when you can do migrations)? Otherwise you should have a test helper that creates your entire database schema so that when it changes you can update the schema in one place and all tests are re-run against the new one. (you should also have some insert common test data helpers that are easy to update as needed should the new schema affect the changes)
I haven't seen mocks fail mysteriously. I've seen them fail often though because requirements change and instead of updating the callers (generally a small number) you end up with 200 tests failing and give up because updating all the tests is too hard. Mocks are always about implementation details - sometimes you have no choice, but the more you can test actual behavior the better.
> you change the internal behaviors of a function/method and now the mocks interact differently with the underlying code, forcing you to change the mocks
Rarely should a mock be “interacting with the underlying code”, because it should be a dead end that returns canned data and makes no other calls.
If your mock is calling back into other code you’ve probably not got a mock but some other kind of “test double”. Maybe a “fake” in Martin Fowler’s terminology.
If you have test doubles that are involved in a bunch of calls back and forth between different pieces of code then there’s a good chance you have poorly factored code and your doubles are complex because of that.
Now, I won’t pretend changes don’t regularly break test doubles, but for mocks it’s usually method changes or additions and the fix is mechanical (though annoying). If your mocks are duplicating a bunch of logic, though, then something else is going on.
To fully show my hand: What I do not like are mocks which have you outline a series of behaviors/method calls which are expected to be called, typically with specific input parameters, often with methods in order. In python, this looks like setting up Mocks and then using `assert_called` methods. In Go, this means using Gomock-generated structs which have an EXPECT.
A test should not fail when the outputs do not change. In pursuit of this ideal I often end up with fakes (to use Martin Fowler's terms) of varying levels of complexity, but not "mocks" as many folks refer to them.
[0] - https://docs.python.org/3/library/unittest.mock.html#unittes...
[1] - https://pkg.go.dev/github.com/golang/mock/gomock
Yeah I think true mocks often lead to fragile, hard-to-read tests. A text which expects a bunch of specific interactions with a mock is white-box, whereas a test which tests the state of a fake after the operation is complete is mostly black-box. Then, if your dependency gets updated, only the fake breaks, rather than all of your test expectations.
A particularly complex fake can even be unit-tested, if need be. Of course, if you're writing huge fakes, there's probably something wrong with your architecture, but I feel like good testing practices should give you options even when you're working with poorly architected code.
So this I agree with. Mocks are often used to over-constrain and end up validating that the implementation does not change rather than does not regress.
There are some specific cases, such as validating that caching is working as expected, where it can make sense to fully validate every call. Most of the time, though, this is a pointless exercise that serves mostly to make it difficult to maintain tests.
It can sometimes also be useful as part of writing new code, because it can help validate that your mental model for the code is correct. But it’s a nightmare for maintenance and committing over-constrained tests just creates future burden.
In Fowler terminology I think I tend to use Stubs rather than Mocks for most cases.
I see a lot of times people (read: me) are lazy and make a mock that does not have anywhere near the behavior of the original. It's more like a very partial stub. I will mock an api with 20 possible calls with the 2 that I use. Unsurprisingly, this mock is closely tied to the current implementation.
I think making narrow use-case-specific test doubles is often better than a big shared one. You can even wrap the API with an interface that only exposes the methods you need.
Why even mock anything in this example? You need to read the source code to work out what to mock, reaching deep inside the code to name some method to mock.
But what if you just passed in the contents of the file or something?
Edit: oh wait actually this is what the very last line in the blog post says! But I think it should be emphasized more!
The main reason why your mock breaks later is because you refactored the code. You did the one thing tests are supposed to help you do, and the tests broke. If code was never modified, you wouldn't need automated tests. You'd just test it manually one time and never touch it again. The whole point of tests is you probably will rewrite internals later as you add new features, improve performance or just figure out better ways to write things. Mock-heavy tests are completely pointless in this respect. You end up rewriting the code and the test every time you touch it.
There are really only a few reasons to use mocks at all. Like avoiding network services, nondeterminism, or performance reasons. If you need to do a lot of mocking in your tests this is a red flag and a sign that you could write your code differently. In this case you could just make the config file location an optional argument and set up one in a temp location in the tests. No mocking required and you're testing the real API of the config file module.
It is worth pointing out that you can often use containerized services as an alternative to mocking.