I know that Martin Fowler has written about this extensively, but I wanted to write a concise version, since this point seems to bear reiterating from time to time.
Mocks aren’t stubs.
Specifically, I recommend against setting an expectation on a method (“mocking” it) when you really want to simulate a specific response from that method for the current testing (“stubbing” it). That said, I must confess than I used mocks in place of stubs early in my test-driven development practice, and I believe that all programmers who try the so-called “mockist” style of design eventually use a mock where a stub would work better. If you’ve done this, or are doing this now, then don’t feel shame. Instead, consider changing your practice.
When one uses a mock in place of a stub, one tends to over-specify the collaboration between a Client and its Suppliers, and that leads to brittle tests. These tests tend to make hyperactive assertions, meaning assertions that fail even when the underlying production code works as expected. Such hyperactive assertions play the role of the boy who cried “wolf!”, distracting us with needless concern. Even when these superfluous mocks don’t over-specify a collaboration, they often duplicate specifications in other tests. Even when they don’t do that, they tend to encourage us to make two assertions (method expectation and checking the ModelAndView) or check irrelevant details multiple times, such as:
Writing one such test doesn’t seem to matter much, but when the method in question returns a collection, and you write the “zero, one, many, lots, oops” tests, you might mock that method exactly the same way five times. How many times do you need to check the same thing?
Now please go read, or re-read, Martin’s excellent article.
Tweet 1 year agoI have long preferred to start writing tests by writing the assertion. I use this technique in my classes, and I teach it as a Novice technique, but I have found the technique useful long past the Novice stage. I use a handful of guidelines to help me keep my tests clean.
I combine these specific guidelines for writing tests with the general guidelines I use to guide my design: the four elements of simple design. I find these guidelines help me write tests with a variety of valuable properties —
I recommend you try applying this technique diligently in your work for at least two weeks, just to see how it benefits you. Please comment with any observations you’ve made about the results, including how you feel about writing tests, differences in the design of your tests, and especially differences in the design of your production code.
Tweet 1 year agoI strongly recommend that especially Novices only refactor when all tests pass. I can offer a straightforward explanation for this strong recommendation.
There are many, many more code bases that fail your tests than pass them.
For the mathematically-minded among you, I strongly believe that the space of “failing code bases” is much larger than the space of “passing code bases”, relative to your tests. I felt like the difference corresponded well to the difference between uncountably infinite and countably infinite, but as one of my commenters points out, the universe of code bases is countable, so that’s not it. I’d like to say “clearly” and move on, but I’d find that intellectually dishonest. I’ll think about how to support this hypothesis, but in the meantime, please suspend any disbelief you have about it.
The result of any change to a failing code base falls into one of these categories:
Now consider that result of any change to a passing code base falls into one of these categories:
First, notice the difference in the variety of results: four cases for changing a failing code base, and two for changing a passing code base. Next, notice which of these results you can detect just by looking at the red/green bar. In the case of the failing code base, you’ll only notice 1 of the 4 results. In the case of the passing code base, you’ll notice both results. This last point wins the day for me.
If you prefer pictures, then these illustrate the difference.
When I try to working in a failing code base, it looks like this.
When I work in a passing code base, it looks like this.
Especially when I refactor, I prefer to have as few surprises as I can, and the passing code base seems to offer fewer surprises. Therefore, I remind myself to refactor only when all tests pass, and I recommend you do the same.
Tweet 1 year agoI have noticed a recent swell of comments about the pain of dependency injection, and if this has caused you problems on your project, then I think I can help you by offering a few simple guidelines and a single goal for injecting dependencies.
I have formulated these as Novice or Advanced Beginner rules, and so I have worded them more strongly than I tend to word my advice.
All these guidelines have their roots in “remove duplication” and “fix bad names”, two of the four elements of simple design.
Some people inject dependencies in order to stub, mock or spy on a method for testing. I used to use that as a primary motivation, but over the years that motivation has evolved into something more widely useful. I inject dependencies for one reason: to move the things that change in the direction of the client and the things that don’t in the direction of the supplier. Or, if you prefer, “abstractions in code, details in data”. Or, if you prefer, to avoid abstractions depending on details. Any of those will do.
No problem. I’ll write more articles in this space showing examples of each of the guidelines, and you’ll have plenty of opportunity to share your opinions about them.
Tweet 1 year agoI have found only one truly universally useful definition for “it works”:
It passes all the design tests we’ve thought of so far.
Pithy, unambiguous, and not to be confused with “it’s done”:
It passes all the examples (business tests) we’ve thought of so far.
or with “it’s ready”:
It passes enough examples to make a significant stakeholder happy.
Notice that “it works”, “it’s done” and “it’s ready” might or might not overlap with each other.
Tweet 1 year agoIn the course of leading the Simple Approach to Modular Design workshop at YOW! Melbourne, I thought of a small number of TDD exercises that I encourage you to try. If you’d like to share your results, please fork the Gist below and show us where you reached.
This is part of the implementation of a first story in a point of sale application. The first story involves selling a single item by handling a barcode event and displaying the product’s price on a display, which I model as a wrapper for a text property. Working straightforwardly by implementing everything relatively simply, I end up at this point:
I argue that this implements the entire story, at least enough to satisfy customer-me.
From this point, I offer a few simple exercises. If you want to share your results, then please take frequent snapshots or add a comment to your work telling us what you were thinking and how you approached the problem.
emptyBarcode() by passing null in for pricesByBarcode and the test still passes. What does that tell you about the design and how do you propose to improve it?pricesByBarcode in each test. Start removing duplication. Keep removing duplication. Where did you stop?If enough people participate, providing their solutions, then I’ll provide my solutions soon thereafter.
Tweet 1 year agoConsider the following code:
Someone, somewhere, on a mailing list, and I apologise for forgetting who, posed the question, “What should I test here?” Here is one solution that someone offered:
It’s not bad, but I don’t like duplicating the irrelevant detail of where the current_user comes from. I would rather avoid that extra step by checking the user more directly:
Checking this results in the following design change:
I like this more, but I notice that we don’t check the special case of user.blank anywhere, so I add something to check that.
Of course, I don’t know the circumstances under which current_user could be blank, and frankly the mere possibility makes me nervous. I’d rather fix current_user so that it couldn’t answer blank, but I don’t know enough about the design to understand the impact of that assumption, so I’ll mark it as a smell and move on.
Now we have the problem that no-one has checked the other roles a user could have, so I add a check for that.
This check implements a kind of double-entry book-keeping for the list of acceptable roles in the production code. It makes me want to parameterize the list of acceptable roles in the production code, so I try this:
Now that I look at that, I highly doubt that a blank user would have any of those roles, so I look at the code for current_user to figure out how it gives me a blank user, then throw 1000 random roles at it to see whether it has any of them. It doesn’t, so I’ll spare you the code. Checking this assumption allows me to simplify the code, but complicates the checks.
I really dislike this revised check, because I don’t know whether to stub the blank? message or the role? message. I’d rather not know the difference. In fact, since I’m no longer checking for a blank user with a guard clause, I don’t have to check what happens with a blank user. If, somehow, we have a blank user with one of the acceptable roles, then something else is definitely broken, and I have already checked that. This means that I can delete the check for rejecting a blank user, leaving only the check for the specific roles.
Now I see that I’ve duplicated the acceptable roles in both the production code and the tests, and my experience has taught me to remove this duplication by moving the copy from the production code towards the copy in the tests. This involves passing the acceptable roles in to manage_samples_link_for as a parameter. This results in:
To avoid migrating clients too soon, I make the list of acceptable roles an optional parameter.
Once I see that pass, I push the optional parameter out towards the client.
And then once I see that pass, I remove the default value for acceptable_roles.
Now the list of acceptable roles in the test accidentally matches the list of acceptable role in manage_samples_link, which means that I should probably change the test, otherwise someone might believe that those values matter.
I could check that manage_samples_link_for invokes :role? on the user for each role until it finds the first acceptable role, but I don’t think I need to check specifically that manage_samples_link_for short-circuits its evaluation or not, so I stop here with manage_samples_link. Now I suppose I would benefit from checking that manage_samples_link uses manage_samples_link_for correctly.
Now I feel like I’ve separated the application-specific concerns from the more generic behavior. I suppose another step would include making the link_to itself a parameter. Since I don’t have enough evidence that that would help anything, I don’t do it yet. I do, however, notice that I could improve the production code a little.
I feel good about the results, so I stop here. Here are the checks I wrote:
I suppose that, if I felt the slightest bit concerned, I’d add this check:
Of course, this seems like designing a :roles? method on user, and since I don’t know enough about the user library here to make that decision, I remind myself to look into that later. In the meantime, I really ought to stop and commit these changes.
I received this question by email, which asked about how to write contract tests for an interface whose implementers themselves have complex collaborators.
When you talked about contract test of the
Listinterface it was pretty clear. Just calladd()then check size of list and fact that it contains new item. However (“in reality”) object under test will often have its own collaborators. And here’s my confusion—how do we write contract test when we need to know implementation details? Wouldn’t it be just another collaboration test? Let me give you example.
Be careful: an interface has no collaborators. Classes declare their collaborators in the parameter list of the constructor (“constructor injection”), and interfaces have no constructors, so interfaces do not have collaborators, in general. An interface should describe what to do and with which inputs, but not which tools it needs to do the job (the collaborators).
Implementation details and the contract are disjoint: all behavior is either an implementation detail or part of the contract. No behavior can be both at the same time.
In our project we work with external system through two kind of API’s - Web Services and PL/SQL stored procedures. Over time PL/SQL API being substituted with Web Services so we need to make clients of these API’s independent from implementation. We introduced interface called Gateway (not exactly this name but meaning should be clear). Let’s assume for now that it has only one method
doExternalCall(). Let’s also assume for sake of brevity that we interested only in one aspect of this method’s contract - handling of errors. It should throwItemNotFoundExceptionif there were no actual results returned from external system.
So I have a contract test: http://gist.github.com/627705
There are two implementation of
Gatewayinterface -DBGatewayandWSGateway.DBGatewayusesDAOinterface to call PL/SQL procedure andWSGatewayusesWSProxyinterface to call concrete Web Service. Information returned from each of these interfaces is completely different thus implementation ofDBGatewayandWSGatewaydiffers significantly.
So each implementation’s tests have to know how to create a Gateway that will return no items. I suppose that could be with stubs for DBGateway and WSProxy.
Our goal would be to test implementation of some
BusinessObjectwhich usesGatewayas collaborator.First we will do collaboration test. We will create mock object for
Gatewayinterface and will set expectation to check that its methods were called with correct parameters. Also we would set return values in order to check thatBusinessObjectbehaves correctly whenGatewaythrowsItemNotFoundException.
This sounds correct.
Next we need to write contract test for
Gatewayin order to make sure that it behaves according to expected contract. We can’t just instantiateDBGatewayorWSGateway. We actually need to set stubs for their collaborators (DAOandWSProxy). Moreover, we would need to set return values from these collaborators to be different for different implementation ofGateway. And now it looks like we don’t test contract anymore, instead we test collaboration of concreteGatewayimplementation with its dependency. If we have test forDBGatewaywe can’t use it as a test forWSGateway! It’s not like testingArrayListandLinkedList. It turned out that instead of separate collaboration/contract tests we have just set of collaboration tests for different objects.
Aha! I think I see. Let me try to sketch a solution.
Here is DbGatewayContractTest: http://gist.github.com/627710
WsGatewayContractTest will be very similar.
Conclusion: using a stub doesn’t always mean “collaboration test”. I would like to see a test for Dao, however, that shows when find() returns null. This is a contract test for Dao.