Here is the Testing Pyramid ...
The testing pyramid is widely used approach towards testing in agile projects, yet it starts to get a bad reputation with testers like Richard Bradshaw, who according to John Stevenson proclaimed the pyramid is dead on MEWT [1], or James Bach, who states “it has little to do with testing” [2]. So why is that?
To elaborate on this we have to go a little bit into the pyramid’s history.
It started out as the test automation pyramid [3] by Mike Cohn, with just 3 layers: unit, service and UI (see Fig. 1). Mike used it to express you should have a lot of automated unit test scripts, a lesser amount of service test scripts and only a few UI test scripts, because as you go up the pyramid the respective automation scripts become less cost effective. This is in fact a valid point.
Over time the three layers have been basically upped to 5 by expanding the service layer, a cloud for all manual test sessions has been added on top and the automation in the title has been abandoned, a prominent example [4] can be found on Alister Scott’s blog. I will not go into further detail regarding the pyramid’s history since other people have already done this. If you want more information on this just watch this 10 minute video by Richard and John [5], which also turns the infamous ice-cream cone from an anti-pattern into a pattern.
Fig. 1: From Test Automation Pyramid to Testing Pyramid |
… and here is what’s wrong with it.
The first clue, that something might be wrong, is right there in the name and visual, hidden in plain sight: the automation in the name is lost over time, yet the pyramid is still predominately about automation, by stating that all layers are automated and all manual testing is shoved into this big, cloudy thing on top. This is a tremendous overstating of the value of automation and underrates any manual testing effort by basically saying: “Do very, very much automation … oh and some other stuff.” This automation heavy approach to testing might be the target of James Bach’s criticism when you look at his definition of testing and checking [6]. The pyramid is all about (machine) checking.
The strong focus on automation - ironically - might also be one reason why the pyramid is so popular among a lot of agile practitioners since this bodes very well with people, who have their roots deep into software development and I was no exception to that. Another reason for the pyramid's popularity is most likely the strong visualisation and the direct actionable advice you can take from it: A pyramid is a perfect form to say “do lots of the stuff at the bottom and less of the the rest while you reach to the top".
A second argument made against the pyramid is, that it marks the top layers as less important than the bottom layers. This is why John Stevenson proposes an alternate model: The test execution model, which treads all layers equally and is described here [7] (Blog) or for the more audio-visual among you here [8] (youtube). Although I think this is a very good model for learning and adjusting your testing during execution I also see a difference to the pyramid. The pyramid is more targeted at a strategic level and wants to express how you can distribute your testing while the test execution model to me focuses more on ... well ... test execution. Hence I do not really see them as alternatives to each other, nor as mutual exclusive. The testing pyramid makes no statement of each layer’s importance, the sole reasoning behind it is cost effectiveness.
This brings us to the third main critique, which is for example addressed by Todd Gartner in his talk about Case Studies in Terrible Testing [9]. Note that Todd seems to use Mike Cohn’s pyramid, but renamed the UI layer with system test, so it’s possibly not entirely the same. Furthermore Todd's slides and his subsequent interview with Joe Colantonio [10] at least indicate a mild misconception about testing on his side, too. In his third case study he states, they had the perfect pyramid, meaning the right amount of and best technologies for automated checks on each layer, yet the project was a failure because nobody addressed the market risk, a layer for user testing was missing. One might argue that Todd fell prey to the automation heavy approach the pyramid indicates and that a good tester, focused on manual testing, might have told them that. Which of course can also not be accounted to the pyramid’s pro-side as I explained above.
Nevertheless the point he makes is very valid: The pyramid does not take risk into account at all and takes its distribution advice solely from a cost analysis of check creation. His best example for this are websites he creates with little to no algorithmic or functional risks, but mainly user related ones.
In conclusion the test pyramid heavily narrows testing down to automation, has a distribution of layers, which is basically only defined by money and does not take risks or anything else into account. So when you look at the evidence, is Richard right and it is it time for the testing pyramid to die?
Another way to look at it
In his talk at Testbash Brighton 2016 John Stevenson encouraged us to take existing models and make them our own, change and adapt them to better suit our project needs. I ask myself: why not give the pyramid this very courtesy but instead start digging it’s grave instead? Especially since I think the pyramid’s success did not come out of nowhere: The pyramidical form is a powerful visualisation of your testing approach, easily explained to others and a very good framing device for yourself. Just print it out, hang it on the wall and you have a very good reminder of how you want to organise your testing efforts right above your desk in one picture. Definitely beats 20 pages long test strategy documents. So there are merits.
A lot of the criticism revolves around the concrete way the pyramid looks right now: the specific layers it consists of, the cause for this exact build up. Here is the point where we can adapt that model. I look at the pyramid mainly as an instance for a possible testing approach, when I step back and abstract it then I see the following:
-
- do lots of this, less of that for a specific reason
- don’t skip anything on here entirely
- everything not on here is not in your focus (at least for now)
If you look at it that way the inital pyramid from Mike Cohn becomes an instance of this approach:
- do lots of unit tests, less Service tests, because unit tests are the most cost effective
- do few UI tests, but don’t skip them entirely, they add value to your project
- this pyramid makes not statement about manual testing efforts
This way of looking at pyramids makes them way more flexible: I do not need to slavishly stick to the layers the inital testing pyramids consist of and I don’t need to focus entirely on automation anymore, yet the pyramid as a strong visualisation and framing device stays intact. I can even change the reasoning behind the layer distribution: “I do lots of Beta Testing, less Unit Testing because customer acceptance and usage are really what makes or brakes my app.” This way you can build up your pyramid of layers addressing certain projects risks if you want.
Take Todd’s fifth Case Study as an example: he claims the testing pyramid does not help him here, since he faces mostly market and orchestration risks, but nearly no interface risks. I agree with Todd, that the original distribution does not help him much, but what did he end up doing? He invested heavily in user and integration tests and has some mild unit testing going on while skipping the system test completely. I might argue, that he still has a pyramid in place, but it is assembled differently:
Fig. 2: A pyramid for Todd's 5th project |
Immediately you have a strong visual representation of Todd’s testing efforts and a guideline to this project’s testing approach as shown in Figure 2. You can verbally express it like this:
- do lot’s of user tests, less Integration tests since acceptance from user poses more risks than integrating services
- do few unit tests, but don’t skip them entirely, sincere there are some algorithmically challenging parts
- don’t invest in system tests, since there is no risk here
Note that this pyramid says nothing about the automation anymore, how exactly the user or unit tests are composed and which role automation plays there is entirely up to the testers in this project. You might even try to use John’s test execution model [7] while executing your tests. Furthermore the cloud on top is lost, because it always just seemed like an add-on for testing efforts that have been simply forgotten in the pyramid itself.
And you can tweak your pyramid even further without losing its benefits with another simple trick: How about color coding the layers to emphasise certain aspects you want to focus on? However be careful not to overdo it: Having a five-layered pyramid with each layer in a different color might end up confusing you. I tend to use color to indicate if I much automated checking I want to do on a specific layer.
A wonderful and popular real life example of an adjusted pyramid is the mobile testing pyramid [11] by Daniel Knott: it flips several stages around and uses color coding to emphasise on which stages automation might be put to good use and which are dominated by manual testing - albeit still tool enhanced, of course.
So instead of clinging to the very specific current instances of the testing pyramid another way to look at it might be to lift this model up to a more generic and abstract level as depicted in the picture below:
Fig. 3: Template for a project specific testing pyramid |
The degree of automation in the Figure 3 is just one example of information you can convey via color coding, you may choose something different. If you approach the testing pyramid like this you loose one of the things, which made it successful, you cannot just take a pyramid as a manual anymore, that tells to build lots of unit tests and only few UI tests in your project.
Instead you have to come up with a specific pyramid of your own. There a lot of different ways to do this for this, e.g. the simple risk analysis Todd did or you can use James Bach’s Heuristic Test Strategy Model [12]. Once you have done this you benefit from the pyramids strengths. It is easy to explain to and discuss with others and the simple, yet strong visual helps you keep your testing efforts in line with your chosen approach. If, for example, you spend way more effort on UI testing than your pyramid indicates you should, you can easily see this and reassess your project by asking yourself some questions: Are you still testing the right thing? Shouldn’t you spend this effort on another layer, where it adds more value to your project? Or is there a flaw in your pyramid and you really should spend all this effort on UI testing? If so: What other implications does this realisation have to your approach?
What does it mean: lots of this and less of this?
When you frame the pyramid like I did in the last section you stumble over a quite interesting follow up question. You visualised for example to do lots of user testing and less unit testing. However what does it mean when you do lots of something or less of something else?
In the traditional test automation pyramide it basically boiled down to number of test scripts: If you count the number of test scripts on unit layer, it should be significantly higher than the number of test scripts on service layer. Furthermore you should have only few test scripts on UI layer. So figuring out if you are applying the pyramid as intended is a task as simple as counting. Although, to be fair, doing this is criticised itself since number of test scripts does not necessarily reflect how much work went into them. Yet it remains a good indicator.
However counting test scripts does not work anymore when every layer can consist of both automated checks and manual testing tasks. You can still count the number of automated test scripts, but you cannot count manual testing tasks as easy. In a first approach you might want to count the number of manually performed test cases, but this is flawed even if you overlook the “apples to oranges” comparison between automated and manual test scripts. In today’s projects designed test cases for testers to manually execute hardly capture everything a tester does, some even argue test cases do not reflect testing very well [13]. A tester does much more than creating and then ticking of check lists, for example performing test sessions, product explorations or interacting with other team members. As a result there is no easy measurment and comparision anymore, which expresses the difference between the layers in easy to digest numbers.
One might think that effort spent on a layer is a good metric here, but this is a false friend, too. 250 Beta Testers will definitely outmatch every other layer within hours regarding effort, even if the Beta Test is not the bottom layer. What might come closest is effort spent in the core team for that layer, for example not measuring the testing effort of all 250 Beta Testers (which is admittedly a little unjust), but rather taking into account the setup of the beta tests, deciding which purpose it should have in the project, deciding which groups get which versions, analysing the feedback, scripting testers with different chartas, ...
The truth is I have no easy answer to this. As a rule of thumb I say I am willing to spent the most manpower, brainpower or money on the bottom layer since this is the one, which is closest to my heart in regards to my reasoning behind my pyramid in my project. And I am aware that this is the exact opposite of why Mike Cohn put unit tests at the bottom.
Conclusion
The test automation pyramid is criticised for very valid reasons and is not a very good model to use anymore in it’s exact appearance. However I think if you take a step back and look at the pyramidical form in general, you can still use it’s benefits to add value to your project - and that is why I still like pyramids.
Sources: