Montag, 22. August 2016

A tester’s thoughts on characterization testing

Michael Feathers just recently posted something about characterization testing on his blog. The term is not new, in fact it is used at least since 2007, still I stumbled over something in this particular blog post. Since I also read Katrina Clokies post about human centered automation at the same time the two topics kind of merged a bit together in my head and got me thinking.
So what is this blog post going to be? Basically it is my stream of thought about characterization testing to see if I can make sense of my thoughts. Hopefully someone else benefits from this, too.

characterization testing is an exploratory testing technique

Let’s start with what characterization testing actually is, at least to my understanding. Characterization testing is a technique, which facilitates writing unit tests to check and document what an existing source code actually does and is therefore especially useful when dealing with legacy code. Note that it is not important if the checked behaviour is also the wanted behaviour.
The created checks are used to find out what a system does and then automatically check that it still works as before while you refactor the code base. If you want to dig deeper on characterization tests and how to create them I suggest you read Michael’s inital blog post or go to Alberto Saviola’s four piece article series, which starts here and ends with a tool, that can create characterization tests automatically.  

Michael starts his blog with the following statement before he moves on to characterization testing itself: "We use [the word testing] for many things, from exploratory testing and manual testing to unit testing and other forms of automation. The core issue is that we need to know that our code works, and we’ve lumped together a variety of practices and put them under this banner.” 

I have a problem with that statement and this was the kicker, which started my thoughts. I namely disagree with stating exploratory testing is there to make sure “that our code works”, because this is not how I see exploratory testing. I use exploratory testing to find out how a system works. If my findings represent desired behaviour or if they result in a series of bugs is often up to debate with my teammates. 

The ultimate reference for exploratory testing to me is Elisabeth Hendrickson’s book Explore It!. I own a german translation therefore I cannot quote here and will summarise instead. Right in the beginning of the book she writes a test strategy should answer two questions:
  1. Does the software behave as planed under the specified conditions?
  2. Are there further risks?
The first one deals a lot with knowing “our code works” as Michael puts it. The second one goes further and also explores (sic!) the system in more detail than just checking it against a specification. Risks are found by learning what the system actually does and using this as an input for even further exploration. 
I think you already know where I am going with this: If exploratory testing is there to learn about risks by learning how the system at hand behaves doesn’t this mean that characterization testing is an exploratory testing technique? Elisabeth’s book even has a whole chapter dedicated to exploring existing (aka legacy) systems, which is precisely what Michael uses characterization testing for.

In this case I think the terms black box testing and white box testing are helpful: While Elisabeth describes mainly black box testing techniques in her book I see characterization testing as a white box testing technique for exploration on unit level. Combine Elisabeth’s techniques with Michael’s characterization testing and you have a very powerful framework to start working on a legacy system, still I see characterization testing more as a part of and not an addition to exploratory testing.

You can read Meike Mertsch’s blog post Exploratory testing while writing code to see how a tester with an exploratory mind works with code while testing, although it might not be characterization testing in the most strictest sense. Meike was also the translator of Explore It! to german.

If you look at characterization testing as a white box exploratory testing technique they have a very unique property when being compared to all the black box techniques in Elisabeth’s book: they create automated checks, which can be seen as a form of documentation of the current system behaviour.

characterization tests are fascinating for testers

This is the point where I have to say that I am a big fan of characterization testing when dealing with legacy systems. Developers, who have to refactor the system, benefit from them directly, because they give them confidence that they did not change the system behaviour in unexpected ways. Testers can use existing characterization tests as a starting point for finding out more about the system.

I don’t know you, but to me finding or writing characterization checks begs the question why the system behaves that way. What is is this behaviour good for and what does it lead to if you put it in the bigger picture of the overall system? Characterization checks can be an input for exploratory testing sessions or fuel discussions with developers, product managers or users. They are an invitation to explore even when they don’t fail and therefore are a good example of checks that help you learn about the system even if the build is green. 

As a tester there are two fallacies regarding characterization tests, I have encountered in the past. The first one is not fixing bugs, because the bugfix breaks a characterization test. Remember that you cannot know if the checked behaviour is correct or wrong. I saw it happen that someone wanted to commit code, but reverted it because it broke some checks. Only later did we find out that the checked behaviour was actually faulty.
The second one is the exact opposite: You know that they are just checking the current state and you are very confident your new code works better than the old one, when the checks break you adjust them to your code and commit everything together. Guess what: The old behaviour was correct and you just introduced a bug.
Since characterization testing comes with all the pros and cons of unit testing (fast & cheap vs. checking only a small part of the system) the situation can even change over time: the checked behaviour is correct until the implementation of a new feature, now the checked behaviour is wrong. The build however stays green. 

ageing characterization and regular checks 

Characterization checks do not just come into existence, in fact Michael and Alberto both wrote down some rules when and how to create them. Now while developers work on a legacy system characterization checks are not the only unit checks they create. There are also regular checks for new code, which are created using TDD and check for a desired behaviour. Both kind of checks end up in the code base and in the continuous integration. In time you may not know anymore if a check stems from characterization testing or TDD. In this sense characterization checks itself can become legacy code, which is hard to deal with.

Imagine entering a project finding 1000 automated checks, 250 of which are characterization checks and the rest are regular checks. If one of the characterization checks fails it is not necessarily a bug, if one of the others fails it most certainly is. Only you cannot see which is which. if the person, who wrote the check, is not on the project anymore you have to treat every failing check as a characterization check and always have to investigate if you found a bug or not. A way to mitigate this is following up on Richard Bradshaw’s  advice to state the intent of a specific check. If you do this you know if a check is a characterization check or not.

Furthermore I have the feeling that a lot of checks become characterization checks over time. When they were written in the first place there was a reason for creating them exactly like they are, checking for a specific behaviour. Now, one or two project member generations later, they are there and document a specific system behaviour. The people, who know why they were created and why the system behaves like this, are gone. The checks have become characterization checks.

This is maybe what Katrina is facing in her project. She writes about a test suite, which is longer with the project than all of the testers, hence they don’t know why there is some certain logic coded into it. Katrina uses this as an example why they do not automate after mastery. I tend to disagree a little bit: The initial team members might very well have automated after mastery, I cannot know for sure, yet knowledge of why has been lost over time. Moving away from Katrina’s example this happens quite often: testers inherit checks from previous testers.

I like to think of a project as body of knowledge, not just the people, but the project itself. There is a lot of knowledge about the system, the users, the workflows in the project’s confluence, in the specific build setup and in the automated checks. From the project’s perspective I see the automated checks as a form of codified prior knowledge.
The current team is left with this form of prior knowledge and now has the problem of finding out why the system behaves like that. Otherwise they risk running into one of the two problems I mentioned earlier: being reluctant to change behaviour that needs changing or introducing bugs by ignoring the checks. This is actually a tough exercise, because finding out why a systems does what it does is usually very challenging. 


Characterization testing is a white box exploratory testing technique and a very powerful tool when dealing with legacy systems. As a tester you should make sure characterization checks are marked as such and try to find out why a system behaves as a characterization check says it does.