Why you can’t test a product and why you will never be able to verify anything that matters

I make two pretty drastic claims in the title
– You can’t test a product!
– You will never be able to verify anything that matters

In this post I will explain what I mean with those statements and provide my arguments for why I believe it is so.

Our understanding of the world is full of simplifications. I like to think of our understanding of the world as very large set of mental models, all more or less connected with each other. Some helpful, but all more or less incorrect. The testing world is no exception to this phenomena. Models like “testing the product against the spec” can be very helpful. But to evolve our understanding we need to question our understanding and come up with something that’s more accurate of what is actually happening. A more accurate model might be more complex and a bit more difficult to work with in day to day testing. But it helps us understand how our simplified models sometimes can get us into trouble.

You don’t test the product, you test your understanding of the product
A simple model of testing could be that you test the product against a specification (or some other oracle of your choosing). Basically comparing the product against some spec. But it’s not quite that easy. What is closer to the truth is that you compare your understanding of the specification with your understanding of the product. The “compare” is a pretty big simplification but we return to that later. For now we simply want to compare our understandings. There are many ways one can improve understanding of specifications but that’s out of scope for this article. Here we focus on improving our understanding of the product. So what do we do? Look at the specification? No! Looking at the specification would not build our understanding of the product, it would build understanding about the spec. Ahh, so we start testing the product! No!

Wtf? Not test the product!?!

Nope. The way I see it we start off by guessing about the product. More or less  consciously we theorizing about the product. In a fraction of a second our brain has build the first model of the product. Then we start testing, not the product, but we test our understanding of the product. A technical empirical investigation to reveal flaws in our understanding of the product. Which investigations or tests you choose to perform is governed by a set of different models and emotions. I hope to write an article on those models soon also.

The rock
Imagine a big rock on a flat surface. Is it heavy? Your previous experience with rocks of that size tell you immediately that rocks of that size is heavy. The reasonable thing to do is to guess that it is heavy. You’re a tester, you decide to test. Based on your guess that this is a heavy stone you decide to lift with your knees instead of your back. It was a heavy stone. Now, could you have made that test without having no idea of what to expect? Even with minimal experience with stone lifting, any reasonably sane person who encountered at least a few stones would expect it to lie still as you approach it. Because that’s what stone on flat surfaces do, right? A stone with anthropophobia is unheard of. Your model of stones says that they don’t roll off when approached by humans. When approaching products we have all kinds of ideas about what we “know” about it based on previously experiences with similar products. Even with no experience of anything like it we still expect it to conform with basic models of nature laws.

A reasonable person makes good guesses about stuff. A good tester recognises that a large part of what we think we know about stuff is guesswork, and tries to challenge those guesses by empirical investigations aka tests. If our investigation is coherent (another model) with our understanding have we verified that what we think we know is actually true? Nope.

Once upon a time there was a splendid theory about sunrise and sunset. Sun rises in the east in the morning and set in the west in the evening. A good model of how the world worked. Partly that it excluded things that can not happen (a model that explains everything explains nothing) and partly because it was very testable. This theory was verified over and over again. Model mapped extremely well to observations and all was fine until one guy decided to explore the northern part of the globe. Imagine his surprise when he realized that the sun didn’t set at all way up in the north. How many white swans do you need to observe before one can verify that all swans are white?

You will never be able to verify anything that matters
Can we know that all stones on flat surfaces lie still when approached with humans? No, we can’t. Even if we test all stones in the universe we can’t be sure. Maybe something changed since we tested? Maybe we approached the stones from the wrong direction. We can’t know. And that is actually just fine. Even if we were wrong, it would not be a disaster, albeit a very exciting discovery. So what can we know? Only that ONCE when we approached a CERTAIN stone in EXACTLY this way it didn’t move in any way that I was able to observe. We can’t verify anything that actually matters.

So what can we do?
We can build as good understanding of a product as possible by guessing, thinking critically if our guesses make sense and testing them as ruthlessly as we possibly can. By testing we can remove the faulty bits in our understanding and replace them with something that hopefully that it stands up better to critical analysis and harsh tests.

A very simple yet more helpful and accurate model
This is a simple high level model of how I see testing. Is it a good model? Well, it helps me think a bit deeper about testing and to me it represents more accurately what’s really going on when I test stuff. It’s wrong, but to me it is helpful. It helps me realize the complexity of testing. And from this simple model I can add more complicated models.


As I see the more traditional view of testing it actually consists of many different parts: Testing our product model to refine it, testing all of our other models about other stuff in the world like specifications. And refining all the models that helps us make sense of what these models of the products and it’s context is telling us. All this to be able to act in a way to achieve the goals that we want to achieve as testers.

I hope that you think this  model is okay but decides to come up with a better, more accurate, more helpful one. And that you share it with with the world so that we can build on each other models. To verify the correctness of this model is impossible, just like verifying anything else. But it is also a really crappy model of how testing works since there is no way refute it. It just the way one tester represented his mental model about testing at one point in time.

Enabling serendipity to test your mental product models

As testers, one of our most important tasks is to find problems with the product or service that we are testing. The things which reduce value for any stakeholder that matters. To be able to distinguish between what is a problem and what is not a problem we need to compare our understanding of the product with our understanding of what threatens value for any stakeholders who matters. I will in this post focus on our understanding, our model, of the product. And especially how to find flaws in our product model at later stages, when we believe that we have a pretty firm grasp of the product.

Understanding the product

When we first start testing a new product we have a very basic understanding of it. Our mental model might be as crude as that we know that it is an application and the name of it. As we start exploring the product we learn and our model gets fuller and more detailed. Our understanding of the product increases as our mental models more and more reliable and accurately can predict different aspects of the product. Or to be a bit more precise, the understanding of the product in a small system comprised of three main components; the product, the product environment and ourselves.

Gaining understanding about the product

Initially any observation of the product gives us information that requires us to change our mental model to be able to explain the product. Any new observed element will be added to the model. Within just a few moments “random” observations will not be a very efficient source of information to build, refine and refactor our model. At this stage a more systematic approach to observing is necessary. One popular such approach is touring the software.

As we observe the product two different scenarios can occur. Observation can be explained with our current model, which builds confidence that our model is accurate. Although this might feel good, we do not learn anything new about the product. The second outcome, that the observation cannot be explained by our current model is much more interesting and will be the one that I elaborate on.

Adapting our model

When observations cannot be explained with our current model of the product we don’t throw away the whole model. We fix whatever flaw we see in it. We can remove parts, add parts or replace parts of it, so that the new model does a better job at explaining the product. In order to explain a complex product with a simpler model, a model that is useful for testers who has well-developed but still limited cognitive capacity, we need limit the amount of information that our model contains. This can be done by choosing, more or less consciously, which aspects that is important to us. Of many aspects, the one of model granularity is often substantially reduced. Sometimes we make generalizations, we assume based on inductive reasoning that similar things are identical. We assume that very similar buttons are in fact identical. That observed behavior to the stimuli received in this test will be the same of the observed behavior of a later test with same stimuli. When we decide which aspects that are important we do this from another model, where or product model is one part, which also can be amended as new information is revealed. Our product model is not the sum of our observations. Many observations is not included in the model and some parts of the model are based on inductive reasoning.

“Our product model is not the sum of our observations.”

That the model is not an exact representation of the product is important, since it will be an integral part when we test the product for value or lack of such. Decisions about which aspects are important to test, which risks there are and how design powerful tests will depend on the product model. As we test for value we will continue to improve on our product model. But some of our assumptions that are false might be very hard to refute. Especially as our focus is more on testing for value than for understanding of the product. Assumptions, even false ones, which are hard to falsify tend to grow stronger over time. Since they survived for such long time it is tempting to regard them as absolute true, or even cling to them despite of contradicting evidence.

 Detecting false assumptions

To increase our chances of detecting false assumptions we first of all need to know what our assumptions are. But here we need to restrict ourselves to certain kinds of assumptions. For every piece of our model is either built from something that we assumed to have observed correctly or inferred from such assumptions.  The kind of assumptions we are after is those that have no empiric evidence as support. The assumptions that are made wholly from our own reasoning, reasoning that makes sense in the light of our own faulty model. The things we “know” aren’t so.

As a first step we need to identify these assumptions of what is not, what never can happen. It is probably easier to detect these when one are learning a new aspect of a product. Example: While testing an embedded system I notice that the load on an internal bus is about 10%, I do some testing trying to provoke the system so that the bus load goes up. My best effort stresses the system to a bus load of 15%. Not much. Here I make an assumption, bus load will be low. I would be very surprised to see bus loads higher than 20%.  By thorough critical thinking of you product model you can discover these assumptions long after making them.

Given that we have some way of observing the aspect that we make an assumption about, we can add checks for those. Performance checks for things like bus load, memory usage or CPU usage. And also logical restrictions like this and this element will always be shown or never both be shown at the same time, or that this variable never will exceed 10. Then, with the right tool support, we can design continuous checks, restrictions, for these aspects. And be notified by the tool if these restrictions where ever violated. These background checks lie dormant while you carry on doing testing only to be notified if an restriction is violated. These checks are more often used in automated regression check suites. There they also help us detect violations but aren’t as effective since they run the same tests over and over whereas testing continuously make use of new tests.

So what if these restrictions are violated? Do we have a problem? The answer is both yes and no. It isn’t necessarily a problem that busload jumped up to 21%, that could still be an acceptable bus load which doesn’t result in any significant risk. But the fact that the bus load is 21% exposes an error in our model of the product. A serendipitous finding that gives us an excellent opportunity to learn more of the product in terms of what influence bus load. This adaptation of our mental product model reveal new potential risks, give us new test ideas and thereby affect another model of ours, the coverage model.

Enablers of Continuous Model Checks


Enablers of Continuous Model Checks.

To be able to add the power of these Continuous Model Checks or Restrictions if you may, there are things that are needed. (1) Known assumptions, (2) Observability of the aspect that we are making an assumption about, so that (3) a tool can help us monitor this aspect of the product.

Designing Alerts

That a tool is detecting a violation of a restriction does us little good if we are to focused on the aspects of the product that we are currently testing to notice it. To make sure these serendipitous events catches our immediate attention the alert needs to be forceful. Good alerts use both audio and graphics. And the graphical notification should be big, signaling colors with high contrast from background and if possible move around (jumping or shaking). All these properties of an alert increases our chances of notifying is in a variety of situations like headphones lying on desk or looking away from the screen of your monitoring tool.

As a closing remark I want to add that we need to be careful when adding monitoring to our system. Our monitoring test system is affecting the product that we are testing to some extent. It is our job to figure out how to balance the value of the information extracted and the risk associated with introducing monitoring.

This became a slightly longer post than I originally planned for. Maybe two years of non-blogging does that to you. Please share your thoughts and own experiences!

How ISTQB foundation made me feel bad about good testing

In the beginning of my career I was assigned to test a part of a very complex automotive system. The project was behind schedule and a fixed release date was closing fast. In the automotive industry there are fixed release dates and patching is very expensive since it requires the car to be brought to a workshop. The system developed was much more complex than the ones currently in production; unfortunately it was in terrible condition. The system crashed on a regular basis, features were missing or not working and there were a lot of re-design and requirement updates.

My approach was to divide the system on the different GUI views. I listed these in Excel. Then I added the subparts of the views into my Excel spreadsheet. By reading the different specifications and talking to the system owners I came up with a lot of things to test and added them to the items. The reporting was done by color marking the items based on my opinion.

  • Green: No problem found
  • Yellow: Small issue
  • Red: Major issue or many small issues (or that feature was untestable because of other defects)
  • Grey: Not tested

I reported issues with the system or specifications into a bug tracking tool. The severities of these issues were discussed with system owners and the project manager. If needed, I updated the colors of my excel items to reflect any new information regarding severity. I also added references to the bug reports next to the colored items. When they were retested ok I crossed them out but kept them in my spreadsheet for regression testing. Both system owner and project manager were very satisfied with my work.

The next step

Eager to learn more about testing I signed up for a course. At this point an international known course like ISTQB Foundation sounded as a good idea. It wasn’t… at least not for me…

The course was packed with text-heavy PowerPoint slides. There were almost no discussion around concepts and no actual practice. Since the goal of the training was to pass an exam there was a big focus on memorizing answers to questions. When I challenged some of the “facts” in the training material the teacher told me to just remember it and take it with a grain of salt.

The importance of salt

Unfortunately, as a rookie in software testing I did not bring enough salt. There was very little discussion about in which situation use certain approach would be recommendable. Context was not considered at all. When I got back to my assignment I tried to implement some of the stuff we had learned at the training.

I started to make detailed test case specifications and mapping these to requirements. I also spent time thinking on how to measure requirement coverage. After a while I realized that even if I did test all the requirements they would have changed when I was done. And updating the test specifications would take a lot of the time I could use for doing testing. And the requirements were nowhere near a full description of the system. So I stopped, and went back to my former way of testing and reporting. All good? Unfortunately not…

Everyone’s happy, except me

Although the project manager, system owners and suppliers thought I was doing a great job, and there was really good progress, I couldn’t lose the feeling that I was cheating. I did not do it the “right” way… How could they be happy about my work? Was it because they didn’t know anything about testing? For a couple of months I got nothing but praise for my excellent work but still felt bad. Then I went on a seminar called “Beyond Test Cases” by James Bach… He made me realize that there was no right way of doing testing and that testing is not about presenting a number of failed or passed test cases. Nor is it about producing requirement coverage metrics. It’s about questioning the product to reveal information about it which is useful for the stakeholders. And the way you approach this challenging task is different in every situation. When the assignment ended and the customer made a performance review of me I got 4,7/5 with the motivation ” I don’t want to give you all 5’s because you might get cocky”. That was great, but the best part was that I finally felt that I had done a great job and that I deserved the praise.

Thank you James for showing me a better path!

Perfect software and other great books about testing

I just finished to read “Perfect software and other illusions of testing” by Gerald M. Weinberg. This book is highly recommended in the context-driven testing community.  I had high expectations, but I am happy to announce that the book fulfilled them all. I read the book in a couple of evenings and have probably only understood a very little part of it yet. But the book is full of interesting concepts and is written in a way that you just want so read on and on… I hope to read this book many times…

Jerry has great offers of leanpub bundles right not. This book is included in “The Testers Library“. It consists of the books

  • Perfect Software and Other Illusions About Testing
  • Are Your Lights On: How to Know What the Problem Really Is
  • Handbook of Technical Reviews (4th edition)
  • General Systems Thinking: An Introduction
  • What Did You Say?: The Art of Giving and Receiving Feedback
  • More Secrets of Consulting: The Consultant’s Tool Kit
  • Becoming a Technical Leader
  • The Aremac Project

I’v started to read the second book in the bundle and it’s as interesting as the first. This bundle of e-books are now available for $49.99 which so far seem like a bargain.

I actually bought several bundles now when they was discounted. A total amount of 29 books. And I am really exited to read them all… And I could probably do that in a month… But just reading books is not,for me, an worthwhile activity… learning from books is… So these books will probably last decades…

Actually, solving my problem with desiring these books by purchasing them, crated a lot of new problems for me. Fortunately, the second book in the series is about solving problems… First of all I need to figure out what the problem really is… which I’v learned is a very challenging task…