This is the third post in the QA myth busting series.
Some managers believe that testing slows work down and, as a result, they cut back on testing. On the other end of the spectrum, we have managers who believe that more testing means better quality. This myth results in two inefficiencies: doing unnecessary testing and performing testing while ignoring other QA measures.
I’ve encountered managers who universally apply so-called “testing best practices,” such as mandating code reviews, setting 80% code-coverage goals, or demanding full manual regression testing for every change. Some managers take it even further by mandating two consecutive code review phases or 100% code coverage, believing that the additional testing will automatically result in higher quality.
To unravel the myth that more testing means higher quality, let’s start with some definitions.
What is testing?
Testing is an inspection — an examination of what we are producing to verify that it is “done” and functions as expected. Testing aims to surface defects and if any defects are found (or the work is determined to be incomplete), the item is sent back to production for further work.
As we agreed in the article QA myth busting: QA slows work down, testing is an integral part of production: we don’t want to give the customers anything that is incomplete or poorly produced. Giving customers unfinished or defective products will usually cost us much more and slow down development due to rework and addressing bugs.
So, how do we make sure that the work we’re testing is “done” and has no defects?
We need to understand what amount of testing will give us the desired confidence in the product we’re building.
Testing depends on complexity and risks
The required amount of testing depends on product complexity: the more complex the product is, the harder it is to gain confidence that we made it well.
The required amount of testing also depends on the risks the product can impose: the higher the risk, the harder it is to gain confidence that we made the product well.
For example, my leatherwork hobby involves a very simple product: a leather wallet. I can gain enough confidence in its quality through a simple visual review of the stitches, leather edges, and snaps and by performing basic user operations such as opening and closing the wallet and inserting and removing money.
My leather wallet also imposes very little risk. In the worst case scenario — if it breaks — my friend (who receives the wallet as a gift) will only be slightly dissatisfied. Considering this small risk, there’s little sense in employing testing methods like load-testing the stitches, or testing it under extreme weather conditions.
Now let’s consider software for an X-Ray machine like Therac-25. The complexity of the software is obviously enormous: it not only has to control the hardware, but also stay functional for years, and prevent unintended harm from an inexperienced operator. Due to this level of complexity, there’s a huge area for potential error during production. Testing such a software would require an extreme variety of verifications: from ensemble (mob) manual testing for all the modules, to all the possible types of automatic testing on all the levels.
The risk of malfunctioning of such a machine is multiple human lives, or irrevertible health damage. Considering this risk, pretty much all the possible testing measures are to be considered.
The difference between Therac-25 software and a leather wallet in terms of complexity and risks is obviously huge, therefore gaining the confidence requires very different testing approaches.
This means that applying the testing approach from Therac-25 software to my leather wallet makes little to no sense. For example, ensemble (mob) testing of each individual leather piece will likely find no additional bugs in the wallet, therefore it would be a waste of time and effort. It also doesn’t matter how many testers close and open the snaps — one is enough.
Testing strategy must be designed for your product
These examples of two very different products clearly show that there’s no universally applicable “best practice” of how much testing is needed. Different products require different testing strategies. A test strategy is composed for the product, its complexity, economical model of the company, and the risks the product has and imposes.
Just because one company does X type of testing, aims for Y% coverage, insists on a zero bug policy, only tests in production, etc., this does not necessarily mean that these practices will fit your product.
Blindly applying testing approaches you saw or read about elsewhere, or simply adding more testing to your product will not guarantee better quality.
With testing, we should stop thinking in terms of “more” or “less” and instead focus on finding the right amount of testing needed for each situation.
In economics, there is a principle called the law of diminishing returns, stating that when you keep increasing investment in one factor of production while keeping all others the same, you will reach a point where the further increase will yield less and less results.
For example, when I’m inspecting my leather wallet for issues, I find most of them within the first 6-7 minutes. After the 7th minute, testing more yields little to no results.
Software products are clearly much more complex than leather wallets. However, for each type of testing there will be a point of diminishing returns.
For instance, code reviews remain a popular testing practice despite studies showing they have low efficiency in finding defects. While a code review process can benefit some products, I’ve yet to see a product where adding an additional stage of such a low-efficiency practice makes sense. Usually, a second code review is akin to me testing my wallet for 10 hours instead of just 7 minutes: little to no new defects will be found as a result.
The higher the product’s complexity and risks, the more diverse the set of testing approaches usually should be.
The recent example of the Crowdstrike incident showed us that while the company employed certain types of testing and invested quite a lot in them, the diversity of testing approaches clearly wasn’t good enough for their product complexity and risks. To improve quality, they will have to employ a more diverse set of testing practices instead of simply adding more efforts in overall testing of the resulting product.
More testing and little QA can harm the product
Testing is an inspection, it does not improve the quality on its own, it simply reports the defects.
If the dev team finds multiple defects in each testing session and fixes the issues, these defects do not cause any harm to the customer. This means the development team excels in testing. However, if the dev team doesn’t employ QA measures, the defects will keep appearing and their number will only grow, requiring more and more investment in testing.
There’s a story on Reddit from a developer who supposedly worked at a major tech company in which he describes how bad the company’s codebase is and how much testing they have to employ for each change in order to reduce the risk of bugs slipping through to users. If the story is true, then it serves as a good example of a situation where the testing strategy actually resulted in system quality getting progressively worse. They invested so much in testing while ignoring other QA measures that they created a system where any change to the codebase takes 2-8 weeks!
One of the possible QA measures for this situation would be to invest in continuous refactoring and continuous training in an ensemble (mob) environment. This would allow the company to slowly progress in cleaning the “unimaginable horror” of the codebase, and ultimately lead to having to invest less and less in testing.
One of Dr. Deming’s 14 points for quality management describes exactly this point:
3. Cease dependence on inspection to achieve quality. Eliminate the need for inspection on a mass basis by building quality into the product in the first place.
Testing finds the defects, QA prevents the defects.
However, even if the product is built well and has little to no defects, there still may not be enough investment in QA.
QA starts with understanding the client
In 2009, Google Wave was introduced as “a new communication platform for a new Web”, a framework for real-time collaborative online editing, with emails, chats, groups and various other mediums all in one place. As usual, they managed to create a huge amount of hype for the product, but after only one year, the product was discontinued. There weren’t many reports on product defects, and the code quality for a new product is usually quite good at Google. According to Google, the reason for the decision was that “Wave has not seen the user adoption [they] would have liked.”
If you have a well-built product with very few defects and users still complain or simply avoid using the product, it’s a clear sign that you need to invest more in QA efforts.
As quality is a match between what’s desired and what’s produced, misunderstanding of the clients’ desires is a quality problem. No matter how well you execute the production and testing, if the product is not what the clients desire, you’ve invested all of that money in vain. Google Wave is one of the thousands of examples of products where a lack of understanding of the customers' needs and wants led to the project's death.
One of the main QA measures is constant communication with the clients. Google Wave was known for limiting the number of beta-testers outside of the company, significantly reducing the feedback flow from the real users until it was too late. If they had invested more in QA measures that focused on communicating with customers, the chance for success of this product would have been higher.
Good testing strategies require context
Clearly, the myth that more testing means higher quality is not only false, it’s a risky mindset. There is no definitive list of “best testing practices” one should follow. Each testing strategy should be designed for the specific context of a product, company, and its people, and define the exact amount of testing required to gain good confidence in product quality. Adding more testing might actually harm the quality of the product since it might hide the overall decline in the internal quality. What’s even worse, relying too much on testing might stop us from investing in essential QA measures like customer communication, constant learning, or refactoring.