Bugs aren't the problem, they're symptoms of a problem
In IT, when we find a bug, we fix it. A fix is a solution, so it seems logical that a bug is a problem, and as soon as it's fixed, we're all good, right?
Not quite. Bugs are not problems but rather symptoms of a problem.
Think of it this way: if you have foot pain and consider the pain to be the problem, you might take painkillers as a solution. But if a fractured bone causes the foot pain, the painkillers aren't the solution. Taking them without addressing the actual problem — a fractured bone — will only mask the problem and could lead to further injury.
In IT, fixing bugs without understanding the underlying problem (or problems) can also be risky. Bugs are often symptoms indicating that something is wrong in the development. If that’s the case, fixing bugs will only mask the more serious issues and potentially cost the company more money in the long run.
For example, if management constantly rushes the development team to ship code as soon as possible, they will inevitably produce buggy code. If testers are also rushed, they won’t have time to find and report all the bugs. Even if testers manage to find all the bugs, developers will have to interrupt their work to address them, and that context switching will make the whole process more expensive.
In this scenario, the prevalence of bugs is a symptom of a deeper problem with the team's processes. Constant bug fixing can successfully mask this problem. However, if the process problem was addressed, the team wouldn’t waste so much time on testing and fixing and would have more time for valuable work.
Remember: bugs aren't the problem; they are a symptom. Treating the symptom without addressing the problem is illogical and risky.
To define a problem, we have to identify and analyze the symptoms
If you go to the doctor with foot pain, they won’t just prescribe a painkiller and send you home. First, they must understand all your symptoms (including the pain) and run tests to determine the problem. If the imaging and examination confirm that a shinbone fracture is the cause of your symptoms, the doctor will immobilize the leg to prevent further injury and administer treatment for the exact bone fracture type.
In IT, there’s the temptation to rush straight to the “painkiller” option (fixing the bug) and stop there. But I believe it’s essential to look deeper into the issue, particularly when bugs occur regularly.
In my diagnostic experience, I have seen multiple reasons why bugs occur, and I can categorize them like this:
- Lack of understanding of what needs to be done: The feature isn't working as expected. For example, the payment feature is supposed to open the payment dialog for both authenticated and guest users when they click the "Pay now" button. However, nothing happens when the tester clicks "Pay now" as a guest user. In this case, the developer did not understand that the payment dialog should open for both authenticated and guest users.
- Lack of understanding of the conditions of use or "failing to satisfy Non Functional Requirements (NFRs)." For example, a tester would understandably treat it as a bug if the payment dialog only opens after the page has been unresponsive for 5 minutes. If the developer does not understand the conditions of use (the dialog should open immediately upon clicking, regardless of whether the user is authenticated or a guest), fixing it once will not fix the root problem.
- Lack of understanding of engineering. This usually occurs when features interact unpredictably with other features. For example, the tester logs in as an authenticated user and opens the payment dialog. Then, the tester logs out, but the payment window with all the previously logged-in user’s details remains open. In this case, the developer lacked information about how the module should interact with the login/log-out functionality.
These are all cases of information loss in communication and development. Most of them could have been prevented if all the necessary information was shared between team members.
We must also consider how culture affects these situations. If people have conflicts or aren't genuinely interested in the product quality, the information loss will be even more significant. Think of it like a severe infection in addition to the fractured bone — the infection will not only slow down recovery but also affect the choice of medication and treatment.
Bugs should be collected and categorized. Then, the most prominent category should be analyzed to determine the underlying cause so that the cause can be addressed — preventing more bugs as a result.
To solve the problem, we must choose the optimal solution
In medicine, one problem can often be treated in many different ways. After the doctor examines your leg and diagnoses a fractured bone, one of the many treatment options will be prescribed: from simple cast immobilization to surgery or even bone grafting.
Similarly, in IT, each problem can have multiple solutions. To choose the best solution, the team must have a solid understanding of the context, including the team structure and maturity level, current processes, the nature of the business, and the product in question.
For example, I’ve had experience with various treatments for problems linked to “lack of understanding of what needs to be done,” such as:
- a kickoff meeting in which developers and testers agree with a product manager on their understanding of the feature and write down a more detailed specification, including NFRs, edge and negative cases, and all happy paths. This meeting can also be used for requirements testing activities.
- A developer pairs with a product manager to build the functioning prototype of a feature so that the understanding of the feature is clarified during the work.
- The team uses ensemble (mob) work to build and test the feature so the specification is written collectively alongside the feature development.
All solutions to the information loss problem require that testing information (expectations, NFRs, edge cases, happy paths, and even testing scenarios) be stored and shared among team members. A test management system like Qase is the bare minimum. Similarly, in medicine, patient medical records were invented so that all medical staff could record and share all necessary medical information.
Each of the solutions has its own advantages, drawbacks, and costs. For instance, even though ensemble work is often the most efficient, not all managers will agree to devote the whole team to just one feature. In many cases, I've seen that even pair work is too much of a process change for a company.
For each problem, there are many solutions; we have to use context to determine which is best for each specific need and team.
To gauge the effectiveness of a solution, we must analyze and revise it
Going back to the fractured bone example, if the prescribed treatment is not improving your condition, the doctor might send you for additional tests such as blood work or an MRI. Each medical professional involved in the treatment will rely on your medical records to analyze what has been done and determine what to try next. Imagine how inefficient your treatment plan would be without those shared records. It would be nearly impossible for doctors to investigate and treat your issues properly.
In IT, if many bugs persist, we need to investigate and revise the solution. For instance, if kickoff meetings involving developers and testers have not significantly decreased bugs or improved shared understanding, they are not the best solution.
Let’s walk through the analysis process. First, we use our test management system to review test cases and plans devised during the kickoff meeting. We discover that the bugs could have been prevented if the developers had managed these test cases and test plans. We can now confidently say that the chosen “kickoff meeting” solution didn’t improve the situation, and we must devise a different approach or treatment.
But why didn’t the solution work? Further analysis revealed that we assumed everyone would be motivated to participate in the kickoff meeting and follow through on the agreed-upon tasks. However, digging into processes revealed that testers have KPIs linked to finding bugs, which naturally discouraged them from participating in meetings that distracted them from their bug-hunting work.
Luckily, using the TMS to store the test cases and test plans gave us the information and context to determine what went wrong and where we could go from there.
No diagnosis is perfect. Using a TMS and documenting processes is like updating medical records — storing information for future analysis makes reassessing and experimenting with new treatments and solutions much easier.