In IT, we talk about bugs all the time. Users get irritated by bugs, developers try to avoid them, testers help developers find and exterminate them, and Quality Assurance specialists help improve the processes and culture so that less bugs emerge.
So, what is a bug? A bug is when something doesn’t work as we want or expected. However, bugs do not only occur in code or mechanisms. Processes are often buggy as well.
For example, many companies introduce Friday Code Freezes. The goal for this process seems quite reasonable: dev teams do not want the new release to impose risk of users experiencing bugs in the code during the weekend while the dev team is off. However, this approach causes unintended consequences. First, every release should give value to the clients, and if the release is ready by Friday but not deployed, the value is not given during the weekend. Second, when the dev team knows that there are no releases on Friday, there’s very little motivation to invest in proper QA measures which will reduce the risk of bugs in the code in all releases.
As these consequences are unintended and hurt the product and the company, I call them process bugs.
Buggy processes introduce risk
In human labor, mistakes are inevitable. We learn and improve our skills by discovering mistakes and bugs, and adjusting our approach.
In code, bugs are inevitable. When we see that the code doesn’t work or doesn’t work as we intended, we change the code and, as a result, learn to write better code.
In a business, we certainly do not want bugs in the code to affect our customers. This is a risk for us: to lose reputation, customers and money.
To reduce the risks of bugs in code, we apply Quality Assurance and Quality Control measures:
- Quality Control measures are aimed at detecting bugs before they reach the client, and provide us with information on how and what to improve.
- Quality Assurance measures reduce the likelihood of bugs (e.g., continuous learning or quality circles) and lessen their impact (e.g., CI/CD with quick rollback).
The benefit of detected bugs in the code is that we can learn from them. Undetected bugs, however, represent a risk.
Some bugs, both in code and processes are quite obvious.
For example, in Javascript development a common bug occurs when you don’t check for the type of the passed argument in the function and use it straight away:
return a + b;
}
let result = add(5, "10"); // "510" instead of 15
Missing the type check increases the risk of unexpected behavior of the code. This risk is one of the reasons why TypeScript was introduced.
Similarly in processes, some approaches and activities are well-known in the industry for causing unintended consequences, i.e. bugs.
One of the most well-studied examples of a buggy process is introducing a KPI like “number of found bugs” for testers. The mere introduction of such an approach causes a few peculiar unintended consequences.
First, it encourages behavior that focuses on achieving these metrics instead of doing the job well. There’s Goodhart’s Law which states “When a measure becomes a target, it ceases to be a good measure.” Consciously or subconsciously, given the measurable incentive of finding more bugs, human beings will focus on finding more bugs. This sometimes becomes quite comical: I’ve seen teams with KPIs for finding bugs that lead to testers logging extremely trivial issues as bugs — issues that could have been solved straight away if a tester simply told the developer about them.
Second, activities which aren’t measured, may be ignored by the testers. If the goal is logging more bugs, naturally there’s no incentive to help others or do anything else which is not incentivized.
This buggy process isn’t new at all. In 1902, under French colonial rule in Hanoi, Vietnam, the colonial government introduced a bounty program to control the rat population, essentially a KPI for the rat catchers. The initiative offered a reward of one cent for each rat killed, with proof required in the form of a severed rat tail. However, colonial officials soon observed an unintended consequence: rats with no tails began appearing throughout the streets of Hanoi. The local rat catchers, seeking to maximize their earnings, would catch rats, sever their tails, and then release them back into the sewers, allowing the rats to continue breeding and thereby perpetuating the cycle.
For more information on bugs caused by KPIs, please watch my talk, Metrics and KPIs: Managing upwards and measuring success or read the article.
Addressing bugs in processes is just as important as addressing them in code (or even more important)
Processes are the way people work and interact. Code is the result of human work, usually a result of a team collaboration. Thus, code is created by people within the framework of work processes.
If the work processes have bugs, the interactions between team members will be less optimal or even corrupted, and as a result, there’s a higher risk that the code delivered to customers will have more bugs.
In my article about the CrowdStrike outage, I explore how poorly designed processes lead to bugs in the code reaching the customers, causing everything to go downhill.
Bugs in processes often lead to bugs in code affecting the customers. A very common example of a buggy work process is imposing deadlines on developers and testers. When rushed, people naturally will either cut corners or simply miss the bugs which could be found.
Both bugs in code and bugs in processes introduce risks of unintended behavior and consequences. Often bugs in processes have a much wider effect than bugs in code, sometimes even causing casualties.
The recent Titan Submersible disaster is a good example of how buggy processes led to the deaths of five people: it seems the entire research and production process was riddled with flaws — rushed research and development, inadequate testing, little regard for safety standards, poor risk management, and insufficient training.
There’s also clear dependency: the fewer bugs we have in our processes, the more efficiently we work.
I believe that exterminating bugs in processes is even more critical than fixing bugs in code. By prioritizing the elimination of process bugs, we enhance overall efficiency, improve collaboration, and significantly reduce the occurrence of code bugs, too!
How to address bugs in processes
We easily detect bugs in code, but detecting bugs in processes is often much more difficult.
First, processes rarely become the object of analysis. Many frameworks (like Scrum) postulate the unquestionable necessity of process activities. We also get used to processes and stop “noticing” them, and even if we see bugs in them, we don’t want to fix the bugs since we are accustomed to working in a certain way.
Second, it is much harder for us to analyze human interactions than to analyze code. Code does not change its behavior if we run it through a debugger (in most cases), but people, when observed or questioned, start to behave differently.
And even if you sit down and analyze a process and think you’ve found a bug — for example, noticing how individual bonuses negatively impact teamwork — often, your personal initiative to change the process won’t be enough. Everyone is allowed and encouraged to fix bugs in code, but processes are often dictated by management.
When we encounter a bug in the code which results in significant financial losses as in the CrowdStrike case, we would urgently fix the bug either by rolling back to the previous version or fixing it, then conduct a postmortem to understand how the bug occurred. After releasing a fix, we would then decide what needs to be done to prevent similar bugs in the future. For example, we might decide to further train developers in handling infrastructure and change our approach to testing code before deployment.
However, when we encounter a process bug which results in significant financial losses (for example, many people leaving the company due to a return-to-office mandate), we usually do not fix the bug, roll back to the previous version, nor fix the current one. We also do not conduct a postmortem to understand how and why the decision to implement the flawed process activity was made. And since no post mortem is done, we do not decide what needs to be done to prevent similar bugs in the future, either. We certainly do not decide to further train managers in dealing with people or change our decision-making approach.
I believe these incident investigation activities should be carried out for each discovered process bug similarly to incidents caused by bugs in code. This is a reactive measure, when something bad has already happened. But we are limited only with reactive measures for bugs in code and processes.
There are three approaches to reducing the risks of bugs in code:
- Reactive: when a bug reached the customers: rollback and quick fix, plus postmortem
- Interactive: when the work on a feature is not finished yet and we want to quickly verify it for bugs through various types of testing, from unit testing to canary testing
- Proactive: before the bug happens, we embrace continuous upskilling for developers and testers and ensemble (mob) programming.
Reducing the risks of bugs in the processes affecting the team and the product has the same three approaches:
- Reactive: when we witnessed the effect of the bug we rollback and and do a postmortem
- Interactive: when we are following a certain process and we want to review it for bugs, we use quality circles, retrospectives, or, what I invented, Process Decision Record. The approach is simple: for each process activity you define the problem, the solution (process activity) with its pros and cons, and the review date. When the review date comes, you reassess the problem, the process activity, and try analyzing its efficiency. This way you provoke critical thinking and reevaluation of each process activity.
- Proactive: before the bug occurs, study the areas of knowledge I listed in my continuous learning for QA professionals article in order to better understand how humans interact with each other and with information systems. The better you understand the scientific principles of teamwork, the less bugs in processes you will have.
The worst bugs appear because companies are treating people like machines. We must acknowledge people for their beauty and messiness, and find ways to build reliable processes where people will happily work together and collaborate efficiently.
Don’t let buggy processes slow you down
No one likes bugs in code, and we in IT do our best to prevent them or fix them as soon as they are found. However, not only code can be buggy, processes are often buggy too. While bugs in code cause certain harm, I believe that bugs in processes can cause more, and they are much harder to find. To start dealing with bugs in processes, we must gain solid knowledge on how people interact, and apply proactive, reactive, and interactive QA measures to the processes.