Back in 2005, the team supporting Google Web Server — the server that handles Google Search queries — was writing code riddled with bugs. According to Adam Bender, now Principal Engineer at Google, at one point over 80% of code pushed to production contained user-facing bugs that they then had to roll back.
Eighty percent is a scary ratio for any company and any product, but this is Google, and, according to Bender, GWS was “as important to Google Search as air traffic control is to an airport.”
The GWS team solved this problem using a strategy that eventually swept the entire tech industry: automated testing led by suites of unit tests. All new code changes eventually required tests, and all tests were continuous. After a year, Google slashed the number of emergency pushes in half (despite the team contributing a record-high number of pushes).
By now, unit testing has become an industry-wide norm. Though there might be debates about test-driven development (TDD) and the ratio of one test type to another, very few disagree with the importance of unit tests.
The downside of this normalization is that developers who haven’t witnessed the before and after change might not appreciate the impact. Worse, the normalization of unit testing doesn’t mean there’s a consensus on how to do it well. This guide will take you from basics to best practices, but first, let’s start at the beginning.
What is unit testing?
Unit tests, as the name implies, test individual units in isolation – primarily classes, modules, and functions.
If end-to-end tests, which test how an entire system functions together, are on one end of the spectrum, unit tests are on the other. They are, as Chris Coyier, founder of CodePen, writes, “little itty bitty sanity testers.”
Even though an individual unit test is often itty bitty, the accumulated effect of using an entire suite of them is anything but. Jeff Atwood, founder of StackOverflow, wrote back in 2006, “The general adoption of unit testing is one of the most fundamental advances in software development in the last 5 to 7 years.” Few would disagree.
The consensus on the importance of unit tests, however, doesn’t translate into a precise consensus about what a “unit” really is. Martin Fowler explains, “If you ask three different people what ‘unit’ means in the context of unit tests, you'll probably receive four different, slightly nuanced answers.”
In a functional language, for example, a unit will probably be a single function. But in an object-oriented language, a unit could be a single method on the small side or an entire class on the larger side.
Different people also disagree about how to test classes called by the class you’re testing (sometimes called “collaborators”). Some argue that you should substitute all collaborators with mocks or stubs, and some argue you should only substitute the slowest, biggest collaborators.
Luckily, Fowler cuts through the debate, writing, “To a certain extent, it's a matter of your own definition, and it's okay to have no canonical answer.” Ultimately, the most important part of unit testing is that you do it, and the second most important part is that you do plenty of it.
Unit testing is the foundation of the software testing pyramid
The test pyramid is a mental model that visualizes the ratio of test types developers should use.
As you can see in the image below, unit tests form the foundation, and end-to-end tests form the top (with integration tests in the middle). Generally speaking, you should have many unit tests and relatively few end-to-end tests.
At Google, for example, there’s a strong encouragement to focus on unit tests. As Bender writes, “We tend to aim to have a mix of around 80% of our tests being narrow-scoped unit tests that validate the majority of our business logic; 15% medium-scoped integration tests that validate the interactions between two or more components; and 5% end-to-end tests that validate the entire system.”
The test pyramid isn’t without its detractors. The Spotify team, for example, argues for a honeycomb shape that emphasizes integration tests. That said, most debate tends to circle around how to define and how best to operate integration and end-to-end tests. Different teams will disagree about the importance of these tests, but few dispute that unit testing is the foundation.
As Bender writes, “Unit tests form an excellent base because they are fast, stable, and dramatically narrow the scope and reduce the cognitive load required to identify all the possible behaviors a class or function has.”
Two types of testing
Unit tests only really come in two different forms: Manual and automatic.
Manual testing is laborious because unit tests are laborious to write by hand. It can be difficult for developers to isolate independent units and test all the possible faults those units could have.
Thankfully, few developers have to manually write unit tests. The major change Google made to its testing policy used automatic unit testing, for example, and that was back in 2005. Bender, as well as many other developers, don’t consider automation a nice-to-have feature. Testing frameworks like JUnit and testing tools like Selenium and Qase have become standard.
“Attempting to assess product quality by asking humans to manually interact with every feature just doesn’t scale,” Bender writes. “When it comes to testing, there is one clear answer: automation.”
The advantage of automation, beyond sheer convenience, is reduced cognitive load and the ability to unit test continuously. By now, the advantages of unit testing and the advantages of automated unit testing, in particular, are typically considered synonymous.
Advantages and disadvantages of unit testing
The advantages of unit testing are plentiful and significant. Most of the disadvantages of unit testing tend to come from testing poorly or having an unhealthy test suite.
The advantages of unit testing can be broken down into two categories: Functional advantages and psychological advantages.
The functional advantages are captured by the example from Google: Automated unit testing cuts the amount of emergency pushes in half. In the process, the GWS team became much more productive, and now, GWS has tens of thousands of tests and pushes releases almost every day.
Another example is Wolt, a food and merchandise delivery platform. The company had long used unit tests, but until recently, Wolt engineers used a mixture of spreadsheets and documents to manage its tests. Now, Wolt runs over 5,000 tests per month, and almost one-fifth of them are automatic — significantly raising the speed of product development.
The psychological benefits of unit testing tend to emerge as feelings of courage and confidence. For example, Tim Bray, formerly a Distinguished Engineer at Amazon, explains, “Working in a well-unit-tested codebase gives developers courage. If a little behavior change would benefit from re-implementing an API or two you can be bold, can go ahead and do it. Because with good unit tests, if you screw up, you’ll find out fast.”
“Fail fast,” a core concept of Lean and Agile, can be threaded throughout the software development cycle via testing. Immediate failure (or, really, feedback) ensures developers are iterating as they code rather than debugging an entire block of code at once.
Bray explains that this advantage is even more beneficial as companies scale. “Writing good tests helps the developer during the first development pass and doesn’t slow them down,” he writes. “But I know, as well as I know anything about this vocation, that unit tests give a major productivity and pain-reduction boost to the many subsequent developers who will be learning and revising this code.”
In this way, well-written unit tests help developers in much the same way well-written documentation does. Bender agrees, writing, “Clear, focused tests that exercise one behavior at a time function as executable documentation.” If a developer wants to know what a given piece of code does, they should look at the test that covers it.
The disadvantages of unit tests (and here, we really mean unit testing done poorly) result from flaky tests and unhealthy test suites.
A flaky test is a test case that can pass or fail despite the tested code remaining the same. Flaky tests are unreliable, defeating the purpose of testing and, worse, discouraging developers from testing entirely. According to 2022 research, developers experience flaky tests every month — at least.
Flaky tests annoy developers trying to stay in flow, but they’re dangerous at scale. In 2020, Benoit Baudry and Martin Monperrus found that Jitsi Meet — a Zoom competitor — had several pseudo-tested methods, a term the writers use to describe test suites that only appear to test a given method’s behavior. One method, for example, was covered by 11 test cases, but “no test case [called] the method directly.”
Flaky tests can be discouraging, which makes maintaining a healthy test suite an essential part of effective unit testing (and testing in general). A healthy test suite isn’t flaky, meaning developers can trust the testing process, and a healthy test suite is stable and fast, meaning developers can run tests continuously without having the tests disrupt their workflow.
“Tests derive their value from the trust engineers place in them,” Bender writes. As a result, he argues, “A bad test suite can be worse than no test suite at all” because developers can start pursuing workarounds rather than returning to bad tests.
How to write unit test cases
Writing good unit tests starts with a standard but reliable rule of thumb: Use one test class for every production class.
As Ham Vocke, an engineer at Stack Overflow, writes, “You can write [unit tests] for all your production code classes, regardless of their functionality or which layer in your internal structure they belong to. You can unit test controllers just like you can unit test repositories, domain classes or file readers.” Just stick to the rule of thumb.
Beyond this basic rule, you’ll also want to closely monitor what you’re trying to test. The most effective unit tests cover all relevant code paths, including the ideal cases and edge cases, but don’t try to capture all of the implementation details.
If your unit tests are close to production code, Vocke explains, your unit tests will break when you refactor. Bender agrees, stating, “A test should contain only the information required to exercise the behavior in question.”
When writing tests, many people recommend a structure called Arrange, Act, Assert. When you arrange, you set up the test and capture its inputs and targets. When you Act, you plan the steps that will cover what you’re testing (e.g., a function or method, API, etc.). When you Assert, you write down your expected outcomes so, after running the test, you can see whether your test passed or failed.
Following this pattern, you want to focus on testing the observable behavior that results from your code instead of mirroring your internal code structure inside your unit tests.
Perhaps more important than the quality of any individual unit test is the health of your test suite. Some amount of failure is inevitable, so your test suite needs to make it easy for developers to address test failures.
Good tests might fail correctly, but a bad test suite can still create a bad testing experience. Bender writes, for example, that “Allowing failing tests to pile up quickly defeats any value they were providing, so it is imperative not to let that happen.” The most effective testing suites function alongside a software development process that enables developers to fix broken tests within minutes and address failed tests rapidly.
The end goal is confidence. As Bray writes, developers should feel so comfortable with their unit tests that they can “run them every few seconds.”
“If you liked it, then you shoulda put a test on it”
We started with Google and are ending with Google not because the company is some legendary, peerless organization that everyone should copy but because the company has been around long enough to show the before and after of the testing revolution.
As Bender writes, “The changes in GWS marked a watershed for testing culture at Google as teams in other parts of the company saw the benefits of testing and moved to adopt similar tactics.”
Similar moments happened throughout the industry and across many companies; testing has now become normalized. By 2006, Jeff Atwood came to consider unit tests “a first-class language construct,” and in 2023, over 60% of developers had automated testing processes available.
Google captures the philosophy best in the cheekily named Beyoncé Rule: “If you liked it, then you shoulda put a test on it.” Do you like the code you’ve written? Do you like the product you’re contributing to? Then, test, test, test.