A guide to integration testing

Black and white illustration of a serious person working on a computer

Integration tests are often seen as the awkward middle child of the software testing world. In the test pyramid, integration tests are literally in the middle, and even amongst developers who prioritize testing, few agree on a tight definition of integration testing.

At first glance, the meaning is clear: While unit tests cover the smallest units of code (primarily classes and methods) and end-to-end tests cover how the whole system works, integration tests cover how different components of the app work together.

From this standpoint, it seems clear, but the confusion is in the details. Even Martin Fowler — one of the people who popularized the software testing pyramid — is wary of the term “integration test,” writing, “when I read it, I look for more context so I know which kind the author really means.”

Table of contents

What is integration testing?

Before we dive into defining and explaining, let’s take a step back. Some of the confusion that tends to erupt from discussions of integration testing comes from a history of testing that precedes modern ways of thinking about software development.

Essentially, as Martin Fowler covers, integration testing became popularized in the 1980s when the waterfall method of software development was the standard. With large projects, Fowler writes, a design phase would start the development process, and developers would each take on modules to build — all in isolation from each other.

Testing, far from the test-driven development often done today, wasn’t done by developers at all. “When the programmer believed it was finished, they would hand it over to QA for testing,” Fowler writes.

Unit testing tended to take precedence, as it does now, and integration testing usually came after. Here, Fowler writes, QA teams tested how these modules fit together, “either into the entire system or into significant sub-systems.”

Already, you might see how this model shows its age. Fowler points out, writing decades in retrospect, that this process conflated two different forms of testing: Tests meant to cover how separate modules work together, and tests meant to see whether a system comprised of many modules worked as expected.

The devil is in the details, but still, there’s an anchor here: Integration tests, whether you want to consider them one type of test or a bundle of slightly different tests, focus on testing how multiple components and modules work together.

Integration tests and the test pyramid

Another way to provide some clarity around integration tests is to look at the testing types that complement them: unit tests and end-to-end tests.

In the test pyramid, integration tests take the middle section. The test pyramid doesn’t capture which tests are most important, but tries to model the proportion of tests teams should run. Most of your tests should be unit tests, and they should run rapidly, whereas relatively few tests should be end-to-end, and those should be run more rarely.

Integration tests lie somewhere in the middle. With the pyramid model, the bounds of what is and isn’t an integration test can be a little unclear, so Chris Coyier, co-founder of CodePen, suggests seeing the three testing types as a spectrum.

He writes that unit tests are “little itty bitty sanity testers” on one end of the spectrum and that “big beefy” end-to-end tests, which “prove that lots and lots of systems are working together as intended and that a user experience is in good shape” are on the other end.

Integration tests, Coyier writes, “are a dot somewhere in the middle of that spectrum.” The vagueness here is purposeful. With integration tests, even more so than the other testing types, the primary difference isn’t the sheer amount of code included in a test — the differentiator lies in what is being tested and why.

Returning to the pyramid, another way to find clarity is to see how teams can slip into constructing a poor testing suite. Adam Bender, who wrote the chapter on Google’s testing processes in the company’s famous Software Engineering At Google book, shares two of the most common ways the testing pyramid can break.

In the first example, which we covered more in an article on end-to-end tests, the test pyramid becomes an ice cream cone. Developers write too many end-to-end tests and too few integration tests and unit tests. “Such suites tend to be slow, unreliable, and difficult to work with,” writes Bender.

In the second example, the test pyramid becomes an hourglass. Developers write too many end-to-end tests and unit tests but too few integration tests. Benders explains that this anti-pattern “results in many end-to-end test failures that could have been caught quicker and more easily with a suite of medium-scope tests.”

Here, you can see how that sometimes vague middle ground serves a useful purpose. End-to-end tests are almost invariably slow (hence why there should be relatively few of them run more infrequently), but unit tests can’t cover the interconnected functions that make an application work as expected.

For example, if a test suite primarily uses unit tests that cover single classes, the test suite may pass even when the feature as a whole doesn’t work. Without integration tests, failures between modules can be invisible until an end-to-end test.

Delays like these are inefficient, slowing down development momentum, but they’re also demoralizing. Bender argues that, even though the test pyramid provides a general heuristic, the best mix will be context-dependent.

“When considering your own mix,” he writes, “you might want a different balance.” An emphasis on integration tests might make for a slower-to-run test suite, but you might find more issues that would have otherwise gone unseen. An emphasis on unit tests will make your test suite fast, but the discovery of connective failures might be delayed.

As Bender writes, “A good test suite contains a blend of different test sizes and scopes that are appropriate to the local architectural and organizational realities.”

Types of integration testing

There are two primary approaches to writing integration test cases. The first, the big bang strategy, involves integrating all modules and testing them all at once. The latter strategy is subdivided into three strategies — bottom-up, top-down, and sandwich/hybrid — all of which emphasize varying ways to test incrementally.

Big bang

In the big bang testing strategy, the development team integrates all of the modules to be tested and tests them as a single unit.

The big bang strategy is most effective when developers are performing system testing on small systems or systems that have little interdependence between modules. More complex systems lend themselves to more incremental approaches.

For example, when a big bang integration test case fails, it can often be difficult to locate the source of any given error. This can be especially frustrating, given how long a big bang integration test can take. Even with these issues, the big bang approach remains useful when the systems to be tested are small.

Top-down

In the top-down integration testing strategy, developers layer the modules to be tested in a hierarchy and, as the name implies, write and run test cases for the higher-level modules first. As each module is tested, proceeding from the top layer to the bottom, developers integrate freshly tested modules into the preceding modules and test the whole. On the way, developers simulate lower-level modules to test the higher-level ones.

Since the top-down approach emphasizes incremental testing, as opposed to the big bang strategy, it’s often easier to trace an error to its source. Developers are also more likely to find the most impactful bugs because the hierarchy approach requires testing the most important modules first. Developers who prefer using stubs to drivers also tend to prefer the top-down approach, which primarily uses the former and not the latter.

Bottom-up

In the bottom-up integration testing strategy, a plan similar to the one laid out in a top-down strategy is made. But instead of testing from the highest layer to the lowest, developers write and run test cases from the lowest module to the highest.

Developers sometimes prefer the bottom-up approach — beyond the reasons shared with top-down vs. big bang — because of flexibility. The higher levels often take the longest to build, so, with a bottom-up strategy, developers can first test the modules most likely to be complete and ready. Bottom-up testing also typically involves using test drivers instead of stubs, which is preferable for some developers.

Sandwich/hybrid

In the sandwich/hybrid testing strategy, another incremental approach, the top-down and bottom-up strategies are combined.

Both top-down and bottom-up strategies face a similar limitation: They each need either the highest level or lowest level module to be coded and unit tested before integration testing can begin. In the sandwich testing approach, developers use both stubs and drivers so that they can test however the product demands, potentially even testing higher- and lower-level modules in parallel.

This hybrid integration testing approach tends to be more complex than previous approaches, but it often pays off when writing test cases for especially complex or otherwise large systems.

Advantages and disadvantages of integration testing

There are many advantages to running integration tests, and most of the disadvantages tend to come about because the tests were written poorly or because the overall design of the test suite is weak.

First, let’s talk about the advantages.

The primary advantage of integration testing is confidence. When you run unit tests, you’re typically staying focused and deliberately small. You’re finding errors, correcting, and testing again — every test rapid enough to keep you in your seat. But this rapidity doesn’t always create confidence.

With integration tests done regularly but less frequently, you can ensure that all those modules come together — that your app or feature is actually working. And while you want a plethora of unit tests, as the test pyramid implies, a complementary series of integration tests makes the whole suite more efficient because you can cover more than a single class at a time.

Alin Turcu, Data and AI Engineering Director at Cognizant, also shares two advantages learned from the realities that often get forgotten when talking about testing at an abstract level.

Code is rarely written with a clean slate, so he shares that integration testing can be particularly useful when it’s time to refactor. “Attempting to refactor a system that only has single-class tests is often painful,” he writes, “because developers usually have to completely refactor the test suite at the same time, invalidating the safety net.” When you’re refactoring, integration tests then become that safety net.

Developers rarely have as much time as they wish to have for testing, too, so Turcu also explains how integration tests can offer more coverage when you need it. “In the ideal case,” he writes, “you would have all your classes unit tested, but we all know that in real case scenarios, you probably don’t have enough time to do that.” Integration tests come to the rescue: “Even without a full coverage by unit tests, integration tests are useful to get more confidence in high-level user stories working well.”

The disadvantages of integration tests tend to be a result of bad experiences with them, rather than anything inherent to the test type itself.

Many developers complain about tests in general, for example, because tests are often flaky and unreliable – not because testing itself isn’t worthwhile. In a 2022 research paper on flaky tests, researchers found that more than half of developers experience flaky tests “on at least a monthly basis.”

The study found that flaky tests can actually encourage poor code quality. “Developers who experience flaky tests more often,” writes Abi Noda, summarizing the study, “are more likely to take no action in response to them, and less likely to attempt to repair them.”

This dynamic isn’t inherent to integration tests, but it’s a likely reason why developers might have a bad association with them. Before, we covered how confusing and vague the term can be. But despite this lack of consensus, a bad approach to integration testing — however a given team defines it — can cause real damage. If the tests aren’t written well, code might become more unreliable than before.

Even if the tests themselves aren’t flakey, developers can find other problems. Gergely Orosz, for example, writes that “Integration testing usually results in more verbose tests, since they have to set up more components up front.” Some developers will be naturally drawn to more concise unit tests, especially when there’s room to stack multiple unit tests and mimic some of the advantages of integration tests.

Some developers are also drawn to unit tests, sometimes many unit tests, because integration tests have a reputation for being slow. Bender explains that “Tests often start fast enough but slow down as the system grows.”

As an example, he describes an integration test that exercises a single dependency. Over time, he says, an integration test that once took five seconds to run could eventually take five minutes as that same dependency eventually connects to dozens of services.

These disadvantages, however, tend to point toward writing integration tests well and maintaining them (as well as the rest of the test suite) rather than disregarding them.

How to write integration tests

The details of how best to write an integration test will be context-specific, but you can avoid most of the pitfalls by following a few best practices.

First, you’ll want to carefully define the purpose of an integration test before you write it. Fowler splits integration tests into two categories with two distinct purposes:

Narrow integration tests cover code in one service that communicates to a separate service. These narrow integration tests use test doubles and often have a similar scope to unit tests.
Broad integration tests, unlike narrow ones, focus on testing live versions of every service covered. That means testers need to go beyond the code that handles interactions and include, as Fowler writes, “substantial test environment and network access” so as to “exercise code paths through all services.”

You can further break integration tests down into different layers, which tend to be dependent on the tools you’re using. Turcu, for example, because he’s focusing on Android development, uses Roboelectric for running integration tests locally and JUnit to test the integrations between various features and components.

(Your tooling might differ here, but the guidance toward thinking about integration tests as covering either your local code vs. code with framework dependencies applies beyond these particular tools and Android development more broadly).

Turcu makes this distinction so that you can use the Roboelectic layer to, for example:

validate how local code interacts without worrying about how it interacts with the entire UI framework
have the freedom to mock or not mock external dependencies and run or not run these tests in parallel
run integration tests quickly.

With that separation, you can then use the JUnit layer to, for example:

test multiple classes that comprise a particular feature or component
mock all dependencies
mock network and file systems

As you write tests across these distinctions and purposes, Tim Bray outlines a few best practices that will also keep you in line.

Integration tests, he explains, are “particularly hard in a microservices context” but still need to pass 100% of the time. “It’s not OK for there to be failures that are ignored,” he writes. Similarly, developers have to be careful to not let test suites get flakey. “Either the tests exercise something that might fail in production, in which case you should treat failures as blockers,” he writes, “or they don’t, in which case you should take them out of the damn test suite, which will then run faster.”

Here, benchmarks make themselves useful so you gather test data and see how fast your integration tests are running, how much coverage you have, and how flakey they are.

Test your understanding before testing your code

Integration tests, more than unit tests and end-to-end tests, require discussion before design and implementation. Even when you’re reading about integration tests on a developer blog, you’ll want to double- and triple-check precisely what kind of integration testing the person in question is writing about.

Martin Fowler, who has deep experience in testing, is “wary” when he reads about integration test practices, so the rest of us should be, too. “The takeaway here,” he writes, “is when anyone starts talking about various testing categories, dig deeper on what they mean by their words, as they probably don't use them the same way as the last person you read did.”

Beyond the tips and practices already covered, we recommend this takeaway as well. Testing is a broad field with many different practitioners. As you learn more, practice more, and experience more, an agreement on definitions prior to doing any actual testing will be essential for improving your test suite.

Integration testing: A guide to the most confusing portion of the test pyramid

What is integration testing?

Integration tests and the test pyramid

Types of integration testing

Big bang

Top-down

Bottom-up

Sandwich/hybrid

Advantages and disadvantages of integration testing

How to write integration tests

Test your understanding before testing your code

Written by

Nick Moore

Is the world full of bugs or do QA professionals live in their own buggy world?

Chaos testing: what it is, where it came from, and how to use it

What is integration testing?

Integration tests and the test pyramid

Types of integration testing

Big bang

Top-down

Bottom-up

Sandwich/hybrid

Advantages and disadvantages of integration testing

How to write integration tests

Test your understanding before testing your code

Written by

Nick Moore

Is the world full of bugs or do QA professionals live in their own buggy world?

Chaos testing: what it is, where it came from, and how to use it

Browse posts by popular tags