The test pyramid: A complete guide

Learn about the test pyramid, including its history, purpose, advantages, and disadvantages, in this complete guide.

Back in 2006, Jeff Atwood, co-founder of StackOverflow, wrote, “The general adoption of unit testing is one of the most fundamental advances in software development in the last 5 to 7 years.” 

Since then, testing tools have evolved, testing methodologies have matured, and test-driven development has become a more popular way of developing software. Even more importantly, unit tests have been recognized as just one part of the testing process and one component of the test suite. 

With this maturity and popularity has come confusion: How many unit tests should you run? When should you run them? What’s the difference between unit tests and integration tests? Should we even run end-to-end tests?

The biggest anchor in this discussion is the test pyramid, and depending on your perspective, it either provides clarity that developers should keep coming back to or confusion that warrants abandoning it. 

In this article, we’ll trace the origins of the test pyramid through its purpose, the test typology it presents, and its advantages and disadvantages. In the end, whether you adhere to the test pyramid or abandon it, understanding its origins and influence is essential to understanding modern testing. 

Origins of the test pyramid

Software testing has always called for some level of systematization, but the test pyramid became popular – and eventually the norm – through Mike Cohn and his 2009 book Succeeding with Agile

According to Martin Fowler, who summarized the history, the original model emerged from discussions in 2003 and 2004. Meanwhile, Jason Huggins independently thought about the same idea in 2006. 

Handwritten draft labeled "Testing pyramid for web apps"

Thanks to Fowler’s influential blog, this first-hand summary remains one of the most popular touchstones. To this day, his 2012 post (which includes the graphic below) is one of the most frequently linked sources for understanding the test pyramid. 

Right: Vertical arrow, rabbit on bottom, turtle on top. Center: Pyramid labeled top to bottom: UI, Service, Unit. Left: Vertical arrow with cents sign at bottom and three dollar signs at top

The popularity of the test pyramid and the semi-standardization that sprung from it has also led to many permutations. Even in Huggins’s 2006 version, functional tests replace integration tests, and UI tests replace end-to-end tests. Fowler’s version, too, prefers UI tests to end-to-end tests. But that’s not all.

At Google, the testing team classifies tests by size and scope. The team splits tests into small, medium, and large tests that roughly map onto the traditional test pyramid, but scope matters much more to them than type. Alex Party, a senior web developer, similarly prefers a different typology, recommending developer interface tests (instead of unit tests), consumer interface tests, and user interface tests.

The shape of the pyramid, which often no longer resembles a pyramid at all, also differs. Kent C. Dodds, creator of epicweb.dev, for example, recommends a trophy shape that emphasizes integration tests and adds static testing.

Screenshot of Kent C. Dodds tweet explaining the Testing Trophy. Image of trophy shape labeled top to bottom: end to end, integration, unit, static

Meanwhile, Spotify’s testing team recommends a testing honeycomb designed for microservices environments. 

Image of a honeycomb labeled top to bottom: Integration, integration, implementation detail

The clarity and confusion resulting from the test pyramid — as well as the people trying to use it, break it, and remake it — come from the perseverance of its original purpose. 

Purpose of the test pyramid

According to Tidelift research, the average developer spends 12% of their time testing, and according to StackOverflow research, about 61% of developers use automated testing tools. 

At the risk of stating the obvious, testing takes a lot of time and resources, and developers (and their managers) have always wanted a way to systematize this effort. 

Testing suffers from a disagreement between its why and its how: Most developers agree testing, broadly, is good because it improves code quality and the functionality of the final product. But few developers agree on how to test, when to test, how much to test, which tests to use, etc. 

The original purpose of the test pyramid was to provide clarity and structure, and the multi-decade argument about it proves that its original purpose is a worthy one. 

Fowler sums up the fundamental purpose of the test pyramid straightforwardly, writing, “The test pyramid is a way of thinking about how different kinds of automated tests should be used to create a balanced portfolio. Its essential point is that you should have many more low-level UnitTests than high-level BroadStackTests running through a GUI.”

That’s it. There’s a valuable discussion to be had from getting into the granular details, from putting the test pyramid into practice and seeing where, when, and how it runs into limitations, but the essential purpose is a simple one. On the one hand, developers should think about a portfolio of test types, and on the other hand, developers should have many small tests and few big tests. 

Three types of tests

Three types of tests comprise the test pyramid: Unit tests (the base), integration tests (the middle), and end-to-end tests (the top). The shape of the pyramid corresponds to how many of each test a test suite should include: Many unit tests, fewer integration tests, and even fewer end-to-end tests. 

Unit tests

Unit tests are likely the most familiar tests to any given developer, from a junior developer to a senior engineer. Unfortunately, this familiarity doesn’t necessarily create clarity.

Ham Vocke, Principal Software Engineer at Stack Overflow, writes, “If you ask three different people what ‘unit’ means in the context of unit tests, you'll probably receive four different, slightly nuanced answers.” In other words, there’s no canonical definition, which could be freeing or annoying, depending on your perspective.

Broadly, however, unit tests cover a single function (if you’re using a functional language) or a single method or class (if you’re using an object-oriented language). Unit tests cover the smallest amount of meaningful code possible, and developers run unit tests frequently — often on an automated basis as they code. 

Unit tests are meant to be small enough to address these atomic elements of code and enable speedy workflows. Neither an automated testing tool nor process is really effective if developers feel the pain of waiting. 

Size is often emphasized throughout the test pyramid. At Google, for example, the testing team “encourages engineers to always write the smallest possible test for a given piece of functionality.” According to Adam Bender, author of the testing chapter in Google’s Software Engineering at Google, “A test’s size is determined not by its number of lines of code, but by how it runs, what it is allowed to do, and how many resources it consumes.”

Fowler emphasizes a golden rule: “Stick to the one test class per production class rule of thumb, and you're off to a good start.” With this in mind, he writes, you can write unit tests for all your production code classes, and you can write unit tests for controllers, repositories, domain classes, and file readers. 

Explore our dedicated unit testing article for a deeper explanation of unit testing along with guidance on how to write them. 

Integration tests

If unit tests are the clearest and most agreed-upon test type in the pyramid, integration tests are the least. 

End-to-end tests, which we’ll explain more in the next section, cover the feature or application as a whole, and unit tests, as we just explained, cover the most atomic elements. Integration tests lie somewhere between. 

Chris Coyier, co-founder of CodePen, cuts through this confusion, explaining, “​​Unit and end-to-end tests are the extremes on two sides of a spectrum. On the left, unit tests, little itty bitty sanity testers. On the right, end-to-end tests, big beefy (cough; sometimes flakey) tests that prove that lots and lots of systems are working together as intended and what a user experiences is in good shape.” Integration tests, he writes, are “somewhere in the middle of that spectrum.” 

The vagueness serves a purpose: Integration tests check how different components and modules integrate and function. That connected functionality is, by definition, a moving target. 

“API tests seem like a quintessential example,” Coyier says. You could use unit tests to test different parts of an API, he continues, but the tests that use the API the way it would actually work in your application are “the juicy ones.” 

The critical differentiator for integration tests is that they cover multiple components so developers can validate how they work together and check that the resulting functionality works as expected. If you’re building functionality that enables users to add a new credit card during a purchasing flow, for example, an integration test will likely be the best way to ensure you’re validating whether the credit card number, expiration date, and CCV fields are working correctly. 

Clear up confusion about integration testing in our in-depth post which covers different types of integration testing, pros and cons, and tips for writing integration tests. 

End-to-end tests

End-to-end tests (sometimes called E2E tests) take the top of the pyramid because developers are meant to run them infrequently. 

As the name implies, these tests test an application from end-to-end — typically via its user interface — and often take a long time to run. Unlike unit tests, which developers can run rapidly as they work, end-to-end tests take a long enough time to run that it’s worth it to schedule them and resist doing them too frequently. 

The primary benefit of running end-to-end tests is confidence. Ham Vocke writes that end-to-end tests “give you the biggest confidence when you need to decide if your software is working or not.” 

That said, as Vocke continues, end-to-end tests are often problematic: “They are notoriously flaky and often fail for unexpected and unforeseeable reasons. Quite often their failure is a false positive. The more sophisticated your user interface, the more flaky the tests tend to become. Browser quirks, timing issues, animations and unexpected popup dialogs are only some of the reasons that got me spending more of my time with debugging than I'd like to admit.”

End-to-end tests, even more so than integration tests, tend to develop a bad reputation, which dissuades some developers from using them. If these tests aren’t set up right, you can spend a lot of time waiting for them to run, only to get meaningless results. 

Adrian Sutton, now a staff protocol engineer at OpsLabs, provides a few good practices from his time working at LMAX. His team, for example:

  • Ran end-to-end tests constantly throughout the day
  • Ensured tests could complete in about 50 minutes
  • Required end-to-end tests to pass before considering new versions releasable 
  • Stored test results in a database for easy query and search
  • Isolated tests so they could run them in parallel to speed things up

With practices like these in mind and use cases that other test types can’t really cover, figuring out end-to-end tests quickly becomes a worthwhile endeavor. 

If you’re building an ecommerce application, for example, your company might live and die based on the user journey that runs from searching for a product to adding that product to a cart to checking out with that product. That’s a relatively simple user journey, and still, its importance demands an end-to-end test to ensure it works. More complex journeys might require more complex end-to-end tests, but that doesn’t make the effort any less necessary.

Review our comprehensive guide to end-to-end testing for a more detailed explanation. 

How to implement the test pyramid 

The test pyramid is at its most valuable when developers consider it a model instead of a didactic recipe. Remember the purpose of the test pyramid: Consider the composition of your test suite and think through the proportion of test types. 

If you do that thinking, there’s a chance that in your particular context, your test suite might not resemble the test pyramid. That’s all right, and the test pyramid, by making you think complexly, has served its purpose. 

And while the test pyramid isn’t scripture, it can be helpful to mind when and why you’re deviating. The Google testing team, for example, warns about two anti-patterns: The ice cream cone and hourglass.

Illustration of ice cream cone. Scoop labeled manual tests, cone labeled top-to-bottom: Automated GUI tests, integration tests, unit tests. Hourglass illustration labeled top to bottom: unit, integration, e2e

In the ice cream cone antipattern, developers write too many end-to-end tests and too few integration and unit tests. “Such suites tend to be slow, unreliable, and difficult to work with,” Bender writes.

In the hourglass antipattern, developers write many end-to-end tests and many unit tests but too few integration tests. This antipattern, Bender writes, “results in many end-to-end test failures that could have been caught quicker and more easily with a suite of medium-scope tests.”

The antipatterns, like the test pyramid itself, are not absolute. There are good and bad contexts where these patterns can be better than the typical pyramid distribution. Bender writes that the hourglass pattern is most likely when systems are so tightly coupled that it’s “difficult to instantiate individual dependencies in isolation.” Even if you want to work toward a more pyramid-like distribution, the solution will be on a long-term systems level — not merely rewriting your test suite. 

A significant reason the test pyramid persists is that, one, it’s mostly correct for most teams, and two, when it doesn’t work, your disagreements with it will be productive. Kent C. Dodds, for example, argues for more integration tests than other test types. He writes, “Integration tests strike a great balance on the trade-offs between confidence and speed/expense. This is why it's advisable to spend most (not all, mind you) of your effort there.” 

Looking at this discussion from a big-picture perspective, you can see which permutations of the pyramid fit which contexts and use cases, and select between them. The test pyramid has been put through its paces, which means it’s fairly easy to see its advantages and disadvantages from other testers and developers.

Advantages and disadvantages of following the test pyramid

There are various pros and cons to following the test pyramid, but many of them will depend on how well the overall test suite is designed and how well each test type is implemented. 

Integration tests, for example, can be slow with the wrong implementation, but that’s not the fault of the test type or the test pyramid. Similarly, some of the advantages of the test pyramid might not garner as many benefits for one company as another. 

But with a broad view of what works and what might not, you can better understand what you want and what value you can further wring out of your testing strategy. 

Holistic view of the test suite

Perhaps the greatest benefit of the test pyramid is how it reinforces a holistic view of your test suite. Many developers have bought into unit tests and feel, because of that, bought into testing. 

But without a holistic testing strategy, integration and end-to-end tests can fall by the wayside. With the test pyramid model, it’s easy to show and explain why, yes, developers should keep prioritizing unit tests while running less frequent but equally important integration and end-to-end tests. 

The test pyramid gets more useful over time, too, as your test requirements change. Bender writes, “Creating and maintaining a healthy test suite takes real effort. As a codebase grows, so too will the test suite. It will begin to face challenges like instability and slowness.” Referring back to the test pyramid can help you keep the proportions of each test in check and refocus you toward a holistic strategy. 

Confidence

One of the biggest advantages of testing, in general, is confidence. However, with the test pyramid, developers can distribute that confidence throughout development. 

For example, Tim Bray, previously a distinguished engineer at AWS, says, “Working in a well-unit-tested codebase gives developers courage.” As codebases get more complex and dependencies spawn and multiply, making changes can be scary. But with good unit tests, Bray writes, you can be confident: “If a little behavior change would benefit from re-implementing an API or two you can be bold, can go ahead and do it. Because with good unit tests, if you screw up, you’ll find out fast.”

Many developers test for this reason above others. The test pyramid shows, however, that this same confidence can scale. With integration tests, you can be more confident about changing how modules interact or inserting new steps into a workflow. If things go wrong, you’ll have a better idea of why. 

And with end-to-end tests, you can be more reliably confident, over time, that your work is contributing to a functional whole. This confidence is cumulative: As you make tests and confidence-building part of your daily workflow, your trust in yourself and your team’s work only grows.

Documentation

Software documentation is, Bender writes, echoing many developers, engineering leaders, and technical writers, “notoriously unreliable.” Code takes precedence, but over time, documentation can fall more and more out of date. At a certain point, it can feel like there are more missing edge cases and outdated requirements than reliable information. 

Good tests, however – especially if there are unit, integration, and end-to-end tests — complement documentation. “Clear, focused tests that exercise one behavior at a time function as executable documentation,” Bender writes. For developers unfamiliar with a given part of the codebase, he recommends first looking at the tests that cover it.

Bray echoes this advice, focusing on the particular issue of new developers (new to the company or the team) trying to understand a chunk of the codebase. “Writing good tests helps the developer during the first development pass and doesn’t slow them down,” Bray writes. “But I know, as well as I know anything about this vocation, that unit tests give a major productivity and pain-reduction boost to the many subsequent developers who will be learning and revising this code.”

False confidence 

Counter to the advantage of confidence is the disadvantage of false confidence. False confidence tends to come when tests are flaky or purport to cover more than they do.

For example, software researchers Benoit Baudry and Martin Monperrus studied Apache’s Commons Collections and found 40 pseudo-tested methods (in the researchers’ terms, tests that appear to cover specific methods but don’t). They found that a particular method, ensureCapacity, was “invoked over 11,000 times and covered by more than 900 test cases,” but “the body of ensureCapacity can be completely deleted without breaking any test case.”

Pseudo-tested methods can happen in many different ways, and the broader point that apparent test coverage lends false confidence can happen in many more. The advantages of the test pyramid then invert: If developers write a comprehensive suite of tests that don’t work, false confidence can result in merging seemingly small changes that cause more chaos than anticipated, and it can result in shipping features that only appear to work.  

Testing dogma

Finally, one of the major flaws with the test pyramid — one that applies to almost any model — is that it can lend itself to dogmatic usage. 

Oleksii Holub, a software developer, writes that “Aggressively popularized ‘best practices’ often have a tendency of manifesting cargo cults around them, enticing developers to apply design patterns or use specific approaches without giving them a much-needed second thought.” Here, Holub is primarily targeting unit tests but the criticism applies to the whole test pyramid. 

When testers are dogmatic about a particular way of testing, they’re unlikely to see the costs and even less likely to compare potentially large costs to relatively small benefits. Martin Sústrik, a developer at Anapaya Systems, for example, argues that tests can actually “ossify the internal architecture.” 

If you write a test suite to cover three components and later refactor that so that two components take on the work of the third, now eliminated component, your tests will be “suddenly rendered useless.” If you’re staring down that possibility, you might be dissuaded from refactoring at all; your product might become “resistant to internal change.”

Bray, writing about testing, mentions dogma four times, concluding, “There’s room for argument here, none for dogma.” Therein lies the best benefit of the test pyramid and the worst tradeoff: If the test pyramid inspires thoughtful consideration, then it has served its purpose; if the test pyramid inspires dogma, then it has hurt more than helped. 

Related read: The test pyramid and its discontents, or, why no one agrees on anything

Respect the tests, and they’ll respect you

If we circle back to Jeff Atwood’s 2006 article on unit testing, you can find a useful principle that serves developers across the test suite, even in 2024: “Unit tests are so important that they should be a first-class language construct.”

Bender, representing Google’s influential testing team, agrees, ”The secret to living with a large test suite is to treat it with respect.” That means incentivizing developers to own their tests and care about their testing and rewarding them for writing good tests in much the same way you’d reward them for building good features. “Basically, treat your tests like production code,” he writes.

Most of the problems that resolve from testing in general and the test pyramid more particularly result from developers treating tests as a secondary goal, as a side quest from the main mission. When a test suite is given the priority it deserves, developers can start to realize its potential. 



You've successfully subscribed to Qase Blog
Great! Next, complete checkout to get full access to all premium content.
Error! Could not sign up. invalid link.
Welcome back! You've successfully signed in.
Error! Could not sign in. Please try again.
Success! Your account is fully activated, you now have access to all content.
Error! Stripe checkout failed.
Success! Your billing info is updated.
Error! Billing info update failed.