What happens when we turn our users into a QA department?

Purple and blue background showing a network. Illustration of a phone, speech bubbles showing 3 and 5 stars

In April 2024, Marques Brownlee, a technology reviewer with almost 20 million followers, reviewed a new device — the Humane AI Pin — and deemed it “the worst product I’ve ever reviewed.”

A controversy erupted: Was he being too critical of a new company? Was it unfair to act like a new product, especially in a new field, is truly complete enough to dismiss? Do reviewers and users have too much power?

Tweet from @dvassalo text "I find it distasteful, almost unethical, to say this when you have 18 million subscribers. Hard to explain why, but with great reach comes great responsibiity. Potentially killing someone else's nascent project reeks of carelessness. First, do no harm" with screenshot of Marques Browlee and title "The Worst Product I've Ever Reviewed...For Now"

The controversy created a lot of froth that distracted people from Brownlee’s point (beyond his opinions of the device in question): How should users feel about a device or software product that’s released for purchase but doesn’t feel “complete”?

The question is perhaps more pointed in hardware — when you’re literally holding a device that doesn’t meet expectations — but it’s even more relevant in software. For years, the software industry has considered it a norm to release “minimum viable products” (MVPs), to release partially completed software and update it over time, and to focus on shipping speed over the quality of what’s initially shipped.

But the tradeoffs tend to go understated: What happens when users form an initial, bad impression of your product? What are the consequences of a narrative forming around your product’s incompleteness?

These questions apply as much to enterprises as to startups. In June 2024, Microsoft announced Windows Recall — an AI-based tool that indexes everything users do on their computers — and many called it a PR disaster.

In the subsequent reporting from Zac Bowden, Senior Editor at Windows Central, we discovered that Microsoft was secretive about Recall. According to Bowden, “Microsoft has the Windows Insider Program, yet to maintain secrecy, it chose not to test this feature openly.” Now, Microsoft has an issue: Many people, including Bowden, like the feature, but because Microsoft didn’t test it, a narrative has formed around it before anyone has even gotten to use it.

These cases — the Humane AI Pin and Microsoft Recall — are quite different, but both contain the same tension: Are companies doing less QA and testing in favor of turning users into de facto QA departments? And even more interestingly, is this a good idea or not?

Professional quality assurance vs. user feedback

In the old days, before SaaS, companies developed software via what we now call the waterfall method and used QA departments to make sure it was good enough to ship. This wasn’t a choice but a need; software came on hard discs, so it was hard for vendors to update it if something was wrong.

Now, with SaaS, companies have the freedom to release “less complete” products and update them on the fly. Further, with DevOps and CI/CD, companies can become very capable of shipping, testing, and iterating with products in production and live users.

This is surely an improvement, but this capability leads to temptation. As Scott Hanselman, Vice President of Developer Community at Microsoft, writes, “It's too easy to ship crap and it's too easy to update that crap [...] Technology companies are outsourcing QA to the customer and we're doing it using frequent updates as an excuse.”

Before we address the risks of shipping crap, we have to address why it’s often a good idea to do so anyway. Then, we can see when user feedback is best and when professional QA is necessary.

What users can do and QA can’t

It bears repeating: In the early days, we didn’t get continuous feedback because we couldn’t. Now that we can, we’ve learned there are some serious advantages to feedback that even a good QA department can’t beat.

User feedback is high-volume: If you release a product and market it well enough online, you can get implicit feedback (through A/B testing, for example) at a volume no QA department can replicate.
Users are representative: If you’re targeting your initial user base correctly, then the feedback you’re getting is also more relevant than what a QA department can match. QA professionals are proxies, but users are – quite directly – the users.
User feedback is more practical: It’s a little counterintuitive at first glance, but with the growth of the SaaS industry, it’s incredibly easy to release a product, attach some analytics to it, and capture some user feedback. Hiring a QA professional carries all the costs of hiring and is inevitably slower to get started.

Across these reasons, you don’t see “and that’s why user feedback is better,” and you won’t see it in the next section. They’re fundamentally different.

What users can’t do and QA can

There are certain things that users can’t do or won’t do, and the nature of the feedback they offer will always be more ambiguous than what a QA person can offer.

Users aren’t professional critics: Users offer great high-level feedback (e.g., do people even stay on this screen? What do they tend to click?), but they’re not professionals. If you want more granular feedback, you need someone who knows more.
Users don’t have an incentive to help: Most user feedback is implicit, but some of the most useful feedback needs to be explicit. If users bounce at a given page, is it because it loads slowly, or because it looks bad, or because it’s just not useful? Usage stats won’t tell you.
Users don’t always have the scale: Counter to one of the advantages of user feedback, A/B testing (and similar methods) only works at a certain scale. Access to a few hundred users might feel significant, but that amount might not be statistically significant.

Some of the slippage here concerns how we define “feedback,” “bugs,” and “quality.”

Jeff Atwood, for example, writes, “It is not the user's job to tell you about errors in your software!

If users have to tell you when your app crashes, and why, you have utterly failed your users. I cannot emphasize this enough.” Here, he’s talking about quality.

David R. MacIver explains, however, that revealed preference shows users would rather do a little QA work than wait for better software. He writes, “Users have stated the price that they’re willing to pay, and that price does not include correctness, so they’re getting software that is not correct [...] The option to ship correct software is simply not on the table, so why on earth should we feel bad about not taking it?”

We end up in a messy middle ground. There are ways to ship near-perfect software (MacIver cites NASA’s process, which is necessarily careful and slow). Is that overkill for other use cases? And when we deem high quality to be too much work, what risks are we taking?

Evaluate all possible consequences when you “just ship it”

The technology industry has largely embraced a “just ship it” philosophy. Especially for startups, it makes a lot of sense to ship and iterate rather than work on your software in a private laboratory and hope for the best once it finally comes out. But there are limits to this perspective and companies often have more at stake than they realize when they don’t account for these risks.

Good reasons to just ship

Users, as we’ve covered, have a few advantages over a QA department: You can sometimes get feedback in large volumes, and that feedback is often from people who have the opinions you care most about. There are still others.

In a retrospective post about why his company RethinkDB failed, Slava Akhmechet shares many reasons, one relevant to this point. He writes that his team “Wanted to build an elegant, robust, and beautiful product,” so they optimized for correctness, simplicity, and consistency.

In the competitive market, however, they found out users preferred done rather than perfect. As such, their primary competitor, MongoDB, overtook them despite having a worse product. “Correct, simple, and consistent software takes a very long time to build,” Akhmechet writes. “That put us three years behind the market.”

In this case, shipping worse software sooner was better than shipping better software later. As much as we might want to care about craftsmanship and pride, the realities of running a business surface eventually.

Risks of just shipping

Especially given the possibility of total business failure, the risks of just shipping tend to go understated. As a result, companies can end up surprised when a “raw” product is “sent back to the kitchen,” so to speak.

In our first two examples, Humane and Microsoft had great teams, and potentially good products were damaged by bad PR, which was only possible due to how they tested and shipped. Months after release, Humane is looking to be sold. As for Microsoft recall, Bowden writes, “There's a very dark cloud hanging over this feature right now, and a lot of privacy-conscious people are simply not going to be able to subscribe to the idea of Windows Recall in its current form.”

These examples, however, are dramatic versions of consequences that are often more subtle. If a startup has an MVP finished and starts publicizing it, it’s hard to determine the ratio of good feedback to turned-off users. When a user deletes your app, how do you weigh the advantages of feedback vs. the disadvantages of forming a bad first impression?

You can see some of the risks by looking at how a good beta program works. In 2016, for example, eero — a startup providing home Wi-Fi devices — ran a successful beta program that fueled a huge PR win (as proven when the media deemed them “the future of home networking”).

Recollecting the success of the beta, Head of Product Paul Nangeroni shares, “If you have immature hardware with plenty of known bugs, software that has major stability and performance issues, or a green support team without the right tools, take pause. Any one of these on its own, and you’re fine to start. Even two is ok if you have a tolerant set of beta users. But any more than that and it will be difficult to gather meaningful insights above the noise created by the product immaturity.”

There’s a lesson for hardware and software companies about the benefits and the risks of shipping: If you ship too early, you’ll get feedback that isn’t useful, but if you take the time to test, you can iterate before full release and achieve much more lasting success.

Users are not a shortcut to quality

If there’s one takeaway from this article, let it be this: User feedback is not a shortcut to making a quality product, and QA has a place in shipping good software at practical speeds. The two, ultimately, are complements that work best together.

Ideally, QA ensures the product is “good enough” to ship. In certain contexts, crossing a quality threshold can be make-or-break. As Lassi A. Liikkanen, Director of Product Design and Insight at Qvik, writes, “There are clear signs that warn you when you should forget about developing an MVP.”

For him, a strong customer base, clear customer expectations, established competition, and proven demand for existing features all signal companies to forgo shipping an incomplete or shoddy product. In these cases, prioritizing user feedback instead of QA doesn’t result in a shortcut to anything but having a generally disliked product.

Even then, let’s say you have an ideal case for shipping a product that is incomplete but still delivers value, one that really needs user feedback to fuel iteration. Even in this case, user feedback is not a shortcut.

Principal software engineer Rouan Wilsenach, for example, shares many of the things you need to do QA in production, including fine-tuned logging, well-researched metrics, and an established monitoring system. User feedback can be great, and iterating on an in-production service can be the right path, but skipping QA doesn’t result in a shortcut. User feedback, too, needs to be done right.

It’s hard to resist slipping back into a black-and-white perspective, to think about it as being perfectionistic and slow or sloppy and fast. One helpful framework here is from developer Sam Rose, who recommends focusing on when and where bugs occur instead of worrying about that sheer amount of them.

5 boxes labeled left to right: Compiler, tests, Code Review, QA/early access, User

“The further to the left of that diagram a bug is found, the happier everyone will be,” he writes. “The worst-case scenario is for a user to discover a bug, shown on the far right.” With this in mind, we can focus on practicality: A QA department can help protect users from bugs and other quality issues, but they’re one filter among many, and some bugs will inevitably reach users. Quality is about making this less likely and ensuring that user feedback is actually useful.

Retention is cheaper than acquisition

This entire discussion can get wrapped up in high-level notions of software quality and craftsmanship when it should narrow down to business value. Ultimately, user retention is almost always cheaper than user acquisition (by as much as five times).

Shipping serves user acquisition, but quality serves user retention. As we’ve argued throughout, there’s a balance: A product could acquire many users and lose them all because it’s too incomplete or broken, and another product could theoretically retain all the users it acquires but can’t because it never sees the light of day.

It’s a balance that the newest startup, debuting its first product, and the oldest enterprise, launching its latest update, need to dial in. The need for quality never rests.

What happens when we turn our users into a QA department?