What’s the point of doing user research? Why not just design a product based on your own point of view and your anecdotal understanding of people’s needs?
Approaching design that way will only lead to disappointment for the user — and by extension, for everyone else involved.
Instead, we approach design from the perspective of human-centeredness, and we aim to reach a deep understanding of the humans who will be impacted by the design in development. The more complete and accurate understanding we gain of the people who will be using the product, the better we can incorporate their needs into the design.
The more complete and accurate understanding we gain of the people who will be using the product, the better we can incorporate their needs into the design.
When we do user research, we amass a lot of data about people and how they behave. A key part of valid user research is being able to discern which data, drawn from our observations and other data collection tools, is meaningful and representative of a broader population. One way we do this is by using a mixed-methods approach that blends qualitative and quantitative methods.
Both qualitative and quantitative methods have benefits and limitations that we’ll go into below. Together they complement and balance each other to help generate a more complete picture of your users.
Natural Variability: We Contain Multitudes
One challenge we face when drawing conclusions about users is that humans vary widely across a multitude of dimensions — sometimes the differences are meaningful, and sometimes they’re due to “randomness” in nature.
In fact, most human characteristics and behavior fit what we call a normal distribution (also called a bell curve or a Gaussian curve), with most people grouped at or around the average value (also called the mean value). Some extreme data points that are very different from the mean exist, but they exist in far fewer quantities.
For example, consider human height: the majority of humans are fairly close to the average human height (5’2″ for women, 5’6″ for men), and fewer people are very tall or very short.
More specifically, 68% of the population falls within one standard deviation of the average height, and 95% of the population falls within two standard deviations.
Standard deviation is a way to quantify just how much the population varies. The standard deviation of a data set represents how far the data is spread out from the mean, on average.
So, to continue the example, if the standard deviation for human population height were four inches and the mean were 5’4″, 95% of humans would be between 4’8″ and 6’0″.
Natural variability is not just limited to simple examples like human height. If you’ve ever run a usability study, you know that users’ interactions with and perceptions of a product can at times feel unpredictable, and looking across participants can often reveal contradictory sentiments. However, in many cases even this “unpredictable” behavior may actually fall into a normal distribution of what we expect to see due to chance.
As user research experts, we are skilled in uncovering key patterns and themes in qualitative data despite the inherent inconsistencies in how different humans behave — but there is also another tool we use to identify “true” patterns in user data and eliminate noise!
Clarity with Quant!
This is where quantitative user research can help complement traditional qualitative testing. Using inferential statistics, quant research is able to separate signal from noise to identify true patterns of data, and extrapolate the data collected from your sample to predict what your entire population of users is likely to experience.
From a business perspective, quant research enables us to feel confident that we are not making decisions based on results that are due to random chance, bias in qualitative data analysis, or to differences we may not be interested in (e.g., the mood of the test taker at the time).
From a business perspective, quant research enables us to feel confident that we are not making decisions based on results that are due to random chance.
A common example is testing prototype concepts. We often have clients that wish to evaluate usability or user preference among two to three early concepts to decide which concept to move forward with. Results that are impacted by random chance could have major negative impacts on safety and user acceptance after market release. By applying quantitative analysis to the data, we can ensure that the insights we derive from the assessment are true (i.e., statistically significant — not likely to be due to chance) and representative.
This holds true for any comparison being made in a user research study — quantitative analysis allows us to understand whether a 10% difference in the rate of use errors between two prototypes is likely to be representative of real life future use or specific to our participant sample.
To make this a little more concrete, let’s look at a simplified example.
Coin Flip, Inc.: A Cautionary Tale
Coin Flip, Inc. was planning the budget for the next fiscal year and wanted to understand users’ behavior as it relates to coin flips.
To address this, they decided to conduct a five-person usability study with their current users. Each user was asked to flip a standard “production-equivalent” coin just one time, and the result (heads or tails) was noted. The raw data from their study is below.
They summarized their data to look for patterns, without taking into account natural variability.
They concluded that the chance of a coin flip resulting in “heads” is 20% greater than the chance of a coin flip resulting in “tails.” As a result, they decided to divert the majority of their resources to the Heads department.
If Coin Flip, Inc. had spoken to the User Research team at Bresslergroup, they would have understood the benefits of using statistical analyses to take into account variability.
Below, we show the same chart, but this time we’ve added error bars that show standard deviation — or the average amount the result varies around the mean. These error bars depict one standard deviation above the mean and one standard deviation below the mean for each outcome (Heads, Tails).
You sometimes may see error bars that reach below 0 like in the Tails example above. Of course, you won’t ever see a negative proportion of Tails, it simply shows that the variability in the sample is higher than the average of the data.
You can see that these error bars are large and overlap quite a bit — one standard deviation around the Heads mean reaches from 0.12 to 1.08 and one standard deviation around the Tails mean reaches from -0.08 to 0.88 (as you can see, these ranges overlap between 0.12 and 0.88).
This gives us visual and numerical evidence that the results were highly variable — in general, looking at how large the error bars are and how much they overlap can give us a quick estimate of whether a difference between two means is due to random chance or not. The larger and more overlapping error bars are, the more likely the difference is due to chance (note: overlapping error bars are just an indication of this, further statistical testing is needed to confirm whether the difference is true).
In this coin flip example, statistical analysis (and expert knowledge of the product) confirms that the difference was just due to random chance.
If Coin Flip, Inc. were our client, we’d have recommended that they continue to fund the Heads and Tails departments equally.
Can you think of any examples from your user research where you wanted to know if the way your concepts performed in users’ hands was due to chance or design?
What Does it Mean To Perform a Statistical Analysis?
We use inferential statistics to identify true (i.e., not due to chance) patterns in user behavior and perceptions. We perform a wide variety of statistical evaluations, depending on the study design and type of data collected.
Broadly speaking, all analyses give us two types of results: (1) a binary “pass/fail” result (p-value) that tells us if there is a relationship between two variables or not, or if two or more data points are different from each other or not, and (2) an effect size, which tells us about the magnitude of the result.
To provide you an example of how these might apply to your research, let’s imagine you wanted to know if there was a relationship between a user’s age and how easily they are able to open your product’s packaging. Our statistical analysis would tell you whether a relationship exists or not, and if it does, how much more difficult it is to open the product’s packaging with each one-year increase in age.
More Key Quant Terms
Quantitative research can sometimes feel intimidating because there is so much terminology and mathematics involved. While we focus on presenting our clients with the digestible and actionable insights we learn from statistical analyses rather than the equations themselves, there are a few key terms that we think are helpful to know:
Confidence Intervals allow us to make statements about how likely the true metric in the population is between two numbers.
For example, “we can be 95% confident that the population’s true average score for this metric will be between values X and Y.”
Just like standard deviation, you can visualize confidence intervals as error bars on a graph. The more the error bars overlap, the less likely it is that there is a real difference between two outcomes.
A p-value is a proportion — specifically, the chance that the result of our statistical test was due to chance. A value of less than .05 is typically considered to be “significant” — i.e., we can be 95% confident that the result is the true pattern of results in the population and not simply due to random chance.
Effect Size is a measure of magnitude and the application varies by test. Even if a test shows that two results are different in a statistically significant way, the magnitude of the effect may be quite small, so it is important to determine what effect size is “meaningful” to the business when making decisions.
For example, if we can prove that a device does cut down on treatment time by 10 seconds compared to the predecessor, it still may not be clinically valuable enough to change the design. If the device is used in emergency situations, however, a reliable 10-second difference could have a huge impact. There is no absolute answer for what effect size is meaningful and it varies by product and use, so we always recommend determining what is going to be considered meaningful before conducting the research.
SMALL EFFECT SIZE
The difference in the bell curve of results from two different tests may be statistically significant, but still not matter to a device’s usability.
LARGE EFFECT SIZE
These two bell curves are further apart, meaning that there is a bigger difference in usability between the two devices. Even in this case, the difference may or not be meaningful to the user or business, so that needs to be determined on a case-by-case basis.
Where in the Design and Development Process Do We Recommend Using Statistical Tools?
Quantitative usability metrics are useful throughout the design process. Here are a few examples of how they fit into various stages of the product development process:
Early in the process: Quantitative data can help to prioritize user needs and gauge user interest and preference among several rough prototype concepts.
Iterative design: Quantitative metrics are particularly useful for tracking the usability of successive designs and comparing among your final 2-3 design options.
Final Design: After your design is finalized, quantitative data can help validate that the design is meeting the user needs it was designed to meet.
A Mixed-Methods Approach for the Win!
That all sounds great — should I switch all of my research to quantitative methods? What’s the catch?
Both qualitative and quantitative methods have benefits and limitations. With qualitative research, it can be harder to separate signal from noise and to understand population-level user behavior.
With quantitative research alone, we can sometimes lose the rich insights that come from in-depth conversations about users’ experiences or observing them in their “natural habitat.” Quantitative research also often requires larger sample sizes, so it may be difficult to execute if you have a small or difficult-to-reach population size. However, quantitative research is frequently done remotely, which can extend the reach of the research.
That’s why we typically recommend a mixed-methods approach to user research. Mixed methods can be applied in innumerable ways. A few examples include:
• Using qual and quant sequentially, for example, with quantitative testing being conducted after a round of qualitative interviews to validate and prioritize results. The qualitative interviews help you craft the survey questions appropriately for the audience and cover all the relevant topics.
• Using qual and quant in parallel, combined into a single study that might involve conducting traditional usability testing with a slightly larger number of participants and using both qualitative and quantitative analysis techniques.
It all depends on your product and what you’re hoping to achieve. If you’re working on a project and you’re not sure which approach to take, please get in touch! We’re happy to answer all your quant questions and help you craft a research program that uses these tools in the way that best fits your needs.
Learn about our Design & User Research expertise.