The Good and the Bad of Measuring User Satisfaction

Last month’s newsletter talked about measuring efficiency. It only seems fair that this month’s newsletter should talk about measuring satisfaction.

Ever since the international standard ISO 9241 added the phrase “with the effectiveness, efficiency, and satisfaction” to the definition of “usability,” people have been assuming these three elements are three components that constitute usability. As a result, most people believe it is necessary to measure all three of these elements, and some even go so far as to suggest a way to combine the three into a single über measure of usability. This suggests that the three elements must somehow be related to each other or, as it would be more technically termed, must correlate with each other. Unfortunately, this is not the case for user satisfaction.

According to one published paper analyzing satisfaction data involving thousands of Survey of User Satisfaction (SUS) scores across hundreds of products: “Post-test SUS scores do correlate with task performance, although the correlation is modest (around r = .24 for completion rates and time), which means that only around 6% of the SUS scores are explained by what happens in the usability test. This is the same level of correlation found with other post-test questionnaires.”

Caution should be used here. Correlation calculations can be tricky to understand. For example, there is a strong correlation coefficient between the sale of ice cream and drowning, even though ice cream sales is not actually correlated with drowning. Ice cream sales is correlated with warmer temperatures. Drowning is also correlated with warmer temperatures since more people go swimming when it’s warmer. So ice cream sales is correlated with warmer weather and swimming is correlated with warmer weather. This is known as a spurious correlation, but the mathematical formulas don’t know how to take this into account. If asked to calculate it, the formula will not realize it’s a spurious correlation and will calculate a positive correlation coefficient between ice cream sales and drowning.

If satisfaction and success were correlated with a positive .24 correlation coefficient, you would need to see a large, positive change in performance in order to see a positive change in satisfaction, or vice versa. Is this actually true?

My firm tested two different voting systems using populations of approximately 100 per system. Two systems were tested against each other in eight rounds of testing. That’s a total population of approximately 1600 users. The performance difference between these two systems was not only measurable but statistically significant. And the performance level on both of these systems was less-than-perfect. Error rates were shown to be close to 30% on each system. Some people made single errors and others made multiple errors, but approximately 480 of the 1600 participants made errors. Asked about their performance and their satisfaction with the product, not one single person in 1600 was dissatisfied with the product and all of them stated they would like to use it in an upcoming election. In this project, satisfaction is completely independent of performance.

In another project we worked on, the most consistent (least variation in scores) and highest SUS scores ever obtained was for a product that no one was able to figure out or use successfully. The published paper even states that “Users may encounter problems (even severe problems) with an application and provide SUS scores which seem high.” So in these projects, satisfaction would seem to have a negative correlation with performance.

There are certainly times when the satisfaction scores and performance scores are consistent with each other. But in these projects, this is not correlation and is not that, as one person once described it, “[satisfaction and performance] are correlated some of the time.” (If you ever want to see if someone understands statistics and correlation, use that line on them. If they laugh, you can keep talking statistics.) This is a coincidence.

The issue is, the formula for correlation coefficient doesn’t take any of this into account. When asked to find a correlation coefficient, the formula will perform its task anyway. A correlation coefficient of .24 across thousands of scores across hundreds of products is a mathematical anomaly, not a sign of correlation. It hides all of the details and forces one overall trend despite obvious differences—-differences that are important to know. To ignore the differences (sometimes satisfaction and success data suggests no relationship, sometimes they suggest a negative relationship, and sometimes they suggest a positive relationship; however, when you combine all the different cases they show a small, positive correlation) is misleading about the underlying facts. It’s the statistical equivalent of telling someone that your watch is accurate twice a day without telling them it stopped running.

The good news is that satisfaction is pretty stable. People develop their impression of a product almost immediately. The fight-or-flight response occurs nearly a half second before cognitive processes even start to be engaged. And it takes time working with the product to determine how easy it is to actually use. This means that our subsequent experience with the product has to try to overwrite our initial impression. That’s not very easy to do. As we all know from experience, there is no second chance to develop a first impression. That means that satisfaction can be reliably measured even if it does not correlate well with the actual product usability.

I would never tell someone not to measure satisfaction. User satisfaction is indeed an important component when it comes to the user experience. However, it does not correlate with how well a person can actually use a product (an effectiveness measurement). So do collect it, but realize it is an independent measure of the user experience that is related to the person’s perception of the product’s usability, not its actual usability.