Last month we talked about the definition of "usability" and what data can be obtained from a usability evaluation. We talked about the fact that focusing on the effectiveness, efficiency, and satisfaction of the user is an indirect method of assessment. We also discussed that observing the apparent effectiveness and efficiency of participants, as well as recording their subjective satisfaction, can seem logical and useful. However, we concluded that these indirect measures can be highly misleading, particularly when working with the small numbers of subjects typically used in a typical evaluation. Finally, we concluded that the collection of this data isn't helpful unless we can identify the attribute of the product that led to this level of effectiveness, efficiency, and satisfaction.
There are seven attributes of a product that are of interest in performing usability evaluations. These attributes are:
Not all of these attributes can be assessed during a typical usability evaluation. Many cannot be assessed at the same time.
We hope that, when products are submitted for a usability evaluation, the designers have properly addressed the functional suitability of the product. However, this is something that should be verified. We cannot observe functional suitability. This attribute of the product comes from a dialogue with the participant, often after the evaluation.
So why do we even care about functional suitability in a usability evaluation? We care because people do not want to use a product that doesn't perform or lacks the functionality needed. Too few functions and the user will lose interest in the product. Too many functions and the user may find the interface overly complicated, or may prefer to use other products that don't contain "bloatware."
Functional discoverability is critical to a product, but rarely tested. Telling the user to perform a specific task or to find specific information precludes ascertaining if the user can figure this out on their own. Usability assessments also tend to focus on specific tasks and hardly ever cover all the capabilities of a product. We can't tell anything about aspects of a product that are not included in an assessment if we are relying only on observation of the participants.
If you're testing a new design, the assessment will be looking only at issues of ease-of-learning. Even if the product can be learned quickly, you can't stop to test each user's skill levels to confirm that they are past the learning curve. If you were testing an existing product, you could test ease-of-learning with new users and try to test ease-of-use with existing users. However, you would have to get a set of users with equivalent experience to be able to accurately assess ease-of-use.
Ease-of-recall is the most difficult attribute to test, since you have to account for both the user's level of knowledge of the product and the length of time they have not used the product. Even then, you cannot account for other products that may have influenced their memory of this product (a phenomenon known as proactive interference).
Even safety (in the form of observed errors) is not necessarily helpful to observe. The nature of the errors that occur vary, depending on the user's experience with the product. Certain errors occur during the learning phase of a product when the user is involved in conscious decision making. If they are benign errors that disappear after learning the product, they may not matter in the long run, even though they are treated as gold in a typical usability assessment.
Other, more important types of errors are likely to occur after the learning curve, when the user is more likely to be relying on unconscious decision making. And these errors occur in the real world of distractions, deadlines, and fatigue that are rarely found in a lab. Even if they can be assessed in situ, these errors are harder to detect, because they occur quickly and are often ignored by the participants.
Errors that occur in an assessment of ease-of-recall are different still, because they include a combination of factors, including faulty memory formation, natural memory fading (which occurs at different rates for different people), and the mutation of memories over time.
Subjective preference is one of the most complicated attributes of a product to evaluate. It, too, changes over time and is influenced by multiple factors. Functional suitability certainly determines whether people like a product, but it's actually rare that users determine their subjective preference on ease-of-learning or even use-of-use, unless these are dramatically outside of their expectations. In fact, a product that performs functions that users desperately want, or provides data never before accessible, will show high subjective preference, even if users have great difficulty using the product.
Visual design can influence subjective preference without having an effect on usability (though it can also negatively affect usability).
Then there are the intangibles. Products can have high functional suitability, functional discoverability, be easy to learn, safe to operate, but still be rated poorly by users.
It would be a far simpler world if five users uncovered 80% of our usability problems for us; or if all we had to do was to interview users about their thoughts on the usability of a product; or if all we had to do was write down the comments of users as they try to use a product. Unfortunately, none of this is true.
The data collected from observing users is just that----data. That data is not necessarily representative, is not likely to be inclusive of the entire product, and is certainly not definitive. The ability of a trained observer to understand why a behavior might have occurred is far more important than the mere fact that it did occur.
The generalization of observed behaviors to the larger population is not based on statistical calculations, but on prior experience and predictions based on solid constructs. Mapping the attributes of the product to the specific observed or collected data on effectiveness, efficiency, and satisfaction, and then separating these facts from other facts (such as the observer effect or other test artifacts) is a rigorous task requiring both training and experience. Caveat emptor.