An examination of four user-based software evaluation methods
Ron Henderson, John Podd, Mike Smith, and Hugo Varela-Alvarez
Since the focus of the last few class periods has been evaluation, I decided to go ahead and read a paper on different methods rather than yet another hand tracking algorithm. The paper was written in 1995, but focuses on evaluation methods that are still used (data logging, questionnaires, interviews, 'verbal protocol analyses').
For their study, the authors got a group of 148 people and had each of them use a different evaluation method to test one of three pieces of software (spreadsheet/word processor/database). The subjects used the software and then applied the evaluation method.
Data Logging: internal software was used to log keystrokes with time samples, to be examined after the test.
Questionnaire: used a 7 point scale with 'not applicable' and 'don't understand' options. Questions were over topics such as
Program self-descriptiveness
User control of the program
Ease of learning the program
Completeness of the program
Correspondence with user expectations
Flexibility in task handling
Fault tolerance
Formatting
Open ended questions followed these asking about specific problems and calling for comments/suggestions.
Interview: Semi-structured format of scripted questions and following up on unique interviewee comments.
Verbal Protocol: Video taped users while they were evaluating the software. Users were later asked to 'think aloud' as they watched the tapes play back.
Conclusions:
Data logging is nice because it's pretty much as objective as you can get, however, it's tedious to analyze.
Questionnaires can be give vague results if the wording is not incredibly specific for each question, and it's difficult to make questionnaires that everybody will understand completely.
Interviews are good for getting relevant information quickly, but are subject to the problem of memory decay.
The verbal protocol method tends to be good at finding problem areas because it calls to memory when the user was having trouble with a particular exercise. However, it's very time-consuming.
The authors note that using a combination of these methods will most likely give the best results, as they add unique contributions, but that using multiple methods is probably affected by diminishing returns, so just blindly adding more methods is not the best approach.