Measuring User Experience is not an easy task. I have strong opinions about quantifying human behavior and recognize its limitations, but I also recognize that in order to communicate the impact that certain features of a product can have in the overall experience of the user, translating them into tangible and clear metrics has its clear advantages. This experiment was designed to uncover guidelines for the design of speech interfaces. I suspected that the capabilities of the Automatic Speech Recognition (ASR) in the GPS devices had a big impact on UX, but, how much? Together with a fellow researcher, we took it upon ourselves to find out.
Project: Benchmarking of GPS devices enabled with Automatic Speech Recognition in terms of UX.
Role: Experiment design, evaluation, report
Deliverable: Guidelines, paper submission.
Note: Paper published at Interact 2011.
Garmin Nüvi 3790T (System 1), TomTom Go Live 1000 Europe (System 2), Navigon 8450 Live (System 3).
Three portable navigation devices were provided to nine participants in a lab environment to perform 5 tasks using voice commands. Each task had a specific purpose to explore a different aspect of the system. Participants (1 female, 8 male) had no prior experience with any of the systems or interfaces.
After interacting with each system, each participant was asked to complete a user satisfaction questionnaire scoring 34 subjective-response items on a 7-point Likert scale. This questionnaire was based on the Subjective Assessment of Speech System Interfaces (SASSI) project.
A one-way repeated measures ANOVA was used to test for preference differences among the three systems. This test revealed that the ASR of System 2 was preferred overall by participants. One of the things observed that could have contributed to this was the increasing number of audio tips the participants were getting when using the wrong command as this was observed to help participants solve the tasks. The one-shot address and Point Of Interest (POI) search entry capabilities of System 2 were making the interaction faster, resulting in an average of 35 seconds in POI search on route (task 2), as opposed to an average of 81 and 146 for Systems 1 and 3 respectively. However, since it was not possible to refine the search when performing an exploratory POI search (task 4), the participants preferred the step-by-step POI search method of Systems 1 and 3.
Interaction with System 1 was perceived as faster than the other two. A possible explanation to this is its lack of prompt sound that was provoking a more natural and uninterrupted conversation. Participants reported confusion on how to proceed due to the lack of audio feedback about the system state, changing their way of speaking to the device, thus worsening the ASR performance. Having to switch between spelling and entering from keyboard in the Dutch address entry (task 5) was not appreciated either.
Although System 3 provides a soft button for going one step back in the ASR menu, no voice command was triggering such an action. That resulted in a 100% of touch screen usage to complete the tasks. This, together with its cumbersome step-by-step error correction method, could be possible explanations why the system was appreciated the least.
Guidelines for design of ASR systems
- Provide voice and soft button shortcuts for canceling/restarting the ASR session.
- Support voice commands for all soft buttons.
- Provide users with immediate feedback (confirmation and/or system status) after they speak out a command.
- Always ask for confirmation before performing an action that requires long loading time.
- Implement more than one voice enabled error recovery way.
- Support audio help messages that gradually provide more information to the user as error rate increases.
- Support alternative voice commands for the same function.
- Avoid reaction time delays and prompt sounds.
- Support one-shot address entry.
- Support choice selection for POIs.
* Photo Credit: TomTom provided stock images for PND devices.