Best Practices for Enabling Voice Recognition
4/8/2010
To be successful, voice recognition accuracy rates must be over 90% in normal conditions and over 80% in extreme conditions. Dropping the success rate by 10% has a profoundly negative effect.
Devices vary widely in their ability to recognize voice due to variations in the microphone hardware, headset hardware, and audio driver.
Successful recognition requires testing the audio subsystem to ensure that under normal usage conditions the 16-kHz audio signal arrives unclipped to the recognizer. This clipping is not easy to detect by listening to recorded audio, but it is easy to detect with an audio analysis tool or software audio waveform editing tool.
Normal usage conditions
- Device held 9-inches away from the user who is looking at the screen and talking to the device
- Device held like a phone with the base of the unit very close to the user's mouth and the screen is toward the cheek
- Device in a car cradle in a running automobile
- Device in a pocket, and headset with the microphone in ear
Normal noise conditions
- A quiet office
- A noisy hallway, meeting room, or restaurant
- A car driving at highway speeds (extreme but still very common)
The microphone gain level is the most critical element to tune.
Automatic Gain Control (AGC) generally distorts the audio due to gain pumping, and drops recognition accuracy rates by as much as 20%. If AGC is provided, it is strongly recommended that it be turned off by default, with an option to turn it on.
Setting the microphone gain level too high drops accuracy rates by as much as 50%. High microphone gain causes the voice signal to be clipped in two ways: background noise saturates the signal, and normal speaking can saturate the signal.
The default mic gain should result in 75% peak-to-peak signal when speaking in a normal voice when the device is held close to the mouth like a phone. It is also useful to provide several gain options to allow the user to configure the device according to the method of use.
To eliminate saturation due to background low-frequency noise, use an analog high pass filter at 200Hz between the microphone and the analog-to-digital conversion. Omitting this filter severely increases audio clipping in noisy environments such as the car.
The microphone should be most sensitive to sound originating in front of the device. Take care to avoid blocking sound from entering the microphone due to the docking cradle, carrying case, the user's hand, or other obstructions.
Consider offering a car cradle that provides a quality external microphone and loudspeaker output into the car stereo system, so users can use voice recognition with your device in their cars.
To enable selecting songs via voice recognition and to control the media player while the device is in a pocket or resting in a car, provide a stereo headset jack, including microphone circuit,
In addition, consider providing a button on your headset that can be user-configured to control a voice recognition program.
Test all aspects of the device's microphone and audio circuitry with a voice recognition program such as Microsoft Voice Command under the usage conditions above to ensure that the device achieves greater than 90% accuracy results for most devices.