Share via


Speech

Kinect for Windows 1.5, 1.6, 1.7, 1.8

Speech recognition is one of the key functionalities of the NUI API. The Kinect sensor’s microphone array is an excellent input device for speech recognition-based applications. It provides better sound quality than a comparable single microphone and is much more convenient to use than a headset. Managed applications can use the Kinect microphone with the Microsoft.Speech API, which supports the latest acoustical algorithms. Kinect for Windows SDK includes a custom acoustical model that is optimized for the Kinect's microphone array.

Important

By default, the AdaptationOn flag in the speech engine is set to ON, which means that the speech engine is actively adapting its speech models to the current speaker(s). This can cause problems over time in noisy environments or where there are a great number of speakers (in a kiosk environment, for example). Therefore we recommend that the AdaptationOn flag be set to OFF in such applications. For more details, see here.

In addition, as the speech engine is run continuously, its RAM requirement grows. We also recommend, for long recognition sessions, that the SpeechRecognitionEngine be recycled (destroyed and recreated) periodically, say every 2 minutes based on your resource constraints.

Note

There is a known issue regarding the default gain setting for the microphone array.

MSDN provides limited documentation for the Microsoft.Speech API. As an alternative, install the Microsoft Speech Platform - Software Development Kit (SDK) and use the installed HTML Help file (.chm), which can be found at Program Files\Microsoft Speech Platform SDK\Docs. (See the software requirements to download the Speech SDK.)

Note

To support dictation in your application, use the Kinect microphone array with the System.Speech API (instead of Microsoft.Speech, which does not support dicatation). When doing so, you must use the standard Windows language pack. (The Windows Runtime Language Pack is not supported with the System.Speech API.) Be aware that the Windows Runtime Language Pack is not optimized for the Kinect microphone array and might not provide the same level of recognition accuracy.

Supported Languages for Speech Recognition

Acoustic models have been created to allow speech recognition in several locales in addition to the default locale of en-US. These are runtime components that are packaged individually and are available here. The following locales are now supported:

  • de-DE
  • en-AU
  • en-CA
  • en-GB
  • en-IE
  • en-NZ
  • es-ES
  • es-MX
  • fr-CA
  • fr-FR
  • it-IT
  • ja-JP

In This Section

  • Speech Tasks in C#
    Here are some speech user tasks, such as identifying a speech recognition engine, creating a grammar, and using a confidence level to recognize user commands.