Speech Tasks in C#

Article
09/12/2013

Kinect for Windows 1.5, 1.6, 1.7, 1.8

Overview

This page describes basic speech user tasks, such as identifying a speech recognition engine, creating a grammar, and using a confidence level to recognize user commands.

Important

By default, the AdaptationOn flag in the speech engine is set to ON, which means that the speech engine is actively adapting its speech models to the current speaker(s). This can cause problems over time in noisy environments or where there are a great number of speakers (in a kiosk environment, for example). Therefore we recommend that the AdaptationOn flag be set to OFF in such applications. For more details, see here.

In addition, as the speech engine is run continuously, its RAM requirement grows. We also recommend, for long recognition sessions, that the SpeechRecognitionEngine be recycled (destroyed and recreated) periodically, say every 2 minutes, based on your resource constraints.

Code It

Get the most suitable speech recognizer (acoustic model)

  private static RecognizerInfo GetKinectRecognizer()
  {
      foreach (RecognizerInfo recognizer in SpeechRecognitionEngine.InstalledRecognizers())
      {
          string value;
          recognizer.AdditionalInfo.TryGetValue("Kinect", out value);
          if ("True".Equals(value, StringComparison.OrdinalIgnoreCase) &&  "en-US".Equals(recognizer.Culture.Name, StringComparison.OrdinalIgnoreCase))
          {
              return recognizer;
          }
      }
            
      return null;
  }

Create a speech recognition engine

  private SpeechRecognitionEngine speechEngine;

  RecognizerInfo ri = GetKinectRecognizer();

  if (null != ri)
  {
    this.speechEngine = new SpeechRecognitionEngine(ri.Id);
  }

The speech recognition engine uses audio data from the Kinect sensor.

Create and load a grammar

  /****************************************************************
  * Use this code to create grammar programmatically rather than from
  * a grammar file.
  * var directions = new Choices();
  * directions.Add(new SemanticResultValue("forward", "FORWARD"));
  * directions.Add(new SemanticResultValue("forwards", "FORWARD"));
  * directions.Add(new SemanticResultValue("straight", "FORWARD"));
  * directions.Add(new SemanticResultValue("backward", "BACKWARD"));
  * directions.Add(new SemanticResultValue("backwards", "BACKWARD"));
  * directions.Add(new SemanticResultValue("back", "BACKWARD"));
  * directions.Add(new SemanticResultValue("turn left", "LEFT"));
  * directions.Add(new SemanticResultValue("turn right", "RIGHT"));
  * var gb = new GrammarBuilder { Culture = ri.Culture };
  * gb.Append(directions);
  * var g = new Grammar(gb);
  ****************************************************************/

  // Create a grammar from grammar definition XML file.
  using (var memoryStream = new MemoryStream(Encoding.ASCII.GetBytes(Properties.Resources.SpeechGrammar)))
  {
      var g = new Grammar(memoryStream);
      speechEngine.LoadGrammar(g);
  }

There is a known issue regarding support for standard numbers and dates, which may require changes to a grammar built in a Beta version.

Initialize the speech recognition engine

  speechEngine.SpeechRecognized += SpeechRecognized;
  speechEngine.SpeechRecognitionRejected += SpeechRejected;

  speechEngine.SetInputToAudioStream(
      sensor.AudioSource.Start(), new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
                    
  speechEngine.RecognizeAsync(RecognizeMode.Multiple);

Add a SpeechRecognized event handler

  private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
  {
    // Speech utterance confidence below which we treat speech as if it hadn't been heard
    const double ConfidenceThreshold = 0.3;

    if (e.Result.Confidence >= ConfidenceThreshold)
    {
      ...
    }
  }

Shut down the speech recognition engine

  if (null != this.sensor)
  {
      this.sensor.AudioSource.Stop();

      this.sensor.Stop();
      this.sensor = null;
  }

  if (null != this.speechEngine)
  {
      this.speechEngine.SpeechRecognized -= SpeechRecognized;
      this.speechEngine.SpeechRecognitionRejected -= SpeechRejected;
      this.speechEngine.RecognizeAsyncStop();
  }

For additional examples, see the SpeechBasics samples (Speech Basics-WPF C# Sample and Speech Basics-D2D C++ Sample,Speech Basics-WPF-VB Sample), which show how to use the Kinect sensor’s microphone array with the Microsoft.Speech API to recognize voice commands. Also see the AudioCaptureRaw-Console C++ Sample, which demonstrates how to capture an audio stream from the Kinect sensor’s microphone array.

Share via