Starting speech recognition for Windows Phone 8
[ This article is for Windows Phone 8 developers. If you’re developing for Windows 10, see the latest documentation. ]
The Windows.Phone.Speech.Recognition API contains two speech recognizers. Each one has its own method for starting a recognition operation.
SpeechRecognizerUIRecognizeWithUIAsync()()() starts recognition for a SpeechRecognizerUI object.
SpeechRecognizerRecognizeAsync()()() starts recognition for a SpeechRecognizer object.
A recognition operation may be initiated by your app logic or by user input, such as pressing a button.
Using the RecognizeWithUIAsync method
On each call to SpeechRecognizerUIRecognizeWithUIAsync()()(), the SpeechRecognizerUI object attempts to match one utterance from a user to an enabled grammar in a speech recognizer's grammar set. To assist the user to be successful with speech recognition, the SpeechRecognizerUIRecognizeWithUIAsync()()() method provides a GUI with screens that prompt the user, provide feedback, and allow the user to initiate multiple recognition attempts.
If a recognition attempt fails, the SpeechRecognizerUI object presents an error handling screen that lets the user initiate another recognition attempt, repeatedly if necessary.
If a user's utterance can be matched with similar confidence to more than one phrase defined in enabled grammars, the SpeechRecognizerUI object presents a Did you say screen that lets a user choose from among possible matches.
You can use the properties of the SpeechRecognizerUISettings class to customize the screens that prompt the user and provide confirmation. See Presenting prompts, confirmations, and disambiguation choices for Windows Phone 8 for more info.
RecognizeWithUIAsync method example
The following example shows a use of SpeechRecognizerUIRecognizeWithUIAsync()()() and the preparation of the text for the GUI screens in an app that recognizes city names.
public partial class MainPage : PhoneApplicationPage
{
// Declare the SpeechRecognizerUI object at the class level.
SpeechRecognizerUI recoWithUI;
// Construct the main page.
public MainPage()
{
InitializeComponent();
// Initialize objects ahead of time to avoid delays when starting recognition.
recoWithUI = new SpeechRecognizerUI();
// Set the path to an SRGS-compliant XML file.
Uri citiesGrammar = new Uri("ms-appx:///CitiesList.grxml", UriKind.Absolute);
// Add the SRGS grammar to the grammar set.
recoWithUI.Recognizer.Grammars.AddGrammarFromUri("cities", citiesGrammar);
// Let the user know what to say.
recoWithUI.Settings.ListenText = "Fly to what city?";
// Give an example of expected input.
recoWithUI.Settings.ExampleText = " 'Barcelona', 'Montreal', 'Santiago' ";
}
// Handle the button click event.
private async void Reco1_Click(object sender, RoutedEventArgs e)
{
// Start recognition.
SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync();
}
}
The following code example shows the corresponding CitiesList.grxml SRGS grammar file.
<?xml version="1.0" encoding="utf-8"?>
<grammar xml:lang="en-US" root="Main" tag-format="semantics/1.0" version="1.0" xmlns="http://www.w3.org/2001/06/grammar">
<!-- Defines an SRGS grammar for choosing a city for flight departures and destinations. -->
<rule id="Main">
<item repeat ="0-1"> I want to fly to </item>
<ruleref uri="#Cities" />
</rule>
<rule id="Cities" scope="public">
<one-of>
<item> Seattle </item>
<item> Sao Paulo </item>
<item> Rome </item>
<item> Tokyo </item>
<item> Barcelona </item>
<item> Montreal </item>
<item> Santiago </item>
</one-of>
</rule>
</grammar>
Using the RecognizeAsync method
On each call to SpeechRecognizerRecognizeAsync()()(), the SpeechRecognizer object attempts to match one utterance from a user to an enabled grammar in a speech recognizer's grammar set. If any of the following occur, the recognition operation will be finalized:
Recognition was successful. The user's utterance was matched to a phrase in an enabled grammar.
Recognition was unsuccessful. The user's utterance could not be matched to a phrase in an enabled grammar.
A recognition timeout interval expired. If the interval for one of the SpeechRecognizerSettingsInitialSilenceTimeout()()(), SpeechRecognizerSettingsBabbleTimeout()()(), or SpeechRecognizerSettingsEndSilenceTimeout()()() expires, the recognition operation is finalized with or without a match to an enabled grammar.
For more info about timeout settings, see Customizing speech recognizer settings for Windows Phone 8.
You typically will use SpeechRecognizerRecognizeAsync()()() if you don't require a GUI to support the recognition operation, or if you provide a custom GUI to support the recognition operation.
RecognizeAsync method example
The following example shows a use of SpeechRecognizerRecognizeAsync()()() that provides a recognition flow similar to what a user would experience with the GUI screens that would be included in a call to SpeechRecognizerUIRecognizeWithUIAsync()()().
This app excerpt includes code to check the results of recognition attempts and to present prompts and confirmations to the user in a text block. The app also subscribes to the SpeechRecognizerAudioCaptureStateChanged()()() event and uses info returned by the event to display "Listening" and "Thinking" messages to the user. In addition to the same checks that would be included with a call to SpeechRecognizerUIRecognizeWithUIAsync()()(), this example checks to see if speech input was recognized with low confidence and prompts the user to speak again. The user can initiate recognition with a tap of a button.
public partial class MainPage : PhoneApplicationPage
{
// Declare the SpeechRecognizer object at the class level.
SpeechRecognizer myRecognizer;
// Construct the main page.
public MainPage()
{
InitializeComponent();
// Initialize the SpeechRecognizer and add the WebSearch grammar.
myRecognizer = new SpeechRecognizer();
myRecognizer.Grammars.AddGrammarFromPredefinedType("citySearch", SpeechPredefinedGrammar.WebSearch);
// Prompt the user for a city name.
displayText.Text = "What's your destination city?";
// Subscribe to the AudioCaptureStateChanged event.
myRecognizer.AudioCaptureStateChanged += myRecognizer_AudioCaptureStateChanged;
}
// Detect capture state changes and write the capture state to the text block.
void myRecognizer_AudioCaptureStateChanged(SpeechRecognizer sender, SpeechRecognizerAudioCaptureStateChangedEventArgs args)
{
if (args.State == SpeechRecognizerAudioCaptureState.Capturing)
{
this.Dispatcher.BeginInvoke(delegate { displayText.Text = "Listening"; });
}
else if (args.State == SpeechRecognizerAudioCaptureState.Inactive)
{
this.Dispatcher.BeginInvoke(delegate { displayText.Text = "Thinking"; });
}
}
// Handle the button click event.
private async void Reco1_Click(object sender, RoutedEventArgs e)
{
// Start recognition.
SpeechRecognitionResult recoResult = await myRecognizer.RecognizeAsync();
// Check to see if speech input was rejected and prompt the user.
if (recoResult.TextConfidence == SpeechRecognitionConfidence.Rejected)
{
displayText.Text = "Sorry, didn't catch that. \n\nSay again.";
}
// Check to see if speech input was recognized with low confidence and prompt the user to speak again.
else if (recoResult.TextConfidence == SpeechRecognitionConfidence.Low)
{
displayText.Text = "Not sure what you said. \n\nSay again.";
}
// Check to see if speech input was recognized and confirm the result.
else if (recoResult.TextConfidence == SpeechRecognitionConfidence.High ||
recoResult.TextConfidence == SpeechRecognitionConfidence.Medium)
{
displayText.Text = "Heard you say: \n\n" + recoResult.Text;
}
}
}