Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
Purpose of Grammars (Microsoft.Speech)
A speech recognition grammar is a container of language rules that define a set of constraints that a speech recognizer can use to perform recognition. These rules provide the guidelines that an application uses when collecting spoken input.
A grammar helps an application perform speech recognition in the following ways:
Limits Vocabulary. A grammar contains only the exact words or phrases that an application needs to match for successful recognition of spoken input. An application might need to recognize only a few words that appear in a grammar structure, therefore, the speech recognition engine does not need to search the entire dictionary. Explicitly providing words in a grammar also improves the recognition accuracy, because the speech recognition engine must process speech only to the extent of confirming a match.
Customizes Vocabulary. A grammar customizes a vocabulary for a particular application. Although grammar structures need to be flexible and accommodate a multitude of phrases and phrasing, grammar structures also need to restrict the spoken input to a specific situation or task. Each application has its own natural language. A coffee ordering system, for example, concentrates on language used to order coffee, not language used to order airline tickets.
Filters Recognition Results. A grammar filters the results of recognition that are sent to an application. The speech recognition engine processes all audio signals it receives, regardless of what is contained in the grammars. The engine identifies and matches a word or phrase in the spoken input with a word or phrase defined in the grammar. The advantage of a grammar is that the speech recognition engine returns a successful recognition event only if the grammar is matched. Otherwise, the application would receive many additional recognition results, few of which have meaning to the application.
Identifies Rules. Each rule in a grammar contains an identifier (ID) that is unique within the grammar. When a successful recognition occurs, the speech recognition engine processes the rule ID as part of the recognition result and passes this information back to the speech application.
A grammar may define all the variations for expressing a specific intent in a single rule, which has a unique name. For example, an application may allow a customer to order coffee by saying any of the following:
"I would like a coffee"
"I'd like coffee"
"Get me a coffee"
"Coffee please"
Each phrase is different, but the intent is the same: the customer wants coffee. It makes no difference which phrase in the rule is actually spoken. If the spoken phrase is defined within that rule, the rule is considered successfully matched by the application. The speech recognition engine returns the recognition back to the application with a single rule name. The application uses that name to process the coffee order. For more information, see Grammar Rules (Microsoft.Speech).
Defines Semantics. A grammar can assign an alternate meaning (semantic) to a word or phrase used to recognize spoken input. The semantic meaning of a word or phrase can be more useful to an application than its literal content. For example, a grammar may assign a semantic meaning of "yes" to any of the speech inputs "yeah", "yup", "yep" "affirmative", or "uh-huh". The semantics assigned to a word or phrase by a grammar are returned in the recognition result when that word or phrase is recognized. Semantics allow an application to identify and parse the text returned by recognition. For more information, see Using the tag Element (Microsoft.Speech), Referencing Grammar Rule Variables (Microsoft.Speech), SML Output Overview (Microsoft.Speech) and Add Semantics to a GrammarBuilder Grammar (Microsoft.Speech).