Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
Microsoft Speech Platform
Introduction to XML Grammar Elements
An XML-format grammar that conforms to the Speech Recognition Grammar Specification (SRGS) Version 1.0 contains <rule> elements which define the speech input that a speech recognition engine will recognize. Rule elements contain the sets of words or phrases that the speech recognition engine uses to match user input, and also specify the required sequence of phrases that a user can speak. Any of the SRGS elements or unmarked text sequences within a <rule> element is called a rule expansion.
A grammar rule must contain at least one rule expansion that contains text that a user can speak. You place elements, such as <item> elements, <token> elements, and <ruleref> elements (which contain references to other rules, including those in other grammars) in a specific sequential order. This allows grammars to offer multiple variations of word combinations that can be recognized.
Follow the links below to examples for the most commonly used SRGS elements:
- item element
- one-of element
- ruleref element
- tag element
- token element
Examples
The following describes grammar elements that you can use to define recognizable phrases within a <rule> element in an SRGS grammar.
item element
The <item> element may contain unmarked text that can be spoken, a <one-of> element, a <ruleref> element, a <tag> element, a <token> element, or any logical combination of these. See item Element (Microsoft.Speech) for more information.
When an <item> element contains a combination of rule expansions (for example, a combination of words), the sequence of the words in that <item> element must match the sequence of the words spoken by the user for recognition to be successful. For example, given the following grammar, the spoken input must contain the phrase "metallic red" for recognition to be successful.
`
<grammar version="1.0" xml:lang="en-US" mode="voice" root="ruleColors" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0">`<rule id="ruleColors" scope="public"> <item> metallic red </item> </rule>
</grammar>
Back to top
one-of element
The <one-of> element contains a set of alternative rule expansions, any of which can be used to recognize spoken input. This increases the flexibility of the grammar by requiring that the input match only one of the alternatives. See one-of Element (Microsoft.Speech) for more information.
In the following grammar, the input must contain the initial phrase "I would like the car in" for recognition to be successful. However, the phrase must be completed by any of the words: "red", "white", or "green".
`
<grammar version="1.0" xml:lang="en-US" mode="voice" root="ruleColors" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0">`<rule id="ruleColors" scope="public"> <item> I would like the car in </item> <one-of> <item> red </item> <item> white </item> <item> green </item> </one-of> </rule>
</grammar>
Back to top
ruleref element
The <ruleref> element specifies a pointer to another rule that spoken input must match as part of a successful recognition of the current rule. See ruleref Element (Microsoft.Speech) for more information.
The following example defines a <rule> element named ruleColors that contains alternative selections for colors. The root rule, buyShirt, then uses a <ruleref> element to reference the ruleColors rule twice.
`
<grammar version="1.0" xml:lang="en-US" mode="voice" root="buyShirt" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0">`<rule id="buyShirt" scope="public"> <item> Get me a <ruleref uri="#ruleColors" /> shirt and a <ruleref uri="#ruleColors"/> tie </item> </rule>
<rule id="ruleColors" scope="public"> <one-of> <item> red </item> <item> white </item> <item> green </item> </one-of> </rule>
</grammar>
Back to top
tag element
The <tag> element allows the grammar author to enter additional information, called semantics, in an SRGS grammar. The speech recognition engine will return the semantics when it recognizes the phrase to which the <tag> element refers. The contents of a <tag> element can be either text or ECMAScript (JavaScript) that defines the semantics for a phrase. Your application can use the semantic results returned by the speech recognition engine to initiate actions in response to recognized phrases. See tag Element (Microsoft.Speech) for more information.
Semantics are a powerful authoring tool because they allow users to express themselves in a variety of ways, but restrict the language that the application must process. The following example of a simple yes/no grammar illustrates how a user may have multiple ways of expressing an affirmative or negative response, but thanks to semantic assignment, the application need only understand "yes" or "no".
`
<?xml version="1.0" encoding="utf-8"?> <grammar version="1.0" xml:lang="en-US" mode="voice" root="yesNo" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0">`<rule id="yesNo"> <one-of> <item> <tag> out="yes"; </tag> <one-of> <item> yes </item> <item> yep </item> <item> yeah </item> <item> affirmative </item> </one-of> </item> <item> <tag> out="no"; </tag> <one-of> <item> no </item> <item> nope </item> <item> neah </item> <item> negative </item> </one-of> </item> </one-of> </rule>
</grammar>
Back to top
token element
The <token> element contains a word or a short phrase that the speech recognition engine can use to perform recognition. You can use the attributes of the <token> element to specify a custom pronunciation for the contained word or an alternate display form for the word. A custom pronunciation specified inline in a grammar using the token element overrides the default engine pronunciation and pronunciations in application lexicons, but is only valid for the single instance of the word. See token Element (Microsoft.Speech) for more information.
The following example uses the <token> element to specify a custom pronunciation for the word "etouffee" inline in the grammar.
`
<?xml version="1.0" encoding="UTF-8"?>`<grammar version="1.0" mode="voice" root="etouffee" xml:lang="en-US" tag-format="semantics/1.0" xml:base="https://www.contoso.com/" xmlns="http://www.w3.org/2001/06/grammar" sapi:alphabet="x-microsoft-ups" xmlns:sapi="https://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">
<rule id="etouffee" scope="public"> <item> Display a recipe for shrimp </item> <token sapi:pron="EI . T U . F EI"> etouffee. </token> </rule>
</grammar>
For more information about specifying custom pronunciations, see Using Custom Pronunciations.
Follow the links below for more information about authoring SRGS grammars:
- How to Create a Basic XML Grammar (Microsoft.Speech)
- Microsoft Grammar Development Tools
- SRGS Grammar XML Reference (Microsoft.Speech)
- Speech Recognition Grammar Specification (SRGS) Version 1.0
- Semantic Interpretation Markup (Microsoft.Speech)
Back to top