Share via


Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

grammar Element (Microsoft.Speech)

Specifies the highest level container for an XML grammar definition. This element is required to make a valid grammar.

Syntax

<grammar
   mode = (voice | dtmf)
   root = "string"
   sapi:alphabet= (ipa | x-microsoft-ups | x-microsoft-sapi)
   tag-format = (semantics/1.0 | semantics-ms/1.0 | semantics/1.0-literals)
   version = "1.0"
   xml:base = "grammarBaseUri"
   xml:lang = "language code-country/region code"
   xmlns = "http://www.w3.org/2001/06/grammar"
   xmlns:sapi= "https://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">
</grammar>

Attributes

Attribute

Description

mode

Optional. Specifies the mode of the grammar. The mode can be one of the following values.

  • voice for spoken input

  • dtmf for dual tone multi-frequency (DTMF) input

If omitted, the default value is voice.

root

Optional, but recommended. Specifies the name of the grammar rule that will be active when the grammar is loaded by a speech recognition engine. If root is omitted, the grammar passes validation checks and compiles, but does not trigger recognition. The rule declared as the root rule must be defined within the scope of the grammar. The root rule can be scoped as either public or private.

sapi:alphabet

Required if using the sapi:pron attribute in the token Element (Microsoft.Speech). Specifies the phonetic alphabet to use for pronunciations defined in the sapi:pron attribute. Valid values are ipa, x-microsoft-ups, and x-microsoft-sapi. When using sapi:alphabet, the grammar element must contain the following declaration: xmlns:sapi="https://schemas.microsoft.com/Speech/2002/06/SRGSExtensions"

tag-format

Required if a grammar contains tag elements, this attribute specifies the content type of all tag elements contained within a grammar. This attribute takes one of the following values:

  • semantics/1.0 declares that the content within tag elements is ECMAScript.

  • semantics-ms/1.0 declares that the content within tag elements is ECMAScript as implemented by Microsoft.

  • semantics/1.0-literals declares that the content within tag elements is a boolean, an integer, a float, or a string. A string CANNOT be enclosed in double quotes.

version

Required. Specifies the version number of the Speech Recognition Grammar Specification used. The only accepted value is 1.0.

xml:base

Optional. Specifies a grammar document's base Uniform Resource Identifier (URI). The value for xml:base is used to resolve relative URIs in a grammar document. For example, a grammar file declares:
xml:base="https://www.contoso.com/"
and contains a relative reference to another document, for example:
<ruleref uri="ExternalGrammar.grxml">
This creates the following absolute path to the document:
https://www.contoso.com/ExternalGrammar.grxml.

xml:lang

Required if the value of the mode attribute is voice, optional if the value of the mode attribute is dtmf. Declares the single language for the content of the containing grammar document. The value may contain either a lower-case, two-letter language code, (such as "en" for English or "fr" for French) or may optionally include an upper-case, country/region or other variation in addition to the language code. Examples with a county/region code include "es-US" for Spanish as spoken in the US, or "fr-CA" for French as spoken in Canada. See the Remarks section for additional information.

xmlns

Required. Specifies the XML namespace for W3C speech recognition grammar. The XML namespace is http://www.w3.org/2001/06/grammar.

xmlns:sapi

Required if the grammar uses any of the following Microsoft-proprietary extensions to the SRGS specification:

The value must be https://schemas.microsoft.com/Speech/2002/06/SRGSExtensions.

Remarks

The model and syntax indicated by the tag-format values semantics/1.0 and semantics/1.0-literals are defined in the W3C specification recommendation Semantic Interpretation for Speech Recognition (SISR) Version 1.0. The tag-format value semantics-ms/1.0 indicates a model and syntax defined by Microsoft. See Support for Semantic Markup (Microsoft.Speech) for more information.

The content of tag elements in a grammar must be of the type declared in the grammar element's tag-format attribute. Using a string literal syntax when the value of tag-format is semantics/1.0 or semantics-ms/1.0 will generally result in a runtime error. Using the ECMAScript syntax when the value of tag-format is semantics/1.0-literals will not produce a runtime error, but will erroneously populate Rule Variables with ECMAScript code. See tag Element (Microsoft.Speech) for more information and examples that use the syntax for each of the values of tag-format.

For a given language code declared in the xml:lang attribute, a Runtime Language that supports that language code must be installed for the grammar to be loaded successfully. The Microsoft Speech Platform Runtime 11 and Microsoft Speech Platform SDK 11 do not include any Runtime Languages. You must download a Runtime Language for each language on which you want to perform speech recognition. A Runtime Language includes the language model, acoustic model, and other data necessary to provision a speech engine to perform speech recognition in a particular language.

The Runtime Languages are different for each version of the Speech Platform Runtime. You must download the Runtime Language version that matches the version of the Speech Platform Runtime that you have installed. The Runtime Languages for the Speech Platform SDK 11 are redistributable and are different than the languages that ship with Windows Vista or Windows 7. Use the following link to download Runtime Languages for version 11 of the Speech Platform Runtime:

See Language Support for a list of languages for which you can download language packs.

The Speech Platform SDK 11 accepts all valid language-country codes. If the grammar element specifies only a language code, and not a country/region code, for the xml:lang attribute (such as xml:lang="en"), then any installed recognizer that expresses support for that generic, region-independent language will be able to load the grammar. See Language Identifier Constants and Strings for a comprehensive list of language codes.

The Speech Platform SDK 11 does not currently support grammars that specify multiple languages. This is a departure from the Speech Recognition Grammar Specification (SRGS) Version 1.0, which allows for a grammar processor to optionally support multiple languages. For example, the SDK does not permit a grammar such as the one shown in the following example:

<?xml version="1.0" encoding="utf-8"?>
<grammar version="1.0" xml:lang="en-GB" xmlns="http://www.w3.org/2001/06/grammar" root="Digits">
  <rule id="Digits">
    <one-of>
      <item xml:lang="fr-FR"> deux </item>
    </one-of>
  </rule>
</grammar>

To support multiple languages for your applications, you can use multiple grammars in parallel, each with a separate single language. An application's recognition engine may load and independently enable or disable one or more grammar files.

Note

The Speech Platform SDK 11 does support multiple languages in Speech Synthesis Markup Language (SSML) documents used to create prompts for synthesized speech. See speak Element (Microsoft.Speech)

Example

The following is an example of a simple grammar that declares all the attributes of the grammar element:

<?xml version="1.0" encoding="utf-8"?>
<grammar 
   version="1.0" mode="voice" root="Welcome"
   tag-format="semantics/1.0" xml:lang="en-US"
   xml:base="https://www.contoso.com/"
   xmlns="http://www.w3.org/2001/06/grammar"
   xmlns:sapi="https://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">

<rule id="Welcome">
   <item>
      Welcome to the managed code API for speech on servers.
   </item>
</rule>

</grammar>