Share via


Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Confusability Input and Output File Format

The Confusability tool accepts as input an XML-based grammar file and a configuration file for a recognition engine. You can optionally specify the identifier for a rule in the grammar, or provide a list of phrases to analyze in a separate XML document. The tool performs the confusability analysis on the entire grammar, or on a rule within the grammar (if specified), or on a separate list of phrases (if supplied). The tool writes a summary of the analysis to the console and generates complete results of the analysis to an XML file.

Input File Format

Confusability accepts the following file formats for input grammar files:

  • Grammar documents in XML format that conform to the Speech Recognition Grammar Specification (SRGS) Version 1.0, for example “MyGrammar.grxml”. It is an accepted convention to use the “.grxml” file extension for XML-based grammar documents that conform to the SRGS specification.

  • Compiled grammar files in CFG format, for example “MyCompiledGrammar.cfg”. See the Compile Grammar Reference Manual for information about compiling grammar files.

Note

Input files in other formats will generate an error.

Example Input Grammar File

The following is an example of a valid input grammar document in XML format that contains only the minimum required elements and attributes in the document’s header, and a simple rule in the body of the document.

<?xml version="1.0"?>   
<grammar version="1.0"      
xmlns="http://www.w3.org/2001/06/grammar"      
xml:lang="en-US" root="main">      

   <rule id="main">         
      <one-of>            
         <item> to </item>            
         <item> two </item>         
      </one-of>      
   </rule> 
  
</grammar>

A valid XML-format grammar document consists of a legal header followed by a body consisting of a set of legal rule definitions. The document header ends and the body of the grammar document begins with the first rule element. Grammar files used as input for Confusability can contain optional elements and attributes in the document header, see SRGS Grammar XML Reference (Microsoft.Speech). For more information about elements and attributes in SRGS grammars, see Speech Recognition Grammar Specification Version 1.0.

Note

  • The Confusability tool does NOT use the custom pronunciation (if specified) to create a pronunciation for a word. It uses only the lexical form of a word to create a pronunciation.

  • Confusability DOES use custom pronunciations (if specified) when searching for confusable phrases for pronunciations it created.

  • You can specify custom pronunciations either inline in XML-format grammars, or in a lexicon that is linked to a grammar. See Using Custom Pronunciations for more information.

  • The lexical form of a word may be unmarked text, or the contents of an item Element (Microsoft.Speech) or a token Element (Microsoft.Speech).

Specifying a List of Phrases

Using the /PhraseFile option, you can specify a list of phrases on which to perform the confusability analysis. The list of phrases may contain phrases that are not in the grammar. We recommend that you use Phrase Generator to produce the list of phrases for input to the Confusability tool. See Phrase Generator Reference Manual. The following is an example of an input file that contains a list of phrases to check against the grammar shown above. Note that the second phrase is not in the grammar.

<?xml version="1.0" encoding="utf-8"?>
<Scenario>
  <Utterances>
    <Utterance>
      <TranscriptText>to</TranscriptText>
    </Utterance>
    <Utterance>
      <TranscriptText>too</TranscriptText>
    </Utterance>
  </Utterances>
</Scenario>

Note

  • The maximum number of phrases that the confusability tool will process is 100,000. The tool will exit if it reaches this threshold.

  • The confusability tool ignores custom pronunciations included in a list of phrases used as input to the /PhraseFile option.

Output File Format

The Confusability tool writes a summary of the analysis to the console and generates complete results of the analysis to an XML file.

Summary Output to the Console

When the analysis is complete, a summary containing the following information displays in the console:

  • The total of all phrases analyzed. Depending on which options were specified on the command line, the total represents either all the phrases in the grammar, or all the phrases from a rule within the grammar, or all the phrases in the separate list of phrases.

  • A count of phrases that generated in-grammar false accepts.

  • Whether the output file was successfully generated.

The following is an example of the summary output to the console:

    Info: 3 phrases analyzed.

    Info: 2 phrases found with confusability greater than zero.

    Info: Successfully output details to "MyOut.xml"

    Processing Complete: 0 Error(s). 0 Warning(s).

Confusability Output File

The Confusability tool generates detailed results of the analysis to an output file in XML format. The output file contains each input phrase (from the phrase file or generated from the grammar) that has some likelihood of being confused by the recognition engine with another phrase in the grammar. The output file includes the following information for each input phrase:

Output File Element

Description

TranscriptText

The input phrase on which analysis was performed.

TranscriptSemantics

The semantics of the input phrase (for in-grammar phrases only).

TranscriptPronunciation

The pronunciation of the input phrase (for in-grammar phrases only).

RecoResultText

The phrase that the engine may confuse with the input phrase, causing a false acceptance.

RecoResultSemantics

The semantics of the phrase that the engine may confuse with the input phrase.

RecoResultRuleTree

The grammar and rule name from which the recognition occurred.

RecoResultPronunciation

The pronunciation of the phrase that the engine may confuse with the input phrase.

RecoResultConfidence

A metric that indicates the probability that the input phrase will be confused with the other phrase.

InGrammar

Whether the phrase is in-grammar (for out-of-grammar phrases only).

Example

The following example illustrates the output that the Confusability tool generates using the grammar and phrase file examples shown above as input.

<?xml version="1.0" encoding="utf-8"?>
<Scenario xml:space="preserve">
  <Utterances>
    <Utterance id="1">
      <TranscriptText>to</TranscriptText>
      <TranscriptSemantics ms:typespace="ECMA-262" xmlns:ms="https://www.microsoft.com/xmlns/webreco" xmlns:emma="http://www.w3.org/2003/04/emma" />
      <TranscriptPronunciation>T O</TranscriptPronunciation>
      <RecoResultText>two</RecoResultText>
      <RecoResultSemantics ms:typespace="ECMA-262" xmlns:ms="https://www.microsoft.com/xmlns/webreco" xmlns:emma="http://www.w3.org/2003/04/emma" />
      <RecoResultRuleTree>
        <Rule id="main" uri="file:///C:/temp/grammars/Test.grxml">two</Rule>
      </RecoResultRuleTree>
      <RecoResultPronunciation>T U</RecoResultPronunciation>
      <RecoResultConfidence>0.4486466</RecoResultConfidence>
    </Utterance>
    <Utterance id="2">
      <TranscriptText>too</TranscriptText>
      <RecoResultText>to</RecoResultText>
      <RecoResultSemantics ms:typespace="ECMA-262" xmlns:ms="https://www.microsoft.com/xmlns/webreco" xmlns:emma="http://www.w3.org/2003/04/emma" />
      <RecoResultRuleTree>
        <Rule id="main" uri="file:///C:/temp/grammars/Test.grxml">to</Rule>
      </RecoResultRuleTree>
      <RecoResultPronunciation>T U</RecoResultPronunciation>
      <RecoResultConfidence>1.0000000</RecoResultConfidence>
      <InGrammar>False</InGrammar>
    </Utterance>
    <Utterance id="3">
      <TranscriptText>too</TranscriptText>
      <RecoResultText>two</RecoResultText>
      <RecoResultSemantics ms:typespace="ECMA-262" xmlns:ms="https://www.microsoft.com/xmlns/webreco" xmlns:emma="http://www.w3.org/2003/04/emma" />
      <RecoResultRuleTree>
        <Rule id="main" uri="file:///C:/temp/grammars/Test.grxml">two</Rule>
      </RecoResultRuleTree>
      <RecoResultPronunciation>T U</RecoResultPronunciation>
      <RecoResultConfidence>0.4822275</RecoResultConfidence>
      <InGrammar>False</InGrammar>
    </Utterance>
  </Utterances>
</Scenario>

Remarks

It cannot be assumed that the value for RecoResultConfidence will be the same if the confused phrase is used as the input phrase. In the example above, submitting the word "to" returned the word "two" in the first utterance with a RecoResultConfidence of 0.4486466. If we submit "two" as input for analysis, the tool will return "to", but the value for RecoResultConfidence may be different.

Item weights affect the simulated set of results, such that items with low weights are less likely to be a confused result.