Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
Confusability Input and Output File Format
The Confusability tool accepts as input an XML-based grammar file and a configuration file for a recognition engine. You can optionally specify the identifier for a rule in the grammar, or provide a list of phrases to analyze in a separate XML document. The tool performs the confusability analysis on the entire grammar, or on a rule within the grammar (if specified), or on a separate list of phrases (if supplied). The tool writes a summary of the analysis to the console and generates complete results of the analysis to an XML file.
Input File Format
Confusability accepts the following file formats for input grammar files:
Grammar documents in XML format that conform to the Speech Recognition Grammar Specification (SRGS) Version 1.0, for example “MyGrammar.grxml”. It is an accepted convention to use the “.grxml” file extension for XML-based grammar documents that conform to the SRGS specification.
Compiled grammar files in CFG format, for example “MyCompiledGrammar.cfg”. See the Compile Grammar Reference Manual for information about compiling grammar files.
Note
Input files in other formats will generate an error.
Example Input Grammar File
The following is an example of a valid input grammar document in XML format that contains only the minimum required elements and attributes in the document’s header, and a simple rule in the body of the document.
<?xml version="1.0"?>
<grammar version="1.0"
xmlns="http://www.w3.org/2001/06/grammar"
xml:lang="en-US" root="main">
<rule id="main">
<one-of>
<item> to </item>
<item> two </item>
</one-of>
</rule>
</grammar>
A valid XML-format grammar document consists of a legal header followed by a body consisting of a set of legal rule definitions. The document header ends and the body of the grammar document begins with the first rule element. Grammar files used as input for Confusability can contain optional elements and attributes in the document header, see SRGS Grammar XML Reference (Microsoft.Speech). For more information about elements and attributes in SRGS grammars, see Speech Recognition Grammar Specification Version 1.0.
Note
-
The Confusability tool does NOT use the custom pronunciation (if specified) to create a pronunciation for a word. It uses only the lexical form of a word to create a pronunciation.
-
Confusability DOES use custom pronunciations (if specified) when searching for confusable phrases for pronunciations it created.
-
You can specify custom pronunciations either inline in XML-format grammars, or in a lexicon that is linked to a grammar. See Using Custom Pronunciations for more information.
-
The lexical form of a word may be unmarked text, or the contents of an item Element (Microsoft.Speech) or a token Element (Microsoft.Speech).
Specifying a List of Phrases
Using the /PhraseFile option, you can specify a list of phrases on which to perform the confusability analysis. The list of phrases may contain phrases that are not in the grammar. We recommend that you use Phrase Generator to produce the list of phrases for input to the Confusability tool. See Phrase Generator Reference Manual. The following is an example of an input file that contains a list of phrases to check against the grammar shown above. Note that the second phrase is not in the grammar.
<?xml version="1.0" encoding="utf-8"?>
<Scenario>
<Utterances>
<Utterance>
<TranscriptText>to</TranscriptText>
</Utterance>
<Utterance>
<TranscriptText>too</TranscriptText>
</Utterance>
</Utterances>
</Scenario>
Note
-
The maximum number of phrases that the confusability tool will process is 100,000. The tool will exit if it reaches this threshold.
-
The confusability tool ignores custom pronunciations included in a list of phrases used as input to the /PhraseFile option.
Output File Format
The Confusability tool writes a summary of the analysis to the console and generates complete results of the analysis to an XML file.
Summary Output to the Console
When the analysis is complete, a summary containing the following information displays in the console:
The total of all phrases analyzed. Depending on which options were specified on the command line, the total represents either all the phrases in the grammar, or all the phrases from a rule within the grammar, or all the phrases in the separate list of phrases.
A count of phrases that generated in-grammar false accepts.
Whether the output file was successfully generated.
The following is an example of the summary output to the console:
Info: 3 phrases analyzed.
Info: 2 phrases found with confusability greater than zero.
Info: Successfully output details to "MyOut.xml"
Processing Complete: 0 Error(s). 0 Warning(s).
Confusability Output File
The Confusability tool generates detailed results of the analysis to an output file in XML format. The output file contains each input phrase (from the phrase file or generated from the grammar) that has some likelihood of being confused by the recognition engine with another phrase in the grammar. The output file includes the following information for each input phrase:
Output File Element |
Description |
---|---|
TranscriptText |
The input phrase on which analysis was performed. |
TranscriptSemantics |
The semantics of the input phrase (for in-grammar phrases only). |
TranscriptPronunciation |
The pronunciation of the input phrase (for in-grammar phrases only). |
RecoResultText |
The phrase that the engine may confuse with the input phrase, causing a false acceptance. |
RecoResultSemantics |
The semantics of the phrase that the engine may confuse with the input phrase. |
RecoResultRuleTree |
The grammar and rule name from which the recognition occurred. |
RecoResultPronunciation |
The pronunciation of the phrase that the engine may confuse with the input phrase. |
RecoResultConfidence |
A metric that indicates the probability that the input phrase will be confused with the other phrase. |
InGrammar |
Whether the phrase is in-grammar (for out-of-grammar phrases only). |
Example
The following example illustrates the output that the Confusability tool generates using the grammar and phrase file examples shown above as input.
<?xml version="1.0" encoding="utf-8"?>
<Scenario xml:space="preserve">
<Utterances>
<Utterance id="1">
<TranscriptText>to</TranscriptText>
<TranscriptSemantics ms:typespace="ECMA-262" xmlns:ms="https://www.microsoft.com/xmlns/webreco" xmlns:emma="http://www.w3.org/2003/04/emma" />
<TranscriptPronunciation>T O</TranscriptPronunciation>
<RecoResultText>two</RecoResultText>
<RecoResultSemantics ms:typespace="ECMA-262" xmlns:ms="https://www.microsoft.com/xmlns/webreco" xmlns:emma="http://www.w3.org/2003/04/emma" />
<RecoResultRuleTree>
<Rule id="main" uri="file:///C:/temp/grammars/Test.grxml">two</Rule>
</RecoResultRuleTree>
<RecoResultPronunciation>T U</RecoResultPronunciation>
<RecoResultConfidence>0.4486466</RecoResultConfidence>
</Utterance>
<Utterance id="2">
<TranscriptText>too</TranscriptText>
<RecoResultText>to</RecoResultText>
<RecoResultSemantics ms:typespace="ECMA-262" xmlns:ms="https://www.microsoft.com/xmlns/webreco" xmlns:emma="http://www.w3.org/2003/04/emma" />
<RecoResultRuleTree>
<Rule id="main" uri="file:///C:/temp/grammars/Test.grxml">to</Rule>
</RecoResultRuleTree>
<RecoResultPronunciation>T U</RecoResultPronunciation>
<RecoResultConfidence>1.0000000</RecoResultConfidence>
<InGrammar>False</InGrammar>
</Utterance>
<Utterance id="3">
<TranscriptText>too</TranscriptText>
<RecoResultText>two</RecoResultText>
<RecoResultSemantics ms:typespace="ECMA-262" xmlns:ms="https://www.microsoft.com/xmlns/webreco" xmlns:emma="http://www.w3.org/2003/04/emma" />
<RecoResultRuleTree>
<Rule id="main" uri="file:///C:/temp/grammars/Test.grxml">two</Rule>
</RecoResultRuleTree>
<RecoResultPronunciation>T U</RecoResultPronunciation>
<RecoResultConfidence>0.4822275</RecoResultConfidence>
<InGrammar>False</InGrammar>
</Utterance>
</Utterances>
</Scenario>
Remarks
It cannot be assumed that the value for RecoResultConfidence will be the same if the confused phrase is used as the input phrase. In the example above, submitting the word "to" returned the word "two" in the first utterance with a RecoResultConfidence of 0.4486466. If we submit "two" as input for analysis, the tool will return "to", but the value for RecoResultConfidence may be different.
Item weights affect the simulated set of results, such that items with low weights are less likely to be a confused result.