Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
Microsoft Speech Platform
Legacy SAPI Phone Sets
The first languages created by Microsoft for speech recognition and text-to-speech on computers specified pronunciations of words using the SAPI Phone Set. A phone consists of one or more characters from a phonetic alphabet that describe a discreet sound in a spoken language. The following languages use the SAPI Phone Set to describe the sounds of each language.
Language-Culture Code | Language Name | Language ID |
---|---|---|
zh-TW | Chinese (Taiwan) | 404 |
zh-CN | Chinese (PRC) | 804 |
en-US | English (United States) | 409 |
fr-FR | French (Standard) | 40c |
de-DE | German (Standard) | 407 |
jp-JP | Japanese | 411 |
es-ES | Spanish (Spain, Traditional Sort) | 40a |
Table 1: Languages that use the SAPI Phone Set
More recently, Microsoft developed the Universal Phone Set (UPS), a machine-readable phonetic alphabet that is based on the International Phonetic Alphabet (IPA). UPS is a more powerful phonetic alphabet than the SAPI Phone Set, can be used to describe the sounds of any new language, and is easier to map to other machine-readable phonetic alphabets such as X-SAMPA, SAMPA or ASCII-IPA.
You can specify custom word pronunciations programmatically for any language in the psz parameter of ISpGrammarBuilder::AddWordTransition, using phones from either the UPS (recommended) or SAPI (deprecated) phonetic alphabets. The Speech Platform performs phone conversion automatically between UPS and SAPI formats as required for the target language.
ISpGrammarBuilder::AddWordTransition does not accept pronunciation strings that contain phones from the International Phonetic Alphabet (IPA). However, you can use phones from the IPA (as well as from UPS and SAPI) to specify custom pronunciations for the Speech Platform in grammars that conform to the Speech Recognition Grammar Specification (SRGS) Version 1.0, prompts that conform to the Speech Synthesis Markup Language (SSML) Version 1.0, or PLS lexicons that conform to the Pronunciation Lexicon Specification (PLS) Version 1.0. See the following:
- Use the token Element (Microsoft.Speech) to specify an inline pronunciation in an XML-format SRGS grammar document.
- Use the phoneme Element (Microsoft.Speech) to specify an inline pronunciation in an SSML prompt document.
- Use the phoneme Element PLS (Microsoft.Speech) to specify a word pronunciation in a PLS lexicon.
We recommend that you do not use the SAPI Phone Set to specify pronunciations for any languages in your applications. Instead, use the UPS for ISpGrammarBuilder grammars and the Speech Platform will automatically perform conversion to SAPI Phone Set for legacy languages, as required. Use phones from the UPS or IPA to define pronunciations for XML-format SRGS grammars, SSML prompts, or PLS lexicons. Please see Lexicons and Phonetic Alphabets (Microsoft.Speech) for detailed information about phonetic alphabets and specifying custom pronunciations.
Information about legacy SAPI Phone Sets is provided for reference in the following topics: