Namespace Microsoft::CognitiveServices::Speech

Summary

Members Descriptions
enum PropertyId Defines speech property ids. Changed in version 1.4.0.
enum OutputFormat Output format.
enum ProfanityOption Removes profanity (swearing), or replaces letters of profane words with stars. Added in version 1.5.0.
enum ResultReason Specifies the possible reasons a recognition result might be generated.
enum CancellationReason Defines the possible reasons a recognition result might be canceled.
enum CancellationErrorCode Defines error code in case that CancellationReason is Error. Added in version 1.1.0.
enum NoMatchReason Defines the possible reasons a recognition result might not be recognized.
enum ActivityJSONType Defines the possible types for an activity json value. Added in version 1.5.0.
enum SpeechSynthesisOutputFormat Defines the possible speech synthesis output audio formats. Updated in version 1.19.0.
enum StreamStatus Defines the possible status of audio data stream. Added in version 1.4.0.
enum ServicePropertyChannel Defines channels used to pass property settings to service. Added in version 1.5.0.
enum VoiceProfileType Defines voice profile types.
enum RecognitionFactorScope Defines the scope that a Recognition Factor is applied to.
enum PronunciationAssessmentGradingSystem Defines the point system for pronunciation score calibration; default value is FivePoint. Added in version 1.14.0.
enum PronunciationAssessmentGranularity Defines the pronunciation evaluation granularity; default value is Phoneme. Added in version 1.14.0.
enum SynthesisVoiceType Defines the type of synthesis voices Added in version 1.16.0.
enum SynthesisVoiceGender Defines the gender of synthesis voices Added in version 1.17.0.
enum SpeechSynthesisBoundaryType Defines the boundary type of speech synthesis boundary event Added in version 1.21.0.
enum SegmentationStrategy The strategy used to determine when a spoken phrase has ended and a final Recognized result should be generated. Allowed values are "Default", "Time", and "Semantic".
class AsyncRecognizer AsyncRecognizer abstract base class.
class AudioDataStream Represents audio data stream used for operating audio data as a stream. Added in version 1.4.0.
class AutoDetectSourceLanguageConfig Class that defines auto detection source configuration Updated in 1.13.0.
class AutoDetectSourceLanguageResult Contains auto detected source language result Added in 1.8.0.
class BaseAsyncRecognizer BaseAsyncRecognizer class.
class CancellationDetails Contains detailed information about why a result was canceled.
class ClassLanguageModel Represents a list of grammars for dynamic grammar scenarios. Added in version 1.7.0.
class Connection Connection is a proxy class for managing connection to the speech service of the specified Recognizer. By default, a Recognizer autonomously manages connection to service when needed. The Connection class provides additional methods for users to explicitly open or close a connection and to subscribe to connection status changes. The use of Connection is optional. It is intended for scenarios where fine tuning of application behavior based on connection status is needed. Users can optionally call Open() to manually initiate a service connection before starting recognition on the Recognizer associated with this Connection. After starting a recognition, calling Open() or Close() might fail. This will not impact the Recognizer or the ongoing recognition. Connection might drop for various reasons, the Recognizer will always try to reinstitute the connection as required to guarantee ongoing operations. In all these cases Connected/Disconnected events will indicate the change of the connection status. Updated in version 1.17.0.
class ConnectionEventArgs Provides data for the ConnectionEvent. Added in version 1.2.0.
class ConnectionMessage ConnectionMessage represents implementation specific messages sent to and received from the speech service. These messages are provided for debugging purposes and should not be used for production use cases with the Azure Cognitive Services Speech Service. Messages sent to and received from the Speech Service are subject to change without notice. This includes message contents, headers, payloads, ordering, etc. Added in version 1.10.0.
class ConnectionMessageEventArgs Provides data for the ConnectionMessageEvent.
class EmbeddedSpeechConfig Class that defines embedded (offline) speech configuration.
class EventArgs Base class for event arguments.
class EventSignal Clients can connect to the event signal to receive events, or disconnect from the event signal to stop receiving events.
class EventSignalBase Clients can connect to the event signal to receive events, or disconnect from the event signal to stop receiving events.
class Grammar Represents base class grammar for customizing speech recognition. Added in version 1.5.0.
class GrammarList Represents a list of grammars for dynamic grammar scenarios. Added in version 1.7.0.
class GrammarPhrase Represents a phrase that may be spoken by the user. Added in version 1.5.0.
class HybridSpeechConfig Class that defines hybrid (cloud and embedded) configurations for speech recognition or speech synthesis.
class KeywordRecognitionEventArgs Class for the events emmited by the KeywordRecognizer.
class KeywordRecognitionModel Represents keyword recognition model used with StartKeywordRecognitionAsync methods.
class KeywordRecognitionResult Class that defines the results emitted by the KeywordRecognizer.
class KeywordRecognizer Recognizer type that is specialized to only handle keyword activation.
class NoMatchDetails Contains detailed information for NoMatch recognition results.
class PersonalVoiceSynthesisRequest Class that defines the speech synthesis request for personal voice (aka.ms/azureai/personal-voice). This class is in preview and is subject to change. Added in version 1.39.0.
class PhraseListGrammar Represents a phrase list grammar for dynamic grammar scenarios. Added in version 1.5.0.
class PronunciationAssessmentConfig Class that defines pronunciation assessment configuration Added in 1.14.0.
class PronunciationAssessmentResult Class for pronunciation assessment results.
class PronunciationContentAssessmentResult Class for content assessment results.
class PropertyCollection Class to retrieve or set a property value from a property collection.
class RecognitionEventArgs Provides data for the RecognitionEvent.
class RecognitionResult Contains detailed information about result of a recognition operation.
class Recognizer Recognizer base class.
class SessionEventArgs Base class for session event arguments.
class SmartHandle Smart handle class.
class SourceLanguageConfig Class that defines source language configuration, added in 1.8.0.
class SourceLanguageRecognizer Class for source language recognizers. You can use this class for standalone language detection. Added in version 1.17.0.
class SpeechConfig Class that defines configurations for speech / intent recognition, or speech synthesis.
class SpeechRecognitionCanceledEventArgs Class for speech recognition canceled event arguments.
class SpeechRecognitionEventArgs Class for speech recognition event arguments.
class SpeechRecognitionModel Speech recognition model information.
class SpeechRecognitionResult Base class for speech recognition results.
class SpeechRecognizer Class for speech recognizers.
class SpeechSynthesisBookmarkEventArgs Class for speech synthesis bookmark event arguments. Added in version 1.16.0.
class SpeechSynthesisCancellationDetails Contains detailed information about why a result was canceled. Added in version 1.4.0.
class SpeechSynthesisEventArgs Class for speech synthesis event arguments. Added in version 1.4.0.
class SpeechSynthesisRequest Class that defines the speech synthesis request. This class is in preview and is subject to change. Added in version 1.37.0.
class SpeechSynthesisResult Contains information about result from text-to-speech synthesis. Added in version 1.4.0.
class SpeechSynthesisVisemeEventArgs Class for speech synthesis viseme event arguments. Added in version 1.16.0.
class SpeechSynthesisWordBoundaryEventArgs Class for speech synthesis word boundary event arguments. Added in version 1.7.0.
class SpeechSynthesizer Class for speech synthesizer. Updated in version 1.14.0.
class SpeechTranslationModel Speech translation model information.
class SynthesisVoicesResult Contains information about result from voices list of speech synthesizers. Added in version 1.16.0.
class VoiceInfo Contains information about synthesis voice info Updated in version 1.17.0.

Members

enum PropertyId

Values Descriptions
SpeechServiceConnection_Key The Cognitive Services Speech Service subscription key. If you are using an intent recognizer, you need to specify the LUIS endpoint key for your particular LUIS app. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromSubscription.
SpeechServiceConnection_Endpoint The Cognitive Services Speech Service endpoint (url). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromEndpoint. NOTE: This endpoint is not the same as the endpoint used to obtain an access token.
SpeechServiceConnection_Region The Cognitive Services Speech Service region. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromSubscription, SpeechConfig::FromEndpoint, SpeechConfig::FromHost, SpeechConfig::FromAuthorizationToken.
SpeechServiceAuthorization_Token The Cognitive Services Speech Service authorization token (aka access token). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromAuthorizationToken, SpeechRecognizer::SetAuthorizationToken, IntentRecognizer::SetAuthorizationToken, TranslationRecognizer::SetAuthorizationToken.
SpeechServiceAuthorization_Type The Cognitive Services Speech Service authorization type. Currently unused.
SpeechServiceConnection_EndpointId The Cognitive Services Custom Speech or Custom Voice Service endpoint id. Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechConfig::SetEndpointId. NOTE: The endpoint id is available in the Custom Speech Portal, listed under Endpoint Details.
SpeechServiceConnection_Host The Cognitive Services Speech Service host (url). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromHost.
SpeechServiceConnection_ProxyHostName The host name of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0.
SpeechServiceConnection_ProxyPort The port of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0.
SpeechServiceConnection_ProxyUserName The user name of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0.
SpeechServiceConnection_ProxyPassword The password of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0.
SpeechServiceConnection_Url The URL string built from speech configuration. This property is intended to be read-only. The SDK is using it internally. NOTE: Added in version 1.5.0.
SpeechServiceConnection_ProxyHostBypass Specifies the list of hosts for which proxies should not be used. This setting overrides all other configurations. Hostnames are separated by commas and are matched in a case-insensitive manner. Wildcards are not supported.
SpeechServiceConnection_TranslationToLanguages The list of comma separated languages used as target translation languages. Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechTranslationConfig::AddTargetLanguage and SpeechTranslationConfig::GetTargetLanguages.
SpeechServiceConnection_TranslationVoice The name of the Cognitive Service Text to Speech Service voice. Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechTranslationConfig::SetVoiceName. NOTE: Valid voice names can be found here.
SpeechServiceConnection_TranslationFeatures Translation features. For internal use.
SpeechServiceConnection_IntentRegion The Language Understanding Service region. Under normal circumstances, you shouldn't have to use this property directly. Instead use LanguageUnderstandingModel.
SpeechServiceConnection_RecoMode The Cognitive Services Speech Service recognition mode. Can be "INTERACTIVE", "CONVERSATION", "DICTATION". This property is intended to be read-only. The SDK is using it internally.
SpeechServiceConnection_RecoLanguage The spoken language to be recognized (in BCP-47 format). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetSpeechRecognitionLanguage.
Speech_SessionId The session id. This id is a universally unique identifier (aka UUID) representing a specific binding of an audio input stream and the underlying speech recognition instance to which it is bound. Under normal circumstances, you shouldn't have to use this property directly. Instead use SessionEventArgs::SessionId.
SpeechServiceConnection_UserDefinedQueryParameters The query parameters provided by users. They will be passed to service as URL query parameters. Added in version 1.5.0.
SpeechServiceConnection_RecoBackend The string to specify the backend to be used for speech recognition; allowed options are online and offline. Under normal circumstances, you shouldn't use this property directly. Currently the offline option is only valid when EmbeddedSpeechConfig is used. Added in version 1.19.0.
SpeechServiceConnection_RecoModelName The name of the model to be used for speech recognition. Under normal circumstances, you shouldn't use this property directly. Currently this is only valid when EmbeddedSpeechConfig is used. Added in version 1.19.0.
SpeechServiceConnection_RecoModelKey This property is deprecated.
SpeechServiceConnection_RecoModelIniFile The path to the ini file of the model to be used for speech recognition. Under normal circumstances, you shouldn't use this property directly. Currently this is only valid when EmbeddedSpeechConfig is used. Added in version 1.19.0.
SpeechServiceConnection_SynthLanguage The spoken language to be synthesized (e.g. en-US) Added in version 1.4.0.
SpeechServiceConnection_SynthVoice The name of the TTS voice to be used for speech synthesis Added in version 1.4.0.
SpeechServiceConnection_SynthOutputFormat The string to specify TTS output audio format Added in version 1.4.0.
SpeechServiceConnection_SynthEnableCompressedAudioTransmission Indicates if use compressed audio format for speech synthesis audio transmission. This property only affects when SpeechServiceConnection_SynthOutputFormat is set to a pcm format. If this property is not set and GStreamer is available, SDK will use compressed format for synthesized audio transmission, and decode it. You can set this property to "false" to use raw pcm format for transmission on wire. Added in version 1.16.0.
SpeechServiceConnection_SynthBackend The string to specify TTS backend; valid options are online and offline. Under normal circumstances, you shouldn't have to use this property directly. Instead, use EmbeddedSpeechConfig::FromPath or EmbeddedSpeechConfig::FromPaths to set the synthesis backend to offline. Added in version 1.19.0.
SpeechServiceConnection_SynthOfflineDataPath The data file path(s) for offline synthesis engine; only valid when synthesis backend is offline. Under normal circumstances, you shouldn't have to use this property directly. Instead, use EmbeddedSpeechConfig::FromPath or EmbeddedSpeechConfig::FromPaths. Added in version 1.19.0.
SpeechServiceConnection_SynthOfflineVoice The name of the offline TTS voice to be used for speech synthesis Under normal circumstances, you shouldn't use this property directly. Instead, use EmbeddedSpeechConfig::SetSpeechSynthesisVoice and EmbeddedSpeechConfig::GetSpeechSynthesisVoiceName. Added in version 1.19.0.
SpeechServiceConnection_SynthModelKey This property is deprecated.
SpeechServiceConnection_VoicesListEndpoint The Cognitive Services Speech Service voices list api endpoint (url). Under normal circumstances, you don't need to specify this property, SDK will construct it based on the region/host/endpoint of SpeechConfig. Added in version 1.16.0.
SpeechServiceConnection_InitialSilenceTimeoutMs The initial silence timeout value (in milliseconds) used by the service. Added in version 1.5.0.
SpeechServiceConnection_EndSilenceTimeoutMs The end silence timeout value (in milliseconds) used by the service. Added in version 1.5.0.
SpeechServiceConnection_EnableAudioLogging A boolean value specifying whether audio logging is enabled in the service or not. Audio and content logs are stored either in Microsoft-owned storage, or in your own storage account linked to your Cognitive Services subscription (Bring Your Own Storage (BYOS) enabled Speech resource). Added in version 1.5.0.
SpeechServiceConnection_LanguageIdMode The speech service connection language identifier mode. Can be "AtStart" (the default), or "Continuous". See Language Identification document. Added in 1.25.0.
SpeechServiceConnection_TranslationCategoryId The speech service connection translation categoryId.
SpeechServiceConnection_AutoDetectSourceLanguages The auto detect source languages Added in version 1.8.0.
SpeechServiceConnection_AutoDetectSourceLanguageResult The auto detect source language result Added in version 1.8.0.
SpeechServiceResponse_RequestDetailedResultTrueFalse The requested Cognitive Services Speech Service response output format (simple or detailed). Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechConfig::SetOutputFormat.
SpeechServiceResponse_RequestProfanityFilterTrueFalse The requested Cognitive Services Speech Service response output profanity level. Currently unused.
SpeechServiceResponse_ProfanityOption The requested Cognitive Services Speech Service response output profanity setting. Allowed values are "masked", "removed", and "raw". Added in version 1.5.0.
SpeechServiceResponse_PostProcessingOption A string value specifying which post processing option should be used by service. Allowed values are "TrueText". Added in version 1.5.0.
SpeechServiceResponse_RequestWordLevelTimestamps A boolean value specifying whether to include word-level timestamps in the response result. Added in version 1.5.0.
SpeechServiceResponse_StablePartialResultThreshold The number of times a word has to be in partial results to be returned. Added in version 1.5.0.
SpeechServiceResponse_OutputFormatOption A string value specifying the output format option in the response result. Internal use only. Added in version 1.5.0.
SpeechServiceResponse_RequestSnr A boolean value specifying whether to include SNR (signal to noise ratio) in the response result. Added in version 1.18.0.
SpeechServiceResponse_TranslationRequestStablePartialResult A boolean value to request for stabilizing translation partial results by omitting words in the end. Added in version 1.5.0.
SpeechServiceResponse_RequestWordBoundary A boolean value specifying whether to request WordBoundary events. Added in version 1.21.0.
SpeechServiceResponse_RequestPunctuationBoundary A boolean value specifying whether to request punctuation boundary in WordBoundary Events. Default is true. Added in version 1.21.0.
SpeechServiceResponse_RequestSentenceBoundary A boolean value specifying whether to request sentence boundary in WordBoundary Events. Default is false. Added in version 1.21.0.
SpeechServiceResponse_SynthesisEventsSyncToAudio A boolean value specifying whether the SDK should synchronize synthesis metadata events, (e.g. word boundary, viseme, etc.) to the audio playback. This only takes effect when the audio is played through the SDK. Default is true. If set to false, the SDK will fire the events as they come from the service, which may be out of sync with the audio playback. Added in version 1.31.0.
SpeechServiceResponse_JsonResult The Cognitive Services Speech Service response output (in JSON format). This property is available on recognition result objects only.
SpeechServiceResponse_JsonErrorDetails The Cognitive Services Speech Service error details (in JSON format). Under normal circumstances, you shouldn't have to use this property directly. Instead, use CancellationDetails::ErrorDetails.
SpeechServiceResponse_RecognitionLatencyMs The recognition latency in milliseconds. Read-only, available on final speech/translation/intent results. This measures the latency between when an audio input is received by the SDK, and the moment the final result is received from the service. The SDK computes the time difference between the last audio fragment from the audio input that is contributing to the final result, and the time the final result is received from the speech service. Added in version 1.3.0.
SpeechServiceResponse_RecognitionBackend The recognition backend. Read-only, available on speech recognition results. This indicates whether cloud (online) or embedded (offline) recognition was used to produce the result.
SpeechServiceResponse_SynthesisFirstByteLatencyMs The speech synthesis first byte latency in milliseconds. Read-only, available on final speech synthesis results. This measures the latency between when the synthesis is started to be processed, and the moment the first byte audio is available. Added in version 1.17.0.
SpeechServiceResponse_SynthesisFinishLatencyMs The speech synthesis all bytes latency in milliseconds. Read-only, available on final speech synthesis results. This measures the latency between when the synthesis is started to be processed, and the moment the whole audio is synthesized. Added in version 1.17.0.
SpeechServiceResponse_SynthesisUnderrunTimeMs The underrun time for speech synthesis in milliseconds. Read-only, available on results in SynthesisCompleted events. This measures the total underrun time from PropertyId::AudioConfig_PlaybackBufferLengthInMs is filled to synthesis completed. Added in version 1.17.0.
SpeechServiceResponse_SynthesisConnectionLatencyMs The speech synthesis connection latency in milliseconds. Read-only, available on final speech synthesis results. This measures the latency between when the synthesis is started to be processed, and the moment the HTTP/WebSocket connection is established. Added in version 1.26.0.
SpeechServiceResponse_SynthesisNetworkLatencyMs The speech synthesis network latency in milliseconds. Read-only, available on final speech synthesis results. This measures the network round trip time. Added in version 1.26.0.
SpeechServiceResponse_SynthesisServiceLatencyMs The speech synthesis service latency in milliseconds. Read-only, available on final speech synthesis results. This measures the service processing time to synthesize the first byte of audio. Added in version 1.26.0.
SpeechServiceResponse_SynthesisBackend Indicates which backend the synthesis is finished by. Read-only, available on speech synthesis results, except for the result in SynthesisStarted event Added in version 1.17.0.
SpeechServiceResponse_DiarizeIntermediateResults Determines if intermediate results contain speaker identification.
CancellationDetails_Reason The cancellation reason. Currently unused.
CancellationDetails_ReasonText The cancellation text. Currently unused.
CancellationDetails_ReasonDetailedText The cancellation detailed text. Currently unused.
LanguageUnderstandingServiceResponse_JsonResult The Language Understanding Service response output (in JSON format). Available via IntentRecognitionResult.Properties.
AudioConfig_DeviceNameForCapture The device name for audio capture. Under normal circumstances, you shouldn't have to use this property directly. Instead, use AudioConfig::FromMicrophoneInput. NOTE: This property id was added in version 1.3.0.
AudioConfig_NumberOfChannelsForCapture The number of channels for audio capture. Internal use only. NOTE: This property id was added in version 1.3.0.
AudioConfig_SampleRateForCapture The sample rate (in Hz) for audio capture. Internal use only. NOTE: This property id was added in version 1.3.0.
AudioConfig_BitsPerSampleForCapture The number of bits of each sample for audio capture. Internal use only. NOTE: This property id was added in version 1.3.0.
AudioConfig_AudioSource The audio source. Allowed values are "Microphones", "File", and "Stream". Added in version 1.3.0.
AudioConfig_DeviceNameForRender The device name for audio render. Under normal circumstances, you shouldn't have to use this property directly. Instead, use AudioConfig::FromSpeakerOutput. Added in version 1.14.0.
AudioConfig_PlaybackBufferLengthInMs Playback buffer length in milliseconds, default is 50 milliseconds.
AudioConfig_AudioProcessingOptions Audio processing options in JSON format.
Speech_LogFilename The file name to write logs. Added in version 1.4.0.
Speech_SegmentationSilenceTimeoutMs A duration of detected silence, measured in milliseconds, after which speech-to-text will determine a spoken phrase has ended and generate a final Recognized result. Configuring this timeout may be helpful in situations where spoken input is significantly faster or slower than usual and default segmentation behavior consistently yields results that are too long or too short. Segmentation timeout values that are inappropriately high or low can negatively affect speech-to-text accuracy; this property should be carefully configured and the resulting behavior should be thoroughly validated as intended.
Speech_SegmentationMaximumTimeMs The maximum length of a spoken phrase when using the "Time" segmentation strategy. As the length of a spoken phrase approaches this value, the Speech_SegmentationSilenceTimeoutMs will begin being reduced until either the phrase silence timeout is hit or the phrase reaches the maximum length.
Speech_SegmentationStrategy The strategy used to determine when a spoken phrase has ended and a final Recognized result should be generated. Allowed values are "Default", "Time", and "Semantic".
Conversation_ApplicationId Identifier used to connect to the backend service. Added in version 1.5.0.
Conversation_DialogType Type of dialog backend to connect to. Added in version 1.7.0.
Conversation_Initial_Silence_Timeout Silence timeout for listening Added in version 1.5.0.
Conversation_From_Id From id to be used on speech recognition activities Added in version 1.5.0.
Conversation_Conversation_Id ConversationId for the session. Added in version 1.8.0.
Conversation_Custom_Voice_Deployment_Ids Comma separated list of custom voice deployment ids. Added in version 1.8.0.
Conversation_Speech_Activity_Template Speech activity template, stamp properties in the template on the activity generated by the service for speech. Added in version 1.10.0.
Conversation_ParticipantId Your participant identifier in the current conversation. Added in version 1.13.0.
Conversation_Request_Bot_Status_Messages
Conversation_Connection_Id
DataBuffer_TimeStamp The time stamp associated to data buffer written by client when using Pull/Push audio input streams. The time stamp is a 64-bit value with a resolution of 90 kHz. It is the same as the presentation timestamp in an MPEG transport stream. See https://en.wikipedia.org/wiki/Presentation_timestamp Added in version 1.5.0.
DataBuffer_UserId The user id associated to data buffer written by client when using Pull/Push audio input streams. Added in version 1.5.0.
PronunciationAssessment_ReferenceText The reference text of the audio for pronunciation evaluation. For this and the following pronunciation assessment parameters, see the table Pronunciation assessment parameters. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create or PronunciationAssessmentConfig::SetReferenceText. Added in version 1.14.0.
PronunciationAssessment_GradingSystem The point system for pronunciation score calibration (FivePoint or HundredMark). Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create. Added in version 1.14.0.
PronunciationAssessment_Granularity The pronunciation evaluation granularity (Phoneme, Word, or FullText). Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create. Added in version 1.14.0.
PronunciationAssessment_EnableMiscue Defines if enable miscue calculation. With this enabled, the pronounced words will be compared to the reference text, and will be marked with omission/insertion based on the comparison. The default setting is False. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create. Added in version 1.14.0.
PronunciationAssessment_PhonemeAlphabet The pronunciation evaluation phoneme alphabet. The valid values are "SAPI" (default) and "IPA" Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::SetPhonemeAlphabet. Added in version 1.20.0.
PronunciationAssessment_NBestPhonemeCount The pronunciation evaluation nbest phoneme count. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::SetNBestPhonemeCount. Added in version 1.20.0.
PronunciationAssessment_EnableProsodyAssessment Whether to enable prosody assessment. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::EnableProsodyAssessment. Added in version 1.33.0.
PronunciationAssessment_Json The json string of pronunciation assessment parameters Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create. Added in version 1.14.0.
PronunciationAssessment_Params Pronunciation assessment parameters. This property is intended to be read-only. The SDK is using it internally. Added in version 1.14.0.
PronunciationAssessment_ContentTopic The content topic of the pronunciation assessment. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::EnableContentAssessmentWithTopic. Added in version 1.33.0.
SpeakerRecognition_Api_Version Speaker Recognition backend API version. This property is added to allow testing and use of previous versions of Speaker Recognition APIs, where applicable. Added in version 1.18.0.
SpeechTranslation_ModelName The name of a model to be used for speech translation. Do not use this property directly. Currently this is only valid when EmbeddedSpeechConfig is used.
SpeechTranslation_ModelKey This property is deprecated.
KeywordRecognition_ModelName The name of a model to be used for keyword recognition. Do not use this property directly. Currently this is only valid when EmbeddedSpeechConfig is used.
KeywordRecognition_ModelKey This property is deprecated.
EmbeddedSpeech_EnablePerformanceMetrics Enable the collection of embedded speech performance metrics which can be used to evaluate the capability of a device to use embedded speech. The collected data is included in results from specific scenarios like speech recognition. The default setting is "false". Note that metrics may not be available from all embedded speech scenarios.
SpeechSynthesisRequest_Pitch The pitch of the synthesized speech.
SpeechSynthesisRequest_Rate The rate of the synthesized speech.
SpeechSynthesisRequest_Volume The volume of the synthesized speech.

Defines speech property ids. Changed in version 1.4.0.

enum OutputFormat

Values Descriptions
Simple
Detailed

Output format.

enum ProfanityOption

Values Descriptions
Masked Replaces letters in profane words with star characters.
Removed Removes profane words.
Raw Does nothing to profane words.

Removes profanity (swearing), or replaces letters of profane words with stars. Added in version 1.5.0.

enum ResultReason

Values Descriptions
NoMatch Indicates speech could not be recognized. More details can be found in the NoMatchDetails object.
Canceled Indicates that the recognition was canceled. More details can be found using the CancellationDetails object.
RecognizingSpeech Indicates the speech result contains hypothesis text.
RecognizedSpeech Indicates the speech result contains final text that has been recognized. Speech Recognition is now complete for this phrase.
RecognizingIntent Indicates the intent result contains hypothesis text and intent.
RecognizedIntent Indicates the intent result contains final text and intent. Speech Recognition and Intent determination are now complete for this phrase.
TranslatingSpeech Indicates the translation result contains hypothesis text and its translation(s).
TranslatedSpeech Indicates the translation result contains final text and corresponding translation(s). Speech Recognition and Translation are now complete for this phrase.
SynthesizingAudio Indicates the synthesized audio result contains a non-zero amount of audio data.
SynthesizingAudioCompleted Indicates the synthesized audio is now complete for this phrase.
RecognizingKeyword Indicates the speech result contains (unverified) keyword text. Added in version 1.3.0.
RecognizedKeyword Indicates that keyword recognition completed recognizing the given keyword. Added in version 1.3.0.
SynthesizingAudioStarted Indicates the speech synthesis is now started Added in version 1.4.0.
TranslatingParticipantSpeech Indicates the transcription result contains hypothesis text and its translation(s) for other participants in the conversation. Added in version 1.8.0.
TranslatedParticipantSpeech Indicates the transcription result contains final text and corresponding translation(s) for other participants in the conversation. Speech Recognition and Translation are now complete for this phrase. Added in version 1.8.0.
TranslatedInstantMessage Indicates the transcription result contains the instant message and corresponding translation(s). Added in version 1.8.0.
TranslatedParticipantInstantMessage Indicates the transcription result contains the instant message for other participants in the conversation and corresponding translation(s). Added in version 1.8.0.
EnrollingVoiceProfile Indicates the voice profile is being enrolling and customers need to send more audio to create a voice profile. Added in version 1.12.0.
EnrolledVoiceProfile The voice profile has been enrolled. Added in version 1.12.0.
RecognizedSpeakers Indicates successful identification of some speakers. Added in version 1.12.0.
RecognizedSpeaker Indicates successfully verified one speaker. Added in version 1.12.0.
ResetVoiceProfile Indicates a voice profile has been reset successfully. Added in version 1.12.0.
DeletedVoiceProfile Indicates a voice profile has been deleted successfully. Added in version 1.12.0.
VoicesListRetrieved Indicates the voices list has been retrieved successfully. Added in version 1.16.0.

Specifies the possible reasons a recognition result might be generated.

enum CancellationReason

Values Descriptions
Error Indicates that an error occurred during speech recognition.
EndOfStream Indicates that the end of the audio stream was reached.
CancelledByUser Indicates that request was cancelled by the user. Added in version 1.14.0.

Defines the possible reasons a recognition result might be canceled.

enum CancellationErrorCode

Values Descriptions
NoError No error. If CancellationReason is EndOfStream, CancellationErrorCode is set to NoError.
AuthenticationFailure Indicates an authentication error. An authentication error occurs if subscription key or authorization token is invalid, expired, or does not match the region being used.
BadRequest Indicates that one or more recognition parameters are invalid or the audio format is not supported.
TooManyRequests Indicates that the number of parallel requests exceeded the number of allowed concurrent transcriptions for the subscription.
Forbidden Indicates that the free subscription used by the request ran out of quota.
ConnectionFailure Indicates a connection error.
ServiceTimeout Indicates a time-out error when waiting for response from service.
ServiceError Indicates that an error is returned by the service.
ServiceUnavailable Indicates that the service is currently unavailable.
RuntimeError Indicates an unexpected runtime error.
ServiceRedirectTemporary Indicates the Speech Service is temporarily requesting a reconnect to a different endpoint.
ServiceRedirectPermanent Indicates the Speech Service is permanently requesting a reconnect to a different endpoint.
EmbeddedModelError Indicates the embedded speech (SR or TTS) model is not available or corrupted.

Defines error code in case that CancellationReason is Error. Added in version 1.1.0.

enum NoMatchReason

Values Descriptions
NotRecognized Indicates that speech was detected, but not recognized.
InitialSilenceTimeout Indicates that the start of the audio stream contained only silence, and the service timed out waiting for speech.
InitialBabbleTimeout Indicates that the start of the audio stream contained only noise, and the service timed out waiting for speech.
KeywordNotRecognized Indicates that the spotted keyword has been rejected by the keyword verification service. Added in version 1.5.0.
EndSilenceTimeout Indicates that the audio stream contained only silence after the last recognized phrase.

Defines the possible reasons a recognition result might not be recognized.

enum ActivityJSONType

Values Descriptions
Null
Object
Array
String
Double
UInt
Int
Boolean

Defines the possible types for an activity json value. Added in version 1.5.0.

enum SpeechSynthesisOutputFormat

Values Descriptions
Raw8Khz8BitMonoMULaw raw-8khz-8bit-mono-mulaw
Riff16Khz16KbpsMonoSiren riff-16khz-16kbps-mono-siren Unsupported by the service. Do not use this value.
Audio16Khz16KbpsMonoSiren audio-16khz-16kbps-mono-siren Unsupported by the service. Do not use this value.
Audio16Khz32KBitRateMonoMp3 audio-16khz-32kbitrate-mono-mp3
Audio16Khz128KBitRateMonoMp3 audio-16khz-128kbitrate-mono-mp3
Audio16Khz64KBitRateMonoMp3 audio-16khz-64kbitrate-mono-mp3
Audio24Khz48KBitRateMonoMp3 audio-24khz-48kbitrate-mono-mp3
Audio24Khz96KBitRateMonoMp3 audio-24khz-96kbitrate-mono-mp3
Audio24Khz160KBitRateMonoMp3 audio-24khz-160kbitrate-mono-mp3
Raw16Khz16BitMonoTrueSilk raw-16khz-16bit-mono-truesilk
Riff16Khz16BitMonoPcm riff-16khz-16bit-mono-pcm
Riff8Khz16BitMonoPcm riff-8khz-16bit-mono-pcm
Riff24Khz16BitMonoPcm riff-24khz-16bit-mono-pcm
Riff8Khz8BitMonoMULaw riff-8khz-8bit-mono-mulaw
Raw16Khz16BitMonoPcm raw-16khz-16bit-mono-pcm
Raw24Khz16BitMonoPcm raw-24khz-16bit-mono-pcm
Raw8Khz16BitMonoPcm raw-8khz-16bit-mono-pcm
Ogg16Khz16BitMonoOpus ogg-16khz-16bit-mono-opus
Ogg24Khz16BitMonoOpus ogg-24khz-16bit-mono-opus
Raw48Khz16BitMonoPcm raw-48khz-16bit-mono-pcm
Riff48Khz16BitMonoPcm riff-48khz-16bit-mono-pcm
Audio48Khz96KBitRateMonoMp3 audio-48khz-96kbitrate-mono-mp3
Audio48Khz192KBitRateMonoMp3 audio-48khz-192kbitrate-mono-mp3
Ogg48Khz16BitMonoOpus ogg-48khz-16bit-mono-opus Added in version 1.16.0
Webm16Khz16BitMonoOpus webm-16khz-16bit-mono-opus Added in version 1.16.0
Webm24Khz16BitMonoOpus webm-24khz-16bit-mono-opus Added in version 1.16.0
Raw24Khz16BitMonoTrueSilk raw-24khz-16bit-mono-truesilk Added in version 1.17.0
Raw8Khz8BitMonoALaw raw-8khz-8bit-mono-alaw Added in version 1.17.0
Riff8Khz8BitMonoALaw riff-8khz-8bit-mono-alaw Added in version 1.17.0
Webm24Khz16Bit24KbpsMonoOpus webm-24khz-16bit-24kbps-mono-opus Audio compressed by OPUS codec in a WebM container, with bitrate of 24kbps, optimized for IoT scenario. (Added in 1.19.0)
Audio16Khz16Bit32KbpsMonoOpus audio-16khz-16bit-32kbps-mono-opus Audio compressed by OPUS codec without container, with bitrate of 32kbps. (Added in 1.20.0)
Audio24Khz16Bit48KbpsMonoOpus audio-24khz-16bit-48kbps-mono-opus Audio compressed by OPUS codec without container, with bitrate of 48kbps. (Added in 1.20.0)
Audio24Khz16Bit24KbpsMonoOpus audio-24khz-16bit-24kbps-mono-opus Audio compressed by OPUS codec without container, with bitrate of 24kbps. (Added in 1.20.0)
Raw22050Hz16BitMonoPcm raw-22050hz-16bit-mono-pcm Raw PCM audio at 22050Hz sampling rate and 16-bit depth. (Added in 1.22.0)
Riff22050Hz16BitMonoPcm riff-22050hz-16bit-mono-pcm PCM audio at 22050Hz sampling rate and 16-bit depth, with RIFF header. (Added in 1.22.0)
Raw44100Hz16BitMonoPcm raw-44100hz-16bit-mono-pcm Raw PCM audio at 44100Hz sampling rate and 16-bit depth. (Added in 1.22.0)
Riff44100Hz16BitMonoPcm riff-44100hz-16bit-mono-pcm PCM audio at 44100Hz sampling rate and 16-bit depth, with RIFF header. (Added in 1.22.0)
AmrWb16000Hz amr-wb-16000hz AMR-WB audio at 16kHz sampling rate. (Added in 1.24.0)
G72216Khz64Kbps g722-16khz-64kbps G.722 audio at 16kHz sampling rate and 64kbps bitrate. (Added in 1.38.0)

Defines the possible speech synthesis output audio formats. Updated in version 1.19.0.

enum StreamStatus

Values Descriptions
Unknown The audio data stream status is unknown.
NoData The audio data stream contains no data.
PartialData The audio data stream contains partial data of a speak request.
AllData The audio data stream contains all data of a speak request.
Canceled The audio data stream was canceled.

Defines the possible status of audio data stream. Added in version 1.4.0.

enum ServicePropertyChannel

Values Descriptions
UriQueryParameter Uses URI query parameter to pass property settings to service.
HttpHeader Uses HttpHeader to set a key/value in a HTTP header.

Defines channels used to pass property settings to service. Added in version 1.5.0.

enum VoiceProfileType

Values Descriptions
TextIndependentIdentification Text independent speaker identification.
TextDependentVerification Text dependent speaker verification.
TextIndependentVerification Text independent verification.

Defines voice profile types.

enum RecognitionFactorScope

Values Descriptions
PartialPhrase A Recognition Factor will apply to grammars that can be referenced as individual partial phrases.

Defines the scope that a Recognition Factor is applied to.

enum PronunciationAssessmentGradingSystem

Values Descriptions
FivePoint Five point calibration.
HundredMark Hundred mark.

Defines the point system for pronunciation score calibration; default value is FivePoint. Added in version 1.14.0.

enum PronunciationAssessmentGranularity

Values Descriptions
Phoneme Shows the score on the full text, word and phoneme level.
Word Shows the score on the full text and word level.
FullText Shows the score on the full text level only.

Defines the pronunciation evaluation granularity; default value is Phoneme. Added in version 1.14.0.

enum SynthesisVoiceType

Values Descriptions
OnlineNeural Online neural voice.
OnlineStandard Online standard voice.
OfflineNeural Offline neural voice.
OfflineStandard Offline standard voice.

Defines the type of synthesis voices Added in version 1.16.0.

enum SynthesisVoiceGender

Values Descriptions
Unknown Gender unknown.
Female Female voice.
Male Male voice.

Defines the gender of synthesis voices Added in version 1.17.0.

enum SpeechSynthesisBoundaryType

Values Descriptions
Word Word boundary.
Punctuation Punctuation boundary.
Sentence Sentence boundary.

Defines the boundary type of speech synthesis boundary event Added in version 1.21.0.

enum SegmentationStrategy

Values Descriptions
Default Use the default strategy and settings as determined by the Speech Service. Use in most situations.
Time Uses a time based strategy where the amount of silence between speech is used to determine when to generate a final result.
Semantic Uses an AI model to deterine the end of a spoken phrase based on the content of the phrase.

The strategy used to determine when a spoken phrase has ended and a final Recognized result should be generated. Allowed values are "Default", "Time", and "Semantic".