Share via


Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Microsoft Speech Platform

ISpStreamFormatConverter

ISpStreamFormatConverter is the primary interface implemented by the audio data format converter in the Speech Platform. The Speech Platform uses the format converter to compensate for differences between various stream formats supported by the speech recognition and text-to-speech (TTS) engines, and the I/O formats requested by the application. Typically applications and engines do not use this object directly. The format converter is a wrapper object that encapsulates the specified base stream. It performs conversion on the fly during read/write operations. The Audio Compression Manager (ACM) layer performs the conversion.

Several methods are included in addition to the ISpStreamFormat interface to allow data conversion.

Implemented By

  • SpStreamFormatConverter

Methods in Vtable Order

ISpStreamFormatConverter Methods Description
ISpStreamFormat interface Inherits from ISpStreamFormat and all those methods are accessible from an ISpStreamFormatConverter object.
SetBaseStream Sets audio stream to be wrapped by the format converter.
GetBaseStream Gets the base audio stream that is being wrapped.
SetFormat Sets the conversion (output) format.
ResetSeekPosition Resets the format converter's stream seek position to the start of the stream.
ScaleConvertedToBaseOffset Maps an offset in the converted stream into an offset in the base stream.
ScaleBaseToConvertedOffset Maps an offset in the base stream into an offset in the converted stream.

Development Helpers

Enumeration, Function, or Class Description
SPSTREAMFORMAT The stream formats supported by the Speech Platform.
CSpStreamFormat Class for managing stream formats and WAVEFORMATEX structures supported by the Speech Platform.

Remarks

The Speech Platform utilizes the host system's installed audio codecs to perform the conversion. The Speech Platform currently supports 1-stage and 2-stage stream conversions, but does not support conversions that require 3 or more stages.

An example of a 1-stage stream format conversion is the conversion of a Pulse Code Modulation (PCM) format to another PCM format (for example, 8kHz 16-bit Stereo PCM [SPSF_8kHz16BitStereo] to 44kHz 8-bit Mono [SPSF_44kHz8BitMono]). This requires only one codec (that is, "Microsoft PCM Converter").

An example of a 2-stage stream conversion is the conversion of a compressed format to a PCM format (for example, TrueSpeech 8kHz 1-Bit Mono [SPSF_TrueSpeech_8kHz1BitMono] to 8kHz 8-bit Mono PCM [SPSF_8kHz8BitMono] to 44kHz 16-bit Stereo [SPSF_44kHz16BitStereo]). This requires two codecs (that is, "DSP Group TrueSpeech(TM) Audio" and "Microsoft PCM Converter"). Note that one of the formats must be a PCM format.

An example of an unsupported 3-stage stream conversion is the conversion of a compressed format to another compressed format (for example, TrueSpeech 8kHz 1-Bit Mono [SPSF_TrueSpeech_8kHz1BitMono] to 8kHz 8-bit Mono PCM [SPSF_8kHz8BitMono] to 8kHz 8-bit Stereo PCM [SPSF_8kHz8BitStereo] to ALaw 8kHz Stereo [SPSF_CCITT_ALaw_8kHzStereo]). This would require three codecs (that is, "DSP Group TrueSpeech(TM) Audio", "Microsoft PCM Converter", and "Microsoft CCITT G.771 Audio"). Note that the Speech Platform is capable of converting between two compressed non-PCM formats if a single codec can do the entire conversion.