What input formats does Batch Speech to Text support?

日立s　018 20

Are there any documentation or guidelines regarding the input formats supported by the Batch Speech to Text service? I have two mp4 files with different properties; one can be transcribed (bitrate 62kbps, mono, 16000kHz) , while the other cannot (168kbps, stereo, 48000kHz).
User's image

navba-MSFT 27,065 Reputation points Microsoft Employee

2024-12-19T04:57:46.9366667+00:00
@日立s　018 Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

.

The batch transcription API (and fast transcription API) supports multiple formats and codecs, such as:

WAV

MP3

OPUS/OGG

FLAC

WMA

AAC

ALAW in WAV container

MULAW in WAV container

AMR

WebM

M4A

SPEEX

.

The same is documented here.

Hope this answers.

**

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.
日立s　018 20 Reputation points

2024-12-19T07:14:46.5466667+00:00

@navba-MSFT Thanks for your answer. I tried with two files of the same format and codecs, and one was successful while the other failed. Can you explain why?
日立s　018 20 Reputation points

2024-12-19T07:17:51.39+00:00

These are the two files that I tried.
日立s　018 20 Reputation points

2024-12-19T07:18:52.4666667+00:00

These are the two files that I trie
navba-MSFT 27,065 Reputation points Microsoft Employee

2024-12-19T07:34:10.0566667+00:00

@日立s　018 Thanks for getting back. Could you please confirm if that is mp4 ? If yes that is not supported. Instead could you please try with mp3 version of this and confirm ? Awaiting your reply.
日立s　018 20 Reputation points

2024-12-19T07:44:52.6833333+00:00

@navba-MSFT Yes, they are MP4 files. I tried with an MP3 file, and it was successful. Why is it that when I try two MP4 files, one succeeds and the other fails, as I described in the topic?

Accepted answer

navba-MSFT 27,065 Reputation points Microsoft Employee

2024-12-19T09:31:10.32+00:00

@日立s　018 Thanks for getting back. While the mp4 container is generally supported, the service can only process streamable files. Not all mp4 files are streamable; these are probably instances of such files.

It can either be converted to a different format or amended keeping the same format by using the

ffmpeg command ffmpeg -i inputvideo.mp4 -movflags faststart -acodec copy -vcodec copy outputvideo.mp4.

Hope this answers.

**

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.
Please sign in to rate this answer.

1 person found this answer helpful.
日立s　018 20 Reputation points

2024-12-19T09:41:59.7966667+00:00

@navba-MSFT Thank you for your support.

Tong Viet Anh 0 Reputation points

2024-12-26T10:33:15.1433333+00:00

@navba-MSFT Hi, is there any documentation describing the parameters (bit rate, channel, sample rate, etc.) supported by the conversion service? If not, could you provide me with related information? Thank you.
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

What input formats does Batch Speech to Text support?

0 additional answers

Your answer