Question 1

What file formats are supported?

Accepted Answer

Any video or audio file — MP4, MOV, AVI, MP3, WAV, M4A, FLAC, and more. The pipeline automatically extracts the audio track from video files.

Question 2

How accurate is the transcription?

Accepted Answer

Word-error rates under 5% for clear audio in major languages. Accuracy depends on audio quality, background noise, and speaker clarity.

Question 3

What is speaker detection?

Accepted Answer

When enabled, the transcript labels each segment with the speaker who said it (Speaker A, Speaker B, etc.). Use the Number of Speakers hint to improve accuracy when you know how many people are in the recording.

Question 4

Which languages are supported?

Accepted Answer

99 languages with automatic detection. Set a specific language for best accuracy when you know the source language.

Question 5

How long does it take?

Accepted Answer

Typically 10-30% of the audio duration. A 10-minute recording takes 1-3 minutes to transcribe.

Transcription

Recipes using this pipeline

Long video → multiple captioned clips, in one command

AI Video Transcription

What you can do with it

How it works

Output formats

Frequently Asked Questions

Explore more pipelines