MP4 video format · Auto audio extraction

Transcribe MP4 Video to Text — Free + Accurate

Upload your MP4. We strip the audio track, run it through the same Whisper/ElevenLabs engine that powers our sermon tool, and hand you back a transcript plus matched .srt captions. About 5 minutes for a 45-minute sermon video.

Processing

5 min / 45-min MP4

Caption sync

.srt + .vtt included

Privacy

Video deleted after 30d

What is an MP4 file?

MP4 (MPEG-4 Part 14) is a container format — it holds a compressed video stream (typically H.264 or H.265), one or more audio streams (typically AAC), and metadata in a single .mp4 file. It's the standard output of nearly every phone camera, livestream encoder, and video editor on the market.

For transcription, only the audio track matters. We use ffmpeg to extract it at upload time, then discard the video and run the audio through speech-to-text. You're billed per audio minute — a 45-minute sermon video costs the same $0.27 (Standard) or $0.90 (Premium) as a 45-minute MP3.

Step-by-step: MP4 video to text

  1. 1

    Open the transcription tool

    Hit /transcribe in the top nav. The same upload zone handles both audio and video. No special toggle needed.

  2. 2

    Drag your .mp4 file into the upload zone

    Drop straight from your video editor's export folder, Dropbox, or local disk. Free tier supports up to 25MB. For full-length sermon video (~200–500MB), upgrade to Pro or extract audio first.

  3. 3

    We extract the audio track automatically

    On upload, our backend runs the equivalent of: ffmpeg -i sermon.mp4 -vn -ac 1 -ar 16000 -c:a libmp3lame audio.mp3. The video stream is dropped. You don't pay for video minutes — only audio minutes.

  4. 4

    Pick Standard or Premium tier

    Standard ($0.006/min) is fine for a single-camera sermon with a head-worn mic. Premium ($0.02/min) is better for panel discussions, multi-camera interviews, or anytime you want speaker labels in the output.

  5. 5

    Wait about a tenth of the runtime

    A 10-minute clip transcribes in roughly 1 minute. A 45-minute service in about 5. You can close the tab — results are saved to your dashboard and emailed when ready.

  6. 6

    Download .srt or .vtt for captions, .txt for blog

    Drop the .srt into YouTube Studio (Subtitles → Add → Upload file) for instant captions. Drop the .vtt into your church website's <video> tag. Use the .txt for a sermon-page blog post.

Video format & size compatibility

FormatFree tier maxPro maxNotes
MP4 (H.264 + AAC)25 MB500 MBMost common — works natively
MOV (QuickTime)25 MB500 MBiPhone/Mac native — works
MKV (Matroska)25 MB500 MBSometimes needs HandBrake convert
WEBM25 MB500 MBYouTube downloads native
AVI25 MB500 MBOld format — works but convert if errors
MP3 / WAV / M4A (audio only)25 MB500 MBSkip the video step entirely

MP4-specific tips

  • Full-length sermon video over 25MB? Extract the audio first. In ffmpeg (free, command-line):ffmpeg -i sermon.mp4 -vn -ac 1 -b:a 64k sermon.mp3That gives you a ~20MB MP3 of a 45-minute sermon — well under the free tier limit.
  • No command line? Use HandBrake (free, GUI). Open the MP4, choose "Audio Only" preset, export as MP3, upload that. Or use VLC: Media → Convert/Save → Choose MP3 profile.
  • Recording sermon video for transcription? Use a wireless lav mic on the speaker, going into the camera's 3.5mm input or a separate field recorder. Room mics pick up too much ambient noise. The video can be any resolution; the audio is what matters.
  • Want YouTube chapters? The .srt file includes timestamps. Use the timestamps to find natural section breaks in the sermon and paste them into your YouTube description (e.g., "3:42 The Big Idea — Trusting in suffering").
  • Multi-camera service with overlapping audio? Use only the camera that has the main mic feed. Don't upload a multi-cam edit before locking the audio bed — the alternate angle audio will confuse diarization.

The MP4 transcription workflow

.mp4video fileffmpegaudioextractedAI engine5 min /45-min video.srt.vtt.txt

MP4 transcription pricing vs alternatives

ServiceCost / 45-min videoAccuracyCaption filesFree tier
Sermon Transcription (Std)$0.2799.0–99.5%.srt .vtt .txt .docx10 min free
Sermon Transcription (Premium)$0.9099.5%+ with diarization.srt .vtt .txt .docx10 min free
Rev AI$11.2590–95%.srt .vtt + JSON5 hours free
Rev human$67.5099%+.srt + WordNone
Otter Pro~$0.64 effective90–95%.srt .vtt300 min / mo
HappyScribe AI~$9.0085–92%.srt .vttNone

Pricing as of early 2026. Rev AI $0.25/min; Rev human $1.50/min. Otter Pro $16.99/mo ÷ 1,200 included min × 45 min. HappyScribe AI ~$0.20/min.

MP4 transcription FAQ

Can I upload MP4 video directly, or do I need to extract the audio first?+

Upload the MP4 directly. We extract the audio track on our end (ffmpeg under the hood), drop the video, and run only the audio through the transcription engine. You don't pay for video processing — only for the audio length.

What is the maximum MP4 file size I can upload?+

25MB on the free tier — that fits a ~5-minute sermon clip at typical 720p compression. For full-length sermon video (45 minutes is usually 200–500MB MP4), upgrade to Pro (500MB per file), or extract the audio to MP3 first with a free tool like Audacity or ffmpeg.

Does video quality affect transcription accuracy?+

No. We discard the video entirely. What matters is the audio track inside the MP4. A 480p video with crisp lavalier audio transcribes more accurately than a 4K video with distant room mic audio. Mic placement is everything.

Can I get an SRT caption file from an MP4?+

Yes — that's the most common request. Every MP4 transcription includes an .srt and .vtt file with timestamps synced to your original video. Upload the .srt to YouTube, Vimeo, or any video editor for auto-aligned captions.

Does it work for MOV, AVI, MKV, and WebM too?+

Yes. The same workflow handles MOV (QuickTime), AVI, MKV, and WebM. ffmpeg extracts the audio from any common container. If a file fails to upload, convert it to MP4 in HandBrake (free) and try again.

How fast is MP4 to text transcription?+

About one-tenth the audio length. A 10-minute MP4 transcribes in around 1 minute; a 45-minute sermon video in about 5 minutes. Upload time depends on file size — a 500MB MP4 takes 2–5 minutes to upload on home broadband, then processes in the background.

Drop your MP4. Get captions back in 5 minutes.

First 10 minutes free. .srt and .vtt files generated automatically. No editor required.

Upload MP4

Related