Transcribe MP4 Video to Text — Free + Accurate
Upload your MP4. We strip the audio track, run it through the same Whisper/ElevenLabs engine that powers our sermon tool, and hand you back a transcript plus matched .srt captions. About 5 minutes for a 45-minute sermon video.
Processing
5 min / 45-min MP4
Caption sync
.srt + .vtt included
Privacy
Video deleted after 30d
What is an MP4 file?
MP4 (MPEG-4 Part 14) is a container format — it holds a compressed video stream (typically H.264 or H.265), one or more audio streams (typically AAC), and metadata in a single .mp4 file. It's the standard output of nearly every phone camera, livestream encoder, and video editor on the market.
For transcription, only the audio track matters. We use ffmpeg to extract it at upload time, then discard the video and run the audio through speech-to-text. You're billed per audio minute — a 45-minute sermon video costs the same $0.27 (Standard) or $0.90 (Premium) as a 45-minute MP3.
Step-by-step: MP4 video to text
- 1
Open the transcription tool
Hit /transcribe in the top nav. The same upload zone handles both audio and video. No special toggle needed.
- 2
Drag your .mp4 file into the upload zone
Drop straight from your video editor's export folder, Dropbox, or local disk. Free tier supports up to 25MB. For full-length sermon video (~200–500MB), upgrade to Pro or extract audio first.
- 3
We extract the audio track automatically
On upload, our backend runs the equivalent of: ffmpeg -i sermon.mp4 -vn -ac 1 -ar 16000 -c:a libmp3lame audio.mp3. The video stream is dropped. You don't pay for video minutes — only audio minutes.
- 4
Pick Standard or Premium tier
Standard ($0.006/min) is fine for a single-camera sermon with a head-worn mic. Premium ($0.02/min) is better for panel discussions, multi-camera interviews, or anytime you want speaker labels in the output.
- 5
Wait about a tenth of the runtime
A 10-minute clip transcribes in roughly 1 minute. A 45-minute service in about 5. You can close the tab — results are saved to your dashboard and emailed when ready.
- 6
Download .srt or .vtt for captions, .txt for blog
Drop the .srt into YouTube Studio (Subtitles → Add → Upload file) for instant captions. Drop the .vtt into your church website's <video> tag. Use the .txt for a sermon-page blog post.
Video format & size compatibility
| Format | Free tier max | Pro max | Notes |
|---|---|---|---|
| MP4 (H.264 + AAC) | 25 MB | 500 MB | Most common — works natively |
| MOV (QuickTime) | 25 MB | 500 MB | iPhone/Mac native — works |
| MKV (Matroska) | 25 MB | 500 MB | Sometimes needs HandBrake convert |
| WEBM | 25 MB | 500 MB | YouTube downloads native |
| AVI | 25 MB | 500 MB | Old format — works but convert if errors |
| MP3 / WAV / M4A (audio only) | 25 MB | 500 MB | Skip the video step entirely |
MP4-specific tips
- Full-length sermon video over 25MB? Extract the audio first. In ffmpeg (free, command-line):
ffmpeg -i sermon.mp4 -vn -ac 1 -b:a 64k sermon.mp3That gives you a ~20MB MP3 of a 45-minute sermon — well under the free tier limit. - No command line? Use HandBrake (free, GUI). Open the MP4, choose "Audio Only" preset, export as MP3, upload that. Or use VLC: Media → Convert/Save → Choose MP3 profile.
- Recording sermon video for transcription? Use a wireless lav mic on the speaker, going into the camera's 3.5mm input or a separate field recorder. Room mics pick up too much ambient noise. The video can be any resolution; the audio is what matters.
- Want YouTube chapters? The .srt file includes timestamps. Use the timestamps to find natural section breaks in the sermon and paste them into your YouTube description (e.g., "3:42 The Big Idea — Trusting in suffering").
- Multi-camera service with overlapping audio? Use only the camera that has the main mic feed. Don't upload a multi-cam edit before locking the audio bed — the alternate angle audio will confuse diarization.
The MP4 transcription workflow
MP4 transcription pricing vs alternatives
| Service | Cost / 45-min video | Accuracy | Caption files | Free tier |
|---|---|---|---|---|
| Sermon Transcription (Std) | $0.27 | 99.0–99.5% | .srt .vtt .txt .docx | 10 min free |
| Sermon Transcription (Premium) | $0.90 | 99.5%+ with diarization | .srt .vtt .txt .docx | 10 min free |
| Rev AI | $11.25 | 90–95% | .srt .vtt + JSON | 5 hours free |
| Rev human | $67.50 | 99%+ | .srt + Word | None |
| Otter Pro | ~$0.64 effective | 90–95% | .srt .vtt | 300 min / mo |
| HappyScribe AI | ~$9.00 | 85–92% | .srt .vtt | None |
Pricing as of early 2026. Rev AI $0.25/min; Rev human $1.50/min. Otter Pro $16.99/mo ÷ 1,200 included min × 45 min. HappyScribe AI ~$0.20/min.
MP4 transcription FAQ
Can I upload MP4 video directly, or do I need to extract the audio first?+
Upload the MP4 directly. We extract the audio track on our end (ffmpeg under the hood), drop the video, and run only the audio through the transcription engine. You don't pay for video processing — only for the audio length.
What is the maximum MP4 file size I can upload?+
25MB on the free tier — that fits a ~5-minute sermon clip at typical 720p compression. For full-length sermon video (45 minutes is usually 200–500MB MP4), upgrade to Pro (500MB per file), or extract the audio to MP3 first with a free tool like Audacity or ffmpeg.
Does video quality affect transcription accuracy?+
No. We discard the video entirely. What matters is the audio track inside the MP4. A 480p video with crisp lavalier audio transcribes more accurately than a 4K video with distant room mic audio. Mic placement is everything.
Can I get an SRT caption file from an MP4?+
Yes — that's the most common request. Every MP4 transcription includes an .srt and .vtt file with timestamps synced to your original video. Upload the .srt to YouTube, Vimeo, or any video editor for auto-aligned captions.
Does it work for MOV, AVI, MKV, and WebM too?+
Yes. The same workflow handles MOV (QuickTime), AVI, MKV, and WebM. ffmpeg extracts the audio from any common container. If a file fails to upload, convert it to MP4 in HandBrake (free) and try again.
How fast is MP4 to text transcription?+
About one-tenth the audio length. A 10-minute MP4 transcribes in around 1 minute; a 45-minute sermon video in about 5 minutes. Upload time depends on file size — a 500MB MP4 takes 2–5 minutes to upload on home broadband, then processes in the background.
Drop your MP4. Get captions back in 5 minutes.
First 10 minutes free. .srt and .vtt files generated automatically. No editor required.
Upload MP4Related
Alternative
Sermon Transcription vs HappyScribe
Same captions, lower price.
Alternative
Sermon Transcription vs TurboScribe
Church-aware vocabulary, no subscription.
Guide
Sermon Transcription with Timestamps
Why SRT files matter for sermon video.
Roundup
Best Church Media Tools 2026
Where MP4 transcription fits in the workflow.