Transcribe Video to Text Online — Any Format
Upload an MP4, MOV, MKV, WebM, or AVI. We extract the audio, run it through the same 99.5%-accurate transcription engine that powers our sermon tool, and hand you back captions in SRT and VTT plus a clean text transcript. About 5 minutes for a 45-minute video.
Processing
5 min / 45-min video
Caption sync
.srt + .vtt
Formats
MP4 · MOV · MKV · WebM
What this is
Video-to-text transcription extracts the speech from your video file and turns it into a written transcript plus time-coded caption files (.srt, .vtt). The video itself isn't transcribed — only the audio track, because text-to-speech models work on sound waves, not pixels.
This is the universal video page. For a deeper guide on the specific MP4 workflow see Transcribe MP4 to Text; for YouTube URLs, see YouTube Link to Transcript.
Step-by-step: video to text
- 1
Open /transcribe
Hit the top nav. The same upload zone takes both audio and video — no separate workflow.
- 2
Drag your video file in
MP4, MOV, MKV, WebM, or AVI. Free tier accepts up to 25MB; Pro accepts 500MB. The browser shows upload progress live.
- 3
Server extracts the audio
ffmpeg pulls the audio track out, downsamples to 16kHz mono, and discards the video. You're billed per audio minute, not video minute, so a 4K and 480p file with identical audio cost the same.
- 4
Pick Standard or Premium
Standard ($0.006/min) is fine for single-camera sermon or talking-head video. Premium ($0.02/min) labels who said what — pick it for panel discussions, interviews, or roundtables.
- 5
Wait ~5 minutes for a 45-minute video
Processing time scales with audio duration, not file size. A 45-minute 4K video and a 45-minute 480p video both finish in about 5 minutes.
- 6
Download .srt for YouTube, .vtt for the web, .txt for blog
Upload the .srt to YouTube Studio for instant synced captions. Drop the .vtt into your church website's <video> tag. Use the .txt for a sermon blog post or searchable archive entry.
Video format compatibility
| Format | Free tier max | Pro max | Source |
|---|---|---|---|
| MP4 (H.264 + AAC) | 25 MB | 500 MB | YouTube Studio, OBS, Premiere |
| MOV (QuickTime) | 25 MB | 500 MB | iPhone, Mac, Final Cut Pro |
| MKV (Matroska) | 25 MB | 500 MB | OBS Studio, Plex |
| WebM | 25 MB | 500 MB | YouTube downloads, browser recordings |
| AVI | 25 MB | 500 MB | Older Windows video editors |
| FLV / WMV | 25 MB | 500 MB | Legacy formats — convert if errors |
| 3GP | 25 MB | 500 MB | Old phones — convert to MP4 first |
Video transcription tips
- Video over 25MB on free tier? Extract audio to MP3 first. ffmpeg:
ffmpeg -i video.mp4 -vn -ac 1 -b:a 64k audio.mp3That gets a 45-min sermon under 25MB. Upload the .mp3 instead. - No command line? HandBrake (free GUI) — open the video, choose "Audio Only" preset, export as MP3. Or VLC: Media → Convert/Save → choose MP3 profile.
- Want YouTube chapters? Use the .srt timestamps to mark natural section breaks in the sermon, then paste them into your YouTube description (one per line, e.g., "3:42 The big idea"). YouTube auto-detects and creates chapter markers.
- Burned-in vs sidecar captions? Most platforms (YouTube, Vimeo, Squarespace, WordPress with the right plugin) take sidecar .srt or .vtt files and overlay captions live. Only burn captions in (with HandBrake or DaVinci Resolve) if you're distributing the raw .mp4 standalone.
- Want to clip a section? Trim the video first in iMovie, CapCut, or Premiere to just the section you want transcribed. You pay only for the audio minutes you actually upload — trim aggressively for a tighter, cheaper transcript.
The video-to-text workflow
Video transcription pricing vs alternatives
| Service | Cost / 45-min video | Accuracy | Caption files | Free tier |
|---|---|---|---|---|
| Sermon Transcription (Std) | $0.27 | 99.0–99.5% | .srt .vtt .txt .docx | 10 min free |
| Sermon Transcription (Premium) | $0.90 | 99.5%+ with diarization | .srt .vtt .txt .docx | 10 min free |
| YouTube auto-captions | Free | 75–88% | Built-in only | Unlimited |
| Rev AI | $11.25 | 90–95% | .srt .vtt | 5 hours free |
| Rev human captioning | $67.50 | 99%+ | .srt .vtt + burned-in | None |
| HappyScribe AI | ~$9.00 | 85–92% | .srt .vtt | None |
Pricing as of early 2026. Rev AI $0.25/min; Rev human $1.50/min. HappyScribe AI ~$0.20/min. YouTube auto-captions are free but limited to YouTube videos and rarely meet WCAG accessibility standards.
Video to text FAQ
Which video formats can I transcribe?+
MP4 (H.264/H.265), MOV (QuickTime), MKV (Matroska), WebM, AVI, and FLV all upload natively. We strip the audio track using ffmpeg on our end and transcribe only the audio. If a format isn't accepted, run it through HandBrake (free) to MP4 first.
Do I need to extract audio before uploading?+
No — we do that step server-side. You upload the video, and the audio track is extracted, downsampled to 16kHz mono, and fed to the transcription engine. The video is dropped (no video minutes billed; only audio minutes).
What's the maximum video file size?+
25MB on the free tier — that's about 5 minutes of 720p MP4. A typical 45-minute sermon video at 720p is 200–500MB. To transcribe full-length video on the free tier, extract the audio to MP3 first (see tips below). Or upgrade to Pro for 500MB per file.
Will I get .srt and .vtt caption files back?+
Yes — every video transcription produces an SRT file (for YouTube, Vimeo, video editors) and a WebVTT file (for HTML5 <video> on the web). Plus plain text (.txt) and a Word doc (.docx) with timestamps every 30 seconds.
How does video accuracy compare to audio?+
Identical — because we transcribe only the audio track, video resolution has zero effect on accuracy. A 480p webcam recording with a good lavalier mic transcribes more accurately than a 4K cinema rig with a distant shotgun mic. Mic placement and signal quality are what matter.
Can I add the captions to my YouTube video?+
Yes. Download the .srt, go to YouTube Studio → your video → Subtitles → Add language → English → Upload file → Select with timing → choose the .srt. Captions appear within minutes and remain editable in YouTube's caption editor.
Drop your video. Get captions back.
First 10 minutes free. SRT and VTT generated automatically. Drop into YouTube Studio for instant captions.
Upload videoRelated
Alternative
Sermon Transcription vs Rev.com
Same SRT output, $0.27 vs $11.25 per 45-min.
Alternative
Sermon Transcription vs SermonShots
Transcript-first vs clip-first.
Guide
Sermon Transcription with Timestamps
Why SRT matters for accessibility and SEO.
Strategy
Repurposing Sermon Transcripts: 15 Ideas
From one video to fifteen content pieces.