How to transcribe a YouTube sermon — the right way

YouTube's auto-captions miss scripture references, theological terms, and speaker turns. Here's the 15-minute workflow to download a YouTube sermon's audio, transcribe it with Whisper accuracy, and end up with a clean, publishable transcript.

Why YouTube auto-captions aren't enough

Auto-captions miss vocabulary

Words like 'propitiation', 'eschatology', 'pneumatology', or 'Septuagint' rarely come through correctly. YouTube wasn't trained for theological speech.

No paragraph or punctuation structure

YouTube outputs a single wall of caption text with rough punctuation. For a blog post or research notes, you need clean paragraphs — which is what Whisper produces.

No clean export

You can scrape captions via a transcript panel, but you get raw text with no SRT/VTT/JSON export options. Sermon Transcription gives you all formats with one click.

The complete YouTube-to-transcript workflow

Total time: 15 minutes for a 45-minute sermon. Total cost: $0.27.

1

Find your YouTube sermon URL

Copy the full URL from the address bar — youtube.com/watch?v=XXXXX or youtu.be/XXXXX. For your own channel, you can also download direct from YouTube Studio (Content → ... menu → Download).

2

Download as audio (3 options)

Option A (recommended): yt-dlp — install with brew install yt-dlp or pip install yt-dlp. Then run: yt-dlp -x --audio-format mp3 --audio-quality 5 URL. Quality 5 = ~96kbps which stays under our 25MB cap for sermons up to ~35 min.
Option B: Browser extension — Free Video Downloader (Firefox) or similar grab the file while you stream. Look for the MP4 option, then convert to MP3 with Audacity.
Option C: Screen record + extract audio — record the video playback with QuickTime (Mac), Game Bar (Windows), or OBS. Export audio with ffmpeg or Audacity.

3

Verify file size (under 25MB for free tier)

Check the resulting file size. If it's over 25MB, re-export at lower quality. Quick ffmpeg recipe: ffmpeg -i input.mp3 -b:a 64k -ac 1 output.mp3. A 64kbps mono file is plenty good for transcription accuracy.

4

Upload to /transcribe

Drag and drop the MP3. Whisper auto-detects language. Standard tier at $0.006/min — a 45-minute sermon costs $0.27. Premium at $0.02/min adds speaker ID if the YouTube video has multiple speakers (panel, interview, Q&A).

5

Get a clean, paragraph-structured transcript

Within 3-5 minutes you'll have TXT, SRT, VTT, and JSON exports. The TXT version has proper sentence punctuation, paragraph breaks, and accurate scripture references — far cleaner than YouTube's auto-captions.

6

Re-upload SRT to YouTube as captions (if it's your channel)

In YouTube Studio: Subtitles → Add language → English → Upload file → Without timing or With timing (use With timing for the SRT). Publish. Your video now has accurate captions, which improves YouTube ranking and viewer retention.

Cost comparison: YouTube workflow vs. alternatives

For one 45-minute YouTube sermon, here's what you'd pay across the popular options.

MethodCost / sermonTurnaroundOutput quality
YouTube auto-captionsFreeMinutes after upload~75-85% accuracy, no formatting
Sermon Transcription (this site)$0.275 minutes95-98% accuracy, paragraph-clean
Otter.ai (Business)$20-30/mo subscriptionReal-time~90% accuracy, monthly minute cap
Rev.com (AI)$11.25 (at $0.25/min)Same day~95% accuracy
Rev.com (Human)$67.50 (at $1.50/min)12-24 hours99%+ accuracy

For sermon use cases (where 95-98% machine accuracy + 10 min of editing is functionally identical to human work), Whisper-based transcription is the obvious winner.

The YouTube-to-transcript pipeline

Three steps, 15 minutes, $0.27.

YouTubeSermon videoyt-dlpMP3 file96 kbps mono~20 MB/transcribeTranscriptTXT + SRT+ VTT + JSON~15 minutes total3 min download · 5 min transcribe · 7 min review$0.27 per 45-minute sermonFirst 10 minutes free

Frequently asked questions

Can Sermon Transcription pull a YouTube URL directly?+
Not currently. YouTube's terms of service restrict third-party downloads, so we don't operate a URL-paste endpoint. Instead, you download the audio file yourself (the same way you might download for offline listening), then upload that file to /transcribe. The workflow is documented step-by-step on this page.
Are YouTube's auto-generated captions good enough?+
For most sermons, no. YouTube's auto captions tend to be 70-85% accurate on conversational speech, but they struggle with theological vocabulary, scripture references, and the cadence of preaching. They also lack speaker labels, punctuation in long sentences, and clean paragraph structure. For a publishable transcript, you'll want a Whisper-based transcription.
What's the cheapest way to download YouTube sermon audio?+
Three good options. (1) yt-dlp — a free open-source command-line tool that downloads any YouTube video as audio-only MP3. (2) A browser extension like Free Video Downloader (verify legality in your region — these grab the file as it streams). (3) Screen-record the playback in QuickTime or OBS and export audio. yt-dlp is fastest and cleanest if you're comfortable with a terminal.
What's the file size limit on Sermon Transcription?+
25MB on the free tier. For typical sermons that's never a problem if you download as 64-96kbps MP3. A 60-minute sermon at 64kbps mono is about 28MB — slightly over the limit, so re-export at 48kbps or use a paid plan for larger uploads. Most 40-minute sermons fit comfortably at 96kbps.
Is it legal to transcribe a sermon from someone else's YouTube channel?+
For personal study, yes. For republishing the transcript publicly, you need the rights holder's permission — usually the church or pastor. If you're transcribing your own church's YouTube uploads, you have full rights. If you're transcribing a famous pastor for personal Bible study or seminary research, that's typically fair use under most academic policies but is not legal advice — consult your school's research office.
Can I use this to caption MY church's YouTube sermons?+
Yes — this is the cleanest use case. Export your sermon audio from your recording software, transcribe with us, download the SRT file, and upload it to YouTube as a caption track. Total time: under 15 minutes per sermon. YouTube ranks captioned videos higher in search and your accessibility metrics improve.
How accurate is transcription on YouTube-quality audio?+
Generally 95-98% on clean church livestream audio (mixer board → recording). Drops to 90-94% on audience-captured handheld video uploads with room reverb and ambient noise. Whisper is remarkably robust — even at 90% accuracy a 30-minute sermon costs about $0.18 and 5 minutes of cleanup gets you to a publishable transcript.
Can I transcribe a long playlist of sermons in one batch?+
Yes — but you'll need to download each video as a separate audio file. yt-dlp supports playlist URLs natively: 'yt-dlp -x --audio-format mp3 PLAYLIST_URL' downloads every video in the playlist as MP3. Then upload each file to /transcribe individually (or batch via API for higher volumes).

Turn any YouTube sermon into clean text

Download, upload, get a transcript in 15 minutes. First 10 minutes free.

Start Free