Audio to Text · MP3 · WAV · M4A · FLAC · OGG · 10 min free

Audio to Text Converter — Free Online Transcription

Drag any audio file — MP3, WAV, M4A, FLAC, OGG — and get back searchable text with timestamps. 99.5% AI accuracy on sermons, interviews, podcasts, lectures, and meetings. First 10 minutes are free; everything after is $0.006/min.

Processing

5 min / 45-min file

Accuracy

99.0–99.5%

Formats

MP3 · WAV · M4A · FLAC · OGG

What this is

Audio-to-text conversion (also called speech-to-text or transcription) uses a neural network to turn recorded speech into written words. Modern AI models like OpenAI Whisper achieve near-human accuracy on clear voice recordings at a tiny fraction of the cost of professional human transcription.

This page is the universal entry point — upload anything from a podcast MP3 to a lecture WAV to an iPhone Voice Memo M4A, and the same backend handles it. If you have a specific format in mind we have dedicated guides for MP3, WAV, M4A, MP4 video, and YouTube links.

Step-by-step: audio file to text

  1. 1

    Open the upload tool

    Go to /transcribe in the nav. No signup needed for your first 10 minutes of audio.

  2. 2

    Drop the audio file in

    Drag-drop from Finder, Explorer, Dropbox, or Google Drive. Up to 25MB on the free tier; 500MB per file on Pro. Accepted: MP3, WAV, M4A, AAC, FLAC, OGG, OPUS, AIFF.

  3. 3

    Pick Standard or Premium

    Standard ($0.006/min, OpenAI Whisper) is the right call for most single-speaker recordings. Premium ($0.02/min, ElevenLabs Scribe) adds speaker diarization — labels who said what. Pick Premium if you're transcribing a panel, interview, or roundtable.

  4. 4

    Let the engine run

    Processing takes ~10% of audio length. A 45-minute file transcribes in about 5 minutes. You can close the tab; results land in your dashboard and an email goes out when done.

  5. 5

    Review the transcript in the browser

    Hit play in the inline player — each word is clickable and seeks the audio to that timestamp. Edit typos directly. Search-find works for jumping to specific phrases.

  6. 6

    Download the output

    Choose .txt (plain text), .srt (subtitle format), .vtt (HTML5 captions), or .docx (Word document with formatted timestamps). Pro accounts also get JSON with word-level timestamps for custom processing.

Supported audio formats

FormatFree tier maxPro maxBest for
MP325 MB ≈ 45 min500 MBPodcasts, sermons, lectures (most common)
WAV25 MB ≈ 2.5 min500 MBStudio masters, mixer recordings
M4A / AAC25 MB500 MBiPhone Voice Memos, QuickTime, Zoom
FLAC25 MB500 MBLossless without WAV bulk
OGG / OPUS25 MB500 MBDiscord exports, WhatsApp voice notes
AIFF25 MB500 MBApple Logic Pro masters
MP4 / MOV (video)25 MB500 MBAudio extracted automatically

Tips that lift accuracy 1–3%

  • Mic placement is everything. A $20 lavalier within 6 inches of the speaker beats a $2,000 condenser at 6 feet, every time. The biggest accuracy gains come from before the recording, not after.
  • Trim silence at the start and end with Audacity (Effect → Truncate Silence) before upload. Smaller files upload faster and you don't pay to transcribe dead air.
  • Normalize the audio if levels are quiet. Audacity Effect → Normalize → -1.0 dB. This makes a measurable accuracy difference on under-gained recordings.
  • Mono, not stereo, for voice. The model collapses to mono internally, but a properly summed mono export skips one step and gives marginally cleaner input.
  • Two voices speaking over each other? Use Premium tier. Standard treats overlapping speech as one stream and confuses the lattice.

The audio-to-text workflow

MP3WAV / M4AFLAC / OGGuploadAI engineWhisper / ElevenLabs~5 min / 45-min.txt.srt / .vtt.docx

Audio-to-text pricing vs alternatives

ServiceCost / 45-min audioAccuracyOutput formatsFree tier
Sermon Transcription (Std)$0.2799.0–99.5%.txt .srt .vtt .docx10 min free
Sermon Transcription (Premium)$0.9099.5%+ with diarization.txt .srt .vtt .docx + JSON10 min free
Rev AI$11.2590–95%.txt .srt .vtt + JSON5 hours free
Rev human$67.5099%+.txt .docxNone
Otter Pro~$0.64 effective90–95%.txt .docx .srt300 min / mo
HappyScribe AI~$9.0085–92%.txt .srt .vttNone

Pricing as of early 2026. Rev AI $0.25/min; Rev human $1.50/min. Otter Pro $16.99/mo ÷ 1,200 included min × 45 min. HappyScribe AI ~$0.20/min.

Audio to text FAQ

What audio formats can I convert to text?+

MP3, WAV, M4A (AAC), FLAC, OGG, AIFF, and OPUS upload natively. Video formats (MP4, MOV, MKV, WebM, AVI) work too — we strip the audio track on upload. If your format isn't listed, convert it to MP3 with VLC or HandBrake first.

Is it really free to convert audio to text?+

Yes — the first 10 minutes of audio are free for every user, no credit card and no signup required. After that, Standard tier is $0.006/min (about $0.27 per 45-minute file) and Premium is $0.02/min (about $0.90).

How accurate is the audio-to-text conversion?+

99.0–99.5% on clear voice recordings. The model is biased toward English speech but handles Spanish, French, German, Mandarin, Portuguese, and 90+ other languages with similar accuracy. Heavily accented speech, reverb-heavy rooms, and background music are the main accuracy killers.

How long does audio-to-text conversion take?+

Roughly one-tenth of the audio length. A 10-minute audio file converts in about 1 minute. A 45-minute sermon in about 5. A 90-minute podcast episode in about 10.

What output formats do I get?+

Plain text (.txt) for blog posts and search indexing. SRT (.srt) for video captions. WebVTT (.vtt) for HTML5 video players. Word document (.docx) with timestamps every 30 seconds. JSON with word-level timestamps available on Pro accounts.

Can I batch-convert multiple audio files at once?+

Yes — Pro accounts support folder upload. Drop in a folder of mixed MP3/WAV/M4A files and each is processed in parallel. Results land in your dashboard, downloadable individually or as a zipped bundle.

Convert your audio to text now

MP3, WAV, M4A, FLAC, OGG. First 10 minutes free. No signup until you go beyond that.

Start free

Related