How the comparison works
- Tokenize both sermons. Each transcript is lowercased and split into word tokens. Stopwords (a, the, and, you) and tokens shorter than three letters are dropped, leaving only meaningful theme words.
- Compute Jaccard similarity. We take the unique-word set of each sermon, count the intersection, and divide by the union. The result is a percentage from 0 (no overlap) to 100 (identical vocabulary).
- Rank themes and detect references. Shared and exclusive themes are ranked by combined frequency. Bible references in both sermons are matched against the same 66-book regex used by the Scripture Extractor.
When you'd use this tool
The most common use is series planning. Halfway through preaching a four-part series, drop the first two sermons in here to see if you're repeating themes too tightly or, conversely, drifting too far from the series spine. A series should usually score in the 30 to 45 percent Jaccard range — high enough to feel connected, low enough that each sermon adds something genuinely new.
The second use is self-audit. A pastor with a healthy preaching ministry covers a remarkably narrow vocabulary band — pull any two of your sermons from the last six months and you'll likely see 25 to 35 percent overlap regardless of topic. That's not bad; it reflects a coherent theological voice. But if every comparison comes back at 50 percent or higher, your sermons may be starting to blur. The "only in A" and "only in B" columns become the most useful output: they show what was actually unique about each message.
The third use is reference-checking. If you've drawn heavily on a source sermon (with attribution), this tool quantifies how much your message echoes the source. Anything above 40 percent vocabulary overlap when topics differ may suggest closer paraphrase than intended. Anything above 60 percent on a similar topic warrants careful review of citation practices.
Related tools
- Sermon Tag Cloud — visualize each sermon's themes.
- Scripture Density — compare refs per minute.
- Readability Analyzer — grade level for each sermon.
- Sermon Word Counter — raw counts.