Comparing Local Models: Whisper vs. DeepGram vs. Apple

Not all speech models are created equal. We put the industry leaders to the test on a standard dataset of technical dictation, accents, and background noise.

The Contenders

OpenAI Whisper (Medium): The open-source heavyweight.
Apple On-Device Dictation: The built-in macOS/iOS engine.
DeepGram (Nova-2): A specialized cloud API renowned for speed.

Accuracy (Word Error Rate)

Lower is better.

Whisper (Medium): 2.8% WER. Incredible handling of accents and technical jargon.
DeepGram: 3.1% WER. Very close, slightly faster but costs money.
Apple Dictation: 5.4% WER. Often struggles with context and proper nouns.

Speed Factor

DeepGram is the king of speed, often transcribing faster than real-time. But Whisper on M-series chips is catching up fast.

Whisper "Turbo"

New optimizations like distilled models are pushing Whisper to run at 20x real-time on consumer hardware, making it viable for live captioning.

The Apple Advantage

Apple's integration is deep. It dips the audio of other apps, handles microphone inputs seamlessly, and works system-wide.

Conclusion

If you want the absolute best accuracy and privacy, running a quantized Whisper Medium model locally is the current gold standard. For pure speed in a cloud workflow, DeepGram wins. Apple's built-in tool is convenient but falls behind on accuracy for complex tasks.

Comparing Local Models: Whisper vs. DeepGram vs. Apple

The Contenders

Accuracy (Word Error Rate)

Speed Factor

Whisper "Turbo"

The Apple Advantage

Conclusion

Related posts

Stop typing. Start flowing.

Product

Resources

Company