Not all speech models are created equal. We put the industry leaders to the test on a standard dataset of technical dictation, accents, and background noise.
The Contenders
- OpenAI Whisper (Medium): The open-source heavyweight.
- Apple On-Device Dictation: The built-in macOS/iOS engine.
- DeepGram (Nova-2): A specialized cloud API renowned for speed.
Accuracy (Word Error Rate)
Lower is better.
- Whisper (Medium): 2.8% WER. Incredible handling of accents and technical jargon.
- DeepGram: 3.1% WER. Very close, slightly faster but costs money.
- Apple Dictation: 5.4% WER. Often struggles with context and proper nouns.
Speed Factor
DeepGram is the king of speed, often transcribing faster than real-time. But Whisper on M-series chips is catching up fast.
Whisper "Turbo"
New optimizations like distilled models are pushing Whisper to run at 20x real-time on consumer hardware, making it viable for live captioning.
The Apple Advantage
Apple's integration is deep. It dips the audio of other apps, handles microphone inputs seamlessly, and works system-wide.
Conclusion
If you want the absolute best accuracy and privacy, running a quantized Whisper Medium model locally is the current gold standard. For pure speed in a cloud workflow, DeepGram wins. Apple's built-in tool is convenient but falls behind on accuracy for complex tasks.
Related posts
Stop typing. Start flowing.
Join the thousands of developers who have ditched the keyboard. Andak is the local Voice AI that understands your code.
