VTT and Superwhisper are closer cousins than most comparisons here: both are Mac-native and both can run speech recognition locally. The differences are in engine flexibility, how cloud is handled, and the overall feel. Here’s the breakdown.
01At a glance
| VTT | Superwhisper | |
|---|---|---|
| On-device option | Yes (Apple Speech) | Yes (local Whisper models) |
| Works offline | Yes | Yes (local models) |
| Cloud engines | Deepgram, OpenAI, ElevenLabs — your key | Cloud option available |
| Per-language engine routing | Yes | Limited |
| Account required | No | No |
| Pricing | Free to start; pay only for cloud you use | Free tier + paid plan |
| Platform | Native macOS | Native macOS |
General positioning in mid-2026; Superwhisper updates often — check its site for the latest models, tiers, and pricing.
02On-device approach
Both keep audio local when you want it. Superwhisper is built around running Whisper models locally — you download a model size that fits your Mac and accuracy/speed needs. VTT leans on Apple’s on-device Speech (including the macOS 26 models), which is tightly integrated with the system and needs no separate model juggling for the default path. Both are valid; it comes down to whether you prefer Whisper locally or Apple’s native stack.
03Cloud & engine choice
Where VTT pushes further is letting you bring multiple cloud engines under your own key — Deepgram, OpenAI, ElevenLabs — and choose the engine per language. That flexibility is useful if you dictate in several languages and want the best engine for each, rather than one model for everything.
04Feel & workflow
Both are menu-bar-friendly Mac apps with hotkey dictation. Superwhisper offers modes and prompt-style transformations of your dictation; VTT focuses on fast, faithful capture with a native feel and minimal ceremony. If you love configurable transformation modes, Superwhisper is rich there; if you want it to get out of the way, VTT’s simplicity is the draw.
05Pricing
Both have a free path. The distinction is the cloud model: VTT’s cloud cost is just your provider’s per-minute rate via your own key (nothing extra), while paid app tiers bundle features into a plan. If you intend to stay mostly on-device, both keep ongoing costs low.
One hotkey, every engine
VTT runs on-device by default and adds Deepgram, OpenAI, or ElevenLabs on your own key — with per-language routing. Free, no account.
Download VTT06Which should you pick?
- Choose Superwhisper if: you specifically want local Whisper models and configurable transformation modes.
- Choose VTT if: you want Apple’s native on-device engine by default, multiple cloud engines under your own key, and per-language engine selection — in a minimal, native app.