A push-to-talk speech transcription tool powered by whisper.cpp. Press a hotkey, speak, release; your speech is transcribed locally and typed/pasted into the active window.
Built with Go, using whisper.cpp's C library with Metal GPU acceleration on Apple Silicon.
The following is my environment, you can probably build this on Linux, and maybe on Windows without significant changes.
- macOS on Apple Silicon (M-series)
- Go 1.21+
- CMake
- Xcode Command Line Tools (
xcode-select --install)
Clone the repo with submodules:
git clone --recurse-submodules git@github.com:lhk/push2whisper.git
cd push2whisperThen run the following scripts in order:
bash go-whisper/download_models.shThis downloads the default set model (turbo large v3 q5 quantized. That works best for me :) ). You can also pass specific model names, e.g. bash go-whisper/download_models.sh ggml-base.en.bin for a smaller model.
bash go-whisper/build_whisper.shBuilds the whisper.cpp static libraries with Metal and BLAS acceleration.
bash go-whisper/build_wrapper.shProduces the go-whisper/whisper-client executable.
cd go-whisper
./whisper-client| Hotkey | Action |
|---|---|
| Ctrl + Shift + S | Start/stop recording |
| Ctrl + Shift + Q | Re-transcribe last recording |
| Ctrl + Shift + 2 | Retype last transcription |
Set WHISPER_MODEL_PATH to use a different model:
WHISPER_MODEL_PATH=../whisper.cpp/models/ggml-base.en.bin ./whisper-clientTo try CoreML acceleration (runs the transcription on Apple's Neural Engine):
uv venv
source .venv/bin/activate
uv pip install -r whisper.cpp/models/requirements-coreml.txt
bash go-whisper/build_whisper_coreml.sh
bash go-whisper/convert_coreml_model.sh large-v3-turbo
bash go-whisper/build_wrapper_coreml.shSee the scripts for details on model naming and symlinks for quantized models.