Real-time voice dictation using Google Cloud Speech-to-Text streaming API. Text appears as you speak, directly typed into any application via xdotool.
- Real-time streaming: Text appears progressively as you speak
- Smart display updates: Handles interim results with minimal flicker
- Toggle mode: Run once to start, run again to stop
- Continuous dictation: Stream stays open for natural pauses
- Linux with X11 (uses xdotool for typing)
- Python 3.8+
- Google Cloud account with Speech-to-Text API enabled
# Install dependencies
pip install google-cloud-speech pyaudio
# On Debian/Ubuntu, you may also need:
sudo apt install python3-pyaudio xdotool portaudio19-dev
# Copy the script to your PATH
cp dictate-google ~/.local/bin/
chmod +x ~/.local/bin/dictate-google- Create a Google Cloud project and enable the Speech-to-Text API
- Create a service account and download the JSON credentials
- Place credentials at
~/.config/stt-credentials.jsonor setGOOGLE_APPLICATION_CREDENTIALS
# Option 1: Default location
cp your-credentials.json ~/.config/stt-credentials.json
# Option 2: Environment variable
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json# Start dictation (English)
dictate-google
# Start dictation (German)
dictate-google --lang=de-DE
# Stop dictation (run again or Ctrl+C)
dictate-googleBind dictate-google to a key (e.g., Super+D) in your desktop environment for quick toggle.
- Opens microphone stream and sends audio to Google Cloud Speech-to-Text
- Receives interim results (may change) and final results (committed)
- Types text into the focused application using xdotool
- Tracks what's typed to handle corrections without flickering
- Streams have a 5-minute maximum duration (Google API limit)
- Requires X11 (Wayland users need XWayland or alternative input method)
- Microphone must be accessible to the script
MIT