Skip to content

JoeNerdan/dictate-google

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

dictate-google

Real-time voice dictation using Google Cloud Speech-to-Text streaming API. Text appears as you speak, directly typed into any application via xdotool.

Features

  • Real-time streaming: Text appears progressively as you speak
  • Smart display updates: Handles interim results with minimal flicker
  • Toggle mode: Run once to start, run again to stop
  • Continuous dictation: Stream stays open for natural pauses

Requirements

  • Linux with X11 (uses xdotool for typing)
  • Python 3.8+
  • Google Cloud account with Speech-to-Text API enabled

Installation

# Install dependencies
pip install google-cloud-speech pyaudio

# On Debian/Ubuntu, you may also need:
sudo apt install python3-pyaudio xdotool portaudio19-dev

# Copy the script to your PATH
cp dictate-google ~/.local/bin/
chmod +x ~/.local/bin/dictate-google

Setup

  1. Create a Google Cloud project and enable the Speech-to-Text API
  2. Create a service account and download the JSON credentials
  3. Place credentials at ~/.config/stt-credentials.json or set GOOGLE_APPLICATION_CREDENTIALS
# Option 1: Default location
cp your-credentials.json ~/.config/stt-credentials.json

# Option 2: Environment variable
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

Usage

# Start dictation (English)
dictate-google

# Start dictation (German)
dictate-google --lang=de-DE

# Stop dictation (run again or Ctrl+C)
dictate-google

Keyboard Shortcut

Bind dictate-google to a key (e.g., Super+D) in your desktop environment for quick toggle.

How It Works

  1. Opens microphone stream and sends audio to Google Cloud Speech-to-Text
  2. Receives interim results (may change) and final results (committed)
  3. Types text into the focused application using xdotool
  4. Tracks what's typed to handle corrections without flickering

Limitations

  • Streams have a 5-minute maximum duration (Google API limit)
  • Requires X11 (Wayland users need XWayland or alternative input method)
  • Microphone must be accessible to the script

License

MIT

About

Real-time voice dictation using Google Cloud Speech-to-Text streaming API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages