feat(voice): dual-provider TTS (Supertonic local + ElevenLabs cloud) with Linux support by Trei-D · Pull Request #1301 · danielmiessler/Personal_AI_Infrastructure

Trei-D · 2026-05-24T06:25:27Z

Problem

The v5.0.0 voice module is macOS-only (uses afplay + osascript) and ElevenLabs-only (requires API key + quota). This means:

Linux PAI users have no voice — afplay doesn't exist on Linux
Voice costs money — every notification burns ElevenLabs API credits
Voice requires internet — no offline/local option

Solution

Dual-provider TTS architecture with cross-platform audio playback.

New: Supertonic as local-first provider

Zero cost, zero internet, zero API key. Supertonic runs TTS inference on CPU using ONNX models that auto-download on first use.

Installation

cd ~/.claude/PAI/PULSE/VoiceServer

# Create Python venv and install Supertonic
python3 -m venv .venv
.venv/bin/pip install supertonic

# Verify installation
.venv/bin/python supertonic-tts.py --text "Hello from PAI" --voice M1 --output /tmp/test.wav

# Play the result (Linux)
paplay /tmp/test.wav

Requirements:

Python 3.10+ (tested with 3.12)
~158 MB disk for the venv
~386 MB disk for model cache (~/.cache/supertonic3/, downloaded on first run)

Available voices

Voice	Gender	Notes
M1–M5	Male	5 distinct male voices
F1–F5	Female	5 distinct female voices

Configure in settings.json:

{
  "daidentity": {
    "voices": {
      "provider": "supertonic",
      "main": {
        "supertonicVoice": "M1"
      }
    }
  }
}

Performance (CPU-only, no GPU required)

Benchmarked on a 2-core Intel Skylake VM (worst case — most desktops will be faster):

Message	Synthesis time	End-to-end (+ playback)
Short (3 words)	~1.6s	~3.5s
Medium (8 words)	~2.0s	~5.5s
Long (12 words)	~2.0s	~5.5s

First run: adds ~10–30s for model download (~386 MB), then cached permanently
CPU usage: uses all available cores during synthesis (~8s user time on 2 cores = full parallel), then idle
Memory: ~200 MB RSS during synthesis

For comparison, ElevenLabs cloud TTS takes ~1–2s network round-trip but costs $0.30/1K characters.

New: Cross-platform audio playback

Audio player discovery chain (first available wins):

Player	Platform	Package
`paplay`	Linux (PulseAudio/PipeWire)	`pulseaudio-utils` or `pipewire-pulse`
`ffplay`	Universal (FFmpeg)	`ffmpeg`
`afplay`	macOS	Built-in

Linux system dependencies:

# Ubuntu/Debian
sudo apt install pulseaudio-utils libnotify-bin

# Fedora
sudo dnf install pulseaudio-utils libnotify

# Arch
sudo pacman -S libpulse libnotify

New: Linux desktop notifications

notify-send on Linux (libnotify) — visual popup alongside audio
osascript on macOS (existing behavior preserved)

Homeserver → Desktop audio routing

For users running PAI on a headless server (VM, NAS, homelab), voice audio can play on a remote desktop machine via PulseAudio/PipeWire network streaming:

On the desktop (audio sink):

# PulseAudio: allow network connections
pactl load-module module-native-protocol-tcp auth-anonymous=1

# PipeWire: add to ~/.config/pipewire/pipewire-pulse.conf.d/network.conf
# context.modules = [{ name = libpipewire-module-protocol-pulse
#   args = { server.address = ["unix:native", "tcp:4713"] } }]

On the server (PAI host):

# Add to ~/.claude/.env or shell profile
export PULSE_SERVER=tcp:<DESKTOP_IP>:4713

Audio from paplay/ffplay on the server routes to the desktop's speakers over the LAN. Works with both WAV (Supertonic) and MP3 (ElevenLabs).

Troubleshooting

Issue	Fix
`No audio player found`	Install `pulseaudio-utils` (Linux) or `ffmpeg`
`Supertonic TTS failed`	Check `.venv/bin/python` exists; re-run `pip install supertonic`
`Voice: Supertonic not installed — falling back to elevenlabs`	Normal if you haven't installed Supertonic; set `provider: "elevenlabs"` to suppress
No sound on remote server	Set `PULSE_SERVER=tcp:<desktop-ip>:4713` in `.env`
`Connection refused` on PulseAudio TCP	Run `pactl load-module module-native-protocol-tcp` on the desktop

Backward compatibility

ElevenLabs users: set "provider": "elevenlabs" in settings.json — everything works exactly as before
macOS users: afplay + osascript still in the discovery chain — zero behavior change
No Supertonic installed: auto-fallback to ElevenLabs with a log warning
All existing HTTP endpoints (/notify, /notify/personality, /voice, /voice/health) unchanged
3-tier config resolution preserved (caller body → voice_id lookup → defaults)

Files changed

File	Change
`VoiceServer/voice.ts`	Dual-provider architecture, Linux audio/notification support
`VoiceServer/supertonic-tts.py`	NEW — Python wrapper for Supertonic TTS synthesis

Testing

Verified on:

Ubuntu 24.04 (paplay + notify-send) with Supertonic provider
Homeserver → desktop audio routing via PulseAudio TCP (VM → desktop over LAN)
ElevenLabs fallback when Supertonic not installed
macOS compatibility preserved (afplay + osascript in discovery chain)

…with Linux support - Add Supertonic as local CPU-based TTS provider (zero cost, no API key needed) - Add Linux audio playback: paplay (PulseAudio) → ffplay (FFmpeg) → afplay (macOS) - Add Linux desktop notifications via notify-send - Add VoiceProvider type for provider selection in settings.json - Add per-voice Supertonic voice mapping (M1-M5, F1-F5) - Add supertonic-tts.py wrapper script - Preserve full backward compatibility with ElevenLabs-only setups - Auto-fallback: if Supertonic not installed, falls back to ElevenLabs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(voice): dual-provider TTS (Supertonic local + ElevenLabs cloud) with Linux support#1301

feat(voice): dual-provider TTS (Supertonic local + ElevenLabs cloud) with Linux support#1301
Trei-D wants to merge 1 commit into
danielmiessler:mainfrom
Trei-D:feat/dual-provider-voice-linux

Trei-D commented May 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Trei-D commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

New: Supertonic as local-first provider

Installation

Available voices

Performance (CPU-only, no GPU required)

New: Cross-platform audio playback

New: Linux desktop notifications

Homeserver → Desktop audio routing

Troubleshooting

Backward compatibility

Files changed

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Trei-D commented May 24, 2026 •

edited

Loading