feat(voice): dual-provider TTS (Supertonic local + ElevenLabs cloud) with Linux support#1301
Open
Trei-D wants to merge 1 commit into
Open
feat(voice): dual-provider TTS (Supertonic local + ElevenLabs cloud) with Linux support#1301Trei-D wants to merge 1 commit into
Trei-D wants to merge 1 commit into
Conversation
…with Linux support - Add Supertonic as local CPU-based TTS provider (zero cost, no API key needed) - Add Linux audio playback: paplay (PulseAudio) → ffplay (FFmpeg) → afplay (macOS) - Add Linux desktop notifications via notify-send - Add VoiceProvider type for provider selection in settings.json - Add per-voice Supertonic voice mapping (M1-M5, F1-F5) - Add supertonic-tts.py wrapper script - Preserve full backward compatibility with ElevenLabs-only setups - Auto-fallback: if Supertonic not installed, falls back to ElevenLabs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The v5.0.0 voice module is macOS-only (uses
afplay+osascript) and ElevenLabs-only (requires API key + quota). This means:afplaydoesn't exist on LinuxSolution
Dual-provider TTS architecture with cross-platform audio playback.
New: Supertonic as local-first provider
Zero cost, zero internet, zero API key. Supertonic runs TTS inference on CPU using ONNX models that auto-download on first use.
Installation
Requirements:
~/.cache/supertonic3/, downloaded on first run)Available voices
Configure in
settings.json:{ "daidentity": { "voices": { "provider": "supertonic", "main": { "supertonicVoice": "M1" } } } }Performance (CPU-only, no GPU required)
Benchmarked on a 2-core Intel Skylake VM (worst case — most desktops will be faster):
For comparison, ElevenLabs cloud TTS takes ~1–2s network round-trip but costs $0.30/1K characters.
New: Cross-platform audio playback
Audio player discovery chain (first available wins):
paplaypulseaudio-utilsorpipewire-pulseffplayffmpegafplayLinux system dependencies:
New: Linux desktop notifications
notify-sendon Linux (libnotify) — visual popup alongside audioosascripton macOS (existing behavior preserved)Homeserver → Desktop audio routing
For users running PAI on a headless server (VM, NAS, homelab), voice audio can play on a remote desktop machine via PulseAudio/PipeWire network streaming:
On the desktop (audio sink):
On the server (PAI host):
Audio from
paplay/ffplayon the server routes to the desktop's speakers over the LAN. Works with both WAV (Supertonic) and MP3 (ElevenLabs).Troubleshooting
No audio player foundpulseaudio-utils(Linux) orffmpegSupertonic TTS failed.venv/bin/pythonexists; re-runpip install supertonicVoice: Supertonic not installed — falling back to elevenlabsprovider: "elevenlabs"to suppressPULSE_SERVER=tcp:<desktop-ip>:4713in.envConnection refusedon PulseAudio TCPpactl load-module module-native-protocol-tcpon the desktopBackward compatibility
"provider": "elevenlabs"in settings.json — everything works exactly as beforeafplay+osascriptstill in the discovery chain — zero behavior change/notify,/notify/personality,/voice,/voice/health) unchangedFiles changed
VoiceServer/voice.tsVoiceServer/supertonic-tts.pyTesting
Verified on: