Skip to content

Feature Contribution: Add support for ElevenLabs TTS voices.#1073

Open
evetzyokozuna wants to merge 26 commits intoagent0ai:mainfrom
evetzyokozuna:main
Open

Feature Contribution: Add support for ElevenLabs TTS voices.#1073
evetzyokozuna wants to merge 26 commits intoagent0ai:mainfrom
evetzyokozuna:main

Conversation

@evetzyokozuna
Copy link

Summary

This PR adds and stabilizes an ElevenLabs-based voice output path in Agent Zero, alongside existing browser/Kokoro speech behavior.
It updates backend API handling, frontend speech routing, and settings UX so ElevenLabs can be enabled as an optional provider without regressing default behavior.

Files covered

  1. python/api/el11_tts.py
  2. webui/components/chat/speech/speech-store.js
  3. webui/components/settings/agent/speech.html
  4. requirements.txt

Problem Statement

Voice output through kokoro was OK, but for those wanting a more human like voice for their agent-zero implementation, allow for custom voices from ElevenLabs.

This PR implements an additional capability to use ElevenLabs voices.


What this PR changes

1) python/api/el11_tts.py — ElevenLabs proxy API endpoint

Purpose

Provide a server-side TTS proxy endpoint (/el11_tts) that:

  • accepts text input from the UI
  • resolves active voice profile configuration
  • calls ElevenLabs with server-side credentials
  • returns playable audio/mpeg data to the client

Behavior

  • Expects payload like:
    • text (required)
    • profile (optional, defaults to active profile)
  • Loads per-agent voice config from:
    • agents/<profile>/elevenlabs_voice.json
  • Uses environment key:
    • EL11_API_KEY
  • Returns:
    • audio stream bytes (MPEG) on success
    • structured JSON error payload on failure

Why this matters

  • Keeps API key off the browser
  • Enables profile-specific voice identity
  • Creates a clean TTS backend interface that can be reused for telephony paths later

2) webui/components/chat/speech/speech-store.js — speech provider routing + playback

Purpose

Add real speech routing support for ElevenLabs in the existing TTS flow.

Behavior added

  • provider gating checks for ElevenLabs mode (via local settings/toggle)
  • new ElevenLabs speech path that calls /el11_tts
  • robust audio playback for returned audio blobs
  • fallback behavior retained:
    • if ElevenLabs fails, existing Kokoro/browser behavior still works
  • existing stream/chunk speech flow remains intact

Why this matters

  • The UI can now actually use ElevenLabs audio, not just display a toggle
  • Preserves backward compatibility for users not enabling ElevenLabs

3) webui/components/settings/agent/speech.html — settings UX

Purpose

Expose a clear user-facing toggle for ElevenLabs proxy TTS in the Speech settings panel.

Behavior added

  • an explicit “Enable ElevenLabs TTS Proxy” control
  • UX text clarifying this uses the server proxy route and requires configured key/config

Why this matters

  • Provides discoverable, controllable behavior from UI
  • Aligns user intent with actual provider routing in speech-store

4) requirements.txt — dependency/runtime parity

Purpose

Align dependency set with runtime expectations for the ElevenLabs integration path and live environment stability.

Why this matters

  • Reduces “works in one environment but not another” drift
  • Supports reproducible deployments and clean runtime behavior

Configuration and Usage

Required env

  • EL11_API_KEY=<your_elevenlabs_key>

Required voice config

Place elevenlabs_voice.json in relevant agent directories, e.g.:

  • agents/agent0/elevenlabs_voice.json
  • agents/default/elevenlabs_voice.json
  • etc.

Example fields:

  • voice_id
  • model
  • stability
  • similarity_boost
  • style
  • optional quality-related settings as supported by endpoint

Enable in UI

  1. Open Settings -> Agent -> Speech
  2. Enable ElevenLabs TTS Proxy
  3. Trigger any voice output path in chat

Backward Compatibility

  • Default speech behavior remains unchanged unless ElevenLabs mode is enabled.
  • Kokoro/browser fallback paths remain available.
  • Existing speech chunking and stream sequencing logic remains preserved.

Security Considerations

  • ElevenLabs API key remains server-side (not exposed to browser code).
  • Frontend calls local authenticated endpoint (/el11_tts) rather than external API directly.
  • Profile-based config loading is constrained to expected agent config files.

Validation / Test Notes

Manual checks performed

  • endpoint registration and availability for /el11_tts
  • valid audio response path (content-type: audio/mpeg)
  • frontend served assets include ElevenLabs routing logic
  • settings toggle rendered and persisted in UI
  • fallback behavior sanity checked

Suggested reviewer checks

  • verify speech quality changes when ElevenLabs toggle is enabled
  • verify fallback when ElevenLabs key/config is missing
  • verify no regressions in browser/Kokoro modes
  • verify multi-agent profile voice switching behavior

Known Limitations / Follow-ups

  • current control for provider mode is toggle-based; future refinement can consolidate into a single tts_mode setting for stronger clarity.
  • telemetry around provider selection/fallback reason could be added for troubleshooting.
  • future telephony integration may reuse /el11_tts shape or move to provider abstraction layer.

Why this PR is valuable

This change turns ElevenLabs support from “partial wiring + config files” into a working, testable, user-selectable voice path in Agent Zero.
It is designed to preserve current behavior while enabling higher-quality voice output now and cleaner voice-provider extensibility going forward.

…ech tab, bound to localStorage speech.el11Server
… post method, request.json(), absolute paths)
…mods, speech UI, dashboard dir, flask/bak files)
EL11 TTS Proxy Implementation (feature/elevenlabs)
fix: Sync el11_tts.py & requirements.txt to live runtime (Flask hybrid)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments