Skip to content

feat: add xai asr and tts extensions#2146

Open
BenWeekes wants to merge 5 commits into
mainfrom
dev/xai
Open

feat: add xai asr and tts extensions#2146
BenWeekes wants to merge 5 commits into
mainfrom
dev/xai

Conversation

@BenWeekes
Copy link
Copy Markdown
Contributor

Summary

  • add xai_asr_python and xai_tts_python TEN extensions
  • add xAI graph variants to the voice-assistant example
  • add xAI env/config wiring and usage notes for local and demo testing
  • add standalone test coverage for both new extensions

What Changed

  • added ai_agents/agents/ten_packages/extension/xai_asr_python
    • WebSocket STT client with explicit audio.done finalize flow
    • bounded reconnect handling with fatal escalation for non-recoverable auth/config failures
    • result mapping for interim, locked, and final transcript states
    • dump support, metrics, and vendor error classification
  • added ai_agents/agents/ten_packages/extension/xai_tts_python
    • WebSocket TTS client using PCM output
    • bounded reconnect / backoff with auth failures treated as fatal
    • cancellation and request-finalization behavior aligned with AsyncTTS2BaseExtension
    • config redaction, lifecycle logging, and invalid punctuation-only request handling
  • updated ai_agents/agents/examples/voice-assistant/tenapp/manifest.json
    • add xAI extension dependencies
  • updated ai_agents/agents/examples/voice-assistant/tenapp/property.json
    • add voice_assistant_xai_asr
    • add voice_assistant_xai_tts
    • add voice_assistant_xai_full
  • updated ai_agents/agents/examples/voice-assistant/README.md
    • document graph selection and xAI usage
  • updated ai_agents/.env.example
    • add XAI_API_KEY

Testing

  • standalone TTS tests: 23 passed
  • standalone ASR tests: 17 passed
  • validated the demo graph wiring on voice_assistant_xai_full

Notes

  • companion repo docs updates were pushed to docs/progressive-disclosure in TEN Framework
  • companion portal guide updates were pushed to improve/tts-extension-guide in TEN Portal

@github-actions
Copy link
Copy Markdown

test

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

placeholder being replaced

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

edit test

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Code Review: xAI ASR and TTS Extensions

Good overall implementation - both extensions follow the TEN Framework patterns well, have bounded reconnect logic, proper fatal/non-fatal error classification, config redaction, and solid test coverage (17 ASR + 23 TTS tests). The graph wiring and README updates are clean.

Below are the issues worth addressing before merge, ordered by severity.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Code Review: xAI ASR and TTS Extensions

Good overall implementation - both extensions follow the TEN Framework patterns well, have bounded reconnect logic, proper fatal/non-fatal error classification, config redaction, and solid test coverage (17 ASR + 23 TTS tests). The graph wiring and README updates are clean.

Below are the issues worth addressing before merge, ordered by severity.


Bugs / Correctness

1. XAIASRConfig.to_json() returns Python repr, not JSON

In xai_asr_python/config.py, to_json() ends with return str(config_dict). str() on a dict produces Python __repr__ output (single-quoted keys, True/False instead of true/false) which is not valid JSON. This should be return json.dumps(config_dict). If this string is ever parsed downstream it will fail.

2. _message_handler re-fires on_open() on duplicate transcript.created

In recognition.py, when a transcript.created event arrives inside the running _message_handler loop, callback.on_open() is called again. That re-triggers the buffer-flush and timeline reset in the extension. If the server re-sends this event mid-session it will corrupt in-flight audio state. Guard this branch so on_open() is only forwarded once per connection.

3. Redundant copy.deepcopy after json.loads in _flush_buffered_audio_frames

json.loads already returns a fresh object tree, so wrapping it in copy.deepcopy is unnecessary overhead on every buffered frame.

4. _ensure_connection does not verify the socket is still open (TTS)

self._ws could reference a closed connection if stop() or cancel() raised during the finally block. Consider also checking self._ws.state == State.OPEN, mirroring how XAIASRRecognition.is_connected() works.


API Key Validation Leaks Test Logic into Production

5. xai_tts_python/config.py validate() accepts "test" prefix

The validation accepts api_key values starting with "test" to bypass format checking in tests, but the error message says the key must start with "xai-". Remove the "test" branch and update tests to use a properly formatted dummy key like "xai-test-key" (which already passes the xai- check).


Missing Manifest Properties

6. xai_asr_python/manifest.json omits dump, dump_path, and finalize_timeout_ms

These three properties are present in property.json and documented in the README but absent from api.property.properties in the manifest. Other extensions in this repo declare all settable properties in the manifest so TMAN can discover and validate them. Add: dump (bool), dump_path (string), finalize_timeout_ms (int32).


Fatal-Error String Matching is Fragile

7. ASR _is_fatal_connection_error and on_error

Both use substring scans on free-form error strings, including the literal string "api key" which could match unrelated messages. The TTS extension handles this correctly by catching InvalidStatus and threading a numeric status_code through XAITTSConnectionException. The ASR path should do the same - surface the status code through the recognition callback rather than scanning the message text.


Minor / Style

8. Missing license header in xai_asr_python/__init__.py - the TTS counterpart has the Apache-2.0 header; the ASR one does not.

9. Missing newlines at end of files: xai_asr_python/requirements.txt and xai_asr_python/tests/bin/start both show no trailing newline (git diff reports no newline at end of file) in the diff.

10. validate() shadows Pydantic's built-in in both configs. Both XAIASRConfig and XAITTSConfig define a validate() method that shadows the Pydantic class-level validator name. Consider renaming to validate_config() or check_required_fields().

11. Duplicated conftest.py - both test directories contain byte-for-byte identical conftest.py. Consider extracting to a shared fixture module.

12. Stale test log string in xai_tts_python/tests/test_basic.py: text="Hello, this is a test of the Deepgram TTS extension." should reference xAI.


Summary

Severity Count
Bug / Correctness 4
Validation / Security 2
Missing config surface 1
Minor / Style 6

The top priorities before merge are items 1 (invalid JSON from to_json()), 5 (test bypass in production validate()), and 6 (missing manifest properties). The rest are good-to-fix but not blockers.

@github-actions
Copy link
Copy Markdown

PR Review: xAI ASR and TTS Extensions - test post

@github-actions
Copy link
Copy Markdown

PR Review: xAI ASR and TTS Extensions. Issues to fix: (1) on_open called twice in recognition.py - both start() and _message_handler call the callback on transcript.created, causing duplicate metrics and buffer flushes if server re-sends; (2) buf potentially unbound in xai_asr_python send_audio - if lock_buf raises, buf is undefined in the except handler, initialize buf=None before the try block; (3) property_en_hotwords.json is identical to property_en.json with no hotword impl - remove or implement. Issues to address: (4) XAIASRConfig.validate() shadows Pydantic class method, rename to validate_config(); (5) TTS api key startswith('test') check is too broad; (6) turn_status values 1 and 2 are undocumented magic numbers; (7) isinstance str check for text_normalization is dead code since the field is typed bool; (8) missing trailing newlines in requirements.txt and tests/bin/start; (9) test_basic.py copy-paste error says Deepgram not xAI. Positives: great reconnect logic with backoff and fatal escalation, API key redaction, _contains_spoken_content guard, comprehensive test coverage (timeout, flush, punctuation, reconnect-after-error, auth failure), clean agent.py update for the new event types. Generated by Claude Code

@github-actions
Copy link
Copy Markdown

Code Review: xAI ASR + TTS Extensions

Great overall structure. The two extensions follow existing repo conventions closely and come with thorough test coverage (17 ASR + 23 TTS tests). The reconnection logic, error classification, and sensitive-data redaction are all solid. A few issues worth addressing before merge.

Bugs

1. Potential UnboundLocalError in xai_asr_python/extension.py send_audio

If frame.lock_buf() itself raises (e.g. the frame is already invalid), buf is never bound, and the except block's frame.unlock_buf(buf) will throw a second exception masking the original. Prefer the standard try/finally pattern:

buf = frame.lock_buf()
try:
    audio_data = bytes(buf)
    ...
    return True
except Exception as e:
    self.ten_env.log_error(f"Error sending audio: {e}")
    return False
finally:
    frame.unlock_buf(buf)

Code Quality

2. _connection_successful flag in ReconnectManager is dead code

self._connection_successful is assigned in __init__ and handle_reconnect but never read. Remove it.

3. try/except TypeError for dynamic dispatch is fragile (reconnect_manager.py)

handle_reconnect catches TypeError to detect whether the error handler accepts 1 or 2 arguments. This silently swallows unrelated TypeErrors raised inside the handler. Since the only call-site is XAIASRExtension._handle_reconnect, use a consistent 2-arg signature throughout.

4. Empty on_deinit override in TTS extension (extension.py)

The override only calls super() with no additional logic. Delete it.

5. BUILD.gn in xai_tts_python/manifest.json package includes

BUILD.gn is auto-generated and in .gitignore. AGENTS.md says not to include auto-generated files. xai_asr_python correctly omits it. Remove from xai_tts_python's manifest.

Completeness / Consistency

6. dump, dump_path, and finalize_timeout_ms missing from ASR manifest property schema

xai_asr_python/manifest.json only declares params under api.property.properties. The fields dump, dump_path, and finalize_timeout_ms are used by XAIASRConfig but are absent from the schema, so they are invisible to TMAN Designer and TEN's property system. Compare with xai_tts_python/manifest.json which correctly declares dump and dump_path.

7. property_en_hotwords.json is byte-for-byte identical to property_en.json

Both files in xai_asr_python/tests/configs/ have the same content. The name implies hotwords support which xAI STT does not expose here. Either add a meaningful difference or remove the file.

Minor Nits

8. Misleading validation error message for TTS API key (config.py)

validate() allows keys starting with xai- OR test, but the error message only mentions xai-. Update the message to reflect both accepted prefixes.

9. Copy-paste error in TTS test (test_basic.py)

The basic test sends "Hello, this is a test of the Deepgram TTS extension." — should reference xAI, not Deepgram.

Summary

Severity Item
Bug lock_buf exception path can produce UnboundLocalError in send_audio
Code quality Dead _connection_successful flag in ReconnectManager
Code quality try/except TypeError for dispatch in handle_reconnect
Code quality Pointless on_deinit override in TTS extension
Code quality BUILD.gn in TTS manifest package includes
Completeness dump/dump_path/finalize_timeout_ms missing from ASR manifest schema
Completeness property_en_hotwords.json identical to property_en.json
Nit Misleading API key error message
Nit "Deepgram" copy-paste in xAI TTS test

def _calculate_request_event_interval_ms(self) -> int:
if self.sent_ts is None:
return 0
return int((datetime.now() - self.sent_ts).total_seconds() * 1000)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interval time is formulate is "current time - first audio receive time", not first request send time

code=ModuleErrorCode.NON_FATAL_ERROR,
vendor_info=ModuleErrorVendorInfo(vendor=self.vendor()),
)
await self._finalize_request(TTSAudioEndReason.ERROR, error=error)
Copy link
Copy Markdown

@wangyimin-agora wangyimin-agora Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_finalize_request only for handling last request of the turn.
if t.text_input_end:

if start_ms <= 0:
start_ms = int(datetime.now().timestamp() * 1000)

transcript_result = TTSTextResult(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TTSTextResult is not necessary. Actually, we only use it when we can get a word-level timestamp. if this tts vendor cannot provide, we can remove it

message=e.body,
),
)
await self._finalize_request(TTSAudioEndReason.ERROR, error=error)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_finalize_request only for handling last request of the turn.
if t.text_input_end:

@github-actions
Copy link
Copy Markdown

Code Review: PR #2146 — Add xAI ASR and TTS Extensions

Overview

This PR adds two well-structured TEN extensions — xai_asr_python (WebSocket STT) and xai_tts_python (WebSocket TTS) — along with three new voice-assistant graph variants and comprehensive test coverage. Overall this is high-quality work following the project's existing patterns. A few issues worth addressing before merge.


Issues

Bug: TypeError swallowed in ReconnectManager.handle_reconnect

reconnect_manager.py:

try:
    await error_handler(error, vendor_info)
except TypeError:
    await error_handler(error)

Using exception handling to detect function signatures is fragile — if error_handler raises a genuine TypeError internally, the manager silently retries without vendor_info. The callers (XAIASRExtension.send_asr_error) have a stable signature, so this safety valve isn't needed. Call await error_handler(error, vendor_info) directly (or await error_handler(error) when vendor_name is None).

Bug: TIMEOUT_CODE = 10105 in const.py is defined but never used

xai_asr_python/const.py defines TIMEOUT_CODE = 10105 but nothing in the ASR extension uses it. Should be wired up or removed.

Design smell: _ensure_client accesses private internals

xai_tts_python/extension.py:

ws = getattr(self.client, "_ws", None)
if ws is not None and getattr(ws, "state", None) == State.OPEN:
    return

This reaches into XAITTSClient's private state. XAITTSClient should expose an is_connected() -> bool method (as XAIASRRecognition already does) and _ensure_client should call that instead.

Fragile API key prefix validation in TTS config

xai_tts_python/config.py:

if not self.api_key.startswith("xai-"):
    raise ValueError("API key must start with 'xai-'")

This will break if xAI changes their key format, and there is no equivalent check in the ASR extension. Either apply it consistently or drop it — the server will reject invalid keys with a 401 anyway.

Test config: property_en_hotwords.json is a copy of property_en.json

xai_asr_python/tests/configs/property_en_hotwords.json and property_en.json are byte-for-byte identical. If hotwords aren't implemented yet, remove the duplicate rather than leaving a misleading placeholder.


Minor Issues

  • Docstring/code mismatch: ReconnectManager docstring says max_attempts=4, but extension.py instantiates it with max_attempts=10. Update the docstring.
  • Unnecessary override: xai_tts_python/extension.py's on_deinit does nothing except call super() — remove it.
  • Reconnect cost on cancel: XAITTSClient.cancel() closes the WebSocket and sets _needs_reconnect = True, causing a full TLS + WS handshake on the next utterance after every barge-in. Consider draining the socket instead of closing it, or pre-warming the connection after cancel. Not a blocker, but a known latency trade-off worth documenting.
  • State reset scattered across request_tts: The ~6 fields reset in multiple code paths (_request_text, _request_metadata, total_audio_bytes, etc.) would benefit from a private _reset_request_state() helper to avoid future divergence.
  • Duplicate api_key check: start_connection in xai_asr_python/extension.py re-checks for an empty API key even though config.validate() already raised on it during on_init. The guard is harmless but adds noise.

What's Good

  • Reconnection logic is well-designed: bounded retries with exponential backoff, auth failures correctly escalated as fatal, successful connections reset the counter.
  • Both extensions handle the audio.done/finalize flow correctly, including timeout handling and the _close_expected guard against spurious reconnects.
  • Config redaction with encrypt() before logging is consistently applied in both extensions.
  • Excellent test coverage — 40 tests covering: 401 errors, timeout, reconnect-after-error, sequential requests, flush, dump verification, TTFB metrics, config redaction, and language normalization.
  • The agent.py fix silently dropping tts_audio_start/tts_audio_end is a clean way to handle the new data events without polluting the log with warnings.
  • Property graphs are correct and follow the established pattern.

Summary

The implementation is solid and well-tested. The TypeError-swallowing pattern in ReconnectManager and the private-field access in _ensure_client are the most important items to address. The rest are cleanup/polish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants