feat: add xai asr and tts extensions by BenWeekes · Pull Request #2146 · TEN-framework/ten-framework

BenWeekes · 2026-04-24T08:51:35Z

Summary

add xai_asr_python and xai_tts_python TEN extensions
add xAI graph variants to the voice-assistant example
add xAI env/config wiring and usage notes for local and demo testing
add standalone test coverage for both new extensions

What Changed

added ai_agents/agents/ten_packages/extension/xai_asr_python
- WebSocket STT client with explicit audio.done finalize flow
- bounded reconnect handling with fatal escalation for non-recoverable auth/config failures
- result mapping for interim, locked, and final transcript states
- dump support, metrics, and vendor error classification
added ai_agents/agents/ten_packages/extension/xai_tts_python
- WebSocket TTS client using PCM output
- bounded reconnect / backoff with auth failures treated as fatal
- cancellation and request-finalization behavior aligned with AsyncTTS2BaseExtension
- config redaction, lifecycle logging, and invalid punctuation-only request handling
updated ai_agents/agents/examples/voice-assistant/tenapp/manifest.json
- add xAI extension dependencies
updated ai_agents/agents/examples/voice-assistant/tenapp/property.json
- add voice_assistant_xai_asr
- add voice_assistant_xai_tts
- add voice_assistant_xai_full
updated ai_agents/agents/examples/voice-assistant/README.md
- document graph selection and xAI usage
updated ai_agents/.env.example
- add XAI_API_KEY

Testing

standalone TTS tests: 23 passed
standalone ASR tests: 17 passed
validated the demo graph wiring on voice_assistant_xai_full

Notes

companion repo docs updates were pushed to docs/progressive-disclosure in TEN Framework
companion portal guide updates were pushed to improve/tts-extension-guide in TEN Portal

github-actions · 2026-04-24T09:01:58Z

test

github-actions · 2026-04-24T09:03:21Z

placeholder being replaced

github-actions · 2026-04-24T09:04:44Z

edit test

github-actions · 2026-04-24T09:05:50Z

Code Review: xAI ASR and TTS Extensions

Good overall implementation - both extensions follow the TEN Framework patterns well, have bounded reconnect logic, proper fatal/non-fatal error classification, config redaction, and solid test coverage (17 ASR + 23 TTS tests). The graph wiring and README updates are clean.

Below are the issues worth addressing before merge, ordered by severity.

github-actions · 2026-04-24T09:07:03Z

Code Review: xAI ASR and TTS Extensions

Good overall implementation - both extensions follow the TEN Framework patterns well, have bounded reconnect logic, proper fatal/non-fatal error classification, config redaction, and solid test coverage (17 ASR + 23 TTS tests). The graph wiring and README updates are clean.

Below are the issues worth addressing before merge, ordered by severity.

Bugs / Correctness

1. XAIASRConfig.to_json() returns Python repr, not JSON

In xai_asr_python/config.py, to_json() ends with return str(config_dict). str() on a dict produces Python __repr__ output (single-quoted keys, True/False instead of true/false) which is not valid JSON. This should be return json.dumps(config_dict). If this string is ever parsed downstream it will fail.

2. _message_handler re-fires on_open() on duplicate transcript.created

In recognition.py, when a transcript.created event arrives inside the running _message_handler loop, callback.on_open() is called again. That re-triggers the buffer-flush and timeline reset in the extension. If the server re-sends this event mid-session it will corrupt in-flight audio state. Guard this branch so on_open() is only forwarded once per connection.

3. Redundant copy.deepcopy after json.loads in _flush_buffered_audio_frames

json.loads already returns a fresh object tree, so wrapping it in copy.deepcopy is unnecessary overhead on every buffered frame.

4. _ensure_connection does not verify the socket is still open (TTS)

self._ws could reference a closed connection if stop() or cancel() raised during the finally block. Consider also checking self._ws.state == State.OPEN, mirroring how XAIASRRecognition.is_connected() works.

API Key Validation Leaks Test Logic into Production

5. xai_tts_python/config.py validate() accepts "test" prefix

The validation accepts api_key values starting with "test" to bypass format checking in tests, but the error message says the key must start with "xai-". Remove the "test" branch and update tests to use a properly formatted dummy key like "xai-test-key" (which already passes the xai- check).

Missing Manifest Properties

6. xai_asr_python/manifest.json omits dump, dump_path, and finalize_timeout_ms

These three properties are present in property.json and documented in the README but absent from api.property.properties in the manifest. Other extensions in this repo declare all settable properties in the manifest so TMAN can discover and validate them. Add: dump (bool), dump_path (string), finalize_timeout_ms (int32).

Fatal-Error String Matching is Fragile

7. ASR _is_fatal_connection_error and on_error

Both use substring scans on free-form error strings, including the literal string "api key" which could match unrelated messages. The TTS extension handles this correctly by catching InvalidStatus and threading a numeric status_code through XAITTSConnectionException. The ASR path should do the same - surface the status code through the recognition callback rather than scanning the message text.

Minor / Style

8. Missing license header in xai_asr_python/__init__.py - the TTS counterpart has the Apache-2.0 header; the ASR one does not.

9. Missing newlines at end of files: xai_asr_python/requirements.txt and xai_asr_python/tests/bin/start both show no trailing newline (git diff reports no newline at end of file) in the diff.

10. validate() shadows Pydantic's built-in in both configs. Both XAIASRConfig and XAITTSConfig define a validate() method that shadows the Pydantic class-level validator name. Consider renaming to validate_config() or check_required_fields().

11. Duplicated conftest.py - both test directories contain byte-for-byte identical conftest.py. Consider extracting to a shared fixture module.

12. Stale test log string in xai_tts_python/tests/test_basic.py: text="Hello, this is a test of the Deepgram TTS extension." should reference xAI.

Summary

Severity	Count
Bug / Correctness	4
Validation / Security	2
Missing config surface	1
Minor / Style	6

The top priorities before merge are items 1 (invalid JSON from to_json()), 5 (test bypass in production validate()), and 6 (missing manifest properties). The rest are good-to-fix but not blockers.

github-actions · 2026-04-24T10:01:32Z

PR Review: xAI ASR and TTS Extensions - test post

github-actions · 2026-04-24T10:02:19Z

PR Review: xAI ASR and TTS Extensions. Issues to fix: (1) on_open called twice in recognition.py - both start() and _message_handler call the callback on transcript.created, causing duplicate metrics and buffer flushes if server re-sends; (2) buf potentially unbound in xai_asr_python send_audio - if lock_buf raises, buf is undefined in the except handler, initialize buf=None before the try block; (3) property_en_hotwords.json is identical to property_en.json with no hotword impl - remove or implement. Issues to address: (4) XAIASRConfig.validate() shadows Pydantic class method, rename to validate_config(); (5) TTS api key startswith('test') check is too broad; (6) turn_status values 1 and 2 are undocumented magic numbers; (7) isinstance str check for text_normalization is dead code since the field is typed bool; (8) missing trailing newlines in requirements.txt and tests/bin/start; (9) test_basic.py copy-paste error says Deepgram not xAI. Positives: great reconnect logic with backoff and fatal escalation, API key redaction, _contains_spoken_content guard, comprehensive test coverage (timeout, flush, punctuation, reconnect-after-error, auth failure), clean agent.py update for the new event types. Generated by Claude Code

github-actions · 2026-04-24T14:55:54Z

Code Review: xAI ASR + TTS Extensions

Great overall structure. The two extensions follow existing repo conventions closely and come with thorough test coverage (17 ASR + 23 TTS tests). The reconnection logic, error classification, and sensitive-data redaction are all solid. A few issues worth addressing before merge.

Bugs

1. Potential UnboundLocalError in xai_asr_python/extension.py send_audio

If frame.lock_buf() itself raises (e.g. the frame is already invalid), buf is never bound, and the except block's frame.unlock_buf(buf) will throw a second exception masking the original. Prefer the standard try/finally pattern:

buf = frame.lock_buf()
try:
    audio_data = bytes(buf)
    ...
    return True
except Exception as e:
    self.ten_env.log_error(f"Error sending audio: {e}")
    return False
finally:
    frame.unlock_buf(buf)

Code Quality

2. _connection_successful flag in ReconnectManager is dead code

self._connection_successful is assigned in __init__ and handle_reconnect but never read. Remove it.

3. try/except TypeError for dynamic dispatch is fragile (reconnect_manager.py)

handle_reconnect catches TypeError to detect whether the error handler accepts 1 or 2 arguments. This silently swallows unrelated TypeErrors raised inside the handler. Since the only call-site is XAIASRExtension._handle_reconnect, use a consistent 2-arg signature throughout.

4. Empty on_deinit override in TTS extension (extension.py)

The override only calls super() with no additional logic. Delete it.

5. BUILD.gn in xai_tts_python/manifest.json package includes

BUILD.gn is auto-generated and in .gitignore. AGENTS.md says not to include auto-generated files. xai_asr_python correctly omits it. Remove from xai_tts_python's manifest.

Completeness / Consistency

6. dump, dump_path, and finalize_timeout_ms missing from ASR manifest property schema

xai_asr_python/manifest.json only declares params under api.property.properties. The fields dump, dump_path, and finalize_timeout_ms are used by XAIASRConfig but are absent from the schema, so they are invisible to TMAN Designer and TEN's property system. Compare with xai_tts_python/manifest.json which correctly declares dump and dump_path.

7. property_en_hotwords.json is byte-for-byte identical to property_en.json

Both files in xai_asr_python/tests/configs/ have the same content. The name implies hotwords support which xAI STT does not expose here. Either add a meaningful difference or remove the file.

Minor Nits

8. Misleading validation error message for TTS API key (config.py)

validate() allows keys starting with xai- OR test, but the error message only mentions xai-. Update the message to reflect both accepted prefixes.

9. Copy-paste error in TTS test (test_basic.py)

The basic test sends "Hello, this is a test of the Deepgram TTS extension." — should reference xAI, not Deepgram.

Summary

Severity	Item
Bug	`lock_buf` exception path can produce `UnboundLocalError` in `send_audio`
Code quality	Dead `_connection_successful` flag in `ReconnectManager`
Code quality	`try/except TypeError` for dispatch in `handle_reconnect`
Code quality	Pointless `on_deinit` override in TTS extension
Code quality	`BUILD.gn` in TTS manifest package includes
Completeness	`dump`/`dump_path`/`finalize_timeout_ms` missing from ASR manifest schema
Completeness	`property_en_hotwords.json` identical to `property_en.json`
Nit	Misleading API key error message
Nit	"Deepgram" copy-paste in xAI TTS test

wangyimin-agora · 2026-04-30T08:16:11Z

+    def _calculate_request_event_interval_ms(self) -> int:
+        if self.sent_ts is None:
+            return 0
+        return int((datetime.now() - self.sent_ts).total_seconds() * 1000)


The interval time is formulate is "current time - first audio receive time", not first request send time

wangyimin-agora · 2026-04-30T08:22:11Z

+                code=ModuleErrorCode.NON_FATAL_ERROR,
+                vendor_info=ModuleErrorVendorInfo(vendor=self.vendor()),
+            )
+            await self._finalize_request(TTSAudioEndReason.ERROR, error=error)


_finalize_request only for handling last request of the turn.
if t.text_input_end:

wangyimin-agora · 2026-04-30T08:34:27Z

+        if start_ms <= 0:
+            start_ms = int(datetime.now().timestamp() * 1000)
+
+        transcript_result = TTSTextResult(


TTSTextResult is not necessary. Actually, we only use it when we can get a word-level timestamp. if this tts vendor cannot provide, we can remove it

wangyimin-agora · 2026-04-30T08:35:43Z

+                message=e.body,
+            ),
+        )
+        await self._finalize_request(TTSAudioEndReason.ERROR, error=error)


_finalize_request only for handling last request of the turn.
if t.text_input_end:

github-actions · 2026-04-30T10:09:28Z

Code Review: PR #2146 — Add xAI ASR and TTS Extensions

Overview

This PR adds two well-structured TEN extensions — xai_asr_python (WebSocket STT) and xai_tts_python (WebSocket TTS) — along with three new voice-assistant graph variants and comprehensive test coverage. Overall this is high-quality work following the project's existing patterns. A few issues worth addressing before merge.

Issues

Bug: `TypeError` swallowed in `ReconnectManager.handle_reconnect`

reconnect_manager.py:

try:
    await error_handler(error, vendor_info)
except TypeError:
    await error_handler(error)

Using exception handling to detect function signatures is fragile — if error_handler raises a genuine TypeError internally, the manager silently retries without vendor_info. The callers (XAIASRExtension.send_asr_error) have a stable signature, so this safety valve isn't needed. Call await error_handler(error, vendor_info) directly (or await error_handler(error) when vendor_name is None).

Bug: `TIMEOUT_CODE = 10105` in `const.py` is defined but never used

xai_asr_python/const.py defines TIMEOUT_CODE = 10105 but nothing in the ASR extension uses it. Should be wired up or removed.

Design smell: `_ensure_client` accesses private internals

xai_tts_python/extension.py:

ws = getattr(self.client, "_ws", None)
if ws is not None and getattr(ws, "state", None) == State.OPEN:
    return

This reaches into XAITTSClient's private state. XAITTSClient should expose an is_connected() -> bool method (as XAIASRRecognition already does) and _ensure_client should call that instead.

Fragile API key prefix validation in TTS config

xai_tts_python/config.py:

if not self.api_key.startswith("xai-"):
    raise ValueError("API key must start with 'xai-'")

This will break if xAI changes their key format, and there is no equivalent check in the ASR extension. Either apply it consistently or drop it — the server will reject invalid keys with a 401 anyway.

Test config: `property_en_hotwords.json` is a copy of `property_en.json`

xai_asr_python/tests/configs/property_en_hotwords.json and property_en.json are byte-for-byte identical. If hotwords aren't implemented yet, remove the duplicate rather than leaving a misleading placeholder.

Minor Issues

Docstring/code mismatch: ReconnectManager docstring says max_attempts=4, but extension.py instantiates it with max_attempts=10. Update the docstring.
Unnecessary override: xai_tts_python/extension.py's on_deinit does nothing except call super() — remove it.
Reconnect cost on cancel: XAITTSClient.cancel() closes the WebSocket and sets _needs_reconnect = True, causing a full TLS + WS handshake on the next utterance after every barge-in. Consider draining the socket instead of closing it, or pre-warming the connection after cancel. Not a blocker, but a known latency trade-off worth documenting.
State reset scattered across request_tts: The ~6 fields reset in multiple code paths (_request_text, _request_metadata, total_audio_bytes, etc.) would benefit from a private _reset_request_state() helper to avoid future divergence.
Duplicate api_key check: start_connection in xai_asr_python/extension.py re-checks for an empty API key even though config.validate() already raised on it during on_init. The guard is harmless but adds noise.

What's Good

Reconnection logic is well-designed: bounded retries with exponential backoff, auth failures correctly escalated as fatal, successful connections reset the counter.
Both extensions handle the audio.done/finalize flow correctly, including timeout handling and the _close_expected guard against spurious reconnects.
Config redaction with encrypt() before logging is consistently applied in both extensions.
Excellent test coverage — 40 tests covering: 401 errors, timeout, reconnect-after-error, sequential requests, flush, dump verification, TTFB metrics, config redaction, and language normalization.
The agent.py fix silently dropping tts_audio_start/tts_audio_end is a clean way to handle the new data events without polluting the log with warnings.
Property graphs are correct and follow the established pattern.

Summary

The implementation is solid and well-tested. The TypeError-swallowing pattern in ReconnectManager and the private-field access in _ensure_client are the most important items to address. The rest are cleanup/polish.

…sing (#2146)

feat(ai-agents): add xai speech extensions

920218b

BenWeekes requested review from halajohn and plutoless as code owners April 24, 2026 08:51

fix(ai-agents): route tts lifecycle events in xai graphs

871f777

fix(ai-agents): stabilize xai asr guarder flows (#2146)

88d8ca4

wangyimin-agora reviewed Apr 30, 2026

View reviewed changes

fix(ai-agents): address xai speech review feedback (#2146)

50f116c

BenWeekes pushed a commit that referenced this pull request Apr 30, 2026

merge: bring xai review fixes into dev/ben-graphs (#2146)

e42e01c

fix(ai-agents): remove xai tts text result emission (#2146)

6f6e9f5

BenWeekes pushed a commit that referenced this pull request Apr 30, 2026

test(ai-agents): skip optional tts subtitle alignment when config mis…

df3f0b9

…sing (#2146)

BenWeekes pushed a commit that referenced this pull request Apr 30, 2026

merge: bring xai tts text-result removal into dev/ben-graphs (#2146)

39a1109

Conversation

BenWeekes commented Apr 24, 2026

Summary

What Changed

Testing

Notes

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: xAI ASR and TTS Extensions

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: xAI ASR and TTS Extensions

Bugs / Correctness

API Key Validation Leaks Test Logic into Production

Missing Manifest Properties

Fatal-Error String Matching is Fragile

Minor / Style

Summary

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Code Review: xAI ASR + TTS Extensions

Bugs

Code Quality

Completeness / Consistency

Minor Nits

Summary

Uh oh!

wangyimin-agora Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

wangyimin-agora Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangyimin-agora Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

wangyimin-agora Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 30, 2026

Code Review: PR #2146 — Add xAI ASR and TTS Extensions

Overview

Issues

Bug: TypeError swallowed in ReconnectManager.handle_reconnect

Bug: TIMEOUT_CODE = 10105 in const.py is defined but never used

Design smell: _ensure_client accesses private internals

Fragile API key prefix validation in TTS config

Test config: property_en_hotwords.json is a copy of property_en.json

Minor Issues

What's Good

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading

wangyimin-agora Apr 30, 2026 •

edited

Loading

Bug: `TypeError` swallowed in `ReconnectManager.handle_reconnect`

Bug: `TIMEOUT_CODE = 10105` in `const.py` is defined but never used

Design smell: `_ensure_client` accesses private internals

Test config: `property_en_hotwords.json` is a copy of `property_en.json`