Skip to content

test: add tts audio end interval check and change openai tts sample_rate#2155

Open
YiminW wants to merge 1 commit into
mainfrom
tts/guarder_tts_audio_end_interval
Open

test: add tts audio end interval check and change openai tts sample_rate#2155
YiminW wants to merge 1 commit into
mainfrom
tts/guarder_tts_audio_end_interval

Conversation

@YiminW
Copy link
Copy Markdown
Contributor

@YiminW YiminW commented Apr 28, 2026

No description provided.

@github-actions
Copy link
Copy Markdown

Code Review

Overview

This PR adds request_event_interval_ms validation to 9 TTS guarder integration test files, fixes a blocking time.sleep in the stress test, and adds configurable sample_rate support to the openai_tts2_python extension.


Bug Fix (Positive)

test_append_input_stress.py: Changing time.sleep(5) to await asyncio.sleep(5) is a real bug fix. Using time.sleep in an async callback blocks the entire event loop for 5 seconds, preventing other coroutines from running. Good catch.


Issues: extension.pysynthesize_audio_sample_rate

Type safety

params is dict[str, Any]. If the config JSON has "sample_rate": "24000" (a string), the method returns a string instead of int, which will silently break downstream callers. The return value should be cast:

return int(sample_rate)

Potential API / sample-rate mismatch

update_params() unconditionally sets "response_format": "pcm", which the OpenAI TTS API always outputs at 24000 Hz. If sample_rate remains in params, it will also be forwarded to the API as an unknown query parameter. A user who sets "sample_rate": 22050 will receive 24000 Hz audio while the extension reports 22050 Hz to the TTS guarder — a silent mismatch. Consider either stripping sample_rate from params in update_params(), or mapping it to the appropriate named response_format value (e.g. "pcm_22050").

No test coverage for the new behaviour

There is no unit or integration test asserting that a config with "sample_rate": X causes synthesize_audio_sample_rate() to return X.


Issues: Integration Test Validation Logic

Tolerance may be too tight for wall-clock comparisons

The new check compares received_event_interval_ms (reported by the TTS guarder, derived from audio data) against actual_duration_ms (wall-clock delta between the tts_audio_start and tts_audio_end callbacks). These measure different things:

  • The guarder's interval is pure audio duration.
  • actual_duration_ms includes async dispatch latency and test-harness overhead.

Reusing AUDIO_DURATION_TOLERANCE_MS = 50 (designed for PCM-byte-count comparisons) may be too tight for wall-clock measurements under CI load. Consider a separate constant or documenting why 50 ms is appropriate here.

Silently ignored get_property_int errors

received_event_interval_ms, _ = data.get_property_int("request_event_interval_ms")

The error return value is discarded. If the property is absent the default 0 silently skips validation, which could mask a guarder regression where the field stops being emitted. Consider logging the error when it is non-None.

Duplicate validation block across 9 files

The 16-line event interval block is copy-pasted identically into every test file. This is consistent with the existing audio-duration pattern, but doubles the maintenance surface. A follow-up to extract both blocks into a shared base-class helper would reduce future divergence risk.


Minor

  • Trailing blank-line removals are fine.
  • Version bump to 0.6.2 is appropriate for the behavioural change.

Summary

Area Verdict
asyncio.sleep fix Correct and important
sample_rate return type Should cast to int
sample_rate vs API format mismatch May silently report wrong sample rate
No test for configurable sample_rate Should add a unit test
Wall-clock vs audio-duration tolerance Risk of flaky tests; verify 50 ms is appropriate
Ignored get_property_int error Consider logging
Validation block duplication Follow-up refactor suggestion

@YiminW YiminW force-pushed the tts/guarder_tts_audio_end_interval branch from 7dc8b62 to be4c301 Compare May 7, 2026 06:45
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

PR Review

Thanks for improving TTS integration-test coverage and adding the standalone proxy — the intent of the tests is clear. A few things to flag before this lands:

PR scope / title mismatch

  • The title says "…and change openai tts sample_rate", but the diff contains no sample_rate change — only test additions and a new proxy server. Either the title or a commit is missing. Please clean up the title (and the PR body is currently empty).
  • The new proxy server (openai_tts2_python/proxy/…, ~420 lines incl. scripts) is unrelated to the test: prefix. It really deserves its own PR; mixing a new runtime component with a test-coverage change makes both harder to audit.

Tests — test_*_input*.py, test_*_interrupt.py, test_basic_audio_setting.py, etc.

  1. Duplication. The ~15-line received_event_interval_ms validation block is copy-pasted verbatim (or near-verbatim) into 8 different test files. Please extract into a helper on a common test base (e.g. _validate_event_interval(ten_env, received_event_interval_ms, actual_duration_ms)) — otherwise any future tweak to tolerance, message, or skip logic has to be made in 8 places and will drift.

  2. Status/error return value discarded. received_event_interval_ms, _ = data.get_property_int("request_event_interval_ms") throws away the error flag. If the property is missing, the value is typically 0 and the > 0 skip works by accident, but this silently treats a "property failed to read" case the same as "not emitted." A short comment or explicit error check would make the intent clearer.

  3. Tolerance. AUDIO_DURATION_TOLERANCE_MS = 50 is quite tight for an interval that includes network + scheduler jitter. Worth watching flake rate on CI; consider a separate EVENT_INTERVAL_TOLERANCE_MS since the two metrics may have different noise floors.

  4. test_append_input_stress.py — the time.sleep(5)await asyncio.sleep(5) fix is correct and important (blocking the event loop inside on_start would have been a real bug). Nice catch — worth calling out in the PR description.

Proxy server — openai_tts2_python/proxy/proxy_server.py

Security (most important):

  1. Raw Authorization header logged in plaintext — twice. Lines ~535 (Headers: {dict(headers)}) and ~573 (Forwarding headers: {dict(headers)}) both dump the entire headers dict, including the bearer token, at INFO. The later masked_auth block is defeated by these earlier prints. Mask authorization / x-api-key / api-key centrally before any log call, or install a logging filter that redacts them.

  2. Partial token masking leaks secrets. auth_header[:10] + "***MASKED***" + auth_header[-10:] exposes the last 10 characters of the bearer token. Tail characters of an OpenAI key materially narrow an attacker's search space and should never be logged — keep only a short, non-sensitive prefix (e.g. Bearer sk-… + length), never the suffix.

  3. Request body logged as JSON. Only api_key is masked; any other sensitive field (user PII in input, custom auth in metadata, etc.) is logged verbatim. Drop body logging or allow-list only non-sensitive fields.

  4. Defaults to 0.0.0.0 and has no auth. The README says the proxy is meant to run "in the same container" as the TTS client — in that case 127.0.0.1 is the safer default. As written, if OPENAI_API_KEY is set via env var and the port is exposed on any network, anyone reaching it can burn the key: no authentication, no rate limiting, no allow-list.

Code quality:

  1. Manual __aenter__ / __aexit__ on httpx.stream. The hand-rolled context-manager dance (lines ~578–712) is fragile: three separate __aexit__ call sites, each wrapped in try/except Exception: pass. If generate() is never iterated (e.g. FastAPI returns the StreamingResponse but the client disconnects before reading), the stream leaks. Refactor so the async with http_client.stream(...) as response: block encloses the generator — e.g. enter/exit the context manager from inside the generator itself. You'll also be able to drop the pylint: disable=no-member comments.

  2. Bare except Exception: pass swallows every cleanup error silently (lines ~691, ~707). At minimum log at debug level so real leaks don't disappear.

  3. Globals + pylint: disable=global-statement. http_client as a module global works but is awkward. FastAPI's app.state is the idiomatic place (app.state.http_client = ... in lifespan, read via request.app.state.http_client).

  4. dict(request.headers) twice on the same object — minor, just reuse headers.

  5. Very noisy INFO logging (80-char banner per request, per-chunk log for the first 5 chunks and every 10th). Fine for dev; before landing this as a runtime component, demote most of it to DEBUG.

  6. Scripts. start.sh creates a fresh venv/ on every invocation — OK for dev, but a Dockerfile or documented pip install path is usually what lands in production. docker-example.sh backgrounds the proxy with & and then waits on a single PID; if you ever add the "main application" it comments about, you'll want a real process supervisor or at least trap handlers. Also, neither script has the executable bit set in git (git update-index --chmod=+x).

  7. Type hints. _proxy_to_openai, generate, proxy_audio_speech etc. have no return annotations — minor, but most of the codebase uses them.

  8. Module placement. Living under ten_packages/extension/openai_tts2_python/proxy/ implies it ships with the extension, but it's actually a standalone service with its own deps (fastapi, uvicorn). If it's a separate deployable, consider ai_agents/tools/openai_tts_proxy/ or similar — otherwise it gets bundled wherever the extension is packaged.

Convention nits

  • Per AGENTS.md, commit/PR titles use type: lowercase present-tense — the current title is fine on those axes, but splitting by scope (e.g. test(tts-guarder): add event-interval check and feat(openai-tts2): add local proxy server) would make history easier to scan.

Tests for the proxy

  • The proxy itself has no tests. Even a couple of httpx.MockTransport-based unit tests covering (a) the 200 streaming path, (b) non-200 error forwarding, and (c) client-disconnect cleanup would go a long way given the hand-rolled stream lifecycle.

Happy to re-review once the logging/masking and stream-context issues are addressed and the proxy is split out (or at least scoped under its own commit with tests).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant