Skip to content

Support Realtime custom voice objects#3473

Open
lionel-oai wants to merge 1 commit into
mainfrom
fix/realtime-custom-voice
Open

Support Realtime custom voice objects#3473
lionel-oai wants to merge 1 commit into
mainfrom
fix/realtime-custom-voice

Conversation

@lionel-oai
Copy link
Copy Markdown
Contributor

@lionel-oai lionel-oai commented May 20, 2026

Summary

This PR fixes Realtime custom voice handling in the Agents SDK.

Realtime sessions can receive and send structured custom voice objects such as {"id": "voice_..."}, but the SDK previously typed voice settings as strings and validated inbound server events before updating response lifecycle state. If a server event such as response.created or response.done contained a structured voice object that failed validation, the SDK could skip response state updates and leave the response-create sequencer blocked. That could prevent the next response.create from being sent after tool output.

The change adds typed support for custom voice objects in Realtime session settings, preserves structured voices when building outbound session.update payloads, and adds a validation fallback for inbound server events so custom voice objects do not break response lifecycle tracking.

Tests

  • make format
  • make lint
  • uv run pytest -q tests/realtime/test_openai_realtime.py tests/realtime/test_realtime_model_settings.py
  • uv run pytest -q tests/realtime/test_session.py -k "handoff_session_update_preserves_custom_voice or handoff_tool_handling"
  • uv run mypy src/agents/realtime/config.py src/agents/realtime/openai_realtime.py tests/realtime/test_openai_realtime.py
  • uv run pyright src/agents/realtime/config.py src/agents/realtime/openai_realtime.py tests/realtime/test_openai_realtime.py
  • uv run mypy tests/realtime/test_session.py
  • uv run pyright tests/realtime/test_session.py

Full make tests / make typecheck were not completed locally because optional dependency installation was blocked by a socket-firewall tunnel failure while downloading docstring-parser==0.18.0.

@lionel-oai lionel-oai force-pushed the fix/realtime-custom-voice branch 2 times, most recently from eed10dc to 20e7135 Compare May 20, 2026 18:42
Comment thread src/agents/realtime/openai_realtime.py Outdated
return normalized


def _create_realtime_audio_output(audio_output_args: dict[str, Any]) -> Any:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we upgrade openai package to openai>=2.36.0 , this workaround is not necessary while _normalize_custom_voice_for_server_event_validation is still required even with the latest version.

Can you add quick TODO comments explaining why and when to remove to these internal workarounds?

@seratch seratch added this to the 0.17.x milestone May 20, 2026
@lionel-oai lionel-oai force-pushed the fix/realtime-custom-voice branch from 20e7135 to 393c530 Compare May 22, 2026 15:37
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 393c53087a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if "previous_item_id" in event and event["previous_item_id"] is None:
event["previous_item_id"] = "" # TODO (rm) remove
parsed: AllRealtimeServerEvents = self._server_event_type_adapter.validate_python(event)
validation_event = _normalize_custom_voice_for_server_event_validation(event)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit voice normalization to events that can contain voice objects

_normalize_custom_voice_for_server_event_validation is applied to every inbound WebSocket event before validation, including high-frequency streaming events like response.output_audio.delta. In long audio turns this adds an extra full recursive walk/allocation per event even when no voice field exists, which can unnecessarily increase CPU/GC pressure and degrade realtime playback latency. Since the workaround is only needed for server events carrying session/response voice settings, scope it to those event types (or fast-path when no voice key is present).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants