Describe the bug
I'm running a Node.js voice agent on @livekit/agents@1.4.4. Pipeline: inference.STT (Deepgram nova-3)
→ inference.LLM (openai/gpt-5.4) → inference.TTS (ElevenLabs). Two function_tools registered, defaults
otherwise, preemptiveGeneration.enabled: true.
Occasionnally, raw tool-call protocol text leaks into my TTS output and persisted
transcript. Verbatim sample:
to=functions.saveAnswer 天天中彩票能json
{"question":"Who are the most important people in your
life?","questionId":"3c6bb71f-f7e7-4c75-acb9-d9d5e4a1c563","value":"Wife Patty and their thirteen
children"}цҳауеит
Patty and thirteen kids — well, that's a full house. What kind of work, roles, or responsibilities
have kept you busy over the years?
The structured tool call still fires correctly; the leaked text streams in parallel. This seems to be similar to #1339
Relevant log output
N/A
Describe your environment
This is occurring on Livekit Cloud.
@livekit/agents: ^1.4.1
@livekit/agents-plugin-livekit: ^1.4.1
@livekit/agents-plugin-silero: ^1.4.1
@livekit/noise-cancellation-node: ^0.1.9
@livekit/rtc-node: ^0.13.27
Minimal reproducible example
I don't have reproduction steps because this has been notoriously difficult to replicate, but I've walked the codebase with Claude Code and generated the below hypothesis
Hypothesis
▎ The hypothesis and source-inspection findings below were generated by walking the SDK with Claude
▎ Code. I haven't independently verified every line citation — flagging in case I've misread
▎ something.
This looks like an OpenAI Harmony response format leak:
- to=functions.saveAnswer is Harmony's tool-call recipient syntax.
- The garbled Unicode bracketing the payload (天天中彩票能json, цҳауеит) matches the byte signature of
Harmony control tokens (<|channel|>commentary, <|constrain|>json, <|message|>, <|end|>) being decoded
as raw text rather than consumed as special tokens.
- Pattern: to= .
Reads as the Inference gateway routing Harmony channel content through delta.content instead of
consuming channel markers and emitting tool calls only via delta.tool_calls.
Findings from source inspection
- No Harmony-aware decoding in @livekit/agents@1.4.4 — grep for harmony, to=functions, <|channel|>,
<|message|> in src/ returned nothing.
- src/inference/llm.ts parseChoice (~L615–627): delta.content falls through to emit a text chunk even
while a tool call is mid-stream (this.toolCallId !== undefined).
- src/voice/generation.ts (~L537–563): delta.toolCalls and delta.content are independent branches;
delta.content is written to textWriter (→ tts_node) and appended to data.generatedText.
- Default ttsTextTransforms (filter_markdown, filter_emoji) don't touch Harmony tokens.
Additional information
Livekit Cloud URL of an impacted session https://cloud.livekit.io/projects/p_5vda1ybjh9k/sessions/RM_pfaxxUtAmJiY
Describe the bug
I'm running a Node.js voice agent on @livekit/agents@1.4.4. Pipeline: inference.STT (Deepgram nova-3)
→ inference.LLM (openai/gpt-5.4) → inference.TTS (ElevenLabs). Two function_tools registered, defaults
otherwise, preemptiveGeneration.enabled: true.
Occasionnally, raw tool-call protocol text leaks into my TTS output and persisted
transcript. Verbatim sample:
The structured tool call still fires correctly; the leaked text streams in parallel. This seems to be similar to #1339
Relevant log output
N/A
Describe your environment
This is occurring on Livekit Cloud.
@livekit/agents: ^1.4.1
@livekit/agents-plugin-livekit: ^1.4.1
@livekit/agents-plugin-silero: ^1.4.1
@livekit/noise-cancellation-node: ^0.1.9
@livekit/rtc-node: ^0.13.27
Minimal reproducible example
I don't have reproduction steps because this has been notoriously difficult to replicate, but I've walked the codebase with Claude Code and generated the below hypothesis
Hypothesis
▎ The hypothesis and source-inspection findings below were generated by walking the SDK with Claude
▎ Code. I haven't independently verified every line citation — flagging in case I've misread
▎ something.
This looks like an OpenAI Harmony response format leak:
Harmony control tokens (<|channel|>commentary, <|constrain|>json, <|message|>, <|end|>) being decoded
as raw text rather than consumed as special tokens.
Reads as the Inference gateway routing Harmony channel content through delta.content instead of
consuming channel markers and emitting tool calls only via delta.tool_calls.
Findings from source inspection
<|message|> in src/ returned nothing.
while a tool call is mid-stream (this.toolCallId !== undefined).
delta.content is written to textWriter (→ tts_node) and appended to data.generatedText.
Additional information
Livekit Cloud URL of an impacted session https://cloud.livekit.io/projects/p_5vda1ybjh9k/sessions/RM_pfaxxUtAmJiY