Skip to content

OpenAI Harmony tokens (to=functions.X, channel markers) leaking into delta.content on openai/gpt-5.4 via Inference #1632

@ubaniabalogun

Description

@ubaniabalogun

Describe the bug

I'm running a Node.js voice agent on @livekit/agents@1.4.4. Pipeline: inference.STT (Deepgram nova-3)
→ inference.LLM (openai/gpt-5.4) → inference.TTS (ElevenLabs). Two function_tools registered, defaults
otherwise, preemptiveGeneration.enabled: true.

Occasionnally, raw tool-call protocol text leaks into my TTS output and persisted
transcript. Verbatim sample:

  to=functions.saveAnswer  天天中彩票能json
  {"question":"Who are the most important people in your
  life?","questionId":"3c6bb71f-f7e7-4c75-acb9-d9d5e4a1c563","value":"Wife Patty and their thirteen
  children"}цҳауеит
  Patty and thirteen kids — well, that's a full house. What kind of work, roles, or responsibilities
  have kept you busy over the years?

The structured tool call still fires correctly; the leaked text streams in parallel. This seems to be similar to #1339

Relevant log output

N/A

Describe your environment

This is occurring on Livekit Cloud.
@livekit/agents: ^1.4.1
@livekit/agents-plugin-livekit: ^1.4.1
@livekit/agents-plugin-silero: ^1.4.1
@livekit/noise-cancellation-node: ^0.1.9
@livekit/rtc-node: ^0.13.27

Minimal reproducible example

I don't have reproduction steps because this has been notoriously difficult to replicate, but I've walked the codebase with Claude Code and generated the below hypothesis

Hypothesis

▎ The hypothesis and source-inspection findings below were generated by walking the SDK with Claude
▎ Code. I haven't independently verified every line citation — flagging in case I've misread
▎ something.

This looks like an OpenAI Harmony response format leak:

  • to=functions.saveAnswer is Harmony's tool-call recipient syntax.
  • The garbled Unicode bracketing the payload (天天中彩票能json, цҳауеит) matches the byte signature of
    Harmony control tokens (<|channel|>commentary, <|constrain|>json, <|message|>, <|end|>) being decoded
    as raw text rather than consumed as special tokens.
  • Pattern: to= .

Reads as the Inference gateway routing Harmony channel content through delta.content instead of
consuming channel markers and emitting tool calls only via delta.tool_calls.

Findings from source inspection

  • No Harmony-aware decoding in @livekit/agents@1.4.4 — grep for harmony, to=functions, <|channel|>,
    <|message|> in src/ returned nothing.
  • src/inference/llm.ts parseChoice (~L615–627): delta.content falls through to emit a text chunk even
    while a tool call is mid-stream (this.toolCallId !== undefined).
  • src/voice/generation.ts (~L537–563): delta.toolCalls and delta.content are independent branches;
    delta.content is written to textWriter (→ tts_node) and appended to data.generatedText.
  • Default ttsTextTransforms (filter_markdown, filter_emoji) don't touch Harmony tokens.

Additional information

Livekit Cloud URL of an impacted session https://cloud.livekit.io/projects/p_5vda1ybjh9k/sessions/RM_pfaxxUtAmJiY

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions