Skip to content

accumulate_delta drops tool_call fragments when one chunk has multiple entries at the same index #3203

@hrolfurinn

Description

@hrolfurinn

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

When a streaming chunk contains two tool_calls entries with the same index, accumulate_delta stores them as two separate list elements instead of merging them. Observed in practice against vLLM with speculative decoding enabled. Subsequent chunks then merge into element [0] only, leaving element [1] orphaned. The reconstructed tool_calls[0].function.arguments is missing whichever fragment landed on [1] and is no longer parseable as JSON.

The OpenAPI spec permits this shape. tool_calls is an unconstrained array, and ChatCompletionMessageToolCallChunk only requires index:

ChatCompletionStreamResponseDelta:
  properties:
    tool_calls:
      type: array
      items:
        $ref: "#/components/schemas/ChatCompletionMessageToolCallChunk"

See also #3201 , which is the same issue but with a narrower scope. (I can move this to a comment there if needed).

The cause is two early-return branches at the top of the per-key loop in _deltas.py:

for key, delta_value in delta.items():
    if key not in acc:          # L8-10: key absent
        acc[key] = delta_value
        continue

    acc_value = acc[key]
    if acc_value is None:       # L13-15: value is None
        acc[key] = delta_value
        continue

Both assign the delta list wholesale and skip the index merge logic that handles duplicate indexes in steady state. So any duplicate-index entries in the very first chunk slip through unmerged.

_assistants.py has identical merge logic with the same bug. _convert_initial_chunk_into_snapshot in _completions.py uses "message": choice.delta.to_dict() to seed the snapshot, which copies the raw delta array without an index merge. So the bug also triggers on the very first chunk of a stream.

In the Node SDK (openai@6.36.0), ChatCompletionStream.ts (lines 316-337) doesn't have the short circuit. It always loops and dispatches by index, and produces the correct result on this input. AssistantStream.accumulateDelta (lines 213-270) has the same fast-path logic and the same bug.

To Reproduce

import json
from openai.lib.streaming._deltas import accumulate_delta

acc = {"tool_calls": None}

# Simulates a first chunk with two entries at index=0
# (e.g. tool name + first argument fragment in one SSE event)
accumulate_delta(acc, {"tool_calls": [
    {"index": 0, "id": "call_abc", "function": {"name": "get_weather"}, "type": "function"},
    {"index": 0, "function": {"arguments": '{"city"'}},
]})
accumulate_delta(acc, {"tool_calls": [
    {"index": 0, "function": {"arguments": ': "London"}'}},
]})

args = acc["tool_calls"][0]["function"]["arguments"]
try:
    json.loads(args)
except json.JSONDecodeError as e:
    print("arguments not parseable:", e)
print("orphan entry:", acc["tool_calls"][1])

Expected: arguments == '{"city": "London"}', parses to {"city": "London"}.

Actual: arguments == ': "London"}'. The '{"city"' prefix ended up on acc["tool_calls"][1].

Code snippets

OS

macOS

Python version

Python v3.14.2

Library version

openai v2.35.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions