Skip to content

LiteLLMBackend: mot._thinking not populated for vLLM thinking models #1070

@planetf1

Description

@planetf1

Observed behaviour

When using LiteLLMBackend to route to a vLLM-served thinking model (e.g. Qwen3 with --reasoning-parser qwen3), mot._thinking is never populated — the reasoning trace is silently discarded.

Expected behaviour

mot._thinking should contain the model's reasoning trace, consistent with the behaviour of OpenAIBackend after #1063.

Root cause

vLLM surfaces the reasoning trace under the raw key "reasoning" in the message dict, not "reasoning_content". LiteLLM's normalisation layer (utils.py, text_choices["reasoning_content"] = delta.get("reasoning_content")) only reads reasoning_content from the wire — it never reads reasoning. The field is dropped before the response reaches Mellea's LiteLLMBackend.processing().

As a result, message.reasoning_content is None and mot._thinking stays "".

This differs from the OpenAIBackend fix in #1063: there the raw openai SDK object preserved the extra field in model_extra, so a fallback probe was possible. With LiteLLM the field is lost earlier in the stack.

Reproducer

Route a vLLM thinking model through LiteLLM (e.g. using the openai/ provider prefix pointing at a vLLM server with --reasoning-parser qwen3) and observe that mot._thinking is "" after generation completes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions