Observed behaviour
When using LiteLLMBackend to route to a vLLM-served thinking model (e.g. Qwen3 with --reasoning-parser qwen3), mot._thinking is never populated — the reasoning trace is silently discarded.
Expected behaviour
mot._thinking should contain the model's reasoning trace, consistent with the behaviour of OpenAIBackend after #1063.
Root cause
vLLM surfaces the reasoning trace under the raw key "reasoning" in the message dict, not "reasoning_content". LiteLLM's normalisation layer (utils.py, text_choices["reasoning_content"] = delta.get("reasoning_content")) only reads reasoning_content from the wire — it never reads reasoning. The field is dropped before the response reaches Mellea's LiteLLMBackend.processing().
As a result, message.reasoning_content is None and mot._thinking stays "".
This differs from the OpenAIBackend fix in #1063: there the raw openai SDK object preserved the extra field in model_extra, so a fallback probe was possible. With LiteLLM the field is lost earlier in the stack.
Reproducer
Route a vLLM thinking model through LiteLLM (e.g. using the openai/ provider prefix pointing at a vLLM server with --reasoning-parser qwen3) and observe that mot._thinking is "" after generation completes.
Observed behaviour
When using
LiteLLMBackendto route to a vLLM-served thinking model (e.g. Qwen3 with--reasoning-parser qwen3),mot._thinkingis never populated — the reasoning trace is silently discarded.Expected behaviour
mot._thinkingshould contain the model's reasoning trace, consistent with the behaviour ofOpenAIBackendafter #1063.Root cause
vLLM surfaces the reasoning trace under the raw key
"reasoning"in the message dict, not"reasoning_content". LiteLLM's normalisation layer (utils.py,text_choices["reasoning_content"] = delta.get("reasoning_content")) only readsreasoning_contentfrom the wire — it never readsreasoning. The field is dropped before the response reaches Mellea'sLiteLLMBackend.processing().As a result,
message.reasoning_contentisNoneandmot._thinkingstays"".This differs from the
OpenAIBackendfix in #1063: there the raw openai SDK object preserved the extra field inmodel_extra, so a fallback probe was possible. With LiteLLM the field is lost earlier in the stack.Reproducer
Route a vLLM thinking model through LiteLLM (e.g. using the
openai/provider prefix pointing at a vLLM server with--reasoning-parser qwen3) and observe thatmot._thinkingis""after generation completes.