Observed behaviour
When using OpenAIBackend with a vLLM-served thinking model (Qwen3 with --reasoning-parser qwen3), the thinking trace is silently discarded. mot._thinking remains None even when the model has produced reasoning content.
Root cause
In mellea/backends/openai.py, processing() probes for the thinking trace as:
if hasattr(message, "reasoning_content"):
thinking_chunk = message.reasoning_content
However, vLLM 0.20.2 does not expose this as a Python attribute on the openai SDK message object. The thinking trace is present in the raw response dict under the key "reasoning", only visible via model_dump():
resp.choices[0].model_dump()
# {
# "message": {
# "content": null,
# "reasoning": "2 + 2 equals 4.", ← here, not 'reasoning_content'
# ...
# }
# }
hasattr(message, "reasoning_content") returns False, so mot._thinking is never set. The thinking trace is present in mot._meta["oai_chat_response"] but never surfaced.
Expected behaviour
processing() should also check the raw response dict for a "reasoning" key (vLLM's field name) when reasoning_content is absent, and populate mot._thinking accordingly.
This is independent of the silent-empty-string issue (#1060) — even when a model produces both text and reasoning content, the thinking trace is currently lost for vLLM-served models.
Environment
- Backend:
OpenAIBackend (OpenAI-compatible)
- Model:
Qwen/Qwen3-Coder-Next-FP8
- Inference server: vLLM 0.20.2, flags:
--reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser hermes --tensor-parallel-size 2 --quantization fp8 --max-model-len 262144
- Hardware: 2× GPU (tensor parallel), IBM LSF cluster
Observed behaviour
When using
OpenAIBackendwith a vLLM-served thinking model (Qwen3 with--reasoning-parser qwen3), the thinking trace is silently discarded.mot._thinkingremainsNoneeven when the model has produced reasoning content.Root cause
In
mellea/backends/openai.py,processing()probes for the thinking trace as:However, vLLM 0.20.2 does not expose this as a Python attribute on the openai SDK message object. The thinking trace is present in the raw response dict under the key
"reasoning", only visible viamodel_dump():hasattr(message, "reasoning_content")returnsFalse, somot._thinkingis never set. The thinking trace is present inmot._meta["oai_chat_response"]but never surfaced.Expected behaviour
processing()should also check the raw response dict for a"reasoning"key (vLLM's field name) whenreasoning_contentis absent, and populatemot._thinkingaccordingly.This is independent of the silent-empty-string issue (#1060) — even when a model produces both text and reasoning content, the thinking trace is currently lost for vLLM-served models.
Environment
OpenAIBackend(OpenAI-compatible)Qwen/Qwen3-Coder-Next-FP8--reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser hermes --tensor-parallel-size 2 --quantization fp8 --max-model-len 262144