improvement: stream reasoning-content in <thinking> tags#593
Conversation
…h_reason The OpenAI-compat streaming response carries chain-of-thought tokens under delta.reasoning_content (R1/o1-style models). Without rendering, the user sees a long pause then the final answer. Wrap the reasoning chunks in <thinking>…</thinking> so the CLI mirrors the assistant's output shape. Also surface finish_reason at the end so 'length' or 'content_filter' truncations are visible.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a5557d39f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| reasoning = msg.get("reasoning_content") or "" | ||
| finish_reason = choices[0].get("finish_reason") | ||
| if reasoning: | ||
| console.print(f"[dim]<thinking>\n{reasoning}\n</thinking>[/dim]") |
There was a problem hiding this comment.
Disable Rich markup when emitting reasoning_content
Printing reasoning with console.print(f"[dim]<thinking>\n{reasoning}\n</thinking>[/dim]") causes Rich to parse any [...] sequences inside model output as markup, which can drop or alter text (for example markdown links or bracketed code snippets). In non-streaming mode this corrupts the visible reasoning output and makes it unreliable for debugging; emit the text with markup=False (or write raw to stdout) so the response is preserved verbatim.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1a5557d. Configure here.
| reasoning = msg.get("reasoning_content") or "" | ||
| finish_reason = choices[0].get("finish_reason") | ||
| if reasoning: | ||
| console.print(f"[dim]<thinking>\n{reasoning}\n</thinking>[/dim]") |
There was a problem hiding this comment.
Rich markup injection corrupts model reasoning output
High Severity
The reasoning content from the model is interpolated directly into a Rich markup string via console.print(f"[dim]<thinking>\n{reasoning}\n</thinking>[/dim]"). Rich interprets square brackets as markup tags, so any brackets in the reasoning (array indices like [0], citations like [1], math like [a, b]) will be parsed as styling commands, causing text to silently disappear or produce rendering errors. The rest of the codebase uses markup=False when printing dynamic content (e.g., output_data_as_json).
Reviewed by Cursor Bugbot for commit 1a5557d. Configure here.


Stream reasoning-content tokens (R1/o1-style models) inside
<thinking>tags so the CLI doesn't appear to hang, and surfacefinish_reasonat the end solength/content_filtertruncations are visible.Note
Low Risk
Low risk CLI-only output changes; main risk is minor formatting/streaming regressions when handling mixed
reasoning_contentandcontentchunks.Overview
Improves
prime inference chatoutput to surface model reasoning tokens by streamingreasoning_contentinside<thinking>tags (both streaming and non-streaming responses) so the CLI shows progress even before final content.Tracks
finish_reasonduring generation and prints a warning when the response ends due tolengthwithout producing finalcontent, prompting users to increase--max-tokens.Reviewed by Cursor Bugbot for commit 1a5557d. Bugbot is set up for automated code reviews on this repo. Configure here.