Skip to content

improvement: stream reasoning-content in <thinking> tags#593

Open
JannikSt wants to merge 1 commit into
mainfrom
improvement/inference-reasoning-content-streaming
Open

improvement: stream reasoning-content in <thinking> tags#593
JannikSt wants to merge 1 commit into
mainfrom
improvement/inference-reasoning-content-streaming

Conversation

@JannikSt
Copy link
Copy Markdown
Member

@JannikSt JannikSt commented May 3, 2026

Stream reasoning-content tokens (R1/o1-style models) inside <thinking> tags so the CLI doesn't appear to hang, and surface finish_reason at the end so length / content_filter truncations are visible.


Note

Low Risk
Low risk CLI-only output changes; main risk is minor formatting/streaming regressions when handling mixed reasoning_content and content chunks.

Overview
Improves prime inference chat output to surface model reasoning tokens by streaming reasoning_content inside <thinking> tags (both streaming and non-streaming responses) so the CLI shows progress even before final content.

Tracks finish_reason during generation and prints a warning when the response ends due to length without producing final content, prompting users to increase --max-tokens.

Reviewed by Cursor Bugbot for commit 1a5557d. Bugbot is set up for automated code reviews on this repo. Configure here.

…h_reason

The OpenAI-compat streaming response carries chain-of-thought tokens
under delta.reasoning_content (R1/o1-style models). Without rendering,
the user sees a long pause then the final answer. Wrap the reasoning
chunks in <thinking>…</thinking> so the CLI mirrors the assistant's
output shape. Also surface finish_reason at the end so 'length' or
'content_filter' truncations are visible.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a5557d39f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

reasoning = msg.get("reasoning_content") or ""
finish_reason = choices[0].get("finish_reason")
if reasoning:
console.print(f"[dim]<thinking>\n{reasoning}\n</thinking>[/dim]")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Disable Rich markup when emitting reasoning_content

Printing reasoning with console.print(f"[dim]<thinking>\n{reasoning}\n</thinking>[/dim]") causes Rich to parse any [...] sequences inside model output as markup, which can drop or alter text (for example markdown links or bracketed code snippets). In non-streaming mode this corrupts the visible reasoning output and makes it unreliable for debugging; emit the text with markup=False (or write raw to stdout) so the response is preserved verbatim.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1a5557d. Configure here.

reasoning = msg.get("reasoning_content") or ""
finish_reason = choices[0].get("finish_reason")
if reasoning:
console.print(f"[dim]<thinking>\n{reasoning}\n</thinking>[/dim]")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rich markup injection corrupts model reasoning output

High Severity

The reasoning content from the model is interpolated directly into a Rich markup string via console.print(f"[dim]<thinking>\n{reasoning}\n</thinking>[/dim]"). Rich interprets square brackets as markup tags, so any brackets in the reasoning (array indices like [0], citations like [1], math like [a, b]) will be parsed as styling commands, causing text to silently disappear or produce rendering errors. The rest of the codebase uses markup=False when printing dynamic content (e.g., output_data_as_json).

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1a5557d. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant