fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB)#2876
Conversation
…(Sentry TAURI-RUST-4KB) Resolves Sentry issue 5236 (TAURI-RUST-4KB): https://sentry.tinyhumans.ai/organizations/tinyhumans/issues/5236/ DeepSeek's thinking mode returns `reasoning_content` alongside `tool_calls` and requires that reasoning to be replayed on the follow-up request. Our OpenAI-compatible provider dropped it: `ChatResponse`, the assistant history JSON, and the `NativeMessage` wire type had no carrier for `reasoning_content`, so the next request omitted it and DeepSeek returned: 400 Bad Request: The `reasoning_content` in the thinking mode must be passed back to the API. The agent loop (`run_chat_task`) then failed every multi-turn tool call against deepseek-reasoner (31 events since v0.56.0). Fix: round-trip `reasoning_content` for tool-call assistant turns across all four layers — - `ChatResponse.reasoning_content` (captured in `parse_native_response` and `chat_with_tools`, trimmed; empty -> None) - `build_native_assistant_history` writes it into the assistant history JSON (omitted when empty) - `convert_messages_for_native` lifts it back onto the wire message - `NativeMessage.reasoning_content` serializes only when present Because the field is `skip_serializing_if = Option::is_none` and only populated for reasoning models, non-reasoning providers see zero change on the wire. Tests: provider capture (`parse_native_response_captures_reasoning_content`, blank -> None), wire round-trip (`convert_preserves/omits_reasoning_content`), and history-builder round-trip in `parse_tests`.
…Response initializers The new `ChatResponse.reasoning_content` field was added to every `src/` initializer but the `tests/calendar_grounding_e2e.rs` integration test was missed, so the test build failed to compile (error[E0063]: missing field `reasoning_content`). That broke the Rust Core Tests + Quality, Rust Core Coverage, and Linux Rust integration-suite checks on this PR. Set the field to None at both mock-provider initializers; `cargo test --no-run` now compiles all test targets cleanly.
…ontent Resolved conflicts in: - inference/provider/traits.rs: doc comment wording (took main's) - inference/provider/compatible_types.rs: doc comment wording (took main's) - inference/provider/compatible.rs: combined both storage approaches — prefer JSON-content (tool_loop path) or fall back to extra_metadata (session-turn path), so both replay paths work correctly - agent/dispatcher_tests.rs: indentation (took main's) - agent/harness/session/turn_tests.rs: indentation (took main's) - agent/tests.rs: indentation (took main's)
Both the PR and main added parse_native_response_captures_reasoning_content testing different code paths. Rename the second one (non-streaming API response path) to avoid the duplicate symbol compile error.
The two tests added in this PR (parse_native_response_captures_reasoning_content and parse_native_response_blank_reasoning_is_none) expected the field to be normalised: trimmed and empty-after-trim → None. The implementation was cloning the raw value verbatim, so whitespace-padded strings weren't trimmed and whitespace-only strings weren't collapsed to None. Apply the same `.as_deref().map(str::trim).filter(!s.is_empty()).map(str::to_owned)` pattern already used in the chat_with_tools and streaming paths.
…st-4kb-deepseek-reasoning-content
|
Warning Review limit reached
More reviews will be available in 53 minutes. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
Comment |
Summary
Rebased and fixed version of #2817 by @CodeGhost21.
What changed from #2817:
parse_native_response_captures_reasoning_contentandparse_native_response_blank_reasoning_is_none). The tests correctly expectedreasoning_contentto be trimmed and whitespace-only values to beNone, but the implementation was cloning the raw untrimmed string. Fixed with the same.as_deref().map(str::trim).filter(!s.is_empty()).map(str::to_owned)pattern used in thechat_with_toolsand streaming paths.upstream/mainto resolve the stale base (includes fix(inference): preserve reasoning_content in multi-turn thinking model conversations #2818, fix(observability): classify list_models 404 as ProviderUserState (Sentry TAURI-RUST-YJ) #2873, fix(cron): accept bare cron-expression string in Schedule deserializer (Sentry CORE-RUST-FY) #2874, etc.).Note on supersede claim: Despite the comment on #2817 claiming it was superseded by #2818, that is not accurate. PR #2818 (already on main) fixes the main session-turn path via
extra_metadatainturn.rs. PR #2817 fixes two separate paths:tool_loop.rs— passesreasoning_contenttobuild_native_assistant_historysubagent_runner/ops.rs— samechat_with_tools— returns actualreasoning_contentinstead ofNoneconvert_messages_for_native— liftsreasoning_contentfrom JSON content (with fallback toextra_metadata)These are complementary to #2818, not redundant with it. On current
main, multi-turn reasoning model tool calls via the tool loop still fail becausebuild_native_assistant_historydoesn't embedreasoning_content.Closes #2817.
Original PR description from @CodeGhost21:
reasoning_contenton the follow-up request.reasoning_contentfor tool-call assistant turns through all four layers of the OpenAI-compatible inference path.skip_serializing_if = Option::is_noneso non-reasoning providers see zero change on the wire.Test plan
cargo test --lib inference::provider::compatible::tests::parse_native_response— 7 passed, 0 failed (includes the 2 previously-failing tests)cargo test --lib "reasoning"(26 tests),cargo test --lib "agent::"(890 tests),cargo test --lib "inference::provider::"(316 tests) — all passcargo fmt --checkclean