fix(agent): replay reasoning_content across native tool-call turns to prevent DeepSeek 400s#2918
Conversation
…pdate related functions
…te serialization logic
📝 WalkthroughWalkthroughAdds an optional reasoning_content: Option to AssistantToolCalls, captures it from provider responses, persists it in turns, serializes it into dispatcher-emitted provider JSON when present, updates token accounting, and adjusts tests/fixtures and pattern matches for compatibility. ChangesReasoning Content Propagation
Sequence DiagramsequenceDiagram
participant Provider
participant Turn
participant History
participant Dispatcher
Provider->>Turn: response with reasoning_content
Turn->>History: construct AssistantToolCalls with reasoning_content
History->>Dispatcher: serialize message for provider
Dispatcher->>Provider: send JSON with reasoning_content field
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly Related PRs
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/agent/harness/token_budget.rs`:
- Around line 36-38: The AssistantToolCalls arm in token_budget.rs currently
ignores the new reasoning_content field; update the pattern matching for
ConversationMessage::AssistantToolCalls to capture reasoning_content and include
its token count when estimating tokens for that message in
trim_conversation_history_to_budget by calling
model_token_estimator.tokens_for_text(&reasoning_content) (or the project's
equivalent tokens_for_text method) and adding it to the existing text/tool_calls
token sum so large reasoning payloads are counted in the budget.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e24d24cf-19fb-40d4-8b6c-d8892c85a83c
📒 Files selected for processing (14)
src/openhuman/agent/dispatcher.rssrc/openhuman/agent/dispatcher_tests.rssrc/openhuman/agent/harness/session/runtime_tests.rssrc/openhuman/agent/harness/session/turn.rssrc/openhuman/agent/harness/session/turn_tests.rssrc/openhuman/agent/harness/token_budget.rssrc/openhuman/agent/tests.rssrc/openhuman/context/manager_tests.rssrc/openhuman/context/microcompact.rssrc/openhuman/context/pipeline.rssrc/openhuman/context/segment_recap_summarizer.rssrc/openhuman/context/summarizer.rssrc/openhuman/context/summarizer_tests.rssrc/openhuman/inference/provider/traits.rs
… AssistantToolCalls
graycyrus
left a comment
There was a problem hiding this comment.
@YellowSnnowmann this is a solid fix for the DeepSeek 400s — the approach is right and the backward compat via serde default is the correct call. CI is still pending so holding off on approving until those pass.
One thing to check in the meantime: in NativeToolDispatcher::to_provider_messages, reasoning_content is included directly in serde_json::json!, which means when it's None, the serialized assistant payload includes "reasoning_content": null rather than omitting the key entirely. For DeepSeek and other thinking-capable providers that's fine since the session had no reasoning to replay. But other providers routing through NativeToolDispatcher will now receive a field they've never seen before. Most JSON APIs tolerate unexpected null fields, but worth verifying against each provider's schema — or conditionally including the key only when reasoning_content is Some.
Fix CI and i'll approve.
…e reasoning_content in output and add test for omission
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/openhuman/agent/dispatcher_tests.rs (1)
436-452: 🏗️ Heavy liftSplit this test module to stay within the repo’s Rust file-size guideline.
This file is now ~545 lines, and this new test adds more surface to an already oversized module. Please extract related test groups (for example, reasoning-content replay/omission cases) into dedicated sibling test files to keep this module maintainable.
As per coding guidelines, "
**/*.{ts,tsx,rs}: Keep React component files and Rust modules at ≤ ~500 lines; split growing modules into multiple files" and "{src,app/src,app/src-tauri}/**/*.{rs,ts,tsx}: File size: prefer ≤ ~500 lines per source file; split modules when growing to maintain readability and single responsibility".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/dispatcher_tests.rs` around lines 436 - 452, The test module has grown over the file-size guideline; extract the reasoning-content tests into a new sibling test file: create a new tests module (e.g., a dispatcher_reasoning tests file) and move native_dispatcher_omits_reasoning_content_when_absent and any paired replay/omission tests there, keeping the exact test names (native_dispatcher_omits_reasoning_content_when_absent) and references to NativeToolDispatcher and to_provider_messages; ensure the new file imports or re-exports required helpers (assistant_tool_calls, tool_results) and types (NativeToolDispatcher) from the original module (make helpers pub(crate) or add appropriate use crate::... paths) so the tests compile, and update module declarations if needed so the test suite still runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/openhuman/agent/dispatcher_tests.rs`:
- Around line 436-452: The test module has grown over the file-size guideline;
extract the reasoning-content tests into a new sibling test file: create a new
tests module (e.g., a dispatcher_reasoning tests file) and move
native_dispatcher_omits_reasoning_content_when_absent and any paired
replay/omission tests there, keeping the exact test names
(native_dispatcher_omits_reasoning_content_when_absent) and references to
NativeToolDispatcher and to_provider_messages; ensure the new file imports or
re-exports required helpers (assistant_tool_calls, tool_results) and types
(NativeToolDispatcher) from the original module (make helpers pub(crate) or add
appropriate use crate::... paths) so the tests compile, and update module
declarations if needed so the test suite still runs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1f773ed6-c38b-4f11-82c6-339e0e52885c
📒 Files selected for processing (2)
src/openhuman/agent/dispatcher.rssrc/openhuman/agent/dispatcher_tests.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/openhuman/agent/dispatcher.rs
graycyrus
left a comment
There was a problem hiding this comment.
@YellowSnnowmann hey! previous changes are addressed — the conditional reasoning_content insertion landed in 5c3079b exactly as requested. code looks good to me, but there are still some CI checks pending (Windows Rust tests, E2E suites, Tauri build). once those are green, i'll come back and approve this.
oxoxDev
left a comment
There was a problem hiding this comment.
LGTM. Clean, correctly-scoped fix for TAURI-RUST-4K9 — reasoning_content is persisted on AssistantToolCalls and replayed so DeepSeek thinking-mode follow-ups stop 400ing.
Verified:
#[serde(default, skip_serializing_if = "Option::is_none")]keeps old transcripts deserializing (None) and never emitsnull— the exact DeepSeek-reject case @graycyrus flagged, fixed via the conditionalpayload["reasoning_content"]insert in commit 5c3079b.turn.rstrims + drops empty reasoning before storing.token_budgetnow counts reasoning tokens, so the budget estimate doesn't under-count.- Good coverage: serde roundtrip, serialize-present, omit-absent, token estimate.
Minor (optional, non-blocking)
reasoning_content is replayed for every past AssistantToolCalls in history, but DeepSeek only needs it on the most recent pre-tool-call turn. Long thinking-mode sessions will carry every turn's reasoning blob forward — it's budget-counted and microcompact/summarizer will trim it, so it's bounded, but if token cost shows up consider replaying only the latest turn's reasoning. Approving as-is.
|
Merged ✅ — thanks @graycyrus for the close review here, especially catching the CI was fully green and the change reviewed sane on a re-read — backward-compatible |
Summary
Problem
Solution
Submission Checklist
If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.
.github/workflows/coverage.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge.docs/TEST-COVERAGE-MATRIX.mdreflect this change (or N/A: behaviour-only change)docs/RELEASE-MANUAL-SMOKE.md)Impact
Related
Summary by CodeRabbit
New Features
Testing