Skip to content

Python: Fix reasoning replay when store=False#5250

Queued
eavanvalkenburg wants to merge 5 commits intomicrosoft:mainfrom
eavanvalkenburg:fix_reasoning_content_no_store
Queued

Python: Fix reasoning replay when store=False#5250
eavanvalkenburg wants to merge 5 commits intomicrosoft:mainfrom
eavanvalkenburg:fix_reasoning_content_no_store

Conversation

@eavanvalkenburg
Copy link
Copy Markdown
Member

Motivation and Context

Reasoning models can fail when OpenAIChatClient-based clients run with store=False and replay local tool-loop history. In that mode the Responses API cannot resolve response-scoped reasoning items from local history, which breaks scenarios like the Foundry suspend/resume sample.

Description

This updates the shared Responses serialization path to distinguish service-side storage from local storage when replaying prior messages. Reasoning items are now only sent when the request is continuing service-side storage, while local-storage replay keeps the existing function-call replay behavior without reusing service-scoped reasoning items. The change also adds regression coverage for the stateless reasoning/tool-loop path and a sample that reproduces the original Foundry scenario.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

eavanvalkenburg and others added 2 commits April 14, 2026 15:31
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 14, 2026 13:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes failures in stateless (store=False) reasoning/tool-loop replay for OpenAIChatClient by preventing replay of service-scoped reasoning items, and adds regression coverage plus a sample reproducer for the Foundry suspend/resume scenario.

Changes:

  • Distinguish “service-side storage continuation” vs “local history replay” when serializing Responses input, omitting reasoning items for stateless replay.
  • Add regression tests covering tool-loop replay and _prepare_options behavior for store=False with/without a conversation/previous-response identifier.
  • Add a Python sample demonstrating suspend/resume using local session/history.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
python/samples/02-agents/conversations/suspend_resume_local_session.py Adds a suspend/resume sample intended to reproduce the Foundry local-session scenario.
python/packages/openai/tests/openai/test_openai_chat_client.py Adds regression tests ensuring reasoning items are omitted for stateless replay, with coverage for conversation-id continuation.
python/packages/openai/agent_framework_openai/_chat_client.py Updates request serialization to omit service-scoped reasoning during local replay while preserving function-call replay.

@github-actions github-actions bot changed the title Fix reasoning replay when store=False Python: Fix reasoning replay when store=False Apr 14, 2026
@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented Apr 14, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/openai/agent_framework_openai
   _chat_client.py87012385%522–525, 529–530, 536–537, 547–548, 555, 570–576, 597, 605, 628, 746, 845, 904, 906, 908, 910, 976, 990, 1070, 1080, 1085, 1128, 1244, 1425, 1430, 1434–1436, 1440–1441, 1507, 1536, 1542, 1552, 1558, 1563, 1569, 1574–1575, 1636, 1658–1659, 1674–1675, 1693–1694, 1737, 1900, 1938–1939, 1955, 1957, 2036–2044, 2074, 2181, 2216, 2231, 2251–2261, 2274, 2285–2289, 2303, 2317–2328, 2337, 2369–2372, 2380–2381, 2383–2385, 2399–2401, 2411–2412, 2418, 2433
TOTAL27547319388% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5574 20 💤 0 ❌ 0 🔥 1m 29s ⏱️

Copy link
Copy Markdown
Contributor

@chetantoshniwal chetantoshniwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 83%

✗ Correctness

The PR refactors reasoning-item handling to correctly omit response-scoped reasoning items (rs_*) when replaying tool loops without service-side storage. The core logic is sound: it computes request_uses_service_side_storage by checking for conversation_id, previous_response_id, or conversation keys, then threads that flag through message preparation to skip reasoning items in stateless replay. However, the PR accidentally includes three .worktrees/ submodule entries that should not be committed, and the conversation key check only matches strings, which could miss non-string conversation objects from the OpenAI SDK.

✗ Security Reliability

The core logic change—conditionally omitting response-scoped reasoning items when the request does not use service-side storage—is sound and well-tested. The Message.additional_properties attribute is always initialized to a dict, so the removal of the defensive getattr guard is safe. However, the PR accidentally includes three .worktrees/ gitlink entries that are local development artifacts and should not be committed. Additionally, the isinstance(value, str) type guard for the conversation options key may silently miss non-string conversation identifiers (e.g., objects), causing reasoning items to be incorrectly omitted in that path.

✓ Test Coverage

The PR adds three new tests covering the main behavioral change (reasoning items omitted for stateless replay, kept when conversation_id is present, and an integration test for the full tool loop). These tests adequately cover the core happy paths. However, there are gaps: (1) no test for the _attribution override path where replays_local_storage=True causes reasoning items to be omitted even when request_uses_service_side_storage=True, (2) no tests for previous_response_id or conversation as the trigger key (only conversation_id is tested), and (3) the integration test doesn't assert that the fc_id from additional_properties is properly used (vs call_id) in the stateless path, which is the other half of the replays_local_storage logic.

✓ Design Approach

The PR correctly identifies that response-scoped reasoning item IDs (rs_*) are only valid within a live service-managed context and must be omitted when replaying conversation history stateless (store=False without a prior conversation reference). The request-level flag propagation through _prepare_options → _prepare_messages_for_openai → _prepare_message_for_openai → _prepare_content_for_openai is consistent with how the tool loop works: with store=False, no conversation_id is ever injected into mutable_options, so prepped_messages always contains the full history and the flag correctly gates reasoning item inclusion. One minor issue: the isinstance(value, str) check for the 'conversation' key is overly narrow — while ChatOptions only defines conversation_id (a string), raw caller dicts could in theory include a non-string conversation value (e.g. an OpenAI Conversation object), and the check would silently miss it. Additionally, committing .worktrees/ git-submodule entries is an accidental inclusion that should not be in this PR.


Automated review by chetantoshniwal's agents

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants