Python: Fix reasoning replay when store=False#5250
Python: Fix reasoning replay when store=False#5250eavanvalkenburg wants to merge 5 commits intomicrosoft:mainfrom
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Fixes failures in stateless (store=False) reasoning/tool-loop replay for OpenAIChatClient by preventing replay of service-scoped reasoning items, and adds regression coverage plus a sample reproducer for the Foundry suspend/resume scenario.
Changes:
- Distinguish “service-side storage continuation” vs “local history replay” when serializing Responses input, omitting reasoning items for stateless replay.
- Add regression tests covering tool-loop replay and
_prepare_optionsbehavior forstore=Falsewith/without a conversation/previous-response identifier. - Add a Python sample demonstrating suspend/resume using local session/history.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| python/samples/02-agents/conversations/suspend_resume_local_session.py | Adds a suspend/resume sample intended to reproduce the Foundry local-session scenario. |
| python/packages/openai/tests/openai/test_openai_chat_client.py | Adds regression tests ensuring reasoning items are omitted for stateless replay, with coverage for conversation-id continuation. |
| python/packages/openai/agent_framework_openai/_chat_client.py | Updates request serialization to omit service-scoped reasoning during local replay while preserving function-call replay. |
python/samples/02-agents/conversations/suspend_resume_local_session.py
Outdated
Show resolved
Hide resolved
python/samples/02-agents/conversations/suspend_resume_local_session.py
Outdated
Show resolved
Hide resolved
python/samples/02-agents/conversations/suspend_resume_local_session.py
Outdated
Show resolved
Hide resolved
python/samples/02-agents/conversations/suspend_resume_local_session.py
Outdated
Show resolved
Hide resolved
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
chetantoshniwal
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 83%
✗ Correctness
The PR refactors reasoning-item handling to correctly omit response-scoped reasoning items (rs_*) when replaying tool loops without service-side storage. The core logic is sound: it computes
request_uses_service_side_storageby checking forconversation_id,previous_response_id, orconversationkeys, then threads that flag through message preparation to skip reasoning items in stateless replay. However, the PR accidentally includes three.worktrees/submodule entries that should not be committed, and theconversationkey check only matches strings, which could miss non-string conversation objects from the OpenAI SDK.
✗ Security Reliability
The core logic change—conditionally omitting response-scoped reasoning items when the request does not use service-side storage—is sound and well-tested. The
Message.additional_propertiesattribute is always initialized to a dict, so the removal of the defensivegetattrguard is safe. However, the PR accidentally includes three.worktrees/gitlink entries that are local development artifacts and should not be committed. Additionally, theisinstance(value, str)type guard for theconversationoptions key may silently miss non-string conversation identifiers (e.g., objects), causing reasoning items to be incorrectly omitted in that path.
✓ Test Coverage
The PR adds three new tests covering the main behavioral change (reasoning items omitted for stateless replay, kept when conversation_id is present, and an integration test for the full tool loop). These tests adequately cover the core happy paths. However, there are gaps: (1) no test for the
_attributionoverride path wherereplays_local_storage=Truecauses reasoning items to be omitted even whenrequest_uses_service_side_storage=True, (2) no tests forprevious_response_idorconversationas the trigger key (onlyconversation_idis tested), and (3) the integration test doesn't assert that thefc_idfromadditional_propertiesis properly used (vscall_id) in the stateless path, which is the other half of thereplays_local_storagelogic.
✓ Design Approach
The PR correctly identifies that response-scoped reasoning item IDs (rs_*) are only valid within a live service-managed context and must be omitted when replaying conversation history stateless (store=False without a prior conversation reference). The request-level flag propagation through _prepare_options → _prepare_messages_for_openai → _prepare_message_for_openai → _prepare_content_for_openai is consistent with how the tool loop works: with store=False, no conversation_id is ever injected into mutable_options, so prepped_messages always contains the full history and the flag correctly gates reasoning item inclusion. One minor issue: the isinstance(value, str) check for the 'conversation' key is overly narrow — while ChatOptions only defines conversation_id (a string), raw caller dicts could in theory include a non-string conversation value (e.g. an OpenAI Conversation object), and the check would silently miss it. Additionally, committing .worktrees/ git-submodule entries is an accidental inclusion that should not be in this PR.
Automated review by chetantoshniwal's agents
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation and Context
Reasoning models can fail when
OpenAIChatClient-based clients run withstore=Falseand replay local tool-loop history. In that mode the Responses API cannot resolve response-scoped reasoning items from local history, which breaks scenarios like the Foundry suspend/resume sample.Description
This updates the shared Responses serialization path to distinguish service-side storage from local storage when replaying prior messages. Reasoning items are now only sent when the request is continuing service-side storage, while local-storage replay keeps the existing function-call replay behavior without reusing service-scoped reasoning items. The change also adds regression coverage for the stateless reasoning/tool-loop path and a sample that reproduces the original Foundry scenario.
Contribution Checklist