Python: Fix reasoning replay when store=False by eavanvalkenburg · Pull Request #5250 · microsoft/agent-framework

eavanvalkenburg · 2026-04-14T13:42:37Z

Motivation and Context

Reasoning models can fail when OpenAIChatClient-based clients run with store=False and replay local tool-loop history. In that mode the Responses API cannot resolve response-scoped reasoning items from local history, which breaks scenarios like the Foundry suspend/resume sample.

Description

This updates the shared Responses serialization path to distinguish service-side storage from local storage when replaying prior messages. Reasoning items are now only sent when the request is continuing service-side storage, while local-storage replay keeps the existing function-call replay behavior without reusing service-scoped reasoning items. The change also adds regression coverage for the stateless reasoning/tool-loop path and a sample that reproduces the original Foundry scenario.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes failures in stateless (store=False) reasoning/tool-loop replay for OpenAIChatClient by preventing replay of service-scoped reasoning items, and adds regression coverage plus a sample reproducer for the Foundry suspend/resume scenario.

Changes:

Distinguish “service-side storage continuation” vs “local history replay” when serializing Responses input, omitting reasoning items for stateless replay.
Add regression tests covering tool-loop replay and _prepare_options behavior for store=False with/without a conversation/previous-response identifier.
Add a Python sample demonstrating suspend/resume using local session/history.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
python/samples/02-agents/conversations/suspend_resume_local_session.py	Adds a suspend/resume sample intended to reproduce the Foundry local-session scenario.
python/packages/openai/tests/openai/test_openai_chat_client.py	Adds regression tests ensuring reasoning items are omitted for stateless replay, with coverage for conversation-id continuation.
python/packages/openai/agent_framework_openai/_chat_client.py	Updates request serialization to omit service-scoped reasoning during local replay while preserving function-call replay.

python/packages/openai/agent_framework_openai/_chat_client.py

python/samples/02-agents/conversations/suspend_resume_local_session.py

moonbox3 · 2026-04-14T14:00:41Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/openai/agent_framework_openai
_chat_client.py	870	123	85%	522–525, 529–530, 536–537, 547–548, 555, 570–576, 597, 605, 628, 746, 845, 904, 906, 908, 910, 976, 990, 1070, 1080, 1085, 1128, 1244, 1425, 1430, 1434–1436, 1440–1441, 1507, 1536, 1542, 1552, 1558, 1563, 1569, 1574–1575, 1636, 1658–1659, 1674–1675, 1693–1694, 1737, 1900, 1938–1939, 1955, 1957, 2036–2044, 2074, 2181, 2216, 2231, 2251–2261, 2274, 2285–2289, 2303, 2317–2328, 2337, 2369–2372, 2380–2381, 2383–2385, 2399–2401, 2411–2412, 2418, 2433
TOTAL	27547	3193	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
5574	20 💤	0 ❌	0 🔥	1m 29s ⏱️

chetantoshniwal

Automated Code Review

Reviewers: 4 | Confidence: 83%

✗ Correctness

The PR refactors reasoning-item handling to correctly omit response-scoped reasoning items (rs_*) when replaying tool loops without service-side storage. The core logic is sound: it computes request_uses_service_side_storage by checking for conversation_id, previous_response_id, or conversation keys, then threads that flag through message preparation to skip reasoning items in stateless replay. However, the PR accidentally includes three .worktrees/ submodule entries that should not be committed, and the conversation key check only matches strings, which could miss non-string conversation objects from the OpenAI SDK.

✗ Security Reliability

The core logic change—conditionally omitting response-scoped reasoning items when the request does not use service-side storage—is sound and well-tested. The Message.additional_properties attribute is always initialized to a dict, so the removal of the defensive getattr guard is safe. However, the PR accidentally includes three .worktrees/ gitlink entries that are local development artifacts and should not be committed. Additionally, the isinstance(value, str) type guard for the conversation options key may silently miss non-string conversation identifiers (e.g., objects), causing reasoning items to be incorrectly omitted in that path.

✓ Test Coverage

The PR adds three new tests covering the main behavioral change (reasoning items omitted for stateless replay, kept when conversation_id is present, and an integration test for the full tool loop). These tests adequately cover the core happy paths. However, there are gaps: (1) no test for the _attribution override path where replays_local_storage=True causes reasoning items to be omitted even when request_uses_service_side_storage=True, (2) no tests for previous_response_id or conversation as the trigger key (only conversation_id is tested), and (3) the integration test doesn't assert that the fc_id from additional_properties is properly used (vs call_id) in the stateless path, which is the other half of the replays_local_storage logic.

✓ Design Approach

The PR correctly identifies that response-scoped reasoning item IDs (rs_*) are only valid within a live service-managed context and must be omitted when replaying conversation history stateless (store=False without a prior conversation reference). The request-level flag propagation through _prepare_options → _prepare_messages_for_openai → _prepare_message_for_openai → _prepare_content_for_openai is consistent with how the tool loop works: with store=False, no conversation_id is ever injected into mutable_options, so prepped_messages always contains the full history and the flag correctly gates reasoning item inclusion. One minor issue: the isinstance(value, str) check for the 'conversation' key is overly narrow — while ChatOptions only defines conversation_id (a string), raw caller dicts could in theory include a non-string conversation value (e.g. an OpenAI Conversation object), and the check would silently miss it. Additionally, committing .worktrees/ git-submodule entries is an accidental inclusion that should not be in this PR.

Automated review by chetantoshniwal's agents

.worktrees/devui_datastar

.worktrees/issue-4675-duplicate-telemetry

.worktrees/issue-4676-a2a-sdk-update

python/packages/openai/tests/openai/test_openai_chat_client.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

eavanvalkenburg and others added 2 commits April 14, 2026 15:31

fix reasoning content when store=False

76ff8c5

Remove accidental worktree entries

df940f7

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 14, 2026 13:42

moonbox3 added the python label Apr 14, 2026

Copilot AI reviewed Apr 14, 2026

View reviewed changes

github-actions bot changed the title ~~Fix reasoning replay when store=False~~ Python: Fix reasoning replay when store=False Apr 14, 2026

Copilot started reviewing on behalf of eavanvalkenburg April 14, 2026 13:52 View session

remove local session sample

0e3005e

eavanvalkenburg enabled auto-merge April 14, 2026 14:15

chetantoshniwal reviewed Apr 14, 2026

View reviewed changes

removed left over files

897b477

TaoChenOSU approved these changes Apr 14, 2026

View reviewed changes

Add attribution override regression test

ef32025

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

giles17 approved these changes Apr 14, 2026

View reviewed changes

eavanvalkenburg added this pull request to the merge queue Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Fix reasoning replay when store=False#5250

Python: Fix reasoning replay when store=False#5250
eavanvalkenburg wants to merge 5 commits intomicrosoft:mainfrom
eavanvalkenburg:fix_reasoning_content_no_store

eavanvalkenburg commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

moonbox3 commented Apr 14, 2026 •

edited

Loading

Uh oh!

chetantoshniwal left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

eavanvalkenburg commented Apr 14, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

moonbox3 commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

chetantoshniwal left a comment

Choose a reason for hiding this comment

Automated Code Review

✗ Correctness

✗ Security Reliability

✓ Test Coverage

✓ Design Approach

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

moonbox3 commented Apr 14, 2026 •

edited

Loading