Skip to content

Python: add experimental file history provider#5248

Open
eavanvalkenburg wants to merge 9 commits intomicrosoft:mainfrom
eavanvalkenburg:file_history
Open

Python: add experimental file history provider#5248
eavanvalkenburg wants to merge 9 commits intomicrosoft:mainfrom
eavanvalkenburg:file_history

Conversation

@eavanvalkenburg
Copy link
Copy Markdown
Member

Motivation and Context

This adds a FileHistoryProvider that stores the history as a JSONL file, where each line holds a message.
It also allows you to set a different json (de)serialization function so that you can leverage faster (de)ser.

Adds tests, sample

Description

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot AI review requested due to automatic review settings April 14, 2026 12:43
@moonbox3 moonbox3 added documentation Improvements or additions to documentation python labels Apr 14, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds an experimental FileHistoryProvider to persist per-session conversation history as JSON Lines files, plus accompanying docs and tests.

Changes:

  • Introduces FileHistoryProvider (experimental) with pluggable JSON (de)serializers and safe session-id-to-filename handling.
  • Adds unit tests covering storage, loading, unsafe session IDs, custom serializers, and error cases.
  • Updates sample/docs to demonstrate file-backed history usage.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
python/samples/02-agents/conversations/file_history_provider.py New sample demonstrating FileHistoryProvider with tool-calling.
python/samples/02-agents/conversations/README.md Documents new samples and required environment/auth details.
python/packages/core/tests/core/test_sessions.py Adds test suite validating FileHistoryProvider behavior and experimental marking.
python/packages/core/agent_framework/_sessions.py Implements FileHistoryProvider and JSON (de)serializer abstractions.
python/packages/core/agent_framework/_feature_stage.py Registers ExperimentalFeature.FILE_HISTORY.
python/packages/core/agent_framework/init.py Exports FileHistoryProvider in public API.
python/packages/core/AGENTS.md Documents the new provider and clarifies built-in history providers.
.worktrees/issue-4676-a2a-sdk-update Adds a git-worktree marker file (likely unintended).
.worktrees/issue-4675-duplicate-telemetry Adds a git-worktree marker file (likely unintended).
.worktrees/devui_datastar Adds a git-worktree marker file (likely unintended).
Comments suppressed due to low confidence (1)

.worktrees/issue-4675-duplicate-telemetry:1

  • The .worktrees/* files look like local git worktree metadata and are typically not meant to be committed to the repository. Please remove these .worktrees/* entries from the PR and add .worktrees/ to .gitignore (or the repo’s equivalent ignore mechanism) to prevent accidental commits in the future.

@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented Apr 14, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   _feature_stage.py1141289%96, 153, 160, 162–164, 171, 173, 178, 185, 204, 232
   _sessions.py3733889%100–102, 104–107, 124–125, 127–131, 208–209, 299, 560–564, 606, 609, 643, 692, 696, 706, 839, 855, 979, 987, 993–994, 1016, 1036, 1046, 1075
TOTAL27645320788% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5579 20 💤 0 ❌ 0 🔥 1m 29s ⏱️

Copy link
Copy Markdown
Contributor

@chetantoshniwal chetantoshniwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 86%

✓ Correctness

The PR adds a FileHistoryProvider class that stores conversation history as JSON Lines files, one per session. The implementation is well-structured with proper path traversal protection, thread-safe writes via per-file locks, async I/O via asyncio.to_thread, and comprehensive tests. No correctness bugs were found. The filename sanitization logic correctly handles unsafe session IDs by base64-encoding them, and the is_relative_to check prevents directory traversal. The concurrent write test properly verifies mutual exclusion. Minor note: the class-level _FILE_WRITE_LOCKS dict grows unboundedly but this is a known acceptable trade-off for simplicity.

✓ Security Reliability

The FileHistoryProvider is well-designed with strong path traversal protection (session ID sanitization + is_relative_to guard) and proper JSON Lines format validation. The main reliability concern is the class-level _FILE_WRITE_LOCKS dictionary that grows unboundedly—one entry per unique resolved file path, never cleaned up—which constitutes a memory leak in long-running services with many sessions. Security-wise, the implementation is solid: unsafe session IDs are base64-encoded, resolved paths are validated against the storage root, and file I/O is properly locked and offloaded to threads. The sample file is reasonable for demo purposes.

✓ Test Coverage

The new FileHistoryProvider has a solid test suite covering the main happy path (store/load round-trip via before_run/after_run), custom serializers, skip-excluded filtering, invalid JSONL error handling, multiline-JSON rejection, encoded filenames for unsafe session IDs, directory creation, and concurrent write locking. However, several error-handling branches and edge-case paths lack coverage: deserialized payload that isn't a mapping, Message.from_dict raising ValueError, dumps returning an invalid type (neither str nor bytes), None session_id defaulting to the "default" file stem, and get_messages on a nonexistent/empty file returning an empty list. These are worth adding but are not blocking since they are defense-in-depth error paths.

✗ Design Approach

The FileHistoryProvider is correctly designed for its stated 'local runs' purpose and handles path traversal, encoding, and serialization well. Two design issues are worth addressing: (1) the class-level _FILE_WRITE_LOCKS registry grows without bound — every unique resolved file path adds a threading.Lock that is never evicted, which leaks memory in long-lived processes with many unique session IDs (UUID-based sessions being the common case); (2) the threading.Lock is acquired inside the closure dispatched to asyncio.to_thread, which means a thread-pool worker is blocked idle while waiting for the lock — the cleaner async pattern acquires an asyncio.Lock at the coroutine level before dispatching to thread so no thread slot is wasted. A minor secondary observation: the path-traversal guard (is_relative_to check in _session_file_path) is unreachable dead code because _is_literal_session_file_stem_safe already rejects every character outside [a-zA-Z0-9._-], so no encoded stem can ever contain a path separator; it is harmless as defense-in-depth but worth documenting.


Automated review by chetantoshniwal's agents

eavanvalkenburg and others added 8 commits April 14, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants