Python: add experimental file history provider#5248
Python: add experimental file history provider#5248eavanvalkenburg wants to merge 9 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds an experimental FileHistoryProvider to persist per-session conversation history as JSON Lines files, plus accompanying docs and tests.
Changes:
- Introduces
FileHistoryProvider(experimental) with pluggable JSON (de)serializers and safe session-id-to-filename handling. - Adds unit tests covering storage, loading, unsafe session IDs, custom serializers, and error cases.
- Updates sample/docs to demonstrate file-backed history usage.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| python/samples/02-agents/conversations/file_history_provider.py | New sample demonstrating FileHistoryProvider with tool-calling. |
| python/samples/02-agents/conversations/README.md | Documents new samples and required environment/auth details. |
| python/packages/core/tests/core/test_sessions.py | Adds test suite validating FileHistoryProvider behavior and experimental marking. |
| python/packages/core/agent_framework/_sessions.py | Implements FileHistoryProvider and JSON (de)serializer abstractions. |
| python/packages/core/agent_framework/_feature_stage.py | Registers ExperimentalFeature.FILE_HISTORY. |
| python/packages/core/agent_framework/init.py | Exports FileHistoryProvider in public API. |
| python/packages/core/AGENTS.md | Documents the new provider and clarifies built-in history providers. |
| .worktrees/issue-4676-a2a-sdk-update | Adds a git-worktree marker file (likely unintended). |
| .worktrees/issue-4675-duplicate-telemetry | Adds a git-worktree marker file (likely unintended). |
| .worktrees/devui_datastar | Adds a git-worktree marker file (likely unintended). |
Comments suppressed due to low confidence (1)
.worktrees/issue-4675-duplicate-telemetry:1
- The
.worktrees/*files look like local git worktree metadata and are typically not meant to be committed to the repository. Please remove these.worktrees/*entries from the PR and add.worktrees/to.gitignore(or the repo’s equivalent ignore mechanism) to prevent accidental commits in the future.
python/samples/02-agents/conversations/file_history_provider.py
Outdated
Show resolved
Hide resolved
python/samples/02-agents/conversations/file_history_provider.py
Outdated
Show resolved
Hide resolved
Python Test Coverage Report •
Python Unit Test Overview
|
|||||||||||||||||||||||||||||||||||
chetantoshniwal
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 86%
✓ Correctness
The PR adds a
FileHistoryProviderclass that stores conversation history as JSON Lines files, one per session. The implementation is well-structured with proper path traversal protection, thread-safe writes via per-file locks, async I/O viaasyncio.to_thread, and comprehensive tests. No correctness bugs were found. The filename sanitization logic correctly handles unsafe session IDs by base64-encoding them, and theis_relative_tocheck prevents directory traversal. The concurrent write test properly verifies mutual exclusion. Minor note: the class-level_FILE_WRITE_LOCKSdict grows unboundedly but this is a known acceptable trade-off for simplicity.
✓ Security Reliability
The FileHistoryProvider is well-designed with strong path traversal protection (session ID sanitization + is_relative_to guard) and proper JSON Lines format validation. The main reliability concern is the class-level
_FILE_WRITE_LOCKSdictionary that grows unboundedly—one entry per unique resolved file path, never cleaned up—which constitutes a memory leak in long-running services with many sessions. Security-wise, the implementation is solid: unsafe session IDs are base64-encoded, resolved paths are validated against the storage root, and file I/O is properly locked and offloaded to threads. The sample file is reasonable for demo purposes.
✓ Test Coverage
The new
FileHistoryProviderhas a solid test suite covering the main happy path (store/load round-trip viabefore_run/after_run), custom serializers, skip-excluded filtering, invalid JSONL error handling, multiline-JSON rejection, encoded filenames for unsafe session IDs, directory creation, and concurrent write locking. However, several error-handling branches and edge-case paths lack coverage: deserialized payload that isn't a mapping,Message.from_dictraisingValueError,dumpsreturning an invalid type (neitherstrnorbytes),Nonesession_id defaulting to the"default"file stem, andget_messageson a nonexistent/empty file returning an empty list. These are worth adding but are not blocking since they are defense-in-depth error paths.
✗ Design Approach
The
FileHistoryProvideris correctly designed for its stated 'local runs' purpose and handles path traversal, encoding, and serialization well. Two design issues are worth addressing: (1) the class-level_FILE_WRITE_LOCKSregistry grows without bound — every unique resolved file path adds athreading.Lockthat is never evicted, which leaks memory in long-lived processes with many unique session IDs (UUID-based sessions being the common case); (2) thethreading.Lockis acquired inside the closure dispatched toasyncio.to_thread, which means a thread-pool worker is blocked idle while waiting for the lock — the cleaner async pattern acquires anasyncio.Lockat the coroutine level before dispatching to thread so no thread slot is wasted. A minor secondary observation: the path-traversal guard (is_relative_tocheck in_session_file_path) is unreachable dead code because_is_literal_session_file_stem_safealready rejects every character outside[a-zA-Z0-9._-], so no encoded stem can ever contain a path separator; it is harmless as defense-in-depth but worth documenting.
Automated review by chetantoshniwal's agents
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
a7aff57 to
603cfa3
Compare
Motivation and Context
This adds a FileHistoryProvider that stores the history as a JSONL file, where each line holds a message.
It also allows you to set a different json (de)serialization function so that you can leverage faster (de)ser.
Adds tests, sample
Description
Contribution Checklist