[python] Add conflict detection in shard update#7630
[python] Add conflict detection in shard update#7630JingsongLi merged 4 commits intoapache:masterfrom
Conversation
leaves12138
left a comment
There was a problem hiding this comment.
Nice PR! This is a real-world concurrency issue that would be hard to catch without the E2E test.
Overall the approach looks solid — passing snapshot_id through Plan → ShardTableUpdator → CommitMessage.check_from_snapshot → ConflictDetection._row_id_check_from_snapshot leverages the existing Java conflict detection mechanism correctly.
A few minor comments:
-
FileScannerlambda return type change: The scanner lambdas now return(List, Snapshot)tuples instead of justList. Theall_manifests()lambda also returns(manifests, snapshot)now, but it is only consumed in theFileScannerctor where the new signature is expected — just want to double-check there is no other caller ofFileScannerthat still expects a plain list. -
Incremental scan edge case: In
incremental_manifest(),end_snapshotis set inside theforloop. Ifstart_id + 1 > end_id(empty range),end_snapshotstaysNone. The caller would get(manifests, None). Is this expected — i.e., should the Plan still carrysnapshot_id = Nonewhen there are no incremental snapshots to scan? -
Plandataclass: Addingsnapshot_id: Optional[int] = Noneis backward-compatible. But theFileScanner.scan()path always sets it fromsnapshot_manager.get_latest_snapshot().id— might be worth confirming there is no code path whereget_latest_snapshot()could returnNoneat plan time.
The E2E test design (Java write base → Python scan → Java compact → Python commit conflict) is well thought out and clearly demonstrates the race condition.
Once the above questions are addressed, +1 from me! 🚀
Thanks!
|
|
All points addressed, thanks for the quick responses!
LGTM, approving now! 🚀 |
leaves12138
left a comment
There was a problem hiding this comment.
All comments addressed. LGTM! 🚀
Purpose
This PR is a follow-up to #7323. PR #7323 introduced conflict detection for Python data evolution updates, but the shard-update path was not covered.
As a result, when shard update and compact run concurrently, the shard update may commit successfully against a stale scan snapshot instead of failing fast. The problem only shows up later during read, with the error:
All files in a field merge split should have the same row count.This PR extends the same conflict-detection coverage to the shard-update path.Tests
run_compact_conflict_test in run_mixed_tests.sh