Skip to content

Comprehensive consistency test suite for TieredStorage#363

Draft
jan-auer wants to merge 3 commits intomainfrom
test/tiered-storage-consistency
Draft

Comprehensive consistency test suite for TieredStorage#363
jan-auer wants to merge 3 commits intomainfrom
test/tiered-storage-consistency

Conversation

@jan-auer
Copy link
Member

@jan-auer jan-auer commented Mar 9, 2026

Formalizes three consistency invariants for TieredStorage and adds a structured test suite that proves they hold under normal operation and documents where they break.

Invariants

  • No OrphanLT: if LT has data, HV must have a tombstone pointing to it
  • No DualData: HV and LT must not both contain non-tombstone data
  • OrphanTombstone is safe: tombstone in HV with nothing in LT must return None on read

Testing strategy

Tests are organized into five categories, all using mocked backends (no real GCS/BigTable):

  1. Happy path (16 tests) — all state transitions including overwrites, boundary cases (0 bytes, exactly 1 MiB), and tombstone redirect behavior
  2. Backend outages (12 tests) — error injection at every step of every operation, verifying check_invariants passes after each failure (state unchanged or safely degraded)
  3. Pod termination (2 tests) — drop futures mid-operation via SyncBackend + timeout, proving OrphanLT occurs on insert kill and OrphanTombstone (safe) on delete kill
  4. Concurrent races (3 tests) — deterministic interleaving via Notify-based sync hooks, proving insert+insert and insert+delete produce invariant violations
  5. Property-based fuzzing (1 proptest, 100 random sequences) — random operation sequences on 3 keys with assert_consistent after every operation

Known violations

All four known violations run through check_invariants and assert it returns Err:

  • Pod kill between LT write and tombstone write → OrphanLT
  • Concurrent insert+insert on same key → DualData
  • Concurrent insert+delete on same key → OrphanLT
  • Insert tombstone write fails AND cleanup fails → OrphanLT

When a fix lands, check_invariants will return Ok, the unwrap_err() will panic, and the test must be updated to assert_consistent — making fixes self-enforcing.

Ref FS-236

Ref FS-236

Formalizes the TieredStorage consistency invariants and adds structured
tests proving they hold under normal operation and documenting where
they break under failure, pod termination, and concurrency.
@linear-code
Copy link

linear-code bot commented Mar 9, 2026

jan-auer added 2 commits March 9, 2026 17:40
Extract insert_small/insert_large, make_failing_storage, payload
constants, check_invariants_core, and SyncBackend builder to eliminate
repeated boilerplate across 34 tests. No coverage changes.
Add three chaos fuzz tests that run concurrent operations against
TieredStorage with a ChaosBackend (yield-based interleaving + error
injection) and assert that the known invariant violations occur:

- concurrent_insert_large_insert_small: DualData from racing inserts
- concurrent_insert_delete_from_large_state: OrphanLT + DualData
- concurrent_inserts_with_tombstone_write_errors: DualData from
  tombstone write failure + cleanup failure

Each test documents a gap in the current algorithm. When the algorithm
is hardened, flip the assertions to verify the violations are gone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant