Skip to content

Flaky test report: committed-code failures on 2026-05-20 #271

@andrross

Description

@andrross

Summary

This report covers flaky test failures observed in committed-code CI runs (Timer builds on main and Post Merge Action builds) during the 72-hour window ending 2026-05-20 10:00 UTC. No committed-code test failures occurred in the strict 24-hour window (the most recent was ~27 hours before report time), so the window was expanded to capture the nearest failures.

10 distinct tests failed across 8 builds. None reproduced locally with their original seeds, confirming these are timing/environment-dependent flakes rather than deterministic failures.

Reproduction Results

All tests were run locally on the current main checkout using the exact seed from the failing CI build. None reproduced.

# Test Reproduced?
1 IndexingIT.testIndexingWithSegRep ❌ No
2 FlightOutboundHandlerContextPropagationTests.testThreadContextPropagatedThroughStreamResponseBatch ❌ No
3 ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability ❌ No
4 RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest ❌ No
5 TieringStatusIT.testTieringStatus ❌ No (not in current codebase)
6 SystemIndexRestIT.classMethod ❌ No
7 DeleteSnapshotIT.testDeleteShallowCopySnapshot ❌ No
8 IngestFromKafkaIT.testRawPayloadMapperIngestion ❌ No
9 RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout ❌ No
10 ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting ❌ No

Summary Table (sorted by total builds affected)

Test Builds Affected First Failure Recent Build Pattern
IndexingIT.testIndexingWithSegRep 260 2024-03-25 77617 Stable chronic flake (~5-18/month)
ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting 160 2024-03-26 77440 Stable chronic flake, slight worsening in 2026
RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest 145 2024-05-03 77490 Worsening — spiked mid-2025, elevated since
ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability 119 2024-10-03 77545 Worsening — 35 builds in Apr 2026 (was 1-6/month)
RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout 101 2025-08-04 77460 Improving — peaked Aug-Sep 2025, now low
SystemIndexRestIT.classMethod 89 2025-11-03 77477 Worsening — 20 builds in May 2026 so far
DeleteSnapshotIT.testDeleteShallowCopySnapshot 31 2024-04-06 77472 Stable low-rate flake (~1-2/month)
FlightOutboundHandlerContextPropagationTests.testThreadContextPropagatedThroughStreamResponseBatch 10 2026-04-14 77602 New — appeared after Apr 2026 runner change
TieringStatusIT.testTieringStatus 7 2026-05-18 77477 New — 7 failures all in May 2026
IngestFromKafkaIT.testRawPayloadMapperIngestion 1 2026-05-19 77460 New — single occurrence

Detailed Findings

1. IndexingIT.testIndexingWithSegRep

  • Build: 77617 (Timer, main)
  • Seed: 2CFE8BCAF9F2FE0B
  • Module: qa:rolling-upgrade:v3.6.1#upgradedClusterTest
  • First failure: 2024-03-25
  • Total builds affected: 260
  • Pattern: Chronic stable flake. Consistently fails in 5-18 builds per month since March 2024. This is a rolling-upgrade test that waits for searchable docs after segment replication — inherently timing-sensitive.

2. FlightOutboundHandlerContextPropagationTests.testThreadContextPropagatedThroughStreamResponseBatch

  • Build: 77602 (Timer, main)
  • Seed: 2A132F5DF21983D8
  • Module: plugins:arrow-flight-rpc:test
  • First failure: 2026-04-14
  • Total builds affected: 10
  • Pattern: New flake that appeared right when CI runners moved to m7a.8xlarge (mid-April 2026). Likely CPU-speed amplification of a latent race in thread context propagation.

3. ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability

  • Build: 77545 (Timer, main)
  • Seed: BDEB9106E7A89536
  • Module: server:internalClusterTest
  • First failure: 2024-10-03
  • Total builds affected: 119
  • Pattern: Significantly worsening. Was 1-6 failures/month through early 2026, then jumped to 35 in April 2026 and 19 so far in May. Strong correlation with the m7a.8xlarge runner migration. This is a linearizability test with inherent thread-scheduling sensitivity.

4. RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest

  • Build: 77490 (Post Merge Action)
  • Seed: EB7480DD2E1D21AC
  • Module: server:internalClusterTest
  • First failure: 2024-05-03
  • Total builds affected: 145
  • Pattern: Worsening. Had a major spike in mid-2025 (41 failures in June 2025), subsided, then elevated again in 2026 (11-15/month). Segment replication variant is particularly affected.

5. TieringStatusIT.testTieringStatus

  • Build: 77477 (Post Merge Action)
  • Seed: 3AB0B4BC0125C3CF
  • Module: server:internalClusterTest (not found in current checkout — may be sandbox/feature-flagged)
  • First failure: 2026-05-18
  • Total builds affected: 7
  • Pattern: Brand new — all 7 failures in the last 2 days. Likely a recently introduced test or recently merged feature with a race condition.

6. SystemIndexRestIT.classMethod

  • Build: 77477 (Post Merge Action)
  • Seed: 3AB0B4BC0125C3CF
  • Module: qa:smoke-test-http:integTest
  • First failure: 2025-11-03
  • Total builds affected: 89
  • Pattern: Worsening. Started Nov 2025 with a burst of 41 failures, settled to 2-5/month, now climbing again (13 in Apr, 20 in May 2026). The classMethod failure indicates a class-level setup/teardown issue rather than a specific test method.

7. DeleteSnapshotIT.testDeleteShallowCopySnapshot

  • Build: 77472 (Post Merge Action)
  • Seed: 3D45AF61B849E410
  • Module: server:internalClusterTest
  • First failure: 2024-04-06
  • Total builds affected: 31
  • Pattern: Stable low-rate flake. Consistently 1-2 failures per month over 2+ years. Low priority but persistent.

8. IngestFromKafkaIT.testRawPayloadMapperIngestion

  • Build: 77460 (Timer, main)
  • Seed: 48E5F968DFEEDAF0
  • Module: plugins:ingestion-kafka:internalClusterTest
  • First failure: 2026-05-19
  • Total builds affected: 1
  • Pattern: Single occurrence. May be a one-off or a newly introduced flake. Worth monitoring but not actionable yet.

9. RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout

  • Build: 77460 (Timer, main)
  • Seed: 48E5F968DFEEDAF0
  • Module: server:test
  • First failure: 2025-08-04
  • Total builds affected: 101
  • Pattern: Improving. Peaked at 38 failures in Aug 2025 and 32 in Sep 2025, then dropped to 1-2/month. Recent uptick to 6 in May 2026 may correlate with runner change.

10. ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting

  • Build: 77440 (Timer, main)
  • Seed: 26407BF518CD68CE
  • Module: server:internalClusterTest
  • First failure: 2024-03-26
  • Total builds affected: 160
  • Pattern: Chronic stable flake with slight worsening in 2026. Has been failing consistently for over 2 years at 1-15 builds/month. May 2026 already at 15 failures.

Priority Recommendations

  1. ConcurrentSeqNoVersioningIT — Most urgent. Failure rate increased 6x after runner migration. Needs investigation into whether faster CPUs expose a real linearizability bug or just a test timing issue.
  2. SystemIndexRestIT — Worsening trend, class-level failure suggests infrastructure issue in test setup.
  3. FlightOutboundHandlerContextPropagationTests — New test, new flake. Should be easiest to fix since the code is recent.
  4. TieringStatusIT — Brand new, investigate immediately while context is fresh.
  5. RecoveryWhileUnderLoadIT — Chronic and worsening, but complex to fix (segment replication timing).

Notes

  • Timer builds 77664 and 77653 (within the 24h window) were build/infrastructure failures with no test report — not test flakes.
  • The April 2026 CI runner migration from m5.8xlarge to m7a.8xlarge correlates with increased failure rates for ConcurrentSeqNoVersioningIT and the emergence of FlightOutboundHandlerContextPropagationTests failures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions