Summary
This report covers flaky test failures observed in committed-code CI runs (Timer builds on main and Post Merge Action builds) during the 72-hour window ending 2026-05-20 10:00 UTC. No committed-code test failures occurred in the strict 24-hour window (the most recent was ~27 hours before report time), so the window was expanded to capture the nearest failures.
10 distinct tests failed across 8 builds. None reproduced locally with their original seeds, confirming these are timing/environment-dependent flakes rather than deterministic failures.
Reproduction Results
All tests were run locally on the current main checkout using the exact seed from the failing CI build. None reproduced.
| # |
Test |
Reproduced? |
| 1 |
IndexingIT.testIndexingWithSegRep |
❌ No |
| 2 |
FlightOutboundHandlerContextPropagationTests.testThreadContextPropagatedThroughStreamResponseBatch |
❌ No |
| 3 |
ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability |
❌ No |
| 4 |
RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest |
❌ No |
| 5 |
TieringStatusIT.testTieringStatus |
❌ No (not in current codebase) |
| 6 |
SystemIndexRestIT.classMethod |
❌ No |
| 7 |
DeleteSnapshotIT.testDeleteShallowCopySnapshot |
❌ No |
| 8 |
IngestFromKafkaIT.testRawPayloadMapperIngestion |
❌ No |
| 9 |
RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout |
❌ No |
| 10 |
ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting |
❌ No |
Summary Table (sorted by total builds affected)
| Test |
Builds Affected |
First Failure |
Recent Build |
Pattern |
IndexingIT.testIndexingWithSegRep |
260 |
2024-03-25 |
77617 |
Stable chronic flake (~5-18/month) |
ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting |
160 |
2024-03-26 |
77440 |
Stable chronic flake, slight worsening in 2026 |
RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest |
145 |
2024-05-03 |
77490 |
Worsening — spiked mid-2025, elevated since |
ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability |
119 |
2024-10-03 |
77545 |
Worsening — 35 builds in Apr 2026 (was 1-6/month) |
RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout |
101 |
2025-08-04 |
77460 |
Improving — peaked Aug-Sep 2025, now low |
SystemIndexRestIT.classMethod |
89 |
2025-11-03 |
77477 |
Worsening — 20 builds in May 2026 so far |
DeleteSnapshotIT.testDeleteShallowCopySnapshot |
31 |
2024-04-06 |
77472 |
Stable low-rate flake (~1-2/month) |
FlightOutboundHandlerContextPropagationTests.testThreadContextPropagatedThroughStreamResponseBatch |
10 |
2026-04-14 |
77602 |
New — appeared after Apr 2026 runner change |
TieringStatusIT.testTieringStatus |
7 |
2026-05-18 |
77477 |
New — 7 failures all in May 2026 |
IngestFromKafkaIT.testRawPayloadMapperIngestion |
1 |
2026-05-19 |
77460 |
New — single occurrence |
Detailed Findings
1. IndexingIT.testIndexingWithSegRep
- Build: 77617 (Timer, main)
- Seed:
2CFE8BCAF9F2FE0B
- Module:
qa:rolling-upgrade:v3.6.1#upgradedClusterTest
- First failure: 2024-03-25
- Total builds affected: 260
- Pattern: Chronic stable flake. Consistently fails in 5-18 builds per month since March 2024. This is a rolling-upgrade test that waits for searchable docs after segment replication — inherently timing-sensitive.
2. FlightOutboundHandlerContextPropagationTests.testThreadContextPropagatedThroughStreamResponseBatch
- Build: 77602 (Timer, main)
- Seed:
2A132F5DF21983D8
- Module:
plugins:arrow-flight-rpc:test
- First failure: 2026-04-14
- Total builds affected: 10
- Pattern: New flake that appeared right when CI runners moved to m7a.8xlarge (mid-April 2026). Likely CPU-speed amplification of a latent race in thread context propagation.
3. ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability
- Build: 77545 (Timer, main)
- Seed:
BDEB9106E7A89536
- Module:
server:internalClusterTest
- First failure: 2024-10-03
- Total builds affected: 119
- Pattern: Significantly worsening. Was 1-6 failures/month through early 2026, then jumped to 35 in April 2026 and 19 so far in May. Strong correlation with the m7a.8xlarge runner migration. This is a linearizability test with inherent thread-scheduling sensitivity.
4. RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest
- Build: 77490 (Post Merge Action)
- Seed:
EB7480DD2E1D21AC
- Module:
server:internalClusterTest
- First failure: 2024-05-03
- Total builds affected: 145
- Pattern: Worsening. Had a major spike in mid-2025 (41 failures in June 2025), subsided, then elevated again in 2026 (11-15/month). Segment replication variant is particularly affected.
5. TieringStatusIT.testTieringStatus
- Build: 77477 (Post Merge Action)
- Seed:
3AB0B4BC0125C3CF
- Module:
server:internalClusterTest (not found in current checkout — may be sandbox/feature-flagged)
- First failure: 2026-05-18
- Total builds affected: 7
- Pattern: Brand new — all 7 failures in the last 2 days. Likely a recently introduced test or recently merged feature with a race condition.
6. SystemIndexRestIT.classMethod
- Build: 77477 (Post Merge Action)
- Seed:
3AB0B4BC0125C3CF
- Module:
qa:smoke-test-http:integTest
- First failure: 2025-11-03
- Total builds affected: 89
- Pattern: Worsening. Started Nov 2025 with a burst of 41 failures, settled to 2-5/month, now climbing again (13 in Apr, 20 in May 2026). The
classMethod failure indicates a class-level setup/teardown issue rather than a specific test method.
7. DeleteSnapshotIT.testDeleteShallowCopySnapshot
- Build: 77472 (Post Merge Action)
- Seed:
3D45AF61B849E410
- Module:
server:internalClusterTest
- First failure: 2024-04-06
- Total builds affected: 31
- Pattern: Stable low-rate flake. Consistently 1-2 failures per month over 2+ years. Low priority but persistent.
8. IngestFromKafkaIT.testRawPayloadMapperIngestion
- Build: 77460 (Timer, main)
- Seed:
48E5F968DFEEDAF0
- Module:
plugins:ingestion-kafka:internalClusterTest
- First failure: 2026-05-19
- Total builds affected: 1
- Pattern: Single occurrence. May be a one-off or a newly introduced flake. Worth monitoring but not actionable yet.
9. RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout
- Build: 77460 (Timer, main)
- Seed:
48E5F968DFEEDAF0
- Module:
server:test
- First failure: 2025-08-04
- Total builds affected: 101
- Pattern: Improving. Peaked at 38 failures in Aug 2025 and 32 in Sep 2025, then dropped to 1-2/month. Recent uptick to 6 in May 2026 may correlate with runner change.
10. ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting
- Build: 77440 (Timer, main)
- Seed:
26407BF518CD68CE
- Module:
server:internalClusterTest
- First failure: 2024-03-26
- Total builds affected: 160
- Pattern: Chronic stable flake with slight worsening in 2026. Has been failing consistently for over 2 years at 1-15 builds/month. May 2026 already at 15 failures.
Priority Recommendations
- ConcurrentSeqNoVersioningIT — Most urgent. Failure rate increased 6x after runner migration. Needs investigation into whether faster CPUs expose a real linearizability bug or just a test timing issue.
- SystemIndexRestIT — Worsening trend, class-level failure suggests infrastructure issue in test setup.
- FlightOutboundHandlerContextPropagationTests — New test, new flake. Should be easiest to fix since the code is recent.
- TieringStatusIT — Brand new, investigate immediately while context is fresh.
- RecoveryWhileUnderLoadIT — Chronic and worsening, but complex to fix (segment replication timing).
Notes
- Timer builds 77664 and 77653 (within the 24h window) were build/infrastructure failures with no test report — not test flakes.
- The April 2026 CI runner migration from m5.8xlarge to m7a.8xlarge correlates with increased failure rates for ConcurrentSeqNoVersioningIT and the emergence of FlightOutboundHandlerContextPropagationTests failures.
Summary
This report covers flaky test failures observed in committed-code CI runs (Timer builds on
mainand Post Merge Action builds) during the 72-hour window ending 2026-05-20 10:00 UTC. No committed-code test failures occurred in the strict 24-hour window (the most recent was ~27 hours before report time), so the window was expanded to capture the nearest failures.10 distinct tests failed across 8 builds. None reproduced locally with their original seeds, confirming these are timing/environment-dependent flakes rather than deterministic failures.
Reproduction Results
All tests were run locally on the current
maincheckout using the exact seed from the failing CI build. None reproduced.IndexingIT.testIndexingWithSegRepFlightOutboundHandlerContextPropagationTests.testThreadContextPropagatedThroughStreamResponseBatchConcurrentSeqNoVersioningIT.testSeqNoCASLinearizabilityRecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTestTieringStatusIT.testTieringStatusSystemIndexRestIT.classMethodDeleteSnapshotIT.testDeleteShallowCopySnapshotIngestFromKafkaIT.testRawPayloadMapperIngestionRemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeoutShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSettingSummary Table (sorted by total builds affected)
IndexingIT.testIndexingWithSegRepShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSettingRecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTestConcurrentSeqNoVersioningIT.testSeqNoCASLinearizabilityRemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeoutSystemIndexRestIT.classMethodDeleteSnapshotIT.testDeleteShallowCopySnapshotFlightOutboundHandlerContextPropagationTests.testThreadContextPropagatedThroughStreamResponseBatchTieringStatusIT.testTieringStatusIngestFromKafkaIT.testRawPayloadMapperIngestionDetailed Findings
1. IndexingIT.testIndexingWithSegRep
2CFE8BCAF9F2FE0Bqa:rolling-upgrade:v3.6.1#upgradedClusterTest2. FlightOutboundHandlerContextPropagationTests.testThreadContextPropagatedThroughStreamResponseBatch
2A132F5DF21983D8plugins:arrow-flight-rpc:test3. ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability
BDEB9106E7A89536server:internalClusterTest4. RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest
EB7480DD2E1D21ACserver:internalClusterTest5. TieringStatusIT.testTieringStatus
3AB0B4BC0125C3CFserver:internalClusterTest(not found in current checkout — may be sandbox/feature-flagged)6. SystemIndexRestIT.classMethod
3AB0B4BC0125C3CFqa:smoke-test-http:integTestclassMethodfailure indicates a class-level setup/teardown issue rather than a specific test method.7. DeleteSnapshotIT.testDeleteShallowCopySnapshot
3D45AF61B849E410server:internalClusterTest8. IngestFromKafkaIT.testRawPayloadMapperIngestion
48E5F968DFEEDAF0plugins:ingestion-kafka:internalClusterTest9. RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout
48E5F968DFEEDAF0server:test10. ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting
26407BF518CD68CEserver:internalClusterTestPriority Recommendations
Notes