Skip to content

Flaky test report: committed-code failures on 2026-05-19 #270

@andrross

Description

@andrross

Summary

Automated scan of committed-code (Timer and Post Merge Action) gradle-check failures in the past 24 hours (2026-05-18 to 2026-05-19). All tests were run locally with their original CI seeds; none reproduced deterministically, confirming these are timing/environment-sensitive flakes rather than deterministic failures.

CI runners are currently m7a.8xlarge (since mid-April 2026). Several tests show increased failure rates coinciding with this runner change.

Summary Table (sorted by total builds affected)

# Test Builds Affected First Failure Pattern Build Link
1 RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest 282 2024-04-03 Chronic, worsening since Apr 2026 77490
2 WarmIndexSegmentReplicationIT.testReplicationAfterForceMerge 235 2025-03-11 Chronic, low-rate steady 77440
3 ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting 202 2024-03-26 Chronic, worsening in May 2026 77440
4 RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout 163 2024-04-17 Was improving, uptick in May 2026 77460
5 IngestFromKafkaIT.testRawPayloadMapperIngestion 131 2025-01-12 Chronic (mostly PR builds) 77460
6 ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability 118 2024-10-03 Chronic, sharp spike Apr 2026 (35 builds) 77456
7 SystemIndexRestIT.classMethod 94 2024-07-03 Spike Nov 2025, worsening in 2026 77477
8 DeleteSnapshotIT.testDeleteShallowCopySnapshot 31 2024-04-06 Low-rate chronic, stable 77472
9 IndexFieldDataServiceTests.testClearField 12 2024-07-30 Low-rate, slight uptick May 2026 77413
10 SegmentReplicationUsingRemoteStoreIT.testPrimaryStopped_ReplicaPromoted 9 2024-07-15 New pattern since Apr 2026 77420

Detailed Findings

1. RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest

  • Build: 77490 (Post Merge Action)
  • Seed: EB7480DD2E1D21AC
  • Error: replica shards haven't caught up with primary expected:<20> but was:<15>
  • Reproduced locally: No
  • First failure: 2024-04-03
  • Total builds affected: 282
  • Pattern: Chronic flake. Major spike in Apr-Jul 2025 (21→38→21 builds/month), subsided, then resumed Feb 2026 onward (8→11→13→12). Worsening trend correlates with m7a.8xlarge runner change.

2. WarmIndexSegmentReplicationIT.testReplicationAfterForceMerge

  • Build: 77440 (Timer)
  • Seed: 26407BF518CD68CE
  • Error: Expected: a value equal to or greater than <3L> but: <0L> was less than <3L>
  • Reproduced locally: No
  • First failure: 2025-03-11
  • Total builds affected: 235 (note: class-level count includes other test methods)
  • Pattern: Steady low-rate flake across many months (1-4 builds/month).

3. ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting

  • Build: 77440 (Timer)
  • Seed: 26407BF518CD68CE
  • Error: Suite timeout exceeded (>= 1200000 msec) — test timed out
  • Reproduced locally: No
  • First failure: 2024-03-26
  • Total builds affected: 202 (class-level)
  • Pattern: Chronic timeout flake. Steady 1-9 builds/month since Jan 2025. Sharp spike to 14 in May 2026 — likely environmental sensitivity to runner change.

4. RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout

  • Build: 77460 (Timer)
  • Seed: 48E5F968DFEEDAF0
  • Error: Suite timeout exceeded (>= 1200000 msec)
  • Reproduced locally: No
  • First failure: 2024-04-17
  • Total builds affected: 163 (class-level)
  • Pattern: Major spike Aug-Oct 2025 (37→32→18), then largely resolved. Uptick to 6 builds in May 2026.

5. IngestFromKafkaIT.testRawPayloadMapperIngestion

  • Build: 77460 (Timer)
  • Seed: 48E5F968DFEEDAF0
  • Error: ConditionTimeoutException: Condition was not fulfilled within 1 [second]
  • Reproduced locally: No
  • First failure: 2025-01-12
  • Total builds affected: 131 (class-level, includes other Kafka IT tests)
  • Pattern: Kafka integration tests require embedded Kafka; timeout suggests environmental sensitivity.

6. ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability

  • Build: 77456 (Post Merge Action)
  • Seed: 893C85F2FC3AC7D5
  • Error: ClusterHealthResponse has timed out
  • Reproduced locally: No
  • First failure: 2024-10-03
  • Total builds affected: 118
  • Pattern: Low-rate chronic (1-6 builds/month) until Apr 2026 when it spiked to 35 builds. Strong correlation with m7a.8xlarge runner change — faster CPUs likely amplify race conditions in this linearizability test.

7. SystemIndexRestIT.classMethod

  • Build: 77477 (Post Merge Action)
  • Seed: 3AB0B4BC0125C3CF
  • Error: 1 channels still being tracked in RestCancellableNodeClient while there should be none expected:<0> but was:<1>
  • Reproduced locally: No
  • First failure: 2024-07-03
  • Total builds affected: 94
  • Pattern: Spike of 41 builds in Nov 2025, then subsided. Worsening again in 2026 (13 in Apr, 19 in May so far).

8. DeleteSnapshotIT.testDeleteShallowCopySnapshot

  • Build: 77472 (Post Merge Action)
  • Seed: 3D45AF61B849E410
  • Error: Expected: is <4> but: was <3>
  • Reproduced locally: No
  • First failure: 2024-04-06
  • Total builds affected: 31
  • Pattern: Low-rate stable flake (0-3 builds/month). No significant trend change.

9. IndexFieldDataServiceTests.testClearField

  • Build: 77413 (Post Merge Action)
  • Seed: 25319181894D5792
  • Error: expected:<0> but was:<1>
  • Reproduced locally: No
  • First failure: 2024-07-30
  • Total builds affected: 12
  • Pattern: Very low-rate (0-2 builds/month). Slight uptick to 3 in May 2026.

10. SegmentReplicationUsingRemoteStoreIT.testPrimaryStopped_ReplicaPromoted

  • Build: 77420 (Post Merge Action)
  • Seed: 4B76556EC34B0E76
  • Error: Count is 2 hits but 3 was expected. Total shards: 1 Successful shards: 1
  • Reproduced locally: No
  • First failure: 2024-07-15
  • Total builds affected: 9 (for this specific test method)
  • Pattern: New pattern — 3 builds in Apr 2026, 2 in May 2026. Appears to be newly flaky, possibly triggered by runner change.

Reproduction Notes

All tests were run locally with their original CI seeds on the current main branch. None failed. This is expected for timing-sensitive flakes — the seed controls randomized parameters but not thread scheduling, GC pauses, or network timing.

Notable Trends

  1. ConcurrentSeqNoVersioningIT had a dramatic spike from ~3-6 builds/month to 35 in April 2026, strongly suggesting the m7a.8xlarge runner change amplified an existing race condition.
  2. ShardIndexingPressureSettingsIT and RemoteStoreReplicationSourceTests both fail with suite timeouts, suggesting they are sensitive to overall system load or scheduling.
  3. SystemIndexRestIT has a channel-leak assertion that is worsening month over month in 2026.
  4. SegmentReplicationUsingRemoteStoreIT.testPrimaryStopped_ReplicaPromoted is a newly emerging flake (first appeared Apr 2026).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions