Skip to content

Flaky test report: committed-code failures on 2026-05-16 #267

@andrross

Description

@andrross

Summary

10 test failures were observed against committed code (Timer and Post Merge Action builds on main) in the 24 hours ending 2026-05-16T10:00Z. All failures are from known-flaky tests with long histories in the metrics cluster. None reproduced locally with the original CI seed, confirming they are timing/concurrency-dependent.

Failures Observed

# Test Build Error Seed Reproduces Locally?
1 SegmentReplicationStatsIT.testSegmentReplicationNodeAndIndexStats 77113 AssertionError (assertTrue at line 415) No
2 EhCacheDiskCacheTests.testComputeIfAbsentConcurrently 77074 expected:<1> but was:<2> No
3 SearchRestCancellationIT.testAutomaticCancellationMultiSearchDuringQueryPhase 77071 AssertionError (assertTrue) No
4 RemotePrimaryLocalRecoveryIT.testLocalRecoveryRollingRestartAndNodeFailure 77069 AssertionError: unexpected during rollingRestart No
5 RecoveryWhileUnderLoadIT.testRecoverWithRelocationAndDerivedSource 77057 shard has pending operations No
6 FullRollingRestartIT.testFullRollingRestart_withNoRecoveryPayloadAndSource 77056 replica shards haven't caught up: expected:<20> but was:<17> No
7 RemoteStoreKafkaIT.testPeriodicFlush 77039 ConditionTimeoutException: not fulfilled within 1 minute No
8 IndexingIT.testIndexingWithSegRep 77039 expected:<0> but was:<1> N/A (BWC test)
9 RemoteStoreReplicationSourceTests.testGetMergedSegmentFilesDownloadTimeout 77033 Suite timeout reached No
10 RemoteStoreReplicationSourceTests.classMethod 77033 Suite-level timeout (same as #9) N/A

Historical Flake Rates (sorted by total builds affected)

Data from gradle-check-* indices across all build types (including PR builds). Unique build counts represent distinct CI runs where the test failed.

Test Total Builds First Seen Monthly Trend (recent 6 months) Pattern
IndexingIT 711 2024-03-25 5, 12, 19, 19, 5, 12 Stable high
SearchRestCancellationIT 440 2024-03-26 14, 5, 7, 6, 23, 38 Worsening
FullRollingRestartIT 277 2024-10-11 1, 0, 35, 25, 24, 16 Improving
RecoveryWhileUnderLoadIT 272 2024-04-03 1, 1, 13, 29, 28, 27 Stable high (since Feb 2026)
RemotePrimaryLocalRecoveryIT 163 2024-03-26 1, 0, 2, 6, 7, 3 Stable low
RemoteStoreReplicationSourceTests 161 2024-04-17 3, 3, 1, 2, 3, 4 Stable low
RemoteStoreKafkaIT 119 2025-03-18 5, 1, 8, 3, 7, 5 Stable
EhCacheDiskCacheTests 79 2024-03-28 1, 3, 0, 2, 14, 14 Worsening
SegmentReplicationStatsIT 7 2024-12-06 0, 0, 0, 0, 0, 1 Rare

Key Observations

  1. SearchRestCancellationIT is the most concerning — failure rate jumped from ~6/month to 38 in May 2026 (partial month). The April 2026 CI runner migration from m5.8xlarge to m7a.8xlarge correlates with the uptick, suggesting CPU-speed amplification of a latent race.

  2. EhCacheDiskCacheTests shows a similar worsening pattern (3→14→14), also correlating with the runner migration. The concurrent test (testComputeIfAbsentConcurrently) is likely sensitive to thread scheduling differences on faster hardware.

  3. RecoveryWhileUnderLoadIT stabilized at a high rate (~28/month) since February 2026, predating the runner change. This may be a separate issue.

  4. FullRollingRestartIT is actually improving (35→16 over 4 months), suggesting prior fixes are having effect.

  5. No seeds reproduced locally — all 9 locally-runnable tests passed with their CI seeds. This confirms the failures depend on factors outside seed control (thread scheduling, GC timing, network simulation timing).

Methodology

  • Committed-code failures identified via metrics cluster query filtering on invoke_type: Timer with git_reference: main, and invoke_type: Post Merge Action
  • Seeds extracted from Jenkins test report API (errorStackTrace SeedInfo or JVM args in stdout)
  • Local reproduction attempted with ./gradlew <module>:<task> --tests "<class>.<method>" -Dtests.seed=<SEED>
  • Historical data aggregated across all gradle-check-* indices using monthly date histograms with cardinality aggregation on build_number

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions