Skip to content

Flaky test report: committed-code failures on 2026-05-15 #266

@andrross

Description

@andrross

Flaky test report: committed-code failures on 2026-05-15

Summary

9 distinct test classes failed against committed code (Timer or Post Merge Action builds against main) in the 24 hours ending 2026-05-15T10:00 UTC. None of the failures reproduced locally with the original seed, confirming they are non-deterministic (flaky).

Summary Table

# Test Class Build Unique Builds (All-Time) First Seen Pattern
1 RecoveryWhileUnderLoadIT 76924 267 2024-04-03 Chronic, worsening since Apr 2026
2 RemoteSplitIndexIT 76876 217 2024-04-11 Chronic, stable
3 IngestFromKafkaIT 76857 127 2025-01-12 Chronic, worsening since Mar 2026
4 EhCacheDiskCacheTests 76909 78 2024-03-28 Chronic, spike in Apr-May 2026
5 RemoteStoreKafkaIT 76864 118 2025-03-18 Chronic, stable
6 InternalDistributionBwcSetupPluginFuncTest 76933 44 2025-10-23 Chronic, stable
7 IndexStatsIT 76850 24 2026-04-17 New, worsening
8 DataFormatAwareEngineRecoveryTests 76779 20 2026-05-14 New (appeared this week), all 14 methods fail together
9 SimpleSearchIT 76857 13 2024-08-27 Chronic, low-rate

Detailed Findings

1. RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest

  • Build: 76924
  • Seed: EEA8D4EA47CA1FB0:3781762BE365C51A
  • Error: java.lang.IllegalArgumentException: no data nodes with criteria [5ImEE4GbTBeNZA4pPiAseg] found for shard: [test][2]
  • Reproduced locally: No (passed with seed)
  • First seen: 2024-04-03
  • Total unique builds affected: 267
  • Pattern: Chronic flaky test. Low rate through most of 2024-2025, then significant spike starting Jun 2025 (77 builds) and continuing at elevated rate through 2026. Recent months: Feb=13, Mar=29, Apr=28, May=22. Worsening — likely amplified by the Apr 2026 runner migration to m7a.8xlarge.

2. RemoteSplitIndexIT.testSplitFromOneToN

  • Build: 76876
  • Seed: 284C7FE1DB5BA44D
  • Error: java.lang.AssertionError
  • Reproduced locally: No (passed with seed)
  • First seen: 2024-04-11
  • Total unique builds affected: 217
  • Pattern: Chronic flaky test with consistent failure rate across the entire history. Peaked in Jun 2024 (54 builds), had a spike in Nov 2025 (42 builds), and remains active at 3-7 builds/month in 2026. Stable chronic flake.

3. IngestFromKafkaIT.testDynamicUpdateKafkaParams

  • Build: 76857
  • Seed: 9B93092845FF16D4:AD4B89DE6C4DC752
  • Error: org.awaitility.core.ConditionTimeoutException: Condition was not fulfilled within 1 minutes.
  • Reproduced locally: No (passed with seed)
  • First seen: 2025-01-12
  • Total unique builds affected: 127
  • Pattern: Chronic flaky test. Consistent 5-12 builds/month through 2025, then worsening to 19 builds in Apr 2026 and 17 in May 2026. Timeout-based failure suggests environmental sensitivity.

4. EhCacheDiskCacheTests.testComputeIfAbsentConcurrently

  • Build: 76909
  • Seed: 5721104209303B72:D69A6776F77F731C
  • Error: java.lang.AssertionError: expected:<1> but was:<2> (also leaked 3 threads from SUITE scope)
  • Reproduced locally: No (passed with seed)
  • First seen: 2024-03-28
  • Total unique builds affected: 78
  • Pattern: Chronic flaky test. Sporadic through 2024-2025 (1-6 builds/month), then significant spike in Apr 2026 (14 builds) and May 2026 (13 builds). The concurrency assertion and thread leak suggest a race condition amplified by faster hardware.

5. RemoteStoreKafkaIT.testErrorStrategy

  • Build: 76864
  • Seed: 79A91CD7405643AB:58973B4CD72A8421
  • Error: org.testcontainers.containers.ContainerLaunchException: Container startup failed for image confluentinc/cp-kafka:6.2.1
  • Reproduced locally: No (passed with seed — test uses testcontainers which may behave differently locally)
  • First seen: 2025-03-18
  • Total unique builds affected: 118
  • Pattern: Chronic flaky test. Consistent 4-16 builds/month. The container launch failure is infrastructure-dependent, not code-dependent. Stable rate.

6. InternalDistributionBwcSetupPluginFuncTest

  • Build: 76933
  • Seed: E83B270433C3B517
  • Error: org.gradle.testkit.runner.UnexpectedBuildFailure: Unexpected build execution failure (3 methods failed together)
  • Reproduced locally: No (passed with seed)
  • First seen: 2025-10-23
  • Total unique builds affected: 44
  • Pattern: Chronic flaky test. Peaked in Oct 2025 (15 builds), then stable at 3-7 builds/month. Build infrastructure test — failures likely depend on network/git state.

7. IndexStatsIT.testQueryCache

  • Build: 76850
  • Seed: 93F11E8F83128776:4806499CD0AAB068
  • Error: java.lang.AssertionError: Expected: a value greater than <0L> but: <0L> was equal to <0L>
  • Reproduced locally: No (passed with seed)
  • First seen: 2026-04-17
  • Total unique builds affected: 24
  • Pattern: New flaky test. First appeared Apr 17, 2026 — 4 days after the m7a.8xlarge runner migration. 9 builds in Apr, 15 in May. Worsening. Likely a timing-sensitive assertion about query cache population that races with faster hardware.

8. DataFormatAwareEngineRecoveryTests (14 methods)

  • Build: 76779
  • Seed: 16543A62E872DB93:D99BD69022ABBE91
  • Error: java.lang.NullPointerException: Cannot invoke "org.opensearch.index.mapper.MapperService.getIndexSettings()" because the return value of "org.opensearch.index.engine.EngineConfig.getMapperService()" is null
  • Reproduced locally: Could not attempt — test class does not exist in local checkout (appears to be very recently added, first failures 2026-05-14)
  • First seen: 2026-05-14
  • Total unique builds affected: 20
  • Pattern: Brand new. All 340 failure records are from May 2026, across 20 builds. All 14 test methods fail together with the same NPE, suggesting a test setup/infrastructure issue rather than individual test logic bugs.

9. SimpleSearchIT.testIndexOnlyFloatField

  • Build: 76857
  • Seed: 9B93092845FF16D4:F9BDE466332D3A4F
  • Error: java.lang.AssertionError: expected:<1> but was:<0>
  • Reproduced locally: No (passed with seed)
  • First seen: 2024-08-27
  • Total unique builds affected: 13
  • Pattern: Chronic low-rate flaky test. 1-2 builds affected per month sporadically. Stable, low-impact.

Notes

  • All tests that could be run locally passed with the original CI seed, confirming non-deterministic (timing/environment-dependent) failures.
  • The Apr 2026 runner migration from m5.8xlarge to m7a.8xlarge correlates with increased failure rates for RecoveryWhileUnderLoadIT, EhCacheDiskCacheTests, and the new IndexStatsIT failures.
  • DataFormatAwareEngineRecoveryTests is too new to have a local copy — it was likely added to main after the local checkout was last updated.
  • RemoteStoreKafkaIT failures are container-infrastructure-related (testcontainers), not code-related.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions