Skip to content

CNTRLPLANE-3371: Fix AllowedCIDRs e2e test for Route-based KAS#8469

Open
bryan-cox wants to merge 1 commit intoopenshift:mainfrom
bryan-cox:CNTRLPLANE-3371
Open

CNTRLPLANE-3371: Fix AllowedCIDRs e2e test for Route-based KAS#8469
bryan-cox wants to merge 1 commit intoopenshift:mainfrom
bryan-cox:CNTRLPLANE-3371

Conversation

@bryan-cox
Copy link
Copy Markdown
Member

@bryan-cox bryan-cox commented May 8, 2026

What

Fixes the ValidateKubeAPIServerAllowedCIDRs e2e test so it passes on v2 Azure self-managed clusters where KAS uses Route publishing strategy (via --external-dns-domain).

Why

The test was skipped in v2 CI (--ginkgo.skip="KAS allowed CIDRs") because it always failed. Both v1 and v2 Azure self-managed use Route strategy for KAS, but v1 passes while v2 fails due to a difference in cluster lifecycle timing combined with HTTP/2 connection reuse.

Root cause: HTTP/2 connection reuse

The test reuses a single kubeclient.Clientset across all ServerVersion() poll iterations. Go's HTTP/2 transport multiplexes all requests over a single persistent TCP connection. If the first poll succeeds before Azure NSG rules take effect, all subsequent polls reuse that connection and never observe the expected failure.

Why v1 passes but v2 fails: In v1, the cluster is created fresh inside TestCreateCluster, so the CPO is in its initial reconciliation burst — the router service's LoadBalancerSourceRanges and corresponding Azure NSG rules are updated before the first ServerVersion() call. In v2, the cluster is pre-created and shared across tests, so the CPO is in steady-state with longer reconciliation intervals. The first ServerVersion() call succeeds before the NSG rules catch up, and HTTP/2 holds that connection open for all subsequent polls.

Additional fix: missing downstream service wait

The test waits for AllowedCIDRBlocks to propagate from the HostedCluster to the HostedControlPlane, but does not wait for the CPO to reconcile the downstream LoadBalancer service's LoadBalancerSourceRanges. This is a race condition that exists in both v1 and v2 — v1 just happens to win the race due to CPO being in active reconciliation. Adding an explicit wait makes the test correct rather than relying on timing.

Changes

test/e2e/util/util.go — single file, three changes:

  1. ensureAPIServerAllowedCIDRs signature: *kubeclient.Clientset*rest.Config to enable fresh client creation per poll
  2. Fresh kubeclient per poll: Each ServerVersion() iteration creates a new client via kubeclient.NewForConfig(rest.CopyConfig(guestConfig)), preventing HTTP/2 connection reuse.
  3. Strategy-aware service wait: New allowedCIDRsTargetService() helper determines the correct LB service based on APIServer publishing strategy (Route → router, LoadBalancer → platform-specific KAS LB). An Eventually block waits for the service's LoadBalancerSourceRanges to match before checking KAS reachability.

Test Plan

  • go build -tags e2e ./test/e2e/... — compiles
  • go build -tags e2ev2 ./test/e2e/v2/... — compiles
  • go vet -tags e2e ./test/e2e/... — passes
  • Re-run v2 rehearsal on openshift/release#79048 after merge

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved API server CIDR restriction validation to ensure network rules are reconciled correctly across publishing strategies (Route and LoadBalancer).
  • Tests

    • Strengthened network reachability tests by recreating client connections per attempt to more reliably detect newly applied network security restrictions.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 8, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 8, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 8, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 8, 2026

@bryan-cox: This pull request references CNTRLPLANE-3371 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What

Fixes the ValidateKubeAPIServerAllowedCIDRs e2e test so it passes on v2 Azure self-managed clusters where KAS uses Route publishing strategy (via --external-dns-domain).

Why

The test was skipped in v2 CI (--ginkgo.skip="KAS allowed CIDRs") because it always failed. Root cause: two issues compound to make the test pass on v1 but fail on v2.

1. Missing downstream service wait

The test waits for AllowedCIDRBlocks to propagate from the HostedCluster to the HostedControlPlane, but does not wait for the CPO to reconcile the downstream LoadBalancer service's LoadBalancerSourceRanges. With Route strategy, the relevant service is the router LB service (not a KAS LB). The CPO reconciliation adds a delay that the test doesn't account for.

2. HTTP/2 connection reuse

The test reuses a single kubeclient.Clientset across all ServerVersion() poll iterations. Go's HTTP/2 transport multiplexes all requests over a single persistent TCP connection. If the first poll succeeds before Azure NSG rules take effect, all subsequent polls reuse that connection and never observe the expected failure.

Changes

test/e2e/util/util.go — single file, three changes:

  1. ensureAPIServerAllowedCIDRs signature: *kubeclient.Clientset*rest.Config to enable fresh client creation per poll
  2. Strategy-aware service wait: New allowedCIDRsTargetService() helper determines the correct LB service based on APIServer publishing strategy (Route → router, LoadBalancer → platform-specific KAS LB). An Eventually block waits for the service's LoadBalancerSourceRanges to match before checking KAS reachability.
  3. Fresh kubeclient per poll: Each ServerVersion() iteration creates a new client via kubeclient.NewForConfig(rest.CopyConfig(guestConfig)), preventing HTTP/2 connection reuse.

Test Plan

  • go build -tags e2e ./test/e2e/... — compiles
  • go build -tags e2ev2 ./test/e2e/v2/... — compiles
  • go vet -tags e2e ./test/e2e/... — passes
  • Re-run v2 rehearsal on openshift/release#79048 after merge

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 78a6cdb2-9159-4345-8793-003daec6feca

📥 Commits

Reviewing files that changed from the base of the PR and between d312b79 and 51d7116.

📒 Files selected for processing (1)
  • test/e2e/util/util.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/e2e/util/util.go

📝 Walkthrough

Walkthrough

The test utility ValidateKubeAPIServerAllowedCIDRs now accepts the guest cluster's rest.Config and delegates to ensureAPIServerAllowedCIDRs which waits for the control-plane to reconcile HostedCluster AllowedCIDRBlocks into the appropriate downstream Service's spec.LoadBalancerSourceRanges (selected by APIServer publishing strategy and cloud specifics). After reconciliation, reachability checks are performed in a polling loop that creates a fresh guest kubeclient per attempt to avoid HTTP/2 connection reuse. A new helper allowedCIDRsTargetService selects the Service to monitor for enforcement.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Harness
    participant CP as Control-Plane Operator
    participant LB as Downstream Service / LoadBalancer
    participant GuestAPI as Guest kube-apiserver

    Test->>CP: Apply AllowedCIDRBlocks to HostedCluster spec
    Note right of CP: Reconciler updates target Service
    CP->>LB: Update spec.LoadBalancerSourceRanges
    loop Polling
        Test->>LB: Get Service.LoadBalancerSourceRanges
        alt ranges match expected
            Note right of Test: proceed to reachability checks
            Test->>GuestAPI: Create fresh kubeclient per attempt
            GuestAPI-->>Test: ServerVersion() (reachable/unreachable)
        else not yet reconciled
            Test-->>Test: wait and retry
        end
    end
Loading
🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: fixing the AllowedCIDRs e2e test for Route-based KAS with the associated JIRA reference.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The modified file (test/e2e/util/util.go) contains only utility helper functions, no Ginkgo test declarations. Test calling sites use static, descriptive test names free from dynamic information.
Test Structure And Quality ✅ Passed Proper cleanup via defer, timeouts on all Eventually blocks, meaningful assertion messages, single responsibility design, and consistent codebase patterns (Gomega, UpdateObject, nil-checks).
Microshift Test Compatibility ✅ Passed This PR modifies utility functions in test/e2e/util/util.go, not adding new Ginkgo tests. The check only applies when new Ginkgo tests are added. No Ginkgo test definitions found.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No multi-node assumptions found. Test only performs API calls and validates networking policies, both SNO-compatible.
Topology-Aware Scheduling Compatibility ✅ Passed This PR only modifies test/e2e/util/util.go (test utility code), not deployment manifests, operator code, or controllers. The topology-aware scheduling check is not applicable to test code.
Ote Binary Stdout Contract ✅ Passed No stdout violations detected. Modified test functions contain zero stdout writes, use no problematic logging libraries, and are called only from Ginkgo It() test code blocks.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR modifies utility helper functions only. No new Ginkgo test declarations (It(), Describe(), Context(), When()) are added. Custom check only applies to new tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 8, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels May 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 37.53%. Comparing base (b0a10c5) to head (51d7116).
⚠️ Report is 29 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8469      +/-   ##
==========================================
+ Coverage   37.49%   37.53%   +0.04%     
==========================================
  Files         751      751              
  Lines       91984    92026      +42     
==========================================
+ Hits        34487    34544      +57     
+ Misses      54854    54841      -13     
+ Partials     2643     2641       -2     

see 6 files with indirect coverage changes

Flag Coverage Δ
cmd-support 32.76% <ø> (+0.12%) ⬆️
cpo-hostedcontrolplane 36.77% <ø> (ø)
cpo-other 37.76% <ø> (+0.03%) ⬆️
hypershift-operator 47.93% <ø> (ø)
other 27.77% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The ValidateKubeAPIServerAllowedCIDRs test fails on v2 Azure
self-managed clusters because KAS uses Route publishing strategy
(via external-dns-domain), not LoadBalancer.

Two fixes:

1. Wait for the downstream LB service (router or KAS LB) to have its
   LoadBalancerSourceRanges updated by the CPO before asserting KAS
   reachability. The target service is determined by the HC's APIServer
   publishing strategy.

2. Create a fresh kubeclient per poll iteration to prevent HTTP/2
   connection reuse. Go's HTTP/2 transport multiplexes all requests over
   a single persistent TCP connection — if a prior request succeeded
   before Azure NSG rules took effect, subsequent requests bypass the
   restriction on the same connection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bryan-cox bryan-cox marked this pull request as ready for review May 8, 2026 19:29
@bryan-cox
Copy link
Copy Markdown
Member Author

/pipeline required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/testing Indicates the PR includes changes for e2e testing do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants