CNTRLPLANE-3371: Fix AllowedCIDRs e2e test for Route-based KAS#8469
CNTRLPLANE-3371: Fix AllowedCIDRs e2e test for Route-based KAS#8469bryan-cox wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Skipping CI for Draft Pull Request. |
|
@bryan-cox: This pull request references CNTRLPLANE-3371 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe test utility ValidateKubeAPIServerAllowedCIDRs now accepts the guest cluster's rest.Config and delegates to ensureAPIServerAllowedCIDRs which waits for the control-plane to reconcile HostedCluster AllowedCIDRBlocks into the appropriate downstream Service's spec.LoadBalancerSourceRanges (selected by APIServer publishing strategy and cloud specifics). After reconciliation, reachability checks are performed in a polling loop that creates a fresh guest kubeclient per attempt to avoid HTTP/2 connection reuse. A new helper allowedCIDRsTargetService selects the Service to monitor for enforcement. Sequence Diagram(s)sequenceDiagram
participant Test as Test Harness
participant CP as Control-Plane Operator
participant LB as Downstream Service / LoadBalancer
participant GuestAPI as Guest kube-apiserver
Test->>CP: Apply AllowedCIDRBlocks to HostedCluster spec
Note right of CP: Reconciler updates target Service
CP->>LB: Update spec.LoadBalancerSourceRanges
loop Polling
Test->>LB: Get Service.LoadBalancerSourceRanges
alt ranges match expected
Note right of Test: proceed to reachability checks
Test->>GuestAPI: Create fresh kubeclient per attempt
GuestAPI-->>Test: ServerVersion() (reachable/unreachable)
else not yet reconciled
Test-->>Test: wait and retry
end
end
🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8469 +/- ##
==========================================
+ Coverage 37.49% 37.53% +0.04%
==========================================
Files 751 751
Lines 91984 92026 +42
==========================================
+ Hits 34487 34544 +57
+ Misses 54854 54841 -13
+ Partials 2643 2641 -2 see 6 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
The ValidateKubeAPIServerAllowedCIDRs test fails on v2 Azure self-managed clusters because KAS uses Route publishing strategy (via external-dns-domain), not LoadBalancer. Two fixes: 1. Wait for the downstream LB service (router or KAS LB) to have its LoadBalancerSourceRanges updated by the CPO before asserting KAS reachability. The target service is determined by the HC's APIServer publishing strategy. 2. Create a fresh kubeclient per poll iteration to prevent HTTP/2 connection reuse. Go's HTTP/2 transport multiplexes all requests over a single persistent TCP connection — if a prior request succeeded before Azure NSG rules took effect, subsequent requests bypass the restriction on the same connection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/pipeline required |
What
Fixes the
ValidateKubeAPIServerAllowedCIDRse2e test so it passes on v2 Azure self-managed clusters where KAS uses Route publishing strategy (via--external-dns-domain).Why
The test was skipped in v2 CI (
--ginkgo.skip="KAS allowed CIDRs") because it always failed. Both v1 and v2 Azure self-managed use Route strategy for KAS, but v1 passes while v2 fails due to a difference in cluster lifecycle timing combined with HTTP/2 connection reuse.Root cause: HTTP/2 connection reuse
The test reuses a single
kubeclient.Clientsetacross allServerVersion()poll iterations. Go's HTTP/2 transport multiplexes all requests over a single persistent TCP connection. If the first poll succeeds before Azure NSG rules take effect, all subsequent polls reuse that connection and never observe the expected failure.Why v1 passes but v2 fails: In v1, the cluster is created fresh inside
TestCreateCluster, so the CPO is in its initial reconciliation burst — the router service'sLoadBalancerSourceRangesand corresponding Azure NSG rules are updated before the firstServerVersion()call. In v2, the cluster is pre-created and shared across tests, so the CPO is in steady-state with longer reconciliation intervals. The firstServerVersion()call succeeds before the NSG rules catch up, and HTTP/2 holds that connection open for all subsequent polls.Additional fix: missing downstream service wait
The test waits for
AllowedCIDRBlocksto propagate from the HostedCluster to the HostedControlPlane, but does not wait for the CPO to reconcile the downstream LoadBalancer service'sLoadBalancerSourceRanges. This is a race condition that exists in both v1 and v2 — v1 just happens to win the race due to CPO being in active reconciliation. Adding an explicit wait makes the test correct rather than relying on timing.Changes
test/e2e/util/util.go— single file, three changes:ensureAPIServerAllowedCIDRssignature:*kubeclient.Clientset→*rest.Configto enable fresh client creation per pollServerVersion()iteration creates a new client viakubeclient.NewForConfig(rest.CopyConfig(guestConfig)), preventing HTTP/2 connection reuse.allowedCIDRsTargetService()helper determines the correct LB service based on APIServer publishing strategy (Route →router, LoadBalancer → platform-specific KAS LB). AnEventuallyblock waits for the service'sLoadBalancerSourceRangesto match before checking KAS reachability.Test Plan
go build -tags e2e ./test/e2e/...— compilesgo build -tags e2ev2 ./test/e2e/v2/...— compilesgo vet -tags e2e ./test/e2e/...— passes🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests