Skip to content

[ROSAENG-1070] feat: Support backplane elevation for network verifier pod mode#891

Merged
openshift-merge-bot[bot] merged 2 commits into
openshift:masterfrom
feichashao:ROSAENG-1070
May 15, 2026
Merged

[ROSAENG-1070] feat: Support backplane elevation for network verifier pod mode#891
openshift-merge-bot[bot] merged 2 commits into
openshift:masterfrom
feichashao:ROSAENG-1070

Conversation

@feichashao
Copy link
Copy Markdown
Contributor

@feichashao feichashao commented May 14, 2026

Assisted by Claude Code. Reviewed and validated by Human.

What this PR changes?

This PR adds backplane elevation support for running network verifier in pod mode:

  • It adds a --reason flag in a similar way like other osdctl commands. If --reason is provided, it will run as backplane-cluster-admin.
  • It adds the corresponding function in pkg/k8s/client.go to get the kubeconfig with impersonation.
  • It updates the corresponding unit tests and help messages.

Why?

We have change the backplane RBAC recently, SREP by default has read-only permissions. Users will see the below error when running in pod mode without elevation:

network verifier error: network verifier error: failed to create job osd-network-verifier-job-1778565867: jobs.batch is forbidden: User "system:serviceaccount:openshift-backplane-srep:e97d22975f0604cbb17fe609bde70890" cannot create resource "jobs" in API group "batch" in the namespace "openshift-network-diagnostics"

Validation

After the change in this PR, the osdctl can create the job successfully:

osdctl network verify-egress -C xxxxxxxx --reason "ROSAENG-1070"
2026/05/14 10:44:34 Cluster is HCP - forcing pod mode.
2026/05/14 10:44:34 Preparing to run pod-based network verification in namespace openshift-network-diagnostics.
2026/05/14 10:44:38 Pod mode using elevated backplane credentials (backplane-cluster-admin) for cluster: 2q8stp2ps0ovrf6bue76ea7bniernki3
2026/05/14 10:44:38 Pod mode initialized with namespace: openshift-network-diagnostics
2026/05/14 10:44:38 Detected AWS region from OCM: us-east-2
Using egress URL list from https://api.github.com/repos/openshift/osd-network-verifier/contents/pkg/data/egress_lists/aws-hcp.yaml?ref=main at SHA 1cb7c15e951908f44d5ff3b3589ec763cea917c9

Notes to reviewer: There's a watch issue that may lead to network-verifier hangs watching for the job finish. That is a separated issue and I raised a separated issue to address it openshift/osd-network-verifier#347 .

Summary by CodeRabbit

  • New Features

    • Added a new --reason flag for egress verification in pod-mode operations.
  • Improvements

    • Enforced validation: pod-mode with --cluster-id now requires --reason unless an explicit kubeconfig is provided.
    • When a reason is supplied, pod-mode will use elevated backplane credentials; otherwise non-elevated credentials are used.
  • Documentation

    • Updated command docs and examples to show --reason usage and pod-mode requirements.
  • Tests

    • Extended tests to cover reason-required and kubeconfig-exempt scenarios.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Walkthrough

Adds an elevation "reason" to pod-mode egress verification: EgressVerification gains a Reason field and a --reason CLI flag; input validation requires it when --pod-mode is used with --cluster-id and no --kubeconfig; REST config creation conditionally requests elevated backplane credentials when reason is provided.

Changes

Pod-Mode Elevation Reason

Layer / File(s) Summary
Core data model and CLI interface
cmd/network/verification.go
Add Reason string to EgressVerification; introduce --reason CLI flag and update examples/docs to show its usage in pod mode.
Input validation and REST-config selection
cmd/network/verification.go
Validate that pod mode with --cluster-id and no --kubeconfig requires --reason. getRestConfig now branches: if Reason present call k8s.NewRestConfigAsBackplaneClusterAdmin(...) (elevated), otherwise use existing non-elevated backplane REST config path.
Elevated credentials helper
pkg/k8s/client.go
Add NewRestConfigAsBackplaneClusterAdmin(clusterID string, elevationReasons ...string) (*rest.Config, error) which loads backplane config and returns a REST config impersonating backplane-cluster-admin with optional elevation reasons.
Tests for validation and credential priority
cmd/network/verification_pod_mode_test.go, cmd/network/verification_test.go
Extend pod-mode validation tests to require --reason when --cluster-id is used without --kubeconfig; split/rest refine REST-config tests to assert elevated vs non-elevated backplane behavior.
Documentation updates
docs/README.md, docs/osdctl_network_verify-egress.md
Document new --reason flag and update pod-mode examples and options to show requirement and behavior (elevation required for pod mode with --cluster-id unless an explicit --kubeconfig is provided).

Sequence Diagram

sequenceDiagram
  participant User as Client
  participant CLI
  participant Val as Validator
  participant K8s as pkg/k8s
  participant Back as BackplaneAuth
  participant Cluster

  Client->>CLI: run osdctl network verify-egress --pod-mode --cluster-id [--reason]
  CLI->>Val: build EgressVerification, validate inputs
  Val-->>CLI: validation OK / error (requires --reason)
  CLI->>K8s: request REST config (cluster-id, elevationReasons?)
  alt Reason provided
    K8s->>Back: load backplane config & request rest config as "backplane-cluster-admin" (with reason)
    Back-->>K8s: elevated rest.Config
  else No reason
    K8s->>Back: load backplane config & request standard backplane REST config
    Back-->>K8s: non-elevated rest.Config
  end
  K8s->>Cluster: use rest.Config to operate against cluster
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning TestEgressVerification_ValidateInput_PodMode lacks assertion messages. Lines 346, 347, 351 use assert.Error/NoError without messages, violating requirement for meaningful failure messages. Add messages to assertions: Line 346: assert.Error(t, err, tt.expectedResult), Line 347: assert.Contains(..., msg), Line 351: assert.NoError(t, err, msg). TestEgressVerification_GetRestConfig correctly includes messages.
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding backplane elevation support for the network verifier in pod mode, which is the core purpose of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR uses standard Go testing, not Ginkgo. All test names are static descriptive strings with no dynamic values.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. All test files use Go's standard testing.T framework (unit tests). The check for MicroShift compatibility of Ginkgo e2e tests is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added in this PR. All test changes are updates to existing unit tests in the osdctl CLI tool repository. SNO compatibility check does not apply to non-e2e unit test code.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only CLI tool code and documentation. No deployment manifests, operators, or controllers are added/modified. No scheduling constraints are introduced.
Ote Binary Stdout Contract ✅ Passed PR additions use logging.Logger (stderr) for logs and fmt.Errorf for error returns. No new stdout writes in process-level code. Pre-existing fmt.Println calls are not part of this PR's scope.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR does not introduce Ginkgo e2e tests. It only contains standard Go unit tests and production code. The custom check is specific to Ginkgo e2e tests and is not applicable to this PR.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from iamkirkbater and zmird-r May 14, 2026 01:11
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmd/network/verification.go (1)

255-263: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce --reason after HCP auto-switches to pod mode.

validateInput() runs before Line 255 can force pod mode for HCP. That allows Line 677 to be skipped, and execution can proceed with non-elevated backplane creds in pod mode.

💡 Proposed fix
@@
 	if !e.PodMode && platform == cloud.AWSHCP {
 		e.log.Info(ctx, "Cluster is HCP - forcing pod mode.")
 		e.PodMode = true
 	}
+
+	if e.PodMode && e.ClusterId != "" && e.KubeConfig == "" && strings.TrimSpace(e.Reason) == "" {
+		log.Fatalf("pod mode with --cluster-id requires --reason flag for elevation (write operations need backplane-cluster-admin). Example: --reason 'PD-12345' or --reason 'OHSS-67890'")
+	}

Also applies to: 676-679

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/network/verification.go` around lines 255 - 263, After auto-switching to
pod mode for HCP in the block that sets e.PodMode = true when platform ==
cloud.AWSHCP, ensure the presence of the --reason flag (or non-empty e.Reason)
is enforced before proceeding: either re-run the higher-level validation (call
validateInput()) or add an explicit check that e.Reason (the CLI --reason flag)
is set and return a fatal error if it is missing; update the logic near the
e.PodMode assignment so validatePodModeCompatibility() cannot be reached without
confirming --reason, referencing e.PodMode, platform/cloud.AWSHCP,
validateInput(), and validatePodModeCompatibility() to locate where to insert
the check.
🧹 Nitpick comments (1)
cmd/network/verification_test.go (1)

913-934: ⚡ Quick win

Align expectError values with the actual assertion flow.

These new cases set expectError: false, but the test body still expects err != nil for non-default-kubeconfig paths. This makes table semantics misleading.

💡 Proposed fix
 		{
 			name: "priority_2_backplane_credentials_with_elevation",
@@
-			expectError:    false,
+			expectError:    true,
 			expectedResult: "backplane credentials with elevation should be used",
 		},
 		{
 			name: "priority_2_backplane_credentials_without_elevation",
@@
-			expectError:    false,
+			expectError:    true,
 			expectedResult: "backplane credentials without elevation should be used",
 		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/network/verification_test.go` around lines 913 - 934, The table-driven
tests for the two cases "priority_2_backplane_credentials_with_elevation" and
"priority_2_backplane_credentials_without_elevation" have expectError set to
false while the test body asserts an error for non-default kubeconfig paths;
update the table to reflect the actual assertion flow by setting expectError:
true for these EgressVerification entries (or alternatively change the test body
to assert no error), referencing the test case names and the EgressVerification
struct fields (ClusterId, KubeConfig, log) so the expectation matches the err
check in the test runner.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/network/verification.go`:
- Around line 729-737: The logs currently announcing credential selection are
emitted before checking the result of k8s.NewRestConfigAsBackplaneClusterAdmin /
k8s.NewRestConfig, which can mislead when restConfig creation fails; update the
block that sets restConfig, err (using e.Reason, e.ClusterId) so that you first
call the appropriate constructor, check err, and only call e.log.Info(ctx, ...)
after err == nil to confirm successful credential creation—ensure any logging on
error uses e.log.Error or similar and includes err details.

---

Outside diff comments:
In `@cmd/network/verification.go`:
- Around line 255-263: After auto-switching to pod mode for HCP in the block
that sets e.PodMode = true when platform == cloud.AWSHCP, ensure the presence of
the --reason flag (or non-empty e.Reason) is enforced before proceeding: either
re-run the higher-level validation (call validateInput()) or add an explicit
check that e.Reason (the CLI --reason flag) is set and return a fatal error if
it is missing; update the logic near the e.PodMode assignment so
validatePodModeCompatibility() cannot be reached without confirming --reason,
referencing e.PodMode, platform/cloud.AWSHCP, validateInput(), and
validatePodModeCompatibility() to locate where to insert the check.

---

Nitpick comments:
In `@cmd/network/verification_test.go`:
- Around line 913-934: The table-driven tests for the two cases
"priority_2_backplane_credentials_with_elevation" and
"priority_2_backplane_credentials_without_elevation" have expectError set to
false while the test body asserts an error for non-default kubeconfig paths;
update the table to reflect the actual assertion flow by setting expectError:
true for these EgressVerification entries (or alternatively change the test body
to assert no error), referencing the test case names and the EgressVerification
struct fields (ClusterId, KubeConfig, log) so the expectation matches the err
check in the test runner.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 81518ba9-9394-4dca-9ab7-3bb9783816bd

📥 Commits

Reviewing files that changed from the base of the PR and between 5b99f69 and 462b4c2.

📒 Files selected for processing (4)
  • cmd/network/verification.go
  • cmd/network/verification_pod_mode_test.go
  • cmd/network/verification_test.go
  • pkg/k8s/client.go

Comment thread cmd/network/verification.go
@joshbranham
Copy link
Copy Markdown
Contributor

/lgtm
/approve

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 15, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: feichashao, joshbranham

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 629b5e5 and 2 for PR HEAD b2ba9c5 in total

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 15, 2026

@feichashao: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit b334928 into openshift:master May 15, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants