Skip to content

OCPQE-31585: Auto-detect HyperShift with Kubernetes management cluster#30931

Open
xiuwang wants to merge 2 commits intoopenshift:mainfrom
xiuwang:fix-hypershift-aks-clusterversion-detection
Open

OCPQE-31585: Auto-detect HyperShift with Kubernetes management cluster#30931
xiuwang wants to merge 2 commits intoopenshift:mainfrom
xiuwang:fix-hypershift-aks-clusterversion-detection

Conversation

@xiuwang
Copy link
Contributor

@xiuwang xiuwang commented Mar 25, 2026

HyperShift tests fail when running on pure Kubernetes management clusters (AKS) with errors like:

the server doesn't have a resource type "clusterversion"

This occurs because:

  1. HyperShift can run on Kubernetes management clusters (e.g., AKS HCP)
  2. The current compat_otp.NewCLIForKubeOpenShift() function uses exutil.NewCLI() which tries to access
    OpenShift-specific APIs
  3. Pure Kubernetes clusters don't have OpenShift APIs like clusterversion.config.openshift.io

Solution

Add automatic detection of HyperShift environments with Kubernetes management clusters in
test/extended/util/compat_otp/client.go:

Key Changes:

  1. isHyperShiftWithKubernetesManagement() - Detects if:

    • HYPERSHIFT=true environment variable is set
    • The cluster does NOT have clusterversion API (indicating Kubernetes, not OpenShift)
  2. hasClusterVersionAPI() - Checks if the cluster has OpenShift's clusterversion API by querying
    api-resources

  3. NewCLIForKubeOpenShift() enhancement - Now automatically returns NewCLIWithoutNamespace() for
    HyperShift on Kubernetes management clusters

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 25, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 25, 2026

@xiuwang: This pull request references OCPQE-31585 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

HyperShift tests fail when running on pure Kubernetes management clusters (AKS) with errors like:

the server doesn't have a resource type "clusterversion"

This occurs because:

  1. HyperShift can run on Kubernetes management clusters (e.g., AKS HCP)
  2. The current compat_otp.NewCLIForKubeOpenShift() function uses exutil.NewCLI() which tries to access
    OpenShift-specific APIs
  3. Pure Kubernetes clusters don't have OpenShift APIs like clusterversion.config.openshift.io

Solution

Add automatic detection of HyperShift environments with Kubernetes management clusters in
test/extended/util/compat_otp/client.go:

Key Changes:

  1. isHyperShiftWithKubernetesManagement() - Detects if:

    • HYPERSHIFT=true environment variable is set
    • The cluster does NOT have clusterversion API (indicating Kubernetes, not OpenShift)
  2. hasClusterVersionAPI() - Checks if the cluster has OpenShift's clusterversion API by querying
    api-resources

  3. NewCLIForKubeOpenShift() enhancement - Now automatically returns NewCLIWithoutNamespace() for
    HyperShift on Kubernetes management clusters

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from ming1013 and rioliu-rh March 25, 2026 07:06
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 25, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xiuwang
Once this PR has been reviewed and has the lgtm label, please assign tomasdavidorg for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Mar 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e9e06f55-5e2c-4cb0-ae38-99957ba5197a

📥 Commits

Reviewing files that changed from the base of the PR and between 4692b63 and 2fa2235.

📒 Files selected for processing (1)
  • test/extended/util/compat_otp/client.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/extended/util/compat_otp/client.go

Walkthrough

Added HyperShift-aware runtime detection in NewCLIForKubeOpenShift: if HYPERSHIFT=="true" and the target cluster lacks the OpenShift clusterversion API, the function returns exutil.NewCLIWithoutNamespace(basename). New helpers probe the environment using oc with a 5s timeout.

Changes

Cohort / File(s) Summary
HyperShift CLI Detection
test/extended/util/compat_otp/client.go
Modified NewCLIForKubeOpenShift to return exutil.NewCLIWithoutNamespace(basename) when HYPERSHIFT=="true" and clusterversions is absent. Added helpers isHyperShiftWithKubernetesManagement() and hasClusterVersionAPI() which run oc api-resources with a 5s timeout and inspect errors to infer absence of the OpenShift API. Added imports: context, errors, os/exec, time.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/extended/util/compat_otp/client.go`:
- Around line 43-44: Replace the call to
compat_otp.NewCLIWithoutNamespace(basename) with a direct call to
exutil.NewCLIWithoutNamespace(basename) so the Ginkgo BeforeEach(SetupProject)
hook (registered by exutil.NewCLI(basename)) is not registered; this ensures
SetupProject() is not executed—locate the return in
NewCLIWithoutNamespace/compat_otp.NewCLIWithoutNamespace and change it to call
exutil.NewCLIWithoutNamespace(basename) instead of going through
compat_otp.NewCLI.
- Around line 150-160: Use a context timeout and explicit error classification
instead of always returning false on any cmd.CombinedOutput error: replace
exec.Command with exec.CommandContext using a context.WithTimeout (e.g., 5s) and
check for context.DeadlineExceeded to treat timeouts as real failures; detect
exec.ErrNotFound (missing oc binary) from cmd.Start/Run/Error and treat it as a
real failure; when the process exits with a non-zero exit code, inspect the
combined output/stderr for expected "API not found" or "no resources found"
messages and only return false for those known-not-found patterns, otherwise
propagate or log the error (do not silently return false). Reference symbols:
exec.CommandContext, context.WithTimeout, cmd.CombinedOutput, exec.ErrNotFound,
KUBECONFIG.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5b303466-2cf2-4b1e-a91b-6d81c4e53dd3

📥 Commits

Reviewing files that changed from the base of the PR and between 394edfe and 4692b63.

📒 Files selected for processing (1)
  • test/extended/util/compat_otp/client.go

Comment on lines +150 to +160
cmd := exec.Command("oc", "api-resources", "--api-group=config.openshift.io", "-o=name")

// Use KUBECONFIG from environment if set
if kubeconfig := os.Getenv("KUBECONFIG"); kubeconfig != "" {
cmd.Env = append(os.Environ(), "KUBECONFIG="+kubeconfig)
}

output, err := cmd.CombinedOutput()
if err != nil {
// If the command fails, assume no clusterversion API
return false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for timeout-related code in the repository
echo "=== Searching for timeout mechanisms in the codebase ==="
rg -i "timeout|context\." test/ --type go | head -30

Repository: openshift/origin

Length of output: 3897


🏁 Script executed:

# Search for other uses of oc api-resources command
echo "=== Searching for other 'oc api-resources' usage ==="
rg "oc.*api-resources" . --type go

Repository: openshift/origin

Length of output: 1232


🏁 Script executed:

# Check if there are any error type checks in this function or similar functions
echo "=== Checking for error handling patterns in the function ==="
sed -n '140,170p' test/extended/util/compat_otp/client.go -A 20 | rg -E "err|Error|status"

Repository: openshift/origin

Length of output: 2357


🏁 Script executed:

# Also check if there are any error type checks or status code handling elsewhere
rg -A 10 "CombinedOutput" test/extended/util/compat_otp/client.go

Repository: openshift/origin

Length of output: 309


🏁 Script executed:

# Check the function name and beginning to understand its purpose
head -50 test/extended/util/compat_otp/client.go | cat -n

Repository: openshift/origin

Length of output: 2048


🏁 Script executed:

# Search for where HasClusterVersionAPI is called (if that's the function name)
rg "HasClusterVersionAPI|clusterversions" test/ --type go -B 2 -A 2

Repository: openshift/origin

Length of output: 4400


🏁 Script executed:

# Check for EnvIsKubernetesCluster constant definition
rg "EnvIsKubernetesCluster" test/extended/util/compat_otp/client.go

Repository: openshift/origin

Length of output: 109


🏁 Script executed:

# Check the complete isHyperShiftWithKubernetesManagement function
rg -B 5 "isHyperShiftWithKubernetesManagement" test/extended/util/compat_otp/client.go -A 20

Repository: openshift/origin

Length of output: 2403


🏁 Script executed:

# Check if there's any context timeout used elsewhere in this file
rg "context\.|timeout|Context" test/extended/util/compat_otp/client.go

Repository: openshift/origin

Length of output: 147


🏁 Script executed:

# Search for exec.Command usage with timeouts in the codebase
rg "exec\.Command.*context|context.*exec\.Command" test/ --type go -B 2 -A 5 | head -40

Repository: openshift/origin

Length of output: 42


🏁 Script executed:

# Let's also check if there are any exec.CommandContext usages in the codebase to see the pattern
rg "CommandContext|exec\.CommandContext" test/ --type go -B 2 -A 3 | head -50

Repository: openshift/origin

Length of output: 1989


🏁 Script executed:

# Check if there are similar CLI probe patterns that handle errors better
rg "oc.*api-resources|exec\.Command.*oc" test/ --type go -B 3 -A 8 | head -60

Repository: openshift/origin

Length of output: 5372


🏁 Script executed:

# Search for how other parts handle CLI failures to understand the convention
rg "CombinedOutput\|Output\(\)" test/extended/util/compat_otp/ -A 3 -B 1

Repository: openshift/origin

Length of output: 42


🏁 Script executed:

# Check if there's any logging or error inspection elsewhere in the codebase
rg "exec\.exit|os/exec|ExitError" test/ --type go | head -20

Repository: openshift/origin

Length of output: 1280


🏁 Script executed:

# Check a specific example of how ExitError is used in db_image_helpers.go
rg "ExitError" test/extended/util/db_image_helpers.go -B 5 -A 3 | head -40

Repository: openshift/origin

Length of output: 396


🏁 Script executed:

# Get the complete hasClusterVersionAPI function - let's be precise
head -165 test/extended/util/compat_otp/client.go | tail -20

Repository: openshift/origin

Length of output: 843


Distinguish probe failures from expected "API not found" results and add execution timeout.

The error handler at lines 157-160 conflates multiple failure modes (missing oc binary, invalid KUBECONFIG, authentication failures, network timeouts, hanging API server) into a single false return. This masks real environment issues and defeats the detection function's purpose.

Add a context timeout using exec.CommandContext and differentiate between:

  • Expected failure: exit code indicating resource group/API doesn't exist
  • Real failures: missing binary, bad config, auth issues, timeouts

This prevents silent masking of configuration problems when the probe is used to determine environment type at line 43.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/util/compat_otp/client.go` around lines 150 - 160, Use a
context timeout and explicit error classification instead of always returning
false on any cmd.CombinedOutput error: replace exec.Command with
exec.CommandContext using a context.WithTimeout (e.g., 5s) and check for
context.DeadlineExceeded to treat timeouts as real failures; detect
exec.ErrNotFound (missing oc binary) from cmd.Start/Run/Error and treat it as a
real failure; when the process exits with a non-zero exit code, inspect the
combined output/stderr for expected "API not found" or "no resources found"
messages and only return false for those known-not-found patterns, otherwise
propagate or log the error (do not silently return false). Reference symbols:
exec.CommandContext, context.WithTimeout, cmd.CombinedOutput, exec.ErrNotFound,
KUBECONFIG.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

// In this case, we use NewCLIWithoutNamespace to avoid OpenShift-specific
// initialization (like clusterversion checks) that would fail on pure Kubernetes
if isHyperShiftWithKubernetesManagement() {
return NewCLIWithoutNamespace(basename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Can you clarify why this change is needed?

The existing exutil.NewCLI() already handles pure Kubernetes clusters automatically:

// From test/extended/util/client.go:355
func (c *CLI) SetupProject() string {
    exist, err := DoesApiResourceExist(c.AdminConfig(), "projects", "project.openshift.io")
    if exist {
        return c.setupProject()  // Uses OpenShift ProjectRequest API
    }
    return c.setupNamespace()   // Uses Kubernetes Namespace API
}

So calling exutil.NewCLI() should work fine on Kubernetes management clusters - it will automatically detect the lack of OpenShift APIs and create a Kubernetes namespace instead.

Additional point: Even if a temporary namespace is created by SetupProject(), tests can still run cluster-scoped oc commands using oc.AsAdmin() which uses admin config without namespace context.

Could you share:

  1. What specific error are you seeing with exutil.NewCLI() on Kubernetes management clusters?
  2. Is the issue during initialization, or later when tests execute?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

Let me check the codebase to better understand the flow and where the clusterversion error might originate.


🏁 Script executed:

#!/bin/bash
# Look at the WithoutNamespace implementation and NewCLI init
rg -n "WithoutNamespace\|clusterversion\|ClusterVersion" test/extended/util/client.go | head -60

# Check if clusterversion is accessed during CLI setup/BeforeEach outside SetupProject
rg -n "clusterversion" test/extended/util/ -r --include="*.go" | grep -v "_test.go" | head -40

# Find where clusterversion is accessed in compat_otp or related framework init
rg -n "clusterversion" test/extended/util/compat_otp/ -r --include="*.go" | head -20

# Check what WithoutNamespace does and how it differs from regular NewCLI
ast-grep --pattern 'func ($_ *CLI) WithoutNamespace() $_  {
  $$$
}'

# Check if there's any BeforeEach or framework init that accesses clusterversion
rg -n "clusterversion" test/extended/ -r --include="*.go" -l | head -20

Length of output: 3995


🏁 Script executed:

#!/bin/bash
# Look at the WithoutNamespace implementation
grep -n "WithoutNamespace\|setupNamespace\|setupProject\|SetupProject" test/extended/util/client.go | head -40

# Look at what happens during BeforeEach in the CLI lifecycle - specifically any clusterversion access
grep -n "clusterversion\|ClusterVersion" test/extended/util/client.go | head -20

# Look at compat_otp/clusters.go to understand where clusterversion is accessed
cat test/extended/util/compat_otp/clusters.go | head -100

# Check if exutil.NewCLI BeforeEach/AfterEach registers any hooks that access OpenShift-specific APIs
grep -n "BeforeEach\|AfterEach\|JustBeforeEach" test/extended/util/client.go | head -30

# Check what WithoutNamespace actually skips
grep -n -A5 -B2 "func.*WithoutNamespace" test/extended/util/client.go | head -30

Length of output: 6403


@rioliu-rh, great question. Looking at the code more carefully, the reviewer's concern is largely valid. Here's what I found:

The SetupProject() path is already safe:

SetupProject() (client.go line 355) checks for projects API first:

exist, err := DoesApiResourceExist(c.AdminConfig(), "projects", "project.openshift.io")
if exist {
    return c.setupProject()  // OpenShift path — accesses ClusterVersions (lines 404, 423)
}
return c.setupNamespace()    // Kubernetes path — does NOT access ClusterVersions

On a Kubernetes management cluster, there's no projects API, so it routes to setupNamespace(). The reviewer is correct that exutil.NewCLI() should not cause a clusterversion error during setup.

The real problem is likely in test bodies, not CLI initialization:

clusters.go functions like GetClusterVersion(), GetReleaseImage(), etc. call oc get clusterversion directly without any guards. If HyperShift tests call these helpers against the management cluster's kubeconfig, that's where the error surfaces — not during NewCLI() init.

The PR's solution doesn't fully address the root cause:

Returning NewCLIWithoutNamespace(basename) skips the g.BeforeEach(SetupProject) hook, but test bodies can still call GetClusterVersion() and hit the same error. So the fix treats the symptom (CLI initialization) but not the actual failure point.

Additional concern with hasClusterVersionAPI():

The new function calls exec.Command("oc", "api-resources", ...) synchronously at suite initialization time (not in a BeforeEach), which:

  • Creates a subprocess side-effect before any test runs
  • Depends on oc being in $PATH
  • Is redundant with the existing DoesApiResourceExist() utility which uses the already-configured admin REST client

A cleaner approach would be to guard the OpenShift-specific helpers in clusters.go (like GetClusterVersion) with a check like DoesApiResourceExist(...), or add a HYPERSHIFT=true + no-ClusterVersion check inside those helpers, rather than changing the CLI factory function.

Could you share the actual test or stack trace showing where the clusterversion error occurs? That would confirm whether the fix needs to be in NewCLIForKubeOpenShift or in the helper functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rioliu-rh Thanks for the review.
You're correct that SetupProject() can detect and handle Kubernetes clusters. However, the issue here is specific to the HyperShift test architecture, not just about Kubernetes vs OpenShift API differences.

The Problem:

  1. exutil.NewCLI() registers a BeforeEach hook that automatically calls SetupProject() for every test
  2. In HyperShift scenarios on Kubernetes management clusters (like AKS), this causes the test framework to create a namespace on the management cluster before each test
  3. However, HyperShift tests actually need:
    - Management cluster: Only cluster-scoped/admin access (no namespace needed)
    - Guest cluster: Namespace creation happens separately via oc.AsGuestKubeconf() with its own kubeconfig

Please correct me if I missed any thing, thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @xiuwang! I have a follow-up question to better understand the requirement:

You mentioned that HyperShift tests need "only cluster-scoped/admin access (no namespace needed)" on the management cluster. However, even if SetupProject() creates a temporary namespace, your tests can still run cluster-scoped commands without any issues:

// These work fine even with a temp namespace created:
oc.AsAdmin().Run("get").Args("nodes").Execute()
oc.AsAdmin().Run("get").Args("pods", "--all-namespaces").Execute()
oc.AsGuestKubeconf().Run("create").Args("namespace", "test").Execute()

The temp namespace on the management cluster doesn't interfere with:

  • ✅ Cluster-scoped operations via oc.AsAdmin()
  • ✅ Guest cluster operations via oc.AsGuestKubeconf()
  • ✅ Any management cluster resources

Could you clarify:

  1. Does SetupProject() actually fail on Kubernetes management clusters? If so, what's the specific error message?
  2. Or does it succeed in creating a namespace, but you're trying to avoid it for other reasons (performance, cleanliness, etc.)?
  3. If it's just about not needing a namespace - can you explain why using oc.AsAdmin() in your tests isn't sufficient?

Understanding the actual problem will help us determine if this approach is the right solution, or if there's a simpler fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, SetupProject() itself doesn't fail on Kubernetes management clusters - it correctly detects the lack of
projects.project.openshift.io API and falls back to creating a regular Kubernetes namespace.

The actual problem: Initialization before SetupProject()
The issue occurs earlier in the initialization chain, specifically in exutil.NewCLI() which is called before any
test runs:

  // From test/extended/util/client.go
  func NewCLI(namespace string, ...opts) *CLI {
      // ...
      ginkgo.BeforeEach(cli.SetupProject)  // ← Registered in BeforeEach
      // ...
  }

When exutil.NewCLI() is instantiated, it registers the BeforeEach hook. However, before this hook runs, the CLI
initialization performs several checks including:

  • Attempting to detect cluster version
  • Checking for OpenShift-specific APIs
  • Initializing admin config with OpenShift assumptions

On a pure Kubernetes cluster (like AKS management cluster), these checks produce error messages in logs like:

{  fail [github.com/openshift/openshift-tests-private/test/extended/ota/osus/osus.go:207]: Unexpected error:
   <*fmt.wrapError | 0xc001764700>: 
   Error running oc --kubeconfig=/tmp/kubeconfig-1206426696 get clusterversion -o jsonpath={..desired.version}:
   StdOut>
   error: the server doesn't have a resource type "clusterversion"
   StdErr>
   error: the server doesn't have a resource type "clusterversion"
   exit status 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation @xiuwang! However, I found the actual root cause by examining the test code.

The Real Problem: Test Code Issue, Not CLI Initialization

Looking at the error location you mentioned (openshift-tests-private/test/extended/ota/osus/osus.go:207), the issue is actually caused by module-level variable initialization in the test code:

// osus.go line 204-207
var _ = g.Describe("[sig-updates] OTA osus instance should", func() {
    defer g.GinkgoRecover()
    
    oc := compat_otp.NewCLI("osusinstace", compat_otp.KubeConfigPath())
    
    var major, minor, err = getMajorMinorVersion(oc)  // ← Line 206
    o.Expect(err).NotTo(o.HaveOccurred())             // ← Line 207

And getMajorMinorVersion() calls:

// utils.go:719
func getMajorMinorVersion(oc *exutil.CLI) (string, string, error) {
    clusterVersion, _, err := compat_otp.GetClusterVersion(oc)  // ← This runs "oc get clusterversion"
    // ...
}

The problem: This code executes at Go module load time (when the package is loaded by the runtime), NOT during BeforeEach or test execution. So GetClusterVersion() runs immediately when the test package loads, before any test hooks execute.

Why This PR Won't Fix The Problem

Your PR changes NewCLIForKubeOpenShift() to skip namespace creation, but:

  1. The error is from GetClusterVersion() calling oc get clusterversion
  2. This will fail on Kubernetes clusters regardless of whether a namespace was created
  3. The namespace creation is irrelevant to the clusterversion API check

The Correct Solution

Fix the test code in openshift-tests-private repo to check the environment before calling OpenShift-specific APIs:

var _ = g.Describe("[sig-updates] OTA osus instance should", func() {
    defer g.GinkgoRecover()
    
    oc := compat_otp.NewCLI("osusinstace", compat_otp.KubeConfigPath())
    
    // Skip clusterversion checks for pure Kubernetes clusters
    var major, minor string
    if os.Getenv("HYPERSHIFT") != "true" {
        var err error
        major, minor, err = getMajorMinorVersion(oc)
        o.Expect(err).NotTo(o.HaveOccurred())
    }
    
    // ... rest of tests
})

Or better yet, check if the API exists:

var major, minor string
exists, _ := exutil.DoesApiResourceExist(oc.AdminConfig(), "clusterversions", "config.openshift.io")
if exists {
    var err error
    major, minor, err = getMajorMinorVersion(oc)
    o.Expect(err).NotTo(o.HaveOccurred())
}

This approach:

  • ✅ Actually fixes the root cause (test calling OpenShift APIs on Kubernetes clusters)
  • ✅ No changes needed to origin repo
  • ✅ Clean separation of concerns
  • ✅ Works for all HyperShift scenarios

Summary: The issue is not about CLI initialization or namespace creation - it's about test code calling OpenShift-specific APIs at module load time without checking if they exist first. The fix should be in the test code, not in the CLI factory functions.

1. Use exutil.NewCLIWithoutNamespace directly to avoid BeforeEach hook
   - Changed from compat_otp.NewCLIWithoutNamespace to exutil.NewCLIWithoutNamespace
   - Ensures SetupProject() is not executed in HyperShift Kubernetes scenario

2. Improve hasClusterVersionAPI error handling
   - Added context timeout (5s) to prevent hanging
   - Use exec.CommandContext for better timeout control
   - Explicit error classification:
     * exec.ErrNotFound: oc binary missing
     * context.DeadlineExceeded: command timeout
     * Known patterns: 'API not found', 'no resources found'
   - Only return false for expected errors, not all errors
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 25, 2026

@xiuwang: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify 2fa2235 link true /test verify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants