ovn-kubernetes, cluster-network-operator: scope hypershift-kubevirt conformance to sig-kubevirt#79039
ovn-kubernetes, cluster-network-operator: scope hypershift-kubevirt conformance to sig-kubevirt#79039qinqon wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (12)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (4)
WalkthroughMultiple CI operator release configs under ChangesTest Job Environment Configuration (OVN Kubernetes & Cluster Network Operator releases)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/pj-rehearse |
|
@qinqon: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
56af2f1 to
c40c744
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: qinqon The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…nformance to sig-kubevirt
The e2e-aws-ovn-hypershift-kubevirt (ovn-kubernetes) and
e2e-aws-hypershift-ovn-kubevirt (cluster-network-operator) tests
inherit the default TEST_SUITE 'openshift/conformance/parallel' and
run the entire upstream OpenShift parallel conformance suite (~1500
tests) on top of a HyperShift hosted cluster whose worker nodes are
KubeVirt VMs nested in AWS.
This is both wasteful and fragile:
- The KubeVirt-on-AWS nested topology is sensitive to infra hiccups;
any transient VM/storage stall fails dozens of unrelated tests
simultaneously (DNS, oauth, services, sysctl, cpu_partitioning, ...)
and turns a single infra glitch into a job-wide red signal.
- The job names ('hypershift-kubevirt') advertise kubevirt-specific
coverage, but the only kubevirt-specific tests in the suite are
'[sig-kubevirt]'. Everything else is generic conformance also run
by many other gates.
Match the existing pattern from
openshift/hypershift/openshift-hypershift-release-4.18__periodics.yaml
which already scopes its kubevirt conformance run to:
TEST_SUITE: openshift/conformance/parallel/minimal
TEST_INCLUDES: sig-kubevirt
Both jobs use the same 'hypershift-kubevirt-conformance' workflow,
so the same env override applies. Scope:
- ovn-kubernetes: release-4.18, release-4.19, release-4.20 (job
does not exist on later branches).
- cluster-network-operator: release-4.18 through release-5.1, plus
master.
Expected impact: significantly faster runs, far fewer false-failure
test cases when the underlying KubeVirt infra hiccups, and a focused
gate signal aligned with the job's name.
Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
c40c744 to
3669cef
Compare
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse |
|
@qinqon: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@qinqon: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Scope the
hypershift-kubevirt-conformanceAWS workflow runs to the[sig-kubevirt]subset of the conformance suite, instead of running the entire ~1500-test parallel suite, on:openshift/ovn-kubernetes(teste2e-aws-ovn-hypershift-kubevirt) — release-4.18, release-4.19, release-4.20 (job does not exist on later branches)openshift/cluster-network-operator(teste2e-aws-hypershift-ovn-kubevirt) — release-4.18 through release-5.1, plusmasterBoth tests use the same
hypershift-kubevirt-conformancestep-registry workflow, so the same env override applies.Why
These jobs currently inherit the default
TEST_SUITE: openshift/conformance/paralleland run all of OpenShift's parallel conformance tests on top of a HyperShift hosted cluster whose worker nodes are KubeVirt VMs nested inside an AWS infra cluster.This is both wasteful and fragile:
hypershift-kubevirt) advertise kubevirt-specific coverage, but the only kubevirt-specific tests in the suite are[sig-kubevirt]. Everything else is generic conformance that is already run by many other gates.What
Add to each affected test stanza:
This matches the existing pattern from
openshift/hypershift/openshift-hypershift-release-4.18__periodics.yaml, which is the only AWS hypershift-kubevirt-conformance job in the entire repo that already scopes itself correctly.Files changed (12)
ovn-kubernetes:
ci-operator/config/openshift/ovn-kubernetes/openshift-ovn-kubernetes-release-4.18.yamlci-operator/config/openshift/ovn-kubernetes/openshift-ovn-kubernetes-release-4.19.yamlci-operator/config/openshift/ovn-kubernetes/openshift-ovn-kubernetes-release-4.20.yamlcluster-network-operator:
ci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-master.yamlci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-release-4.18.yamlci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-release-4.19.yamlci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-release-4.20.yamlci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-release-4.21.yamlci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-release-4.22.yamlci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-release-4.23.yamlci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-release-5.0.yamlci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-release-5.1.yamlValidation
make ci-operator-prowgen— clean (no Prow job spec changes; env is read at runtime from the unresolved-config registry)make sanitize-prow-jobs— passedmake ci-operator-checkconfig— passed (Configs reloaded)Background
Investigating a recent failure of
pull-ci-openshift-ovn-kubernetes-release-4.20-e2e-aws-ovn-hypershift-kubevirtshowed the root cause was an in-guest kernel hung-task (disk I/O stall on both KubeVirt worker VMs simultaneously) ~80 minutes into the run. Because the full conformance suite was running, the single infra event caused 7 sig-kubevirt and many additional unrelated tests to fail with "no ready, schedulable nodes in the cluster". Scoping the jobs to their intended purpose makes the gate signal both faster and more meaningful.