Skip to content

NE-2529: add new dcm e2e tests#30926

Open
jcmoraisjr wants to merge 1 commit intoopenshift:mainfrom
jcmoraisjr:NE-2529-dcm-e2e-tests
Open

NE-2529: add new dcm e2e tests#30926
jcmoraisjr wants to merge 1 commit intoopenshift:mainfrom
jcmoraisjr:NE-2529-dcm-e2e-tests

Conversation

@jcmoraisjr
Copy link
Member

Add DCM e2e tests focused on user experience.

Here are some requisites and thoughts kept in mind while designing and coding the tests:

  • Document what the test should do and its expectations.
  • Make it as much focused on user experience as possible: create the scenarios and validate the result checking the behavior in the same way the user would observe it.
  • Make it more declarative: create the pre-requisite for the test inside the test itself, instead of relying on pre-configured requisites.
  • Make it simple to follow: try to abstract complexities into higher level calls, making the test itself smaller, but also comprehensive.

https://redhat.atlassian.net/browse/NE-2529

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 24, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 24, 2026

@jcmoraisjr: This pull request references NE-2529 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add DCM e2e tests focused on user experience.

Here are some requisites and thoughts kept in mind while designing and coding the tests:

  • Document what the test should do and its expectations.
  • Make it as much focused on user experience as possible: create the scenarios and validate the result checking the behavior in the same way the user would observe it.
  • Make it more declarative: create the pre-requisite for the test inside the test itself, instead of relying on pre-configured requisites.
  • Make it simple to follow: try to abstract complexities into higher level calls, making the test itself smaller, but also comprehensive.

https://redhat.atlassian.net/browse/NE-2529

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 24, 2026

Walkthrough

The PR introduces router selector label constants and adds an e2e test suite validating HAProxy router behavior with Dynamic Configuration Manager enabled on IngressController. The test suite covers deployment exposure, route management, and scale-out operations with helper functions for pod interaction and test stack management.

Changes

Cohort / File(s) Summary
Router Configuration Constants
test/extended/router/config_manager.go
Introduced routerSelectorKey and routerSelectorValue constants and updated createRoute to construct the selector string dynamically instead of using a hardcoded value.
Router Ingress E2E Tests
test/extended/router/config_manager_ingress.go
Added comprehensive e2e test suite for HAProxy router with DCM enabled, including helper routines for pod interaction (readLocalURL, waitLocalURL), route/deployment/service stack management via routeStackBuilder, and test cases for basic functionality and scale-out operations. Multiple test cases include TODO placeholders for scale-in, route updates, balancing, metrics, and resource usage validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 24, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcmoraisjr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 24, 2026
@openshift-ci openshift-ci bot requested review from Miciah and gcs278 March 24, 2026 16:07
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
test/extended/router/config_manager_ingress.go (3)

172-173: Use sets.New[string] instead of deprecated sets.NewString.

sets.NewString is deprecated. The codebase should use the generic sets.New[string]() for consistency with modern Go practices.

♻️ Proposed fix
-			expectedServers := sets.NewString(currentServers...)
-			actualServers := sets.NewString()
+			expectedServers := sets.New[string](currentServers...)
+			actualServers := sets.New[string]()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/router/config_manager_ingress.go` around lines 172 - 173,
Replace deprecated sets.NewString usage: initialize expectedServers with
sets.New[string](currentServers...) instead of
sets.NewString(currentServers...), and initialize actualServers with
sets.New[string]() instead of sets.NewString(); update both occurrences where
variables expectedServers and actualServers are declared to use the generic
sets.New[string] constructor.

187-224: TODO tests are well-documented placeholders.

The descriptions clearly outline the test intent and expected behavior. Consider adding g.Skip("not yet implemented") to make it explicit that these are pending tests.

Optional: Add Skip to make pending tests explicit
 		g.It("should handle scale-in operations", func() {
-			// TODO
+			g.Skip("TODO: not yet implemented")
 		})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/router/config_manager_ingress.go` around lines 187 - 224,
Replace the TODO bodies inside the g.It tests in config_manager_ingress.go with
explicit skips so they are marked pending; for each g.It("should ...") block
(the scale-in, various route updates, balancing after scale-out/in, metrics
after scale events, and steady memory/PID usage tests) call g.Skip("not yet
implemented") as the first statement in the closure so the test runner reports
them as skipped instead of empty/implicit. Ensure each g.It closure contains
only the g.Skip invocation until the test is implemented.

10-11: Duplicate import aliases for the same package.

Both metav1 and v1 are imported from k8s.io/apimachinery/pkg/apis/meta/v1. This creates confusion and inconsistency - the file uses metav1 in some places (line 40) and v1 in others (lines 57, 79, 88, 91).

♻️ Consolidate to a single import alias
 import (
 	"context"
 	"net/http"
 	"time"

 	g "github.com/onsi/ginkgo/v2"
 	o "github.com/onsi/gomega"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
-	v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/apimachinery/pkg/labels"

Then update all usages of v1. to metav1. throughout the file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/router/config_manager_ingress.go` around lines 10 - 11, There
are duplicate import aliases: both metav1 and v1 point to
k8s.io/apimachinery/pkg/apis/meta/v1; remove the redundant v1 alias in the
import block so only metav1 is imported, and update every usage of v1.* in this
file (e.g., any v1.ObjectMeta, v1.ListOptions, etc.) to metav1.* so all
references consistently use metav1.
test/extended/router/config_manager.go (1)

842-844: Missing timeout in waitDeployment could cause tests to hang indefinitely.

The oc wait command has no --timeout flag specified, so it defaults to 30s which may be acceptable, but explicit timeout would be clearer and safer for test reliability.

Optional: Add explicit timeout
 func waitDeployment(oc *exutil.CLI, resourceName string, replicas int) error {
-	return oc.AsAdmin().Run("wait").Args("--for", "jsonpath={.status.readyReplicas}="+strconv.Itoa(replicas), "deployment/"+resourceName).Execute()
+	return oc.AsAdmin().Run("wait").Args("--for", "jsonpath={.status.readyReplicas}="+strconv.Itoa(replicas), "--timeout=180s", "deployment/"+resourceName).Execute()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/router/config_manager.go` around lines 842 - 844, The
waitDeployment function calls oc.AsAdmin().Run("wait") without an explicit
timeout which can lead to hangs; update waitDeployment to add a --timeout
argument to the oc wait invocation (e.g., "--timeout=2m" or another
test-appropriate duration) when building Args for deployment/"+resourceName so
the command uses an explicit timeout rather than relying on defaults.
test/extended/router/execpod.go (1)

28-46: Parsing error from strconv.Atoi is not fully handled.

On line 45, if strconv.Atoi fails, the error is returned but the function also returns code (which is 0) and output[:idx]. However, err from line 45 shadows the original nil error value, and this uninitialized code value could confuse callers.

Additionally, the RunHostCmd function (from context snippet 1) does not specify a container, which could target the wrong container in multi-container pods like the router pod. Consider whether container specification is needed for reliability.

Suggested improvement for error handling
 	code, err = strconv.Atoi(output[idx+1:])
+	if err != nil {
+		return 0, "", fmt.Errorf("failed to parse response code %q: %w", output[idx+1:], err)
+	}
 	return code, output[:idx], nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/router/execpod.go` around lines 28 - 46, The strconv.Atoi call
in readURL may fail but its error is returned while still returning a possibly
misleading code and partial output; change the parsing to use a separate
variable (e.g., parsedCode, parseErr := strconv.Atoi(output[idx+1:])) and if
parseErr != nil return 0, "", parseErr, otherwise return parsedCode,
output[:idx], nil; while here also ensure the host command targets the intended
container in multi-container pods by switching the e2eoutput.RunHostCmd call to
the container-aware variant or pass the pod container name (reference
e2eoutput.RunHostCmd and the execPod/pod fields) so the curl runs in the correct
container.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/extended/router/config_manager_ingress.go`:
- Around line 172-173: Replace deprecated sets.NewString usage: initialize
expectedServers with sets.New[string](currentServers...) instead of
sets.NewString(currentServers...), and initialize actualServers with
sets.New[string]() instead of sets.NewString(); update both occurrences where
variables expectedServers and actualServers are declared to use the generic
sets.New[string] constructor.
- Around line 187-224: Replace the TODO bodies inside the g.It tests in
config_manager_ingress.go with explicit skips so they are marked pending; for
each g.It("should ...") block (the scale-in, various route updates, balancing
after scale-out/in, metrics after scale events, and steady memory/PID usage
tests) call g.Skip("not yet implemented") as the first statement in the closure
so the test runner reports them as skipped instead of empty/implicit. Ensure
each g.It closure contains only the g.Skip invocation until the test is
implemented.
- Around line 10-11: There are duplicate import aliases: both metav1 and v1
point to k8s.io/apimachinery/pkg/apis/meta/v1; remove the redundant v1 alias in
the import block so only metav1 is imported, and update every usage of v1.* in
this file (e.g., any v1.ObjectMeta, v1.ListOptions, etc.) to metav1.* so all
references consistently use metav1.

In `@test/extended/router/config_manager.go`:
- Around line 842-844: The waitDeployment function calls
oc.AsAdmin().Run("wait") without an explicit timeout which can lead to hangs;
update waitDeployment to add a --timeout argument to the oc wait invocation
(e.g., "--timeout=2m" or another test-appropriate duration) when building Args
for deployment/"+resourceName so the command uses an explicit timeout rather
than relying on defaults.

In `@test/extended/router/execpod.go`:
- Around line 28-46: The strconv.Atoi call in readURL may fail but its error is
returned while still returning a possibly misleading code and partial output;
change the parsing to use a separate variable (e.g., parsedCode, parseErr :=
strconv.Atoi(output[idx+1:])) and if parseErr != nil return 0, "", parseErr,
otherwise return parsedCode, output[:idx], nil; while here also ensure the host
command targets the intended container in multi-container pods by switching the
e2eoutput.RunHostCmd call to the container-aware variant or pass the pod
container name (reference e2eoutput.RunHostCmd and the execPod/pod fields) so
the curl runs in the correct container.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c0498002-2ebf-493e-9f96-813021b0c975

📥 Commits

Reviewing files that changed from the base of the PR and between 15f465a and 95b624a.

📒 Files selected for processing (3)
  • test/extended/router/config_manager.go
  • test/extended/router/config_manager_ingress.go
  • test/extended/router/execpod.go

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 24, 2026

@jcmoraisjr: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-fips 95b624a link true /test e2e-aws-ovn-fips

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Add DCM e2e tests focused on user experience.

Here are some requisites and thoughts kept in mind while designing and
coding the tests:

* Document what the test should do and its expectations.
* Make it as much focused on user experience as possible: create the
  scenarios and validate the result checking the behavior in the same
  way the user would observe it.
* Make it more declarative: create the pre-requisite for the test inside
  the test itself, instead of relying on pre-configured requisites.
* Make it simple to follow: try to abstract complexities into higher
  level calls, making the test itself smaller, but also comprehensive.

https://redhat.atlassian.net/browse/NE-2529
@jcmoraisjr jcmoraisjr force-pushed the NE-2529-dcm-e2e-tests branch from 95b624a to ebdabb6 Compare March 25, 2026 13:56
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/extended/router/config_manager_ingress.go`:
- Around line 74-76: The Route admission is not scoped to the test namespace
because NamespaceSelector was commented out; restore namespace scoping by
re-enabling NamespaceSelector (use v1.SetAsLabelSelector(labels.Set{"name":
workingNamespace}) or an equivalent label selector that matches the test's
namespace) in the same struct where RouteSelector is set (look for the
RouteSelector field and the commented NamespaceSelector in
config_manager_ingress.go), or alternatively generate unique route
hosts/selectors per test (ensure routerSelectorKey/routerSelectorValue remain
unique per test) so routes from other namespaces cannot be admitted.
- Around line 415-417: The test currently assumes an EndpointSlice will be
created implicitly (endpointsAPICreatesEndpointSlice = true) while
createDetachedService does not create or wait for one, causing scaleInEndpoints
to depend on asynchronous mirroring and flake; update the test so that either
createDetachedService explicitly creates and waits for the EndpointSlice after
creating the Endpoints (so scaleInEndpoints always finds it), or set
endpointsAPICreatesEndpointSlice = false and change the scaleInEndpoints
call-path to handle the absence of an EndpointSlice (e.g., by creating/waiting
for one or operating against Endpoints directly); reference
createDetachedService, scaleInEndpoints, and endpointsAPICreatesEndpointSlice
when making the change.
- Around line 167-173: The test is polling too aggressively (time.Millisecond)
inside the Eventually block that calls readLocalURL against routerPod; change
the polling interval to a less aggressive value (for example 100-500ms) or
introduce a named constant (e.g., dcmIngressPollInterval) and use that instead
of time.Millisecond to avoid hammering the API server; update the call where
Eventually(...).WithTimeout(dcmIngressTimeout).WithPolling(time.Millisecond) to
use the new interval constant or a larger duration like 200*time.Millisecond.
- Around line 251-279: The test cases for the g.It blocks titled "should handle
various route updates", "should maintain proper balancing after scale-out and
scale-in operations", "should report accurate metrics after scale-out and
scale-in operations", and "should maintain steady memory and PID usage after
scale-out and scale-in operations" are empty and produce false-green results;
either implement meaningful assertions and test logic inside each It block
(exercise the router behavior, e.g., perform scale-in/scale-out operations, send
requests, verify balancing, metrics and memory/PID stability) or explicitly mark
each case pending/skipped (e.g., use Skip/Pending helpers) so they do not pass
silently—update the bodies for those specific g.It declarations to contain real
assertions or a deliberate skip.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d2393801-4a6f-464f-9fda-bb7c3e5f8e5e

📥 Commits

Reviewing files that changed from the base of the PR and between 95b624a and ebdabb6.

📒 Files selected for processing (2)
  • test/extended/router/config_manager.go
  • test/extended/router/config_manager_ingress.go
✅ Files skipped from review due to trivial changes (1)
  • test/extended/router/config_manager.go

Comment on lines +74 to +76
// TODO `NamespaceSelector` below makes our routes not to be found, need to debug; `RouteSelector` seems to be enough.
// NamespaceSelector: v1.SetAsLabelSelector(labels.Set{"name": workingNamespace}),
RouteSelector: metav1.SetAsLabelSelector(labels.Set{routerSelectorKey: routerSelectorValue}),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Scope this router to the test instance.

With NamespaceSelector disabled and a shared RouteSelector, this controller can admit matching routes from other namespaces/tests. Because the hosts below are fixed, parallel runs can bleed into each other and make the assertions non-deterministic. Please restore namespace scoping or make the selector/host unique per test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/router/config_manager_ingress.go` around lines 74 - 76, The
Route admission is not scoped to the test namespace because NamespaceSelector
was commented out; restore namespace scoping by re-enabling NamespaceSelector
(use v1.SetAsLabelSelector(labels.Set{"name": workingNamespace}) or an
equivalent label selector that matches the test's namespace) in the same struct
where RouteSelector is set (look for the RouteSelector field and the commented
NamespaceSelector in config_manager_ingress.go), or alternatively generate
unique route hosts/selectors per test (ensure
routerSelectorKey/routerSelectorValue remain unique per test) so routes from
other namespaces cannot be admitted.

Comment on lines +167 to +173
o.Eventually(func(g o.Gomega) {
code, output, err := readLocalURL(routerPod, hostname, "/")
g.Expect(err).NotTo(o.HaveOccurred())
g.Expect(code).To(o.Equal(http.StatusOK))
actualServers.Insert(output)
g.Expect(expectedServers.List()).To(o.Equal(actualServers.List()))
}).WithTimeout(dcmIngressTimeout).WithPolling(time.Millisecond).Should(o.Succeed())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid 1ms polling around RunHostCmd.

Each retry here does a pod exec plus curl. Polling every millisecond can hammer the API server/router pod and make CI flakier than the feature under test.

Suggested change
-				}).WithTimeout(dcmIngressTimeout).WithPolling(time.Millisecond).Should(o.Succeed())
+				}).WithTimeout(dcmIngressTimeout).WithPolling(200 * time.Millisecond).Should(o.Succeed())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
o.Eventually(func(g o.Gomega) {
code, output, err := readLocalURL(routerPod, hostname, "/")
g.Expect(err).NotTo(o.HaveOccurred())
g.Expect(code).To(o.Equal(http.StatusOK))
actualServers.Insert(output)
g.Expect(expectedServers.List()).To(o.Equal(actualServers.List()))
}).WithTimeout(dcmIngressTimeout).WithPolling(time.Millisecond).Should(o.Succeed())
o.Eventually(func(g o.Gomega) {
code, output, err := readLocalURL(routerPod, hostname, "/")
g.Expect(err).NotTo(o.HaveOccurred())
g.Expect(code).To(o.Equal(http.StatusOK))
actualServers.Insert(output)
g.Expect(expectedServers.List()).To(o.Equal(actualServers.List()))
}).WithTimeout(dcmIngressTimeout).WithPolling(200 * time.Millisecond).Should(o.Succeed())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/router/config_manager_ingress.go` around lines 167 - 173, The
test is polling too aggressively (time.Millisecond) inside the Eventually block
that calls readLocalURL against routerPod; change the polling interval to a less
aggressive value (for example 100-500ms) or introduce a named constant (e.g.,
dcmIngressPollInterval) and use that instead of time.Millisecond to avoid
hammering the API server; update the call where
Eventually(...).WithTimeout(dcmIngressTimeout).WithPolling(time.Millisecond) to
use the new interval constant or a larger duration like 200*time.Millisecond.

Comment on lines +251 to +279
g.It("should handle various route updates", func() {
// TODO
})

// Ensure that the router maintains proper balancing for scale-out. This can be achieved by selecting a lb
// algorithm having a predictable behavior, like RoundRobin. It should distribute requests as expected, despite
// of scale-in/out operations happening at the same time. This is one of the issues mentioned in
// https://github.com/openshift/enhancements/blob/master/enhancements/ingress/dynamic-config-manager.md#user-stories
// that DCM should improve.
g.It("should maintain proper balancing after scale-out and scale-in operations", func() {
// TODO
})

// Ensure that the router reports accurate metrics after a scale-in or scale-out event. This can use a long-lived
// connection that is transferring data when the scale-in/out event happens and verify that data transferred after
// the event continue to be reported in the bytes-in metric. This is described in more detail in the EP -
// https://github.com/openshift/enhancements/blob/master/enhancements/ingress/dynamic-config-manager.md
g.It("should report accurate metrics after scale-out and scale-in operations", func() {
// TODO
})

// Ensure that the router pod maintains ~steady memory usage and PID usage after scaling-out/in. The idea here is
// that fork-and-reload would cause a significant memory and PID usage spike due to the forked process, while the
// old ones continue running due to long lived connections. This can be done either 1) checking the consequence:
// memory usage remains steady after scale-in/out operations, while maintaining persistent connections during one
// scale operation and the next; or 2) checking the cause: HAProxy still reports the same PID after all the scale
// operations.
g.It("should maintain steady memory and PID usage after scale-out and scale-in operations", func() {
// TODO
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These specs currently pass without testing anything.

Leaving empty It bodies in the suite creates false-green coverage for scenarios the PR description says are validated. Please either implement them now or mark them pending/skip until they have real assertions.

Safe stopgap
-		g.It("should handle various route updates", func() {
-			// TODO
-		})
+		g.PIt("should handle various route updates", func() {})
@@
-		g.It("should maintain proper balancing after scale-out and scale-in operations", func() {
-			// TODO
-		})
+		g.PIt("should maintain proper balancing after scale-out and scale-in operations", func() {})
@@
-		g.It("should report accurate metrics after scale-out and scale-in operations", func() {
-			// TODO
-		})
+		g.PIt("should report accurate metrics after scale-out and scale-in operations", func() {})
@@
-		g.It("should maintain steady memory and PID usage after scale-out and scale-in operations", func() {
-			// TODO
-		})
+		g.PIt("should maintain steady memory and PID usage after scale-out and scale-in operations", func() {})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/router/config_manager_ingress.go` around lines 251 - 279, The
test cases for the g.It blocks titled "should handle various route updates",
"should maintain proper balancing after scale-out and scale-in operations",
"should report accurate metrics after scale-out and scale-in operations", and
"should maintain steady memory and PID usage after scale-out and scale-in
operations" are empty and produce false-green results; either implement
meaningful assertions and test logic inside each It block (exercise the router
behavior, e.g., perform scale-in/scale-out operations, send requests, verify
balancing, metrics and memory/PID stability) or explicitly mark each case
pending/skipped (e.g., use Skip/Pending helpers) so they do not pass
silently—update the bodies for those specific g.It declarations to contain real
assertions or a deliberate skip.

Comment on lines +415 to +417
// I'm observing that creating the Endpoints resource creates a shadow EndpointSlice,
// flag this to false in case this changes.
endpointsAPICreatesEndpointSlice := true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't rely on implicit EndpointSlice creation here.

createDetachedService returns without creating or waiting for an EndpointSlice, but scaleInEndpoints later requires one unconditionally. That makes the scale-in path depend on asynchronous background mirroring the test never establishes, which is a built-in flake.

Also applies to: 465-468

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/router/config_manager_ingress.go` around lines 415 - 417, The
test currently assumes an EndpointSlice will be created implicitly
(endpointsAPICreatesEndpointSlice = true) while createDetachedService does not
create or wait for one, causing scaleInEndpoints to depend on asynchronous
mirroring and flake; update the test so that either createDetachedService
explicitly creates and waits for the EndpointSlice after creating the Endpoints
(so scaleInEndpoints always finds it), or set endpointsAPICreatesEndpointSlice =
false and change the scaleInEndpoints call-path to handle the absence of an
EndpointSlice (e.g., by creating/waiting for one or operating against Endpoints
directly); reference createDetachedService, scaleInEndpoints, and
endpointsAPICreatesEndpointSlice when making the change.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@gcs278
Copy link
Contributor

gcs278 commented Mar 25, 2026

/assign @bentito
/assign @gcs278

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants