Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@jcmoraisjr: This pull request references NE-2529 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThe PR introduces router selector label constants and adds an e2e test suite validating HAProxy router behavior with Dynamic Configuration Manager enabled on IngressController. The test suite covers deployment exposure, route management, and scale-out operations with helper functions for pod interaction and test stack management. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcmoraisjr The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
🧹 Nitpick comments (5)
test/extended/router/config_manager_ingress.go (3)
172-173: Usesets.New[string]instead of deprecatedsets.NewString.
sets.NewStringis deprecated. The codebase should use the genericsets.New[string]()for consistency with modern Go practices.♻️ Proposed fix
- expectedServers := sets.NewString(currentServers...) - actualServers := sets.NewString() + expectedServers := sets.New[string](currentServers...) + actualServers := sets.New[string]()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/router/config_manager_ingress.go` around lines 172 - 173, Replace deprecated sets.NewString usage: initialize expectedServers with sets.New[string](currentServers...) instead of sets.NewString(currentServers...), and initialize actualServers with sets.New[string]() instead of sets.NewString(); update both occurrences where variables expectedServers and actualServers are declared to use the generic sets.New[string] constructor.
187-224: TODO tests are well-documented placeholders.The descriptions clearly outline the test intent and expected behavior. Consider adding
g.Skip("not yet implemented")to make it explicit that these are pending tests.Optional: Add Skip to make pending tests explicit
g.It("should handle scale-in operations", func() { - // TODO + g.Skip("TODO: not yet implemented") })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/router/config_manager_ingress.go` around lines 187 - 224, Replace the TODO bodies inside the g.It tests in config_manager_ingress.go with explicit skips so they are marked pending; for each g.It("should ...") block (the scale-in, various route updates, balancing after scale-out/in, metrics after scale events, and steady memory/PID usage tests) call g.Skip("not yet implemented") as the first statement in the closure so the test runner reports them as skipped instead of empty/implicit. Ensure each g.It closure contains only the g.Skip invocation until the test is implemented.
10-11: Duplicate import aliases for the same package.Both
metav1andv1are imported fromk8s.io/apimachinery/pkg/apis/meta/v1. This creates confusion and inconsistency - the file usesmetav1in some places (line 40) andv1in others (lines 57, 79, 88, 91).♻️ Consolidate to a single import alias
import ( "context" "net/http" "time" g "github.com/onsi/ginkgo/v2" o "github.com/onsi/gomega" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - v1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/labels"Then update all usages of
v1.tometav1.throughout the file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/router/config_manager_ingress.go` around lines 10 - 11, There are duplicate import aliases: both metav1 and v1 point to k8s.io/apimachinery/pkg/apis/meta/v1; remove the redundant v1 alias in the import block so only metav1 is imported, and update every usage of v1.* in this file (e.g., any v1.ObjectMeta, v1.ListOptions, etc.) to metav1.* so all references consistently use metav1.test/extended/router/config_manager.go (1)
842-844: Missing timeout inwaitDeploymentcould cause tests to hang indefinitely.The
oc waitcommand has no--timeoutflag specified, so it defaults to 30s which may be acceptable, but explicit timeout would be clearer and safer for test reliability.Optional: Add explicit timeout
func waitDeployment(oc *exutil.CLI, resourceName string, replicas int) error { - return oc.AsAdmin().Run("wait").Args("--for", "jsonpath={.status.readyReplicas}="+strconv.Itoa(replicas), "deployment/"+resourceName).Execute() + return oc.AsAdmin().Run("wait").Args("--for", "jsonpath={.status.readyReplicas}="+strconv.Itoa(replicas), "--timeout=180s", "deployment/"+resourceName).Execute() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/router/config_manager.go` around lines 842 - 844, The waitDeployment function calls oc.AsAdmin().Run("wait") without an explicit timeout which can lead to hangs; update waitDeployment to add a --timeout argument to the oc wait invocation (e.g., "--timeout=2m" or another test-appropriate duration) when building Args for deployment/"+resourceName so the command uses an explicit timeout rather than relying on defaults.test/extended/router/execpod.go (1)
28-46: Parsing error fromstrconv.Atoiis not fully handled.On line 45, if
strconv.Atoifails, the error is returned but the function also returnscode(which is 0) andoutput[:idx]. However,errfrom line 45 shadows the original nil error value, and this uninitializedcodevalue could confuse callers.Additionally, the
RunHostCmdfunction (from context snippet 1) does not specify a container, which could target the wrong container in multi-container pods like the router pod. Consider whether container specification is needed for reliability.Suggested improvement for error handling
code, err = strconv.Atoi(output[idx+1:]) + if err != nil { + return 0, "", fmt.Errorf("failed to parse response code %q: %w", output[idx+1:], err) + } return code, output[:idx], nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/router/execpod.go` around lines 28 - 46, The strconv.Atoi call in readURL may fail but its error is returned while still returning a possibly misleading code and partial output; change the parsing to use a separate variable (e.g., parsedCode, parseErr := strconv.Atoi(output[idx+1:])) and if parseErr != nil return 0, "", parseErr, otherwise return parsedCode, output[:idx], nil; while here also ensure the host command targets the intended container in multi-container pods by switching the e2eoutput.RunHostCmd call to the container-aware variant or pass the pod container name (reference e2eoutput.RunHostCmd and the execPod/pod fields) so the curl runs in the correct container.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/extended/router/config_manager_ingress.go`:
- Around line 172-173: Replace deprecated sets.NewString usage: initialize
expectedServers with sets.New[string](currentServers...) instead of
sets.NewString(currentServers...), and initialize actualServers with
sets.New[string]() instead of sets.NewString(); update both occurrences where
variables expectedServers and actualServers are declared to use the generic
sets.New[string] constructor.
- Around line 187-224: Replace the TODO bodies inside the g.It tests in
config_manager_ingress.go with explicit skips so they are marked pending; for
each g.It("should ...") block (the scale-in, various route updates, balancing
after scale-out/in, metrics after scale events, and steady memory/PID usage
tests) call g.Skip("not yet implemented") as the first statement in the closure
so the test runner reports them as skipped instead of empty/implicit. Ensure
each g.It closure contains only the g.Skip invocation until the test is
implemented.
- Around line 10-11: There are duplicate import aliases: both metav1 and v1
point to k8s.io/apimachinery/pkg/apis/meta/v1; remove the redundant v1 alias in
the import block so only metav1 is imported, and update every usage of v1.* in
this file (e.g., any v1.ObjectMeta, v1.ListOptions, etc.) to metav1.* so all
references consistently use metav1.
In `@test/extended/router/config_manager.go`:
- Around line 842-844: The waitDeployment function calls
oc.AsAdmin().Run("wait") without an explicit timeout which can lead to hangs;
update waitDeployment to add a --timeout argument to the oc wait invocation
(e.g., "--timeout=2m" or another test-appropriate duration) when building Args
for deployment/"+resourceName so the command uses an explicit timeout rather
than relying on defaults.
In `@test/extended/router/execpod.go`:
- Around line 28-46: The strconv.Atoi call in readURL may fail but its error is
returned while still returning a possibly misleading code and partial output;
change the parsing to use a separate variable (e.g., parsedCode, parseErr :=
strconv.Atoi(output[idx+1:])) and if parseErr != nil return 0, "", parseErr,
otherwise return parsedCode, output[:idx], nil; while here also ensure the host
command targets the intended container in multi-container pods by switching the
e2eoutput.RunHostCmd call to the container-aware variant or pass the pod
container name (reference e2eoutput.RunHostCmd and the execPod/pod fields) so
the curl runs in the correct container.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c0498002-2ebf-493e-9f96-813021b0c975
📒 Files selected for processing (3)
test/extended/router/config_manager.gotest/extended/router/config_manager_ingress.gotest/extended/router/execpod.go
|
Scheduling required tests: |
|
@jcmoraisjr: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add DCM e2e tests focused on user experience. Here are some requisites and thoughts kept in mind while designing and coding the tests: * Document what the test should do and its expectations. * Make it as much focused on user experience as possible: create the scenarios and validate the result checking the behavior in the same way the user would observe it. * Make it more declarative: create the pre-requisite for the test inside the test itself, instead of relying on pre-configured requisites. * Make it simple to follow: try to abstract complexities into higher level calls, making the test itself smaller, but also comprehensive. https://redhat.atlassian.net/browse/NE-2529
95b624a to
ebdabb6
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/router/config_manager_ingress.go`:
- Around line 74-76: The Route admission is not scoped to the test namespace
because NamespaceSelector was commented out; restore namespace scoping by
re-enabling NamespaceSelector (use v1.SetAsLabelSelector(labels.Set{"name":
workingNamespace}) or an equivalent label selector that matches the test's
namespace) in the same struct where RouteSelector is set (look for the
RouteSelector field and the commented NamespaceSelector in
config_manager_ingress.go), or alternatively generate unique route
hosts/selectors per test (ensure routerSelectorKey/routerSelectorValue remain
unique per test) so routes from other namespaces cannot be admitted.
- Around line 415-417: The test currently assumes an EndpointSlice will be
created implicitly (endpointsAPICreatesEndpointSlice = true) while
createDetachedService does not create or wait for one, causing scaleInEndpoints
to depend on asynchronous mirroring and flake; update the test so that either
createDetachedService explicitly creates and waits for the EndpointSlice after
creating the Endpoints (so scaleInEndpoints always finds it), or set
endpointsAPICreatesEndpointSlice = false and change the scaleInEndpoints
call-path to handle the absence of an EndpointSlice (e.g., by creating/waiting
for one or operating against Endpoints directly); reference
createDetachedService, scaleInEndpoints, and endpointsAPICreatesEndpointSlice
when making the change.
- Around line 167-173: The test is polling too aggressively (time.Millisecond)
inside the Eventually block that calls readLocalURL against routerPod; change
the polling interval to a less aggressive value (for example 100-500ms) or
introduce a named constant (e.g., dcmIngressPollInterval) and use that instead
of time.Millisecond to avoid hammering the API server; update the call where
Eventually(...).WithTimeout(dcmIngressTimeout).WithPolling(time.Millisecond) to
use the new interval constant or a larger duration like 200*time.Millisecond.
- Around line 251-279: The test cases for the g.It blocks titled "should handle
various route updates", "should maintain proper balancing after scale-out and
scale-in operations", "should report accurate metrics after scale-out and
scale-in operations", and "should maintain steady memory and PID usage after
scale-out and scale-in operations" are empty and produce false-green results;
either implement meaningful assertions and test logic inside each It block
(exercise the router behavior, e.g., perform scale-in/scale-out operations, send
requests, verify balancing, metrics and memory/PID stability) or explicitly mark
each case pending/skipped (e.g., use Skip/Pending helpers) so they do not pass
silently—update the bodies for those specific g.It declarations to contain real
assertions or a deliberate skip.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d2393801-4a6f-464f-9fda-bb7c3e5f8e5e
📒 Files selected for processing (2)
test/extended/router/config_manager.gotest/extended/router/config_manager_ingress.go
✅ Files skipped from review due to trivial changes (1)
- test/extended/router/config_manager.go
| // TODO `NamespaceSelector` below makes our routes not to be found, need to debug; `RouteSelector` seems to be enough. | ||
| // NamespaceSelector: v1.SetAsLabelSelector(labels.Set{"name": workingNamespace}), | ||
| RouteSelector: metav1.SetAsLabelSelector(labels.Set{routerSelectorKey: routerSelectorValue}), |
There was a problem hiding this comment.
Scope this router to the test instance.
With NamespaceSelector disabled and a shared RouteSelector, this controller can admit matching routes from other namespaces/tests. Because the hosts below are fixed, parallel runs can bleed into each other and make the assertions non-deterministic. Please restore namespace scoping or make the selector/host unique per test.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended/router/config_manager_ingress.go` around lines 74 - 76, The
Route admission is not scoped to the test namespace because NamespaceSelector
was commented out; restore namespace scoping by re-enabling NamespaceSelector
(use v1.SetAsLabelSelector(labels.Set{"name": workingNamespace}) or an
equivalent label selector that matches the test's namespace) in the same struct
where RouteSelector is set (look for the RouteSelector field and the commented
NamespaceSelector in config_manager_ingress.go), or alternatively generate
unique route hosts/selectors per test (ensure
routerSelectorKey/routerSelectorValue remain unique per test) so routes from
other namespaces cannot be admitted.
| o.Eventually(func(g o.Gomega) { | ||
| code, output, err := readLocalURL(routerPod, hostname, "/") | ||
| g.Expect(err).NotTo(o.HaveOccurred()) | ||
| g.Expect(code).To(o.Equal(http.StatusOK)) | ||
| actualServers.Insert(output) | ||
| g.Expect(expectedServers.List()).To(o.Equal(actualServers.List())) | ||
| }).WithTimeout(dcmIngressTimeout).WithPolling(time.Millisecond).Should(o.Succeed()) |
There was a problem hiding this comment.
Avoid 1ms polling around RunHostCmd.
Each retry here does a pod exec plus curl. Polling every millisecond can hammer the API server/router pod and make CI flakier than the feature under test.
Suggested change
- }).WithTimeout(dcmIngressTimeout).WithPolling(time.Millisecond).Should(o.Succeed())
+ }).WithTimeout(dcmIngressTimeout).WithPolling(200 * time.Millisecond).Should(o.Succeed())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| o.Eventually(func(g o.Gomega) { | |
| code, output, err := readLocalURL(routerPod, hostname, "/") | |
| g.Expect(err).NotTo(o.HaveOccurred()) | |
| g.Expect(code).To(o.Equal(http.StatusOK)) | |
| actualServers.Insert(output) | |
| g.Expect(expectedServers.List()).To(o.Equal(actualServers.List())) | |
| }).WithTimeout(dcmIngressTimeout).WithPolling(time.Millisecond).Should(o.Succeed()) | |
| o.Eventually(func(g o.Gomega) { | |
| code, output, err := readLocalURL(routerPod, hostname, "/") | |
| g.Expect(err).NotTo(o.HaveOccurred()) | |
| g.Expect(code).To(o.Equal(http.StatusOK)) | |
| actualServers.Insert(output) | |
| g.Expect(expectedServers.List()).To(o.Equal(actualServers.List())) | |
| }).WithTimeout(dcmIngressTimeout).WithPolling(200 * time.Millisecond).Should(o.Succeed()) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended/router/config_manager_ingress.go` around lines 167 - 173, The
test is polling too aggressively (time.Millisecond) inside the Eventually block
that calls readLocalURL against routerPod; change the polling interval to a less
aggressive value (for example 100-500ms) or introduce a named constant (e.g.,
dcmIngressPollInterval) and use that instead of time.Millisecond to avoid
hammering the API server; update the call where
Eventually(...).WithTimeout(dcmIngressTimeout).WithPolling(time.Millisecond) to
use the new interval constant or a larger duration like 200*time.Millisecond.
| g.It("should handle various route updates", func() { | ||
| // TODO | ||
| }) | ||
|
|
||
| // Ensure that the router maintains proper balancing for scale-out. This can be achieved by selecting a lb | ||
| // algorithm having a predictable behavior, like RoundRobin. It should distribute requests as expected, despite | ||
| // of scale-in/out operations happening at the same time. This is one of the issues mentioned in | ||
| // https://github.com/openshift/enhancements/blob/master/enhancements/ingress/dynamic-config-manager.md#user-stories | ||
| // that DCM should improve. | ||
| g.It("should maintain proper balancing after scale-out and scale-in operations", func() { | ||
| // TODO | ||
| }) | ||
|
|
||
| // Ensure that the router reports accurate metrics after a scale-in or scale-out event. This can use a long-lived | ||
| // connection that is transferring data when the scale-in/out event happens and verify that data transferred after | ||
| // the event continue to be reported in the bytes-in metric. This is described in more detail in the EP - | ||
| // https://github.com/openshift/enhancements/blob/master/enhancements/ingress/dynamic-config-manager.md | ||
| g.It("should report accurate metrics after scale-out and scale-in operations", func() { | ||
| // TODO | ||
| }) | ||
|
|
||
| // Ensure that the router pod maintains ~steady memory usage and PID usage after scaling-out/in. The idea here is | ||
| // that fork-and-reload would cause a significant memory and PID usage spike due to the forked process, while the | ||
| // old ones continue running due to long lived connections. This can be done either 1) checking the consequence: | ||
| // memory usage remains steady after scale-in/out operations, while maintaining persistent connections during one | ||
| // scale operation and the next; or 2) checking the cause: HAProxy still reports the same PID after all the scale | ||
| // operations. | ||
| g.It("should maintain steady memory and PID usage after scale-out and scale-in operations", func() { | ||
| // TODO |
There was a problem hiding this comment.
These specs currently pass without testing anything.
Leaving empty It bodies in the suite creates false-green coverage for scenarios the PR description says are validated. Please either implement them now or mark them pending/skip until they have real assertions.
Safe stopgap
- g.It("should handle various route updates", func() {
- // TODO
- })
+ g.PIt("should handle various route updates", func() {})
@@
- g.It("should maintain proper balancing after scale-out and scale-in operations", func() {
- // TODO
- })
+ g.PIt("should maintain proper balancing after scale-out and scale-in operations", func() {})
@@
- g.It("should report accurate metrics after scale-out and scale-in operations", func() {
- // TODO
- })
+ g.PIt("should report accurate metrics after scale-out and scale-in operations", func() {})
@@
- g.It("should maintain steady memory and PID usage after scale-out and scale-in operations", func() {
- // TODO
- })
+ g.PIt("should maintain steady memory and PID usage after scale-out and scale-in operations", func() {})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended/router/config_manager_ingress.go` around lines 251 - 279, The
test cases for the g.It blocks titled "should handle various route updates",
"should maintain proper balancing after scale-out and scale-in operations",
"should report accurate metrics after scale-out and scale-in operations", and
"should maintain steady memory and PID usage after scale-out and scale-in
operations" are empty and produce false-green results; either implement
meaningful assertions and test logic inside each It block (exercise the router
behavior, e.g., perform scale-in/scale-out operations, send requests, verify
balancing, metrics and memory/PID stability) or explicitly mark each case
pending/skipped (e.g., use Skip/Pending helpers) so they do not pass
silently—update the bodies for those specific g.It declarations to contain real
assertions or a deliberate skip.
| // I'm observing that creating the Endpoints resource creates a shadow EndpointSlice, | ||
| // flag this to false in case this changes. | ||
| endpointsAPICreatesEndpointSlice := true |
There was a problem hiding this comment.
Don't rely on implicit EndpointSlice creation here.
createDetachedService returns without creating or waiting for an EndpointSlice, but scaleInEndpoints later requires one unconditionally. That makes the scale-in path depend on asynchronous background mirroring the test never establishes, which is a built-in flake.
Also applies to: 465-468
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended/router/config_manager_ingress.go` around lines 415 - 417, The
test currently assumes an EndpointSlice will be created implicitly
(endpointsAPICreatesEndpointSlice = true) while createDetachedService does not
create or wait for one, causing scaleInEndpoints to depend on asynchronous
mirroring and flake; update the test so that either createDetachedService
explicitly creates and waits for the EndpointSlice after creating the Endpoints
(so scaleInEndpoints always finds it), or set endpointsAPICreatesEndpointSlice =
false and change the scaleInEndpoints call-path to handle the absence of an
EndpointSlice (e.g., by creating/waiting for one or operating against Endpoints
directly); reference createDetachedService, scaleInEndpoints, and
endpointsAPICreatesEndpointSlice when making the change.
|
Scheduling required tests: |
Add DCM e2e tests focused on user experience.
Here are some requisites and thoughts kept in mind while designing and coding the tests:
https://redhat.atlassian.net/browse/NE-2529