Skip to content

HYPERFLEET-862 - feat: add Tier 2 E2E tests for deletion edge cases#94

Open
kuudori wants to merge 2 commits into
openshift-hyperfleet:mainfrom
kuudori:HYPERFLEET-862
Open

HYPERFLEET-862 - feat: add Tier 2 E2E tests for deletion edge cases#94
kuudori wants to merge 2 commits into
openshift-hyperfleet:mainfrom
kuudori:HYPERFLEET-862

Conversation

@kuudori
Copy link
Copy Markdown
Contributor

@kuudori kuudori commented May 12, 2026

Summary

Test Plan

  • Unit tests added/updated
  • make test-all passes
  • make lint passes
  • Helm chart changes validated with make test-helm (if applicable)
  • Deployed to a development cluster and verified (if Helm/config changes)
  • E2E tests passed (if cross-component or major changes)

@openshift-ci openshift-ci Bot requested review from rafabene and tirthct May 12, 2026 00:53
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 12, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kuudori for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Switch OpenAPI schema acquisition from a remote curl download to a local extractor (hack/extract-schema) invoked by the Makefile, add the extractor entrypoint and update go.mod and docs. Add a negative Ginkgo E2E test that deploys a dedicated "stuck" adapter, simulates adapter unavailability during cluster deletion to assert soft-delete is preserved, then restores the adapter and verifies hard-delete and namespace cleanup. Update test-design docs to add the new Tier2 case and remove certain cascade/early-deletion E2E scenarios.

Sequence Diagram(s)

sequenceDiagram
  participant TestRunner
  participant HelmChartRepo
  participant AdapterController
  participant APIService
  participant ClusterResource
  participant AdapterDeployment
  participant NamespaceCleaner

  TestRunner->>HelmChartRepo: clone adapter & api charts
  TestRunner->>AdapterController: deploy stuck-adapter release
  TestRunner->>APIService: upgrade required-adapters (include stuck)
  TestRunner->>ClusterResource: create cluster payload
  APIService->>AdapterDeployment: reconcile (Applied=True)
  TestRunner->>AdapterDeployment: scale to 0 (simulate unavailable)
  TestRunner->>APIService: soft-delete cluster
  APIService->>AdapterDeployment: healthy adapters -> Finalized=True
  TestRunner->>ClusterResource: Consistently assert soft-deleted & Reconciled=False
  TestRunner->>AdapterDeployment: scale up stuck adapter
  APIService->>ClusterResource: hard-delete (404)
  NamespaceCleaner->>TestRunner: no namespaces with cluster prefix
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main objective: adding Tier 2 E2E tests for deletion edge cases, specifically a stuck deletion scenario where an adapter cannot finalize.
Description check ✅ Passed The description references HYPERFLEET-862 and relates to implementing Tier 2 E2E tests for deletion edge cases as stated in the title, though minimal implementation details are provided.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e/cluster/cascade_deletion_edge_cases.go`:
- Around line 60-70: Change the Eventually closure around h.Client.GetNodePool
so it does not treat a 404 as immediate success; instead, perform and track at
least one successful (non-404, non-error) read that validates the invariant
(e.g., np.DeletedTime equals originalDeletedTime or np.Generation > 2 and
DeletedTime preserved) before allowing a 404 to be considered success.
Concretely, inside the closure call h.Client.GetNodePool(clusterID, nodepoolID),
if err is a 404 then return a failure unless a prior successful validation flag
is set; on a successful read assert np.DeletedTime / np.Generation as required
and set the flag so subsequent 404s will pass. Apply the same change to both
blocks (the one referencing originalDeletedTime and the other around Generation
> 2) so the test cannot pass purely via an early 404.

In `@e2e/cluster/delete_during_creation.go`:
- Around line 42-57: The test currently only logs when all adapters report
Applied=True but must enforce a precondition that the cluster is not already
Reconciled=True before issuing the DELETE; after calling
h.Client.GetClusterStatuses(ctx, clusterID) (and reusing the existing statuses
and h.HasAdapterCondition logic), add a hard check for the cluster-level
"Reconciled" condition (e.g., via examining statuses or calling the appropriate
status helper) and fail the test (return/error/ginkgo.Fail) if Reconciled==True
so the DELETE is only attempted when the precondition is satisfied; ensure this
check refers to clusterID, GetClusterStatuses, and HasAdapterCondition so it’s
easy to locate.

In `@e2e/cluster/stuck_deletion.go`:
- Around line 112-117: The cleanup currently swallows
RestoreAPIRequiredAdaptersWithRetry errors by only logging them; change it to
fail the spec when restore fails. In the ginkgo.DeferCleanup closure that calls
h.RestoreAPIRequiredAdaptersWithRetry(ctx, apiChartPath, h.Cfg.Namespace,
originalAdapters, 3), replace the ginkgo.GinkgoWriter.Printf call with a
test-failing call (e.g., ginkgo.Fail(...) or ginkgo.GinkgoT().Fatalf(...)) so
any error returned by RestoreAPIRequiredAdaptersWithRetry causes the spec to
fail and prevent leaked mutated state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: f0eac754-09a8-4bc0-8abb-17881c3933bd

📥 Commits

Reviewing files that changed from the base of the PR and between 3f47690 and de1ccef.

📒 Files selected for processing (3)
  • e2e/cluster/cascade_deletion_edge_cases.go
  • e2e/cluster/delete_during_creation.go
  • e2e/cluster/stuck_deletion.go

Comment thread e2e/cluster/cascade_deletion_edge_cases.go Outdated
Comment thread e2e/cluster/delete_during_creation.go Outdated
Comment thread e2e/cluster/stuck_deletion.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
e2e/cluster/delete_during_creation.go (1)

36-57: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce the edge-case precondition before issuing DELETE.

At Line 36, DELETE is sent before validating the cluster is still not Reconciled=True; Lines 42-57 only log a warning after the fact, so this spec can pass without exercising its core scenario.

Suggested minimal change
+				ginkgo.By("asserting cluster is not reconciled before DELETE")
+				current, err := h.Client.GetCluster(ctx, clusterID)
+				Expect(err).NotTo(HaveOccurred(), "failed to fetch cluster before DELETE")
+				Expect(h.HasResourceCondition(
+					current.Status.Conditions,
+					client.ConditionTypeReconciled,
+					openapi.ResourceConditionStatusTrue,
+				)).To(BeFalse(), "test precondition failed: cluster already reconciled before DELETE")
+
 				deletedCluster, err := h.Client.DeleteCluster(ctx, clusterID)
 				Expect(err).NotTo(HaveOccurred(), "DELETE request should succeed with 202")

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity. Validate changes against HyperFleet architecture standards from the linked architecture repository."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e/cluster/delete_during_creation.go` around lines 36 - 57, Before calling
h.Client.DeleteCluster(ctx, clusterID) ensure the test precondition (cluster not
yet Reconciled/fully Applied) is asserted: call h.Client.GetClusterStatuses(ctx,
clusterID) and use h.HasAdapterCondition(s.Conditions,
client.ConditionTypeApplied, openapi.AdapterConditionStatusTrue) across
statuses.Items to verify at least one adapter is NOT Applied; if all adapters
are already Applied either fail the test (Expect/Fail) or retry/poll until the
condition is met before invoking DeleteCluster so the intended edge-case is
actually exercised.
e2e/cluster/stuck_deletion.go (1)

112-117: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail the spec if API adapter-config restore fails.

At Line 114, restore errors are logged-only, which can leak mutated global API adapter configuration into later specs.

Suggested minimal change
 				ginkgo.DeferCleanup(func(ctx context.Context) {
 					ginkgo.By("Restore API required adapters to original config")
-					if err := h.RestoreAPIRequiredAdaptersWithRetry(ctx, apiChartPath, h.Cfg.Namespace, originalAdapters, 3); err != nil {
-						ginkgo.GinkgoWriter.Printf("CRITICAL: %v\n", err)
-					}
+					err := h.RestoreAPIRequiredAdaptersWithRetry(ctx, apiChartPath, h.Cfg.Namespace, originalAdapters, 3)
+					Expect(err).NotTo(HaveOccurred(), "failed to restore API required adapters")
 				})

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity. Validate changes against HyperFleet architecture standards from the linked architecture repository."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e/cluster/stuck_deletion.go` around lines 112 - 117, The cleanup currently
swallows RestoreAPIRequiredAdaptersWithRetry errors by logging them
(ginkgo.GinkgoWriter.Printf) which can leak mutated adapter config; change the
DeferCleanup handler to fail the spec when
h.RestoreAPIRequiredAdaptersWithRetry(ctx, apiChartPath, h.Cfg.Namespace,
originalAdapters, 3) returns an error by invoking ginkgo.Fail (or an equivalent
ginkgo assertion) with a clear message including the error instead of only
printing it, so the spec is marked failed and the test run does not continue
with a mutated global state.
e2e/cluster/cascade_deletion_edge_cases.go (1)

60-70: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t allow 404 to pass before invariant observation.

At Line 63 and Line 150, early 404 exits let both specs succeed without proving the asserted cascade invariants. Require at least one successful non-404 read that validates the invariant before permitting 404 as terminal success.

Suggested pattern (apply to both Eventually blocks)
+				observedInvariant := false
 				Eventually(func(g Gomega) {
 					np, err := h.Client.GetNodePool(ctx, clusterID, nodepoolID)
 					var httpErr *client.HTTPError
 					if errors.As(err, &httpErr) && httpErr.StatusCode == http.StatusNotFound {
+						g.Expect(observedInvariant).To(BeTrue(),
+							"nodepool hard-deleted before invariant was ever observed")
 						return
 					}
 					g.Expect(err).NotTo(HaveOccurred(), "nodepool should be accessible or already 404")
 					// existing invariant assertions...
+					observedInvariant = true
 				}, h.Cfg.Timeouts.Adapter.Processing, h.Cfg.Polling.Interval).Should(Succeed())

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity. Validate changes against HyperFleet architecture standards from the linked architecture repository."

Also applies to: 147-158

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e/cluster/cascade_deletion_edge_cases.go` around lines 60 - 70, The
Eventually lambda currently treats a 404 as immediate success; change it to
require at least one prior successful non-404 read that validates the invariant
before accepting a terminal 404. Concretely, declare a boolean flag (e.g.,
sawValid) captured by the lambda surrounding the call to
h.Client.GetNodePool(clusterID, nodepoolID); when GetNodePool returns non-404
and the assertions on np.DeletedTime/Equal(originalDeletedTime) pass, set
sawValid = true; only treat a 404 (errors.As -> httpErr.StatusCode ==
http.StatusNotFound) as success if sawValid is true, otherwise keep retrying.
Apply the same pattern to the other Eventually block that reads the same
resource (the block around lines 147-158) so both require at least one
validating observation before allowing 404 as terminal success.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@e2e/cluster/cascade_deletion_edge_cases.go`:
- Around line 60-70: The Eventually lambda currently treats a 404 as immediate
success; change it to require at least one prior successful non-404 read that
validates the invariant before accepting a terminal 404. Concretely, declare a
boolean flag (e.g., sawValid) captured by the lambda surrounding the call to
h.Client.GetNodePool(clusterID, nodepoolID); when GetNodePool returns non-404
and the assertions on np.DeletedTime/Equal(originalDeletedTime) pass, set
sawValid = true; only treat a 404 (errors.As -> httpErr.StatusCode ==
http.StatusNotFound) as success if sawValid is true, otherwise keep retrying.
Apply the same pattern to the other Eventually block that reads the same
resource (the block around lines 147-158) so both require at least one
validating observation before allowing 404 as terminal success.

In `@e2e/cluster/delete_during_creation.go`:
- Around line 36-57: Before calling h.Client.DeleteCluster(ctx, clusterID)
ensure the test precondition (cluster not yet Reconciled/fully Applied) is
asserted: call h.Client.GetClusterStatuses(ctx, clusterID) and use
h.HasAdapterCondition(s.Conditions, client.ConditionTypeApplied,
openapi.AdapterConditionStatusTrue) across statuses.Items to verify at least one
adapter is NOT Applied; if all adapters are already Applied either fail the test
(Expect/Fail) or retry/poll until the condition is met before invoking
DeleteCluster so the intended edge-case is actually exercised.

In `@e2e/cluster/stuck_deletion.go`:
- Around line 112-117: The cleanup currently swallows
RestoreAPIRequiredAdaptersWithRetry errors by logging them
(ginkgo.GinkgoWriter.Printf) which can leak mutated adapter config; change the
DeferCleanup handler to fail the spec when
h.RestoreAPIRequiredAdaptersWithRetry(ctx, apiChartPath, h.Cfg.Namespace,
originalAdapters, 3) returns an error by invoking ginkgo.Fail (or an equivalent
ginkgo assertion) with a clear message including the error instead of only
printing it, so the spec is marked failed and the test run does not continue
with a mutated global state.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: ecec8731-dc05-40e8-b87b-8df734ea4eb0

📥 Commits

Reviewing files that changed from the base of the PR and between de1ccef and 303185a.

📒 Files selected for processing (3)
  • e2e/cluster/cascade_deletion_edge_cases.go
  • e2e/cluster/delete_during_creation.go
  • e2e/cluster/stuck_deletion.go

Comment thread e2e/cluster/stuck_deletion.go Outdated
@rafabene
Copy link
Copy Markdown
Contributor

rafabene commented May 12, 2026

Tip

nit — non-blocking suggestion

Category: JIRA

HYPERFLEET-862 lists 7 Tier 2 categories (deletion under load, adapter/API/Sentinel restart during deletion, stuck deletion timeout, deletion metrics, mixed workload). This PR covers deletion edge cases + stuck adapter — which is great, but the restart resilience, load, and metrics scenarios remain. Consider noting in the PR description what's deferred and whether follow-up PRs are planned.

Comment thread e2e/cluster/delete_during_creation.go Outdated
var h *helper.Helper
var clusterID string

ginkgo.BeforeEach(func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

nit — non-blocking suggestion

Category: Inconsistency

Minor inconsistency: this BeforeEach uses func() while the other two new files use func(ctx context.Context). Functionally fine since helper.New() doesn't need ctx, but for consistency with the rest of the PR and codebase:

Suggested change
ginkgo.BeforeEach(func() {
ginkgo.BeforeEach(func(ctx context.Context) {

Comment thread e2e/cluster/cascade_deletion_edge_cases.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e/cluster/stuck_deletion.go`:
- Line 17: The test suite name passed to ginkgo.Describe includes an extra
bracketed token "[negative]" and must be normalized to the required pattern;
update the Describe invocation (the string literal in the ginkgo.Describe call
in stuck_deletion.go) to remove the "[negative]" token so it follows "[Suite:
cluster] Description" format and ensure any scenario type like "negative" is
only represented via test labels, not in the suite name.
- Around line 143-158: Replace the inline Eventually closure that calls
h.Client.GetClusterStatuses and inspects statuses.Items using
h.HasAdapterCondition with a reusable poller + matcher: implement a poller
invocation using h.PollCluster(...) (or create h.PollCluster if missing) and a
custom matcher helper.HaveResourceCondition(adapterName,
client.ConditionTypeApplied, openapi.AdapterConditionStatusTrue) that checks the
same Applied=True condition on the adapter; then call the poller with the
matcher and the same timeouts (h.Cfg.Timeouts.Adapter.Processing,
h.Cfg.Polling.Interval). Do the same refactor for the other two blocks (lines
~170-186 and ~189-197) so tests use h.PollCluster/h.PollNodePool plus
helper.HaveResourceCondition instead of inline Eventually closures and
h.HasAdapterCondition.
- Around line 28-214: The test lacks an explicit ginkgo.AfterEach block to call
h.CleanupTestCluster() as required; add a ginkgo.AfterEach(func(ctx
context.Context) { ... }) that calls h.CleanupTestCluster(ctx, clusterID) with
error handling and warnings via ginkgo.GinkgoWriter.Printf, and ensure clusterID
is accessible by promoting it from the local scope inside the It block to a
surrounding var (e.g., declare var clusterID string above
ginkgo.BeforeEach/ginkgo.It) so the AfterEach can reference it; keep existing
ginkgo.DeferCleanup calls but add this explicit AfterEach for mandated cleanup
semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 4ade577b-2ffb-4b66-939f-e6e26af30d40

📥 Commits

Reviewing files that changed from the base of the PR and between 303185a and 65172e6.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (7)
  • CLAUDE.md
  • Makefile
  • e2e/cluster/stuck_deletion.go
  • go.mod
  • test-design/README.md
  • test-design/testcases/delete-cluster.md
  • test-design/testcases/update-delete-test-matrix.md

Comment thread e2e/cluster/stuck_deletion.go
Comment thread e2e/cluster/stuck_deletion.go
Comment thread e2e/cluster/stuck_deletion.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test-design/README.md (1)

76-76: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Remove duplicate tier assignment for stuck deletion test.

The "Stuck deletion -- adapter unable to finalize" test appears in both the Tier1 examples table (line 76) and the Tier2 examples table (line 101), violating the rule stated on line 15: "Every E2E test case must be assigned exactly one severity tier."

Based on the test specification in delete-cluster.md (Priority: Tier2) and the test characteristics (infrastructure recovery with adapter crash simulation), this should only appear in the Tier2 table on line 101.

📋 Proposed fix

Remove line 76 from the Tier1 examples table:

 | [Cluster reflects adapter failure in top-level status](testcases/cluster.md#test-title-cluster-can-reflect-adapter-failure-in-top-level-status) | Error handling -- validates failure reporting, not the happy path. Core operations work; this tests that failures are visible |
 | [PATCH to soft-deleted cluster returns 409](testcases/delete-cluster.md#test-title-patch-to-soft-deleted-cluster-returns-409-conflict) | API contract -- validates rejection of invalid mutations. The delete path works; this tests that the API guards against misuse |
 | [Concurrent cluster creations without conflicts](testcases/concurrent-processing.md#test-title-system-can-process-concurrent-cluster-creations-without-resource-conflicts) | Secondary workflow -- concurrent creation is important but not the primary single-cluster path |
 | [ManifestWork apply fails for unregistered consumer](testcases/adapter-with-maestro-transport.md#test-title-manifestwork-apply-fails-when-targeting-unregistered-consumer) | Error handling -- validates graceful failure for a misconfigured adapter, not the normal transport path |
 | [Adapter statuses transition during update reconciliation](testcases/update-cluster.md#test-title-adapter-statuses-transition-during-update-reconciliation) | Secondary workflow -- validates intermediate state transitions, not the final converged state |
-| [Stuck deletion -- adapter unable to finalize](testcases/delete-cluster.md#test-title-stuck-deletion----adapter-unable-to-finalize-prevents-hard-delete) | Operational visibility -- adapter permanently fails to finalize, causing silent resource leak that compounds over time if undetected |

As per coding guidelines: "Each E2E test case must be assigned exactly one severity tier."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test-design/README.md` at line 76, Remove the duplicate "Stuck deletion --
adapter unable to finalize" row from the Tier1 examples table so the test is
assigned only to Tier2; locate the row in the Tier1 table matching the exact
title "Stuck deletion -- adapter unable to finalize" (the same string used in
testcases/delete-cluster.md and referenced as Priority: Tier2) and delete that
table line, leaving the Tier2 entry intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e/cluster/stuck_deletion.go`:
- Around line 188-197: The test currently only asserts Reconciled=False for the
cluster; update the Consistently check to explicitly assert that healthy
adapters have reached Finalized=True while the stuck adapter remains
non-finalized: after calling h.Client.GetCluster and confirming cl.DeletedTime,
use h.HasResourceCondition with the Finalized condition (e.g.,
client.ConditionTypeFinalized) and openapi.ResourceConditionStatusTrue for each
healthy adapter identifier, and assert the stuck adapter's Finalized condition
is not True (or is False/absent) so the blocker is proven; place these
additional expectations in the same Consistently block alongside the existing
Reconciled assertion.
- Around line 201-207: After scaling the adapter back up with
h.ScaleDeployment(ctx, h.Cfg.Namespace, deploymentName, 1), wait for the adapter
to become ready before asserting the cluster is hard-deleted; replace the
immediate Eventually(h.PollClusterHTTPStatus(...)) check with a readiness wait
(e.g., call an existing helper like h.WaitForDeploymentReady(ctx,
h.Cfg.Namespace, deploymentName) or poll pods until the deployment's pod(s) are
Ready) and only after that invoke Eventually(h.PollClusterHTTPStatus(ctx,
clusterID), h.Cfg.Timeouts.Cluster.Reconciled,
h.Cfg.Polling.Interval).Should(Equal(http.StatusNotFound)); ensure you reference
deploymentName and clusterID when calling the readiness helper.

---

Outside diff comments:
In `@test-design/README.md`:
- Line 76: Remove the duplicate "Stuck deletion -- adapter unable to finalize"
row from the Tier1 examples table so the test is assigned only to Tier2; locate
the row in the Tier1 table matching the exact title "Stuck deletion -- adapter
unable to finalize" (the same string used in testcases/delete-cluster.md and
referenced as Priority: Tier2) and delete that table line, leaving the Tier2
entry intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: d0c6e71b-ba3e-4a14-b66f-960118fdff0e

📥 Commits

Reviewing files that changed from the base of the PR and between 65172e6 and 3c7f463.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (8)
  • CLAUDE.md
  • Makefile
  • e2e/cluster/stuck_deletion.go
  • go.mod
  • hack/extract-schema/main.go
  • test-design/README.md
  • test-design/testcases/delete-cluster.md
  • test-design/testcases/update-delete-test-matrix.md

Comment thread e2e/cluster/stuck_deletion.go
Comment thread e2e/cluster/stuck_deletion.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/helper/adapter.go (1)

150-158: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

IMAGE_PULL_POLICY override can be silently overridden by opts.SetValues.

The code comments say env policy should override, but opts.SetValues is appended afterward and can replace image.pullPolicy. This weakens config determinism for deployments that rely on the env override.

Suggested fix
-	// Override image pull policy if set (e.g. IfNotPresent for local kind clusters)
-	if policy := os.Getenv("IMAGE_PULL_POLICY"); policy != "" {
-		helmArgs = append(helmArgs, "--set", fmt.Sprintf("image.pullPolicy=%s", policy))
-	}
-
 	// Add additional --set values if provided
 	for key, value := range opts.SetValues {
 		helmArgs = append(helmArgs, "--set", fmt.Sprintf("%s=%s", key, value))
 	}
+
+	// Override image pull policy last so env var is authoritative when provided
+	if policy := os.Getenv("IMAGE_PULL_POLICY"); policy != "" {
+		helmArgs = append(helmArgs, "--set-string", fmt.Sprintf("image.pullPolicy=%s", policy))
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/helper/adapter.go` around lines 150 - 158, The IMAGE_PULL_POLICY env var
override is appended to helmArgs before iterating opts.SetValues, so a
user-provided opts.SetValues entry can silently override image.pullPolicy;
change the logic in the function that builds helmArgs so the env override takes
precedence—either append the "--set image.pullPolicy=..." entry after processing
opts.SetValues or, when iterating opts.SetValues, detect the "image.pullPolicy"
key and skip/replace it if IMAGE_PULL_POLICY is set; reference helmArgs,
opts.SetValues and the IMAGE_PULL_POLICY env var (and the image.pullPolicy key)
when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@pkg/helper/adapter.go`:
- Around line 150-158: The IMAGE_PULL_POLICY env var override is appended to
helmArgs before iterating opts.SetValues, so a user-provided opts.SetValues
entry can silently override image.pullPolicy; change the logic in the function
that builds helmArgs so the env override takes precedence—either append the
"--set image.pullPolicy=..." entry after processing opts.SetValues or, when
iterating opts.SetValues, detect the "image.pullPolicy" key and skip/replace it
if IMAGE_PULL_POLICY is set; reference helmArgs, opts.SetValues and the
IMAGE_PULL_POLICY env var (and the image.pullPolicy key) when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: bb2f9432-c650-464c-852e-faf8ff197721

📥 Commits

Reviewing files that changed from the base of the PR and between 3c7f463 and 7a6e177.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (12)
  • CLAUDE.md
  • Makefile
  • e2e/cluster/stuck_deletion.go
  • go.mod
  • hack/extract-schema/main.go
  • pkg/helper/adapter.go
  • test-design/README.md
  • test-design/testcases/delete-cluster.md
  • test-design/testcases/update-delete-test-matrix.md
  • testdata/adapter-configs/cl-stuck/adapter-config.yaml
  • testdata/adapter-configs/cl-stuck/adapter-task-config.yaml
  • testdata/adapter-configs/cl-stuck/values.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants