OADP-7235: OLMv1 lifecycle tests + OLMv0→OLMv1 migration target#2160
OADP-7235: OLMv1 lifecycle tests + OLMv0→OLMv1 migration target#2160weshayutin wants to merge 23 commits into
Conversation
Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds OLMv1 end-to-end and migration tests under ChangesOLMv1 tests, Makefile targets, and manifest tweaks
Sequence Diagram(s)sequenceDiagram
participant Test as Test Suite
participant API as Kubernetes API
participant CC as ClusterCatalog
participant CE as ClusterExtension Controller
participant OLM as OLM Resolver
Test->>API: Create Namespace & ServiceAccount
Test->>API: Create ClusterRoleBinding (cluster-admin)
Test->>API: Create ClusterCatalog (from image)
Test->>CC: Poll until Serving=True
Test->>API: Create ClusterExtension (reference catalog)
CE->>OLM: Resolve bundle from catalog
OLM-->>CE: Return bundle manifest
CE->>API: Create operator deployment
Test->>API: Poll ClusterExtension.status (wait Installed=True)
API-->>Test: Installed condition reached
Test->>API: Verify controller-manager pods Running
Test->>API: Verify OADP/Velero CRDs exist
Test->>API: Check no Deprecated conditions
alt Upgrade scenario (if upgradeVersion set)
Test->>API: Patch ClusterExtension to new version
CE->>OLM: Resolve upgraded bundle
OLM-->>CE: Return new bundle manifest
CE->>API: Update operator deployment
Test->>API: Poll for new bundle version & Installed=True
Test->>API: Verify controller-manager Running again
end
Test->>API: Delete ClusterExtension
Test->>API: Wait for deletion
Test->>API: Delete ClusterCatalog
Test->>API: Delete ClusterRoleBinding & ServiceAccount
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: weshayutin The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/manager/kustomization.yaml`:
- Around line 7-8: The checked-in kustomization currently pins newName/newTag to
the ephemeral ttl.sh image (ttl.sh/oadp-operator-7e53a850:1h); replace this with
a stable, non-expiring default (e.g., the official oadp-operator image and a
permanent tag or digest) or remove the ttl.sh newName/newTag entries so the
repository manifest does not reference an expiring image; ensure any temporary
ttl.sh usage is moved into CI/deploy workflows that inject the test image
dynamically rather than committing it to kustomization.yaml.
In `@Makefile`:
- Around line 1052-1057: The cleanup target test-olmv1-cleanup unconditionally
deletes ClusterCatalog $(OLMV1_CATALOG); guard that deletion so we only remove a
catalog the tests created by checking the same creation condition or a creation
marker. Modify test-olmv1-cleanup to only run the $(OC_CLI) delete
clustercatalog $(OLMV1_CATALOG) line when OLMV1_CATALOG_IMAGE is set (or when a
persisted marker like OLMV1_CATALOG_CREATED file/env var exists), and update the
Catalog creation step (the rule that creates the catalog) to set that marker
(e.g., touch a file or export a flag) so cleanup can safely detect it before
deleting.
In `@tests/olmv1/olmv1_install_test.go`:
- Around line 107-122: The current gomega.Eventually loop only checks
pod.Status.Phase == corev1.PodRunning which can false-pass; update the check in
the Eventually closure (the block using kubeClient.CoreV1().Pods(...).List and
iterating pods.Items) to verify readiness instead: for each pod, inspect
pod.Status.Conditions for condition.Type == corev1.PodReady with Status ==
corev1.ConditionTrue (or alternatively fetch the Deployment via
kubeClient.AppsV1().Deployments(namespace).Get and assert the Deployment status
has an Available condition == True / status.AvailableReplicas > 0); apply the
same change to the other occurrence mentioned (lines 213-226) so tests assert
PodReady or Deployment Available rather than just PodRunning.
- Around line 167-181: The current code reads the ClusterExtension via
getClusterExtension and then calls
dynamicClient.Resource(clusterExtensionGVR).Update after changing catalogSpec
(using unstructuredNestedMap/unstructuredSetNestedMap), which can race
controller status updates and yield 409s; instead patch only the
spec.source.catalog fields (or wrap the update in retry.RetryOnConflict) rather
than updating the whole object: construct a minimal merge patch containing
spec.source.catalog.version and spec.source.catalog.upgradeConstraintPolicy and
call dynamicClient.Resource(clusterExtensionGVR).Patch with types.MergePatchType
(add import "k8s.io/apimachinery/pkg/types"), or if you prefer keep Update, wrap
the read/modify/write in retry.RetryOnConflict to retry on conflicts. Ensure
references to getClusterExtension, unstructuredNestedMap,
unstructuredSetNestedMap, and dynamicClient.Resource(clusterExtensionGVR) are
updated accordingly.
In `@tests/olmv1/olmv1_suite_test.go`:
- Around line 110-126: The ClusterRoleBinding name in ensureClusterAdminBinding
only uses saName so it can collide across namespaces; change the naming or
reconcile existing bindings: either make bindingName include the namespace
(e.g., bindingName := saName + "-" + ns + "-cluster-admin") so it's unique per
namespace, or when Create returns AlreadyExists call
kubeClient.RbacV1().ClusterRoleBindings().Get to load the existing
ClusterRoleBinding and update its Subjects (add or replace the ServiceAccount
subject for {Name: saName, Namespace: ns}) and then call Update to persist the
corrected subjects; implement one of these approaches inside
ensureClusterAdminBinding.
- Around line 270-295: The cleanupOrphanedCRDs function is destructive on shared
clusters; change it to be gated behind an explicit opt-in (e.g., a test flag or
env var like TEST_DELETE_ORPHAN_CRDS) or a deterministic dedicated-cluster check
before calling dynamicClient.Resource(crdGVR).Delete, and after issuing Delete
for each CRD found by cleanupOrphanedCRDs poll/wait (using
dynamicClient.Resource(crdGVR).Get in a loop with backoff and timeout) until
apierrors.IsNotFound confirms the resource is fully removed before counting it
as deleted; ensure you still handle non-NotFound errors via logging and skip
deletion when the opt-in flag is not set.
- Around line 315-323: When Create on dynamicClient.Resource(clusterCatalogGVR)
returns apierrors.IsAlreadyExists(err), fetch the existing ClusterCatalog (using
dynamicClient.Resource(clusterCatalogGVR).Get with the same name and ctx) and
validate its image field against the requested image variable; if they differ,
fail the test (gomega.Expect/return error) or update/replace the catalog to
match the requested image instead of silently reusing it. Ensure the check
references the existing object's image path (e.g., status/spec field used for
image in ClusterCatalog) and only set createdCatalog = true and log "Created
ClusterCatalog" when you actually created or successfully reconciled the
resource to the desired image.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 77bf8ce7-890a-4973-98c9-d65314a9d12c
📒 Files selected for processing (5)
Makefileconfig/manager/kustomization.yamltests/olmv1/.gitignoretests/olmv1/olmv1_install_test.gotests/olmv1/olmv1_suite_test.go
Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
Makefile (1)
1053-1055:⚠️ Potential issue | 🟠 MajorMake
ClusterCatalogdeletion opt-in.Line 1055 still deletes
$(OLMV1_CATALOG)unconditionally. The suite itself only deletes a catalog after it knows it created it (tests/olmv1/olmv1_suite_test.gosetscreatedCatalogonly on successful create, andtests/olmv1/olmv1_install_test.gochecks that flag inAfterAll). Pointing this cleanup target at an existing/shared catalog will remove a resource the tests did not own.🧹 Safer cleanup sketch
+OLMV1_DELETE_CATALOG ?= false + test-olmv1-cleanup: login-required ## Cleanup resources created by OLMv1 tests. $(OC_CLI) delete clusterextension oadp-operator --ignore-not-found=true - $(OC_CLI) delete clustercatalog $(OLMV1_CATALOG) --ignore-not-found=true + `@if` [ "$(OLMV1_DELETE_CATALOG)" = "true" ]; then \ + $(OC_CLI) delete clustercatalog $(OLMV1_CATALOG) --ignore-not-found=true; \ + fi $(OC_CLI) delete clusterrolebinding $(OLMV1_SERVICE_ACCOUNT)-cluster-admin --ignore-not-found=true $(OC_CLI) delete sa $(OLMV1_SERVICE_ACCOUNT) -n $(OLMV1_NAMESPACE) --ignore-not-found=trueIf you want parity with the suite’s ownership check, persist a creation marker and key catalog deletion off that instead of a plain name match.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 1053 - 1055, The Makefile target test-olmv1-cleanup currently unconditionally deletes the ClusterCatalog $(OLMV1_CATALOG); change it to be opt-in by checking a persisted "created" marker (e.g., touch a file when the suite successfully creates the catalog) before running the delete command in the test-olmv1-cleanup target. Specifically, modify the test-olmv1-cleanup target to only run `$(OC_CLI) delete clustercatalog $(OLMV1_CATALOG)` if the marker file exists, and ensure tests that create the catalog (tests/olmv1/olmv1_suite_test.go and olmv1_install_test.go flow) write/remove that marker so ownership is respected instead of deleting a shared catalog by name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@Makefile`:
- Around line 1053-1055: The Makefile target test-olmv1-cleanup currently
unconditionally deletes the ClusterCatalog $(OLMV1_CATALOG); change it to be
opt-in by checking a persisted "created" marker (e.g., touch a file when the
suite successfully creates the catalog) before running the delete command in the
test-olmv1-cleanup target. Specifically, modify the test-olmv1-cleanup target to
only run `$(OC_CLI) delete clustercatalog $(OLMV1_CATALOG)` if the marker file
exists, and ensure tests that create the catalog
(tests/olmv1/olmv1_suite_test.go and olmv1_install_test.go flow) write/remove
that marker so ownership is respected instead of deleting a shared catalog by
name.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b65495fa-dc13-49e1-a8c0-13056be93809
📒 Files selected for processing (1)
Makefile
Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 55
The issue — "Test OLMv0 to OLMv1 upgrade path/migration" — covers:
It's assigned to you and backlinked to PR |
This comment has been minimized.
This comment has been minimized.
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 55
"Add Makefile target to upgrade OLMv0 install to OLMv1 install" covers:
It's assigned to you and backlinked to PR |
|
@kaovilai taking this over :) |
…anifest generation, and implement migration tests Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
Makefile (1)
1131-1132: 💤 Low value
xargs -ris GNU-specific and may fail on macOS.The
-rflag (don't run if stdin is empty) is a GNU extension not available in BSD xargs. On macOS, this will error unless GNU coreutils is installed.Possible fix using shell conditional
- -$(OC_CLI) get crd -o name 2>/dev/null | grep -E '\.oadp\.openshift\.io|\.velero\.io' | \ - xargs -r $(OC_CLI) delete --ignore-not-found=true || true + -CRDS=$$($(OC_CLI) get crd -o name 2>/dev/null | grep -E '\.oadp\.openshift\.io|\.velero\.io'); \ + if [ -n "$$CRDS" ]; then echo "$$CRDS" | xargs $(OC_CLI) delete --ignore-not-found=true; fi || true🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` around lines 1131 - 1132, The Makefile line uses GNU-only xargs -r which breaks on macOS; change the command to guard against empty stdin instead of relying on -r: capture the output of "$(OC_CLI) get crd -o name 2>/dev/null | grep -E '\.oadp\.openshift\.io|\.velero\.io'" into a variable or test it, and only pipe to "xargs $(OC_CLI) delete --ignore-not-found=true" when non-empty; update the invocation that currently contains "xargs -r" and the surrounding "$(OC_CLI) get crd -o name" pipeline accordingly so the delete step is skipped on empty input in a portable POSIX-compatible way.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@Makefile`:
- Around line 1131-1132: The Makefile line uses GNU-only xargs -r which breaks
on macOS; change the command to guard against empty stdin instead of relying on
-r: capture the output of "$(OC_CLI) get crd -o name 2>/dev/null | grep -E
'\.oadp\.openshift\.io|\.velero\.io'" into a variable or test it, and only pipe
to "xargs $(OC_CLI) delete --ignore-not-found=true" when non-empty; update the
invocation that currently contains "xargs -r" and the surrounding "$(OC_CLI) get
crd -o name" pipeline accordingly so the delete step is skipped on empty input
in a portable POSIX-compatible way.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9043a314-cd22-4f4c-ab9c-d117aa57e4de
📒 Files selected for processing (5)
Makefilebundle/manifests/oadp-operator.clusterserviceversion.yamlconfig/manifests/bases/oadp-operator.clusterserviceversion.yamltests/olmv1/olmv1_migrate_test.gotests/olmv1/olmv1_suite_test.go
OLMv1 cannot adopt resources created by OLMv0. Add cleanup step that deletes all olm.managed=true labeled resources (ServiceAccounts, Roles, RoleBindings, Deployments, ClusterRoles, ClusterRoleBindings) before ClusterExtension install. Also add CatalogSource image detection to migrate custom FBC catalogs to ClusterCatalog, and update Makefile migration target with matching phases. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
The upgrade-v0-to-olmv1 Makefile target was missing the OLMv0 remnant cleanup step. OLMv1 refuses to install when it finds pre-existing resources labeled olm.managed=true (ServiceAccounts, Roles, ClusterRoles, etc). Add Phase 3b to delete these before ClusterExtension creation. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Makefile`:
- Around line 1141-1142: The Makefile currently runs $(OC_CLI) get crd ... |
xargs ... delete which removes OADP/Velero CRDs cluster-wide and thus destroys
all CR instances; change this to avoid deleting CRDs: either (A) remove the CRD
deletion entirely and only delete CR instances by listing and deleting
namespaced CRs (use $(OC_CLI) get <cr-kind> -n $(TARGET_NAMESPACE) ... and
delete those), or (B) gate the CRD deletion behind an explicit opt-in variable
(e.g. SKIP_CRD_DELETE or CONFIRM_DELETE_CRDS) and only run the existing
$(OC_CLI) delete command when that variable is set; target the unique symbols
$(OC_CLI), get crd, and the xargs ... delete pipeline to implement the safe
alternative.
In `@tests/olmv1/olmv1_migrate_test.go`:
- Around line 118-129: The cleanup currently swallows errors when listing or
deleting CatalogSources (calls using
dynamicClient.Resource(catalogSourceGVR).Namespace("openshift-marketplace").List/Delete
and the loop that uses isDefaultCatalogSource), causing flaky tests; change the
logic to assert on errors instead of ignoring them: check and fail the test
(using gomega/Expect or ginkgo.Fail) if the List returns an error and for each
Delete capture its error and assert it succeeded (or retry/collect and fail
after loop), and include the resource name in failure messages so failures are
deterministic and debuggable.
- Around line 82-85: The Eventually checks call
dynamicClient.Resource(subscriptionGVR).Namespace(namespace).List(...) and
ignore the returned error, which can cause nil derefs; update the lambda used in
gomega.Eventually (the anonymous func passed to gomega.Eventually at both
occurrences) to handle the error from List — if err != nil return a sentinel
value (e.g., -1) or otherwise surface the error so the Eventually assertion
won’t access list.Items on a nil list; replace the unconditional return
len(list.Items) in the lambda with logic that checks err and only returns
len(list.Items) when err == nil.
- Around line 170-183: The current deletion loops iterate all
ClusterRoles/ClusterRoleBindings matched by olmSelector and may remove non-OADP
resources; modify the logic in the blocks using
kubeClient.RbacV1().ClusterRoles().List / ClusterRoleBindings().List (variables
crs and crbs) to only delete items that are known OADP remnants by applying an
additional safety filter (e.g., check cr.Name or crb.Name for OADP-specific
prefixes like "oadp", "velero", or the OADP operator name, or check for a
specific label/annotation that identifies OADP resources such as "app=oadp" or
an OADP owner annotation) and skip/log all others instead of deleting them
unconditionally.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ec3a3efd-e2fd-4ef1-9512-a5174ce744ab
📒 Files selected for processing (2)
Makefiletests/olmv1/olmv1_migrate_test.go
…ion test - Fix CatalogSource cleanup to scan both openshift-marketplace and operator namespace (CI places CatalogSource in operator namespace via optional-operators-subscribe) - Add OLMv0 CSV version capture before migration and verify same version installed after OLMv1 ClusterExtension install - Verify installed bundle came from expected catalog, not community default - Pin ClusterExtension to specific ClusterCatalog via selector.matchLabels - Build fresh operator/bundle/catalog images in upgrade-v0-to-olmv1 to avoid expired ttl.sh images - Add verbose diagnostics for ClusterCatalog failures (image ref, catalogd logs) - Append test-upgrade-v0-to-olmv1 to test-e2e target for Prow presubmit - Add usage comments for local and CI/Prow workflows Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
waitForClusterCatalogServing silently returned false when no conditions were present, making it appear the test was hanging with no output. Add log line so the user sees progress during the 5min wait. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
will update makefile to have less var that needs overriding |
OLMv1 test targets required manually setting OLMV1_CATALOG_IMAGE which confused developers. Now defaults to ttl.sh like deploy-olm does. - OLMV1_CATALOG_IMAGE defaults to ttl.sh/oadp-operator-catalog-$(GIT_REV) - OLMV1_CHANNEL defaults to $(DEFAULT_CHANNEL) instead of empty - test-olmv1 auto-builds operator+bundle+catalog before running tests - Just run: make test-olmv1 (no manual image vars needed) Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
OLMV1_VERSION was empty by default, requiring manual override. Now defaults to $(VERSION) matching the built bundle version. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
Note Responses generated with Claude @weshayutin Fixed both issues: 1. No more empty defaults — All OLMv1 vars now have sensible defaults:
2. Auto-build — make test-olmv1No manual image vars needed. 3. Hang logging — |
Print all resolved variable values at the start of test-olmv1 so developers can see what defaults are being used without guessing. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Migrate all 15 oadp-operator variant CI configs from the deprecated index image path to the recommended operator-sdk bundle workflow. Changes per config: - operator.bundles: add `as: oadp-operator-bundle` + `skip_building_index: true` - dependencies: OO_INDEX: ci-index → OO_BUNDLE: oadp-operator-bundle - workflow: optional-operators-ci-aws → optional-operators-ci-operator-sdk-aws - env: replace OO_CHANNEL/OO_PACKAGE/OO_TARGET_NAMESPACES with OO_INSTALL_MODE: OwnNamespace The deprecated index image building will be removed from ci-operator soon. See: https://docs.ci.openshift.org/how-tos/testing-operator-sdk-operators/#building-an-index-deprecated Unblocks: openshift/oadp-operator#2160 Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
Skip tests gracefully when OLMv1 CRDs (ClusterExtension, ClusterCatalog) are not present on the cluster, instead of failing with confusing API errors. This handles clusters running OCP < 4.20 or without OLMv1. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Migrate all 15 oadp-operator variant CI configs from the deprecated index image path to the recommended operator-sdk bundle workflow. Changes per config: - operator.bundles: add `as: oadp-operator-bundle` + `skip_building_index: true` - dependencies: OO_INDEX: ci-index → OO_BUNDLE: oadp-operator-bundle - workflow: optional-operators-ci-aws → optional-operators-ci-operator-sdk-aws - env: replace OO_CHANNEL/OO_PACKAGE/OO_TARGET_NAMESPACES with OO_INSTALL_MODE: OwnNamespace The deprecated index image building will be removed from ci-operator soon. See: https://docs.ci.openshift.org/how-tos/testing-operator-sdk-operators/#building-an-index-deprecated Unblocks: openshift/oadp-operator#2160 Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
Note Responses generated with Claude @weshayutin The error WhyOADP declares Feature gate status per OCP versionChecked all release branches in openshift/api features.go:
Not promoted to Default on any branch (including 5.0, 5.1, master). TechPreview required everywhere. Valid test clustersPer OCPSTRAT-2268 design doc, OPRUN-4131, and openshift/api release branches: # Recommended: 4.21+ nightly with TechPreview
clusterbot launch 4.21.0-0.nightly aws,techpreviewThe gate exists from 4.19 but 4.21 is the recommended minimum per the design doc. Enabling TechPreview on an existing clusterTechPreview can be enabled post-install (runtime change — operator-controller restarts with new flags). Warning: this is irreversible and blocks upgrades. oc patch featuregate cluster --type merge -p '{"spec":{"featureSet":"TechPreviewNoUpgrade"}}'
# Wait for nodes to reboot and operator-controller to restartHow it works under the hood
This is a runtime behavior change in operator-controller, not a CRD or schema change. GA trackingOPRUN-4131 / OCPSTRAT-1982 — once GA, the gate moves to Default and TechPreview is no longer required. Code fixTests now check Refs: openshift/api features.go · cluster-olm-operator gate mapping · operator-controller OwnNamespace PR |
OwnNamespace install mode requires TechPreviewNoUpgrade feature set on OCP 4.21+. Without it, OLMv1 rejects bundles that don't declare AllNamespaces support. Now checks the cluster's FeatureSet and skips with actionable message including clusterbot launch command. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
@weshayutin: This pull request references OADP-7235 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Migrate all 15 oadp-operator variant CI configs from the deprecated index image path to the recommended operator-sdk bundle workflow. Changes per config: - operator.bundles: add `as: oadp-operator-bundle` + `skip_building_index: true` - dependencies: OO_INDEX: ci-index → OO_BUNDLE: oadp-operator-bundle - workflow: optional-operators-ci-aws → optional-operators-ci-operator-sdk-aws - env: replace OO_CHANNEL/OO_PACKAGE/OO_TARGET_NAMESPACES with OO_INSTALL_MODE: OwnNamespace The deprecated index image building will be removed from ci-operator soon. See: https://docs.ci.openshift.org/how-tos/testing-operator-sdk-operators/#building-an-index-deprecated Unblocks: openshift/oadp-operator#2160 Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Migrate all oadp-operator variant CI configs from deprecated index image path to operator-sdk bundle workflow. Also removes EOL oadp-1.0 variant configs and enables TechPreview for OCP 4.22+ testing. Changes per config: - operator.bundles: add as: oadp-operator-bundle + skip_building_index - base_images: add cli-operator-sdk (not in OCP release payload) - dependencies: OO_INDEX: ci-index -> OO_BUNDLE: oadp-operator-bundle - workflow: optional-operators-ci-aws -> optional-operators-ci-operator-sdk-aws - env: replace OO_CHANNEL/OO_PACKAGE/OO_TARGET_NAMESPACES with OO_INSTALL_MODE: OwnNamespace TechPreviewNoUpgrade (OCP 4.22+ configs only): - Enables NewOLMOwnSingleNamespace gate for OLMv1 OwnNamespace support - Enables VolumeGroupSnapshot-based backup/restore testing Unblocks: openshift/oadp-operator#2160 Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
NewOLMOwnSingleNamespace can be enabled via either TechPreviewNoUpgrade (all TechPreview gates) or CustomNoUpgrade (just this single gate). Both block upgrades. Update skip message, code comments, and fallback detection to document and support both options. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
@weshayutin: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Gate check is working. |
| _ = deleteClusterExtension(ctx, packageName) | ||
|
|
||
| ginkgo.By("Creating the ClusterExtension") | ||
| ce := buildClusterExtension(packageName, packageName, namespace, serviceAccountName) |
There was a problem hiding this comment.
In the fresh install test (olmv1_install_test.go), the ClusterExtension is created without a catalog selector when catalogImage is empty:
ce := buildClusterExtension(packageName, packageName, namespace, serviceAccountName)
But in buildClusterExtension, the catalog selector is only added when catalogImage != "":
if catalogImage != "" {
catalogSpec["selector"] = map[string]interface{}{
"matchLabels": map[string]interface{}{
"olm.operatorframework.io/metadata.name": catalogName,
},
}
}On a cluster with default catalogs (like openshift-community-operators), the community OADP package could be resolved instead of the test build. The PR's own design notes call this out explicitly. But the install test doesn't protect against it the way the migration test does (which uses withCatalogSelector).
This might be intentional since test-olmv1 always builds and pushes a catalog image, so catalogImage is always set. But if someone runs the test manually against a productized catalog without setting catalogImage, they could get unexpected resolution.
| - supported: true | ||
| type: OwnNamespace | ||
| - supported: false | ||
| - supported: true |
There was a problem hiding this comment.
Enabling SingleNamespace is needed for OLMv1 compatibility, makes sense. One thing to be aware of: this also changes OLMv0 behavior. Previously OLM would reject an OperatorGroup that targets a namespace different from the install namespace. With this change, that configuration is now allowed.
OADP assumes the operator pod runs in the same namespace it watches (Velero deployment, secrets, SCC management all target WATCH_NAMESPACE). If someone creates an OperatorGroup with a divergent targetNamespaces, things would likely break.
Low risk since nobody does that accidentally, and the OLMv1 flow always sets watchNamespace equal to the install namespace. But worth documenting in the CSV or release notes that SingleNamespace is supported only when the target namespace matches the install namespace.
|
Heads up: found a bug in the This needs to merge before Note Responses generated with Claude |
|
please approve #2204 then we can retest this PR after release repo has a replacement PR per 2204 |
|
CI infrastructure PR updated: openshift/release#79327 (replaces openshift/release#79152). Now uses FBC catalog image via Note Responses generated with Claude |
Overview
Enable OADP users to install and migrate to OLMv1 (
ClusterExtension-based) management. This PR covers:tests/olmv1/)make test-olmv1/make test-olmv1-cleanupMakefile targetsmake generate-olmv1-manifest— generates OLMv1 install manifest per OCPSTRAT-2268 adoption templatemake upgrade-v0-to-olmv1Makefile target for migrating existing OLMv0 installsmake test-upgrade-v0-to-olmv1— Ginkgo migration test with version verificationinstallModes: enableSingleNamespace=true(OADP-4051)test-e2eFixes #2194
Fixes #2193
Prerequisites (per OCPSTRAT-2268 verification)
Before testing, ensure your cluster meets these requirements:
NewOLMOwnSingleNamespacefeature gate enabled. Two options (both block upgrades):clusterbot launch 4.21.0-0.nightly aws,techpreviewoc patch featuregate cluster --type=json -p '[{"op":"add","path":"/spec/featureSet","value":"CustomNoUpgrade"},{"op":"add","path":"/spec/customNoUpgrade","value":{"enabled":["NewOLMOwnSingleNamespace"]}}]'unsupported bundle: bundle does not support AllNamespaces install modeoperator-controllerandcatalogdmust be presentoc get pods -n openshift-operator-controllerandoc get pods -n openshift-catalogdVerification steps (from design doc)
NewOLMOwnSingleNamespaceenabled (TechPreviewNoUpgrade or CustomNoUpgrade)Related Issues & Epics
upgrade-v0-to-olmv1Makefile targetProgress
Fresh Install —
test-olmv1(OADP-7235)OLMV1_*Makefile variables with sensible defaults (ttl.sh images,DEFAULT_CHANNEL,VERSION)make test-olmv1target — auto-builds operator+bundle+catalog, runs Ginkgo suitemake test-olmv1-cleanuptarget — deletes ClusterExtension, ClusterCatalog, SA, CRBClusterExtension(waits forInstalled=True, fail-fast onInvalidConfiguration/Failed)RunningDeprecated,PackageDeprecated,ChannelDeprecated,BundleDeprecatedallFalse)upgradeConstraintPolicy: SelfCertified(skipped whenOLMV1_UPGRADE_VERSIONunset)cleanupOrphanedCRDs— removes CRDs left by OLMv0, waits for async deletion to completeensureClusterCatalog/waitForClusterCatalogServingwith image validation and progress loggingOLMV1_PACKAGE=redhat-oadp-operator) — PASSED usingredhat-operator-index:v4.21OLMv1 Manifest —
generate-olmv1-manifest(OCPSTRAT-2268)make generate-olmv1-manifest— generatesoadp-olmv1-manifest.yamlper official adoption template-installer, CRB=-installer-bindingOLMV1_CHANNEL/OLMV1_VERSIONappended when setOLMV1_PIN_CATALOG— addsselector.matchLabelsto pin ClusterExtension to a specific ClusterCatalog.gitignoreOLMv0 → OLMv1 Migration —
upgrade-v0-to-olmv1(#2194)make upgrade-v0-to-olmv1target with 6 phases (build, remove OLMv0, CRDs, remnants, ClusterCatalog, ClusterExtension)xargs -r)Migration Test —
test-upgrade-v0-to-olmv1(#2193)make test-e2ewith exit code preservationCSV / Bundle (OADP-4051)
SingleNamespace: trueininstallModesDesign Notes
Why
spec.config.inline.watchNamespace(not an annotation)OADP's CSV declares
OwnNamespaceinstall mode. OLMv1 requiresspec.config.inline.watchNamespaceset to the install namespace — without it the install fails withInvalidConfiguration. The metadata annotationolm.operatorframework.io/watch-namespaceis not read by operator-controller; only the spec field matters.Why delete CRDs before migrating from OLMv0
OLMv1 takes ownership of CRDs it creates. CRDs already present on the cluster (owned by OLMv0 or manually created) cannot be adopted — the install will proceed but CRD lifecycle management is broken.
cleanupOrphanedCRDshandles this by deleting*.oadp.openshift.ioand*.velero.ioCRDs before install, then waiting for async deletion to complete.Why pin ClusterExtension to a specific ClusterCatalog
Without
catalog.selector.matchLabels, OLMv1 resolves from all available ClusterCatalogs. On clusters with default catalogs (e.g.,openshift-community-operators), the community OADP package (v0.5.6) may be selected instead of the custom build.Platform requirement
OCP 4.21+ with
NewOLMOwnSingleNamespacefeature gate enabled via TechPreviewNoUpgrade or CustomNoUpgrade (both block upgrades). Without this gate, OLMv1 rejects bundles that don't declareAllNamespacessupport. GA tracking: OPRUN-4131 / OCPSTRAT-1982.Usage
How to Test (Reviewer Guide)
Prerequisites: OCP 4.21+ cluster with
NewOLMOwnSingleNamespaceenabled (TechPreviewNoUpgrade or CustomNoUpgrade). Standalone (not Hypershift). Logged in viaoc login.1. Fresh OLMv1 install
make test-olmv1 # Verify: all Ginkgo specs pass, controller-manager pod Running, CRDs created make test-olmv1-cleanup2. OLMv0 → OLMv1 migration
make deploy-olm make upgrade-v0-to-olmv1 # Verify: ClusterExtension Installed=True, controller-manager Running oc get clusterextension oadp-operator3. Migration test with assertions
make deploy-olm make test-upgrade-v0-to-olmv1 # Verify: all specs pass, version matches, correct catalog usedCommon issues:
unsupported bundle: bundle does not support AllNamespaces install mode→ enableNewOLMOwnSingleNamespacevia TechPreviewNoUpgrade or CustomNoUpgradettl.shimages expire afterTTL_DURATION(default 1h) —make test-olmv1auto-builds fresh imagesFiles Changed
Makefiletest-e2etests/olmv1/olmv1_suite_test.gotests/olmv1/olmv1_install_test.gotests/olmv1/olmv1_migrate_test.gotests/olmv1/.gitignore.gitignoreoadp-olmv1-manifest.yaml.golangci.yamltests/olmv1/from lintingbundle/manifests/oadp-operator.clusterserviceversion.yamlSingleNamespaceinstallModeconfig/manifests/bases/oadp-operator.clusterserviceversion.yamlSingleNamespaceinstallModeNote
Responses generated with Claude