-
Notifications
You must be signed in to change notification settings - Fork 102
dataplane: Make MachineConfig CRD dependency optional #1757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataplane: Make MachineConfig CRD dependency optional #1757
Conversation
|
approach here seems reasonable to me. will let others weigh in |
|
/retest |
abays
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
Deferring to @slagle for final approval |
slagle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, no problems with the change.
I have an inline comment for potential improvement (assuming I'm reading it correctly).
internal/dataplane/inventory.go
Outdated
| helper.GetLogger().Info("Disconnected environment detected but MachineConfig not available. "+ | ||
| "Registry configuration will not be propagated to dataplane nodes. "+ | ||
| "You may need to configure registries.conf manually.", | ||
| "error", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably ok for now, but I think we should file an issue to:
if isDisconnected and <<MachineConfig CRD doesn't exist>>; then
error and fail reconcile
Because I think it's pretty likely that this logged error goes completely unnoticed. Previously we failed hard if there was an error, now we just log and continue.
Should isNoMatchError be used to check if the error was actually a MachineConfig CRD not found error?
1ece03f to
3aef989
Compare
3aef989 to
0d4b629
Compare
| if util.IsNoMatchError(err) { | ||
| helper.GetLogger().Info("Disconnected environment detected but MachineConfig CRD not available. "+ | ||
| "Registry configuration will not be propagated to dataplane nodes. "+ | ||
| "You may need to configure registries.conf manually using ansibleVars "+ | ||
| "(edpm_podman_disconnected_ocp and edpm_podman_registries_conf).", | ||
| "error", err.Error()) | ||
| } else { | ||
| // CRD exists but resource not found, or other errors (network issues, | ||
| // permissions, etc.) - return the error. If MCO is installed but the | ||
| // registry MachineConfig doesn't exist, this indicates a misconfiguration. | ||
| return "", fmt.Errorf("failed to get MachineConfig registry configuration: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slagle iiuc this is what you meant, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, looks good
Problem:
When deploying the openstack-operator on clusters where the OpenShift
Machine Config Operator (MCO) is not installed, the
openstackdataplanenodeset controller fails to start with the error:
ERROR setup problem running manager {"error": "failed to wait for
openstackdataplanenodeset caches to sync: timed out waiting for cache
to be synced for Kind *v1.MachineConfig"}
This occurs because the controller unconditionally sets up a watch for
MachineConfig resources in SetupWithManager(). When the MachineConfig
CRD doesn't exist (e.g., on non-OpenShift Kubernetes clusters or
clusters without MCO), the informer cache sync times out and the
controller fails to start.
Similarly, the disconnected environment detection code
(IsDisconnectedOCP) would fail if ImageContentSourcePolicy or
ImageDigestMirrorSet CRDs don't exist.
Solution:
Implement conditional/dynamic CRD watching for MachineConfig:
1. Remove the MachineConfig watch from SetupWithManager() so the
controller can start without the CRD being present
2. Add ensureMachineConfigWatch() function that:
- Checks if the MachineConfig CRD exists by querying
apiextensions.k8s.io/v1/CustomResourceDefinition
- Dynamically adds the watch using Controller.Watch() if the CRD
exists
- Tracks watched resources to avoid duplicate watch registration
- Logs an informational message if the CRD is not available
3. Call ensureMachineConfigWatch() at the start of each reconciliation
to attempt setting up the watch when the CRD becomes available
4. Update IsDisconnectedOCP() to handle missing ICSP/IDMS CRDs
gracefully instead of returning an error
5. Update inventory generation error handling to distinguish between:
- IsNoMatchError (CRD not installed): Log warning and continue.
This is expected on non-OpenShift clusters or clusters without MCO.
- IsNotFound (CRD exists but resource missing): Return error.
If MCO is installed and a disconnected environment is detected,
the registry MachineConfig should exist. Missing resource indicates
misconfiguration.
This allows the operator to:
- Start successfully even without the MachineConfig CRD
- Work on non-OpenShift Kubernetes clusters
- Gracefully degrade disconnected environment support
- Automatically enable MachineConfig watching if the CRD becomes
available later
- Report actual misconfigurations when MCO is present but registry
MachineConfig is missing
The MachineConfig watch is used for disconnected/mirrored environments
to detect changes to the 99-master-generated-registries MachineConfig
and propagate registry configuration to dataplane nodes. This feature
is now optional rather than required.
Manual registry configuration without MCO:
If the MachineConfig CRD is not available but you need to configure
container registries on dataplane nodes, you can set the ansible
variables directly in the OpenStackDataPlaneNodeSet spec:
spec:
nodeTemplate:
ansible:
ansibleVars:
edpm_podman_disconnected_ocp: true
edpm_podman_registries_conf: |
unqualified-search-registries = ["registry.access.redhat.com"]
[[registry]]
prefix = ""
location = "quay.io/openstack-k8s-operators"
[[registry.mirror]]
location = "my-registry.example.com/openstack-k8s-operators"
Fixes: cache sync timeout when MachineConfig CRD is not installed
Related: OSPRH-24026
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
Add comprehensive unit tests for the disconnected environment detection and registry configuration retrieval functions in the util package. Tests cover: - IsNoMatchError: Detects both 'no matches for kind' (real API server) and 'no kind is registered' (fake client) errors - IsDisconnectedOCP: Tests with ICSP, IDMS, both, empty lists, and missing CRDs (graceful degradation) - GetMCRegistryConf: Tests successful retrieval, missing MachineConfig, missing CRD, invalid ignition format, missing base64 prefix, and invalid base64 content - Full scenario tests for disconnected environments and non-OpenShift cluster graceful degradation The IsNoMatchError function is updated to also detect 'no kind is registered' errors which occur in functional tests where the fake client behaves differently than a real Kubernetes API server. The MachineConfigNotFound test verifies that IsNotFound errors are NOT treated as IsNoMatchError - when the CRD exists but the resource doesn't, this should be an error (not a warning), indicating misconfiguration.
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/42ded77318af4a1db77f49124441f538 ❌ openstack-k8s-operators-content-provider FAILURE in 10m 00s |
0d4b629 to
0d0d158
Compare
|
/retest |
| if util.IsNoMatchError(err) { | ||
| helper.GetLogger().Info("Disconnected environment detected but MachineConfig CRD not available. "+ | ||
| "Registry configuration will not be propagated to dataplane nodes. "+ | ||
| "You may need to configure registries.conf manually using ansibleVars "+ | ||
| "(edpm_podman_disconnected_ocp and edpm_podman_registries_conf).", | ||
| "error", err.Error()) | ||
| } else { | ||
| // CRD exists but resource not found, or other errors (network issues, | ||
| // permissions, etc.) - return the error. If MCO is installed but the | ||
| // registry MachineConfig doesn't exist, this indicates a misconfiguration. | ||
| return "", fmt.Errorf("failed to get MachineConfig registry configuration: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, looks good
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: slagle, stuggi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c92ad83
into
openstack-k8s-operators:main
Problem:
When deploying the openstack-operator on clusters where the OpenShift Machine Config Operator (MCO) is not installed, the openstackdataplanenodeset controller fails to start with the error:
ERROR setup problem running manager {"error": "failed to wait for
openstackdataplanenodeset caches to sync: timed out waiting for cache
to be synced for Kind *v1.MachineConfig"}
This occurs because the controller unconditionally sets up a watch for MachineConfig resources in SetupWithManager(). When the MachineConfig CRD doesn't exist (e.g., on non-OpenShift Kubernetes clusters or clusters without MCO), the informer cache sync times out and the controller fails to start.
Similarly, the disconnected environment detection code (IsDisconnectedOCP) would fail if ImageContentSourcePolicy or ImageDigestMirrorSet CRDs don't exist.
Solution:
Implement conditional/dynamic CRD watching for MachineConfig:
Remove the MachineConfig watch from SetupWithManager() so the controller can start without the CRD being present
Add ensureMachineConfigWatch() function that:
Call ensureMachineConfigWatch() at the start of each reconciliation to attempt setting up the watch when the CRD becomes available
Update IsDisconnectedOCP() to handle missing ICSP/IDMS CRDs gracefully instead of returning an error
Update inventory generation to log a warning instead of failing when MachineConfig is not available in a disconnected environment
This allows the operator to:
The MachineConfig watch is used for disconnected/mirrored environments to detect changes to the 99-master-generated-registries MachineConfig and propagate registry configuration to dataplane nodes. This feature is now optional rather than required.
Manual registry configuration without MCO:
If the MachineConfig CRD is not available but you need to configure container registries on dataplane nodes, you can set the ansible variables directly in the OpenStackDataPlaneNodeSet spec:
Fixes: cache sync timeout when MachineConfig CRD is not installed
Related: OSPRH-24026