OCPBUGS-77839: Fix node degrades due to file and OS update failures#5744
Conversation
|
Skipping CI for Draft Pull Request. |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2 periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-2of2 |
|
@isabella-janssen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1f3b6c80-18a3-11f1-84c1-ae92b5479e4b-0 |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of3 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of3 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-3of3 |
|
@isabella-janssen: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e0175f40-18a3-11f1-8d17-3e721c3f0d52-0 |
|
@isabella-janssen: This pull request references Jira Issue OCPBUGS-77839, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@isabella-janssen: This pull request references Jira Issue OCPBUGS-77839, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/payload-with-prs periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of3 openshift/origin#30841 |
|
@isabella-janssen: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info. |
|
/payload-job-with-prs periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of3 openshift/origin#30841 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f4504140-18ba-11f1-8b71-b09bebb82348-0 |
|
/payload-job-with-prs periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of3 openshift/origin#30841 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7cc698d0-18c5-11f1-8a6e-27135da56f0c-0 |
|
/payload-job-with-prs periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of3 openshift/origin#30841 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5cb34500-18d1-11f1-8d75-08a9c4281b29-0 |
|
/verified by payloads See #5744 (comment) |
|
@isabella-janssen: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
| // When ImageModeStatusReporting is enabled, update the `MachineConfigNodeUpdateFiles` condition to report the experienced error | ||
| if imageModeStatusReportingEnabled { | ||
| err = upgrademonitor.GenerateAndApplyMachineConfigNodes( | ||
| mcnErr := upgrademonitor.GenerateAndApplyMachineConfigNodes( |
There was a problem hiding this comment.
This introduces some changes I'm not sure we want. So, do we really want to ignore errors of GenerateAndApplyMachineConfigNodes? If so, ok, this is fine, but, if we want to fail the call to updateFiles and properly report to the caller without loosing the original error I'd say it's better to:
if mcnErr != nil {
err = errors.Join(err, fmt.Errorf("eError making MCN for failed file apply: %v", err))
}There was a problem hiding this comment.
Unless I'm misunderstanding your concern here, the change here should better match the existing MCN failure case. Here's an existing similar situation:
machine-config-operator/pkg/daemon/update.go
Lines 994 to 1013 in 9050f6f
The example is logging the MCN apply failure (not treating it as a fatal error) then returning only the reconcilableError that was triggering the need for the MCN update. This new change should match that pattern.
There was a problem hiding this comment.
ah, ok, so we area already just printing those errors, ok, I'm fine in that case.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: isabella-janssen, pablintino The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test unit |
|
@isabella-janssen: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest-required |
d5dfdcf
into
openshift:main
|
@isabella-janssen: Jira Issue OCPBUGS-77839: Some pull requests linked via external trackers have merged: The following pull request, linked via external tracker, has not merged:
All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-77839 has not been moved to the MODIFIED state. This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Fix included in accepted release 4.22.0-0.nightly-2026-03-10-100251 |
Closes: OCPBUGS-77839
- What I did
This updates the
errvariable assignment so it is not inaccurately reset on file and OS update failures.- How to verify it
Manually:
In a tech preview cluster, MCPs should properly degrade when an upgrade fails on a file or OS update.
With payloads:
The
Should properly report MCN conditions on node degradetest should pass when running the MCO's tech preview disruptive suite.- Description for the changelog
OCPBUGS-77839: Fix node degrades due to file and OS update failures