OCPBUGS-85237: Manually uncordoned nodes are not automatically re-cordoned#6028
OCPBUGS-85237: Manually uncordoned nodes are not automatically re-cordoned#6028djoshy wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@djoshy: This pull request references Jira Issue OCPBUGS-85237, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
WalkthroughThe drain controller now detects and handles the case where a node is externally uncordoned ( ChangesDrain External Uncordon Protection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@djoshy: This pull request references Jira Issue OCPBUGS-85237, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cherry-pick release-4.22 |
|
@djoshy: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@djoshy: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
- What I did
handleNodeEventfilter to queue nodes that have a change in cordon states and not just the drain annotationssyncNodewill now ensure that the node is still cordoned at the end of a drain, so that we do not apply the completion annotation for a node that was externally uncordoned.- How to verify it
Bug description has reproduction steps, but very simply, apply an MC change to a pool, and uncordon the node while it is being drain via oc:
The MCC won't immediately cordon the node as it is currently being processed in the drain queue. Once the drain completes, it will check if the node was uncordoned for some reason and skip applying the completion annotation:
OTOH, if the drain fails due to a timeout/PDVB, we'd revert to our old fix path where the node is automatically queued for another drain event.
In both cases, the node will be cordoned again if it is not:
Summary by CodeRabbit
Bug Fixes
Tests