Skip to content

Add helm chart#163

Open
honghainguyen777 wants to merge 15 commits into
kubernetes-sigs:mainfrom
honghainguyen777:add-helm-chart
Open

Add helm chart#163
honghainguyen777 wants to merge 15 commits into
kubernetes-sigs:mainfrom
honghainguyen777:add-helm-chart

Conversation

@honghainguyen777
Copy link
Copy Markdown

@honghainguyen777 honghainguyen777 commented Mar 12, 2026

Description

Add a Helm chart for deploying the node-readiness-controller.

This chart installs the node-readiness-controller along with the required
Kubernetes resources, including:

  • Deployment for the controller
  • RBAC resources
  • Metrics and webhook services
  • ValidatingWebhookConfiguration
  • NodeReadinessRule CRD
  • Optional NodeReadinessRule resources configurable via values.yaml

The chart follows conventions used by existing charts in https://github.com/kubernetes-sigs
(e.g. descheduler) and supports customization via Helm values.

Relationship to #128

This PR overlaps with #128 (feat: provision helm chart). I missed that existing PR when opening this one. Depending on maintainer preference, this PR can either supersede #128 or be reconciled with it so we land a single Helm chart implementation.

Related Issue

None

Type of Change

/kind feature

Testing

The chart was tested locally using Helm:

helm lint charts/nrr-controller
helm template charts/nrr-controller
helm install nrr charts/nrr-controller --namespace node-readiness-controller --create-namespace

Verified:

  • Controller deployment starts successfully
  • CRD is installed
  • Optional NodeReadinessRule resources render correctly when defined in values.yaml
  • Chart renders cleanly with helm template

Checklist

  • make test passes
  • make lint passes
  • make lint-chart passes
  • make build-helm passes
  • make ct-helm passes

Does this PR introduce a user-facing change?

Adds a Helm chart for deploying the node-readiness-controller, including
CRD installation and optional NodeReadinessRule resources configurable
via Helm values.

Doc #(issue)

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 12, 2026
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Mar 12, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Mar 12, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @honghainguyen777!

It looks like this is your first PR to kubernetes-sigs/node-readiness-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/node-readiness-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 12, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @honghainguyen777. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 12, 2026

Deploy Preview for node-readiness-controller ready!

Name Link
🔨 Latest commit 68e3085
🔍 Latest deploy log https://app.netlify.com/projects/node-readiness-controller/deploys/6a01ef37d2f49b000851d66b
😎 Deploy Preview https://deploy-preview-163--node-readiness-controller.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 12, 2026
Copy link
Copy Markdown
Contributor

@sahitya-chandra sahitya-chandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this chart. I tested the PR locally from the fetched review-pr-163 branch

Commands/results:

  • helm lint charts/nrr-controller passed
  • helm template test charts/nrr-controller --namespace node-readiness-system rendered successfully
  • helm template test charts/nrr-controller --namespace node-readiness-system --set webhook.enabled=true --set validatingWebhook.enabled=true --set certManager.enabled=true rendered successfully
  • helm package ./charts/nrr-controller --dependency-update --destination /tmp/nrr-chart-package passed
  • Installed into a 3-node kind cluster using the PR’s make kind-multi-node
  • helm install nrr-test charts/nrr-controller --namespace node-readiness-system --create-namespace --wait --timeout 2m succeeded
  • Controller pod became Ready, the CRD was installed, and a sample NodeReadinessRule created through chart values reconciled successfully across all 3 kind nodes

I also installed the same Helm unittest plugin version used by the workflow and ran:
helm unittest charts/nrr-controller --strict

That currently fails with 4 failing tests:
Screenshot from 2026-05-08 15-18-29
so the rendered Deployment contains annotations by default

Screenshot from 2026-05-08 15-18-09

the plugin reports these as expect ... to be an array. These assertions likely need to use equal or target the parent arrays instead

Cc @ajaysundark

@honghainguyen777
Copy link
Copy Markdown
Author

Hi @sahitya-chandra, thanks a lot for the detailed testing and feedback!

I’d actually love to get more contributors involved with the chart so we can maintain it well going forward, so a follow-up fix from you would be more than welcome :)

@kaisoz
Copy link
Copy Markdown

kaisoz commented May 8, 2026

/assign @ajaysundark

for the final review

@kaisoz
Copy link
Copy Markdown

kaisoz commented May 8, 2026

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 8, 2026
@sahitya-chandra
Copy link
Copy Markdown
Contributor

pushing the unittest fix in a bit

@sahitya-chandra
Copy link
Copy Markdown
Contributor

/retest

@sahitya-chandra
Copy link
Copy Markdown
Contributor

sahitya-chandra commented May 8, 2026

Oh, I forgot that I don't have permissions to directly push to this branch, pushed to my fork instead, @honghainguyen777 can cherry-pick this f9b0a6a or I can raise a pr once this pr gets merged

passed locally:
image

Comment thread hack/verify-chart.sh
@@ -0,0 +1 @@
${CONTAINER_ENGINE:-docker} run -it --rm --network host --workdir=/data --volume ~/.kube/config:/root/.kube/config:ro --volume $(pwd):/data quay.io/helmpack/chart-testing:v3.7.0 /bin/bash -c "git config --global --add safe.directory /data; ct install --config=.github/ci/ct.yaml --helm-extra-set-args=\"--set=kind=Deployment\""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that Prow failure is because the current file starts directly with:

${CONTAINER_ENGINE:-docker} run ...

It needs a shell header like the repo expects:

#!/usr/bin/env bash

# Copyright The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: honghainguyen777
Once this PR has been reviewed and has the lgtm label, please ask for approval from ajaysundark. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@honghainguyen777
Copy link
Copy Markdown
Author

@sahitya-chandra I have cherry-picked your commit. Thank you very much! <3

@honghainguyen777
Copy link
Copy Markdown
Author

Tests passed:
image

@sahitya-chandra could you take a second look before @ajaysundark does the final review?

@sahitya-chandra
Copy link
Copy Markdown
Contributor

Re-reviewed, No blocking issues from my side, looks good

@kaisoz
Copy link
Copy Markdown

kaisoz commented May 8, 2026

/lgtm

since @sahitya-chandra who is more familiar with the project than me has reviewed it twice

@@ -0,0 +1,28 @@
{{- if and .Values.webhook.enabled .Values.validatingWebhook.enabled }}
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tried this chart, the webhook service was unreachable by API? Did you get a chance to test with webhook?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The issue was that the ValidatingWebhookConfiguration did not set clientConfig.service.port, so the API server defaulted to 443 while the chart service default was 8443. I was too focused on following the install-full.yaml.

I fixed this by aligning the chart default with upstream Kustomize (443 -> 9443) and explicitly wiring .Values.webhook.service.port into the ValidatingWebhookConfiguration.

Comment thread charts/nrr-controller/values.yaml Outdated

affinity: {}

# NOTE: Before enabling these rules. The CRD `nodereadinessrules.readiness.node.x-k8s.io` must already be installed in the cluster
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will have install time race with controller / webhook. how do we ensure cert-manager has issued certificates or webhook is serving?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Creating NodeReadinessRule resources in the same install that enables the validating webhook can race with cert-manager issuing certificates and the webhook becoming ready.

I updated the chart comments/README to call this out. The safer flow is to install the controller/webhook first, wait until it is ready, and then apply nodeReadinessRules in a follow-up helm upgrade.

Comment on lines +13 to +15
enforcementMode: {{ $rule.enforcementMode | quote }}
nodeSelector:
{{- toYaml $rule.nodeSelector | nindent 4 }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is missing some fields like dryrun etc

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The template now renders the full rule spec except for name, so fields such as dryRun are preserved instead of being individually enumerated. I added helm-unittest coverage for this.

taint:
{{- toYaml $rule.taint | nindent 4 }}
enforcementMode: {{ $rule.enforcementMode | quote }}
nodeSelector:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if $rule.nodeSelector was omitted in values?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. If nodeSelector is omitted from a rule in values, the template now defaults it to {} so the generated resource satisfies the CRD-required field. I added test coverage for omitted and provided nodeSelector.

Comment thread hack/kind_config.yaml Outdated
@@ -0,0 +1,18 @@
kind: Cluster
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why existing config/testing/kind/*.configs cannot be leveraged?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I removed the new hack/kind_config.yaml and updated kind-multi-node to reuse the existing config/testing/kind/kind-3node-config.yaml.

## TL;DR:

```shell
helm repo add node-readiness-controller https://kubernetes-sigs.github.io/node-readiness-controller/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this work for helm upgrade? how does future schema changes reach existing installs?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented this in the chart README. Helm installs CRDs from the chart crds/ directory during initial install, but does not upgrade or delete those CRDs during helm upgrade or helm uninstall.

For future schema changes, users need to apply the updated CRD before upgrading to a chart version that depends on it. Moving CRDs into templates/ solves the problem, but the CRD lifecycle becomes more dangerous. Alternatively, we can add a pre-install/pre-upgrade hook Job that runs kubectl apply for CRDs. For this PR I kept Helm’s standard crds/ behavior and documented that schema-changing upgrades require applying the updated CRD first.

Comment thread hack/verify-chart.sh
@@ -0,0 +1,17 @@
#!/usr/bin/env bash
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a hack/verify-chart-drift.sh and wire it into both hack/verify-all.sh and .github/workflows/helm.yaml. It doesnt have to be exhaustive check, maybe a simple diff check to match controller-gen output to chart?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added hack/verify-chart-drift.sh and wired it into both hack/verify-all.sh and .github/workflows/helm.yaml. The check runs make manifests and diffs the controller-gen CRD against the chart CRD.

admissionReviewVersions:
{{- toYaml .Values.validatingWebhook.admissionReviewVersions | nindent 6 }}
clientConfig:
service:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@honghainguyen777 Your values.yaml sets this default to 8443, but this service config doesnt have a wiring for 'port' field. This seems why webhook was failing.

Can we fix this and also add tests to include coverage for webhook service?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and added the tests

Comment thread charts/nrr-controller/values.yaml Outdated
Comment on lines +100 to +101
port: 8443
targetPort: 9443
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this conflicts with current upstream config/webhook/service.yaml that uses 443 -> 9443. This is why we should have guardrails to maintain configuration consistency between helm and kustomize artifacts.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The chart default now matches upstream config/webhook/service.yaml: 443 -> 9443.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 11, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 11, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants