discover: self-contained NIC daemon bootstrap, no Network Operator needed#67
Merged
Conversation
…eded l8k discover used to require a pre-installed Network Operator: it deployed a thin NicClusterPolicy with a NicConfigurationOperator section to spin up nic-configuration-daemon pods, then read NicDevice CRs from the NO namespace. That made discovery downstream of the full NO Helm install even though discovery itself needs nothing from OFED, device plugins, networks, or IPAM. Discover now bootstraps just the NIC configuration daemon (and the 5 nic-configuration-operator CRDs, only if missing) into a private namespace nvidia-k8s-launch-kit, lets the daemon publish NicDevice CRs there, runs the existing build/probe pipeline against that namespace, and tears the namespace down at exit. l8k generate and l8k deploy are unchanged. New package pkg/nicconfigdaemon (Ensure + Cleanup + vendored CRDs + daemon template) drives the bootstrap. The renamed SA / ClusterRole / ClusterRoleBinding (k8s-launch-kit-nic-config-daemon) avoid cluster-scoped name collisions with a coexisting NO install. An empty nic-configuration-operator-supported-nic-firmware ConfigMap is seeded so the daemon's startup Get on it succeeds without the upstream Helm chart; discovery doesn't drive firmware upgrades so the data map can stay empty. New flag --keep-namespace skips the teardown defer for debugging. The legacy --network-operator-namespace flag is now a no-op for discover (accepted to avoid breaking scripts). CRDs are vendored under pkg/nicconfigdaemon/assets/crds and sync'd from the go.mod-pinned nic-configuration-operator module via a new manual `make sync-nic-config-crds` target; never wired into the default build so library consumers can compile without code-gen prerequisites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
l8k discover used to require a pre-installed Network Operator: it deployed a thin NicClusterPolicy with a NicConfigurationOperator section to spin up nic-configuration-daemon pods, then read NicDevice CRs from the NO namespace. That made discovery downstream of the full NO Helm install even though discovery itself needs nothing from OFED, device plugins, networks, or IPAM.
Discover now bootstraps just the NIC configuration daemon (and the 5 nic-configuration-operator CRDs, only if missing) into a private namespace nvidia-k8s-launch-kit, lets the daemon publish NicDevice CRs there, runs the existing build/probe pipeline against that namespace, and tears the namespace down at exit. l8k generate and l8k deploy are unchanged.
New package pkg/nicconfigdaemon (Ensure + Cleanup + vendored CRDs + daemon template) drives the bootstrap. The renamed SA / ClusterRole / ClusterRoleBinding (k8s-launch-kit-nic-config-daemon) avoid cluster-scoped name collisions with a coexisting NO install. An empty nic-configuration-operator-supported-nic-firmware ConfigMap is seeded so the daemon's startup Get on it succeeds without the upstream Helm chart; discovery doesn't drive firmware upgrades so the data map can stay empty.
New flag --keep-namespace skips the teardown defer for debugging. The legacy --network-operator-namespace flag is now a no-op for discover (accepted to avoid breaking scripts).
CRDs are vendored under pkg/nicconfigdaemon/assets/crds and sync'd from the go.mod-pinned nic-configuration-operator module via a new manual
make sync-nic-config-crdstarget; never wired into the default build so library consumers can compile without code-gen prerequisites.