Skip to content

discover: self-contained NIC daemon bootstrap, no Network Operator needed#67

Merged
almaslennikov merged 1 commit into
mainfrom
nicconfig-daemon-discover
May 24, 2026
Merged

discover: self-contained NIC daemon bootstrap, no Network Operator needed#67
almaslennikov merged 1 commit into
mainfrom
nicconfig-daemon-discover

Conversation

@almaslennikov
Copy link
Copy Markdown
Collaborator

l8k discover used to require a pre-installed Network Operator: it deployed a thin NicClusterPolicy with a NicConfigurationOperator section to spin up nic-configuration-daemon pods, then read NicDevice CRs from the NO namespace. That made discovery downstream of the full NO Helm install even though discovery itself needs nothing from OFED, device plugins, networks, or IPAM.

Discover now bootstraps just the NIC configuration daemon (and the 5 nic-configuration-operator CRDs, only if missing) into a private namespace nvidia-k8s-launch-kit, lets the daemon publish NicDevice CRs there, runs the existing build/probe pipeline against that namespace, and tears the namespace down at exit. l8k generate and l8k deploy are unchanged.

New package pkg/nicconfigdaemon (Ensure + Cleanup + vendored CRDs + daemon template) drives the bootstrap. The renamed SA / ClusterRole / ClusterRoleBinding (k8s-launch-kit-nic-config-daemon) avoid cluster-scoped name collisions with a coexisting NO install. An empty nic-configuration-operator-supported-nic-firmware ConfigMap is seeded so the daemon's startup Get on it succeeds without the upstream Helm chart; discovery doesn't drive firmware upgrades so the data map can stay empty.

New flag --keep-namespace skips the teardown defer for debugging. The legacy --network-operator-namespace flag is now a no-op for discover (accepted to avoid breaking scripts).

CRDs are vendored under pkg/nicconfigdaemon/assets/crds and sync'd from the go.mod-pinned nic-configuration-operator module via a new manual make sync-nic-config-crds target; never wired into the default build so library consumers can compile without code-gen prerequisites.

…eded

l8k discover used to require a pre-installed Network Operator: it deployed
a thin NicClusterPolicy with a NicConfigurationOperator section to spin up
nic-configuration-daemon pods, then read NicDevice CRs from the NO
namespace. That made discovery downstream of the full NO Helm install even
though discovery itself needs nothing from OFED, device plugins, networks,
or IPAM.

Discover now bootstraps just the NIC configuration daemon (and the 5
nic-configuration-operator CRDs, only if missing) into a private namespace
nvidia-k8s-launch-kit, lets the daemon publish NicDevice CRs there, runs
the existing build/probe pipeline against that namespace, and tears the
namespace down at exit. l8k generate and l8k deploy are unchanged.

New package pkg/nicconfigdaemon (Ensure + Cleanup + vendored CRDs + daemon
template) drives the bootstrap. The renamed SA / ClusterRole /
ClusterRoleBinding (k8s-launch-kit-nic-config-daemon) avoid cluster-scoped
name collisions with a coexisting NO install. An empty
nic-configuration-operator-supported-nic-firmware ConfigMap is seeded so
the daemon's startup Get on it succeeds without the upstream Helm chart;
discovery doesn't drive firmware upgrades so the data map can stay empty.

New flag --keep-namespace skips the teardown defer for debugging. The
legacy --network-operator-namespace flag is now a no-op for discover
(accepted to avoid breaking scripts).

CRDs are vendored under pkg/nicconfigdaemon/assets/crds and sync'd from
the go.mod-pinned nic-configuration-operator module via a new manual
`make sync-nic-config-crds` target; never wired into the default build
so library consumers can compile without code-gen prerequisites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@almaslennikov almaslennikov merged commit 91d3952 into main May 24, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant