Validation improvements#66
Merged
Merged
Conversation
Two related additions on the validate side:
(1) pkg/presetmatch/ — new package that wraps presets.LoadPreset +
presets.ValidatePreset into a single MatchGroup / MatchAll call.
Result carries Status (match / deviation / not-found / skipped),
the matched preset name, and the deviation slice. Discovery now
delegates its preset lookup through this package (LoadPreset +
ApplyPreset stay in pkg/presets — only the lookup-vs-spec logic
moved). l8k validate calls MatchAll over the cluster-config.yaml's
groups at validate-time so cluster drift is re-evaluated even when
the user didn't re-run discover.
Preset deviations stay informational — they don't gate the exit
code. They surface as a sub-table in the HTML report's "Node
groups" section and as a "Preset match" block in text output.
(2) PASS/FAIL banner at the top of both text output and the HTML
report. computeOverallVerdict folds every gating check (Helm
release version, component versions, manifest verdict, connectivity
matrix) into an OverallVerdict{Pass, Reasons, Notes}. Pass=false
when any gate fails — same condition that drives the exit code.
Notes carry informational items that don't gate (preset deviations,
in-progress manifests when --wait wasn't used, matrix soft-skip);
those don't downgrade PASS but the operator sees them.
Banner format:
Text: a wide ═-bordered box at the top of the report with
a bulleted list of failure reasons (Reasons) and an
ⓘ-prefixed list of notes (Notes). Green/red text via
the existing color helpers.
HTML: full-width div at the top of <main>, color-coded
green (.verdict-pass) or red (.verdict-fail), with
the same Reasons / Notes lists rendered as <ul>.
exitWithReport (the early-error wrapper) also emits the banner +
report, with a pessimistic FAIL verdict carrying the error message
as the only Reason. Operators on an early failure get the report
file AND the in-terminal banner explaining what went wrong.
Golden fixture rewritten to populate Verdict.Pass=false +
PresetMatches with one StatusDeviation entry so both the banner
and the new preset sub-table are exercised on every regression run.
Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
Offline-safe pieces only — data model, parsers, planner, unit tests.
The RunMatrix orchestrator wiring + per-test SPDY exec runs land in
a follow-up once the test cluster is back. Anything we commit here
is exercised by the existing test suite without needing kubectl.
Data model:
- PingTestKind enum extended: RDMAPingSameRail / RDMAPingCrossRail
(rping) and RDMABwSameRail / RDMABwCrossRail (ib_write_bw). Kind
predicates (IsICMP, IsRDMAPing, IsRDMABw, IsCrossRail) so the
renderer + orchestrator can branch by family without hard-coding.
- PingTest gained SrcRDMADev / DstRDMADev — populated for RDMA test
kinds; empty for ICMP.
- PingResult gained BandwidthGbps + MsgRateMpps — populated by
parseIbWriteBwOutput. PacketLoss stays at -1 ("n/a") for RDMA tests.
- TestPod gained RDMADevsByRail — map[rail]→<mlx5_X> populated by
DiscoverRDMADevices.
New rdma.go:
- DiscoverRDMADevices(ctx, pod, ifaceByRail) — single shell exec per
pod that reads /sys/class/net/<iface>/device/infiniband/ for each
multus secondary network interface. Returns map[rail]→<rdmaDev>.
- RunRPing — spawns `rping -s -a <ip> -p 9999` as a background
process on the server pod, sleeps the settle window, runs `rping
-c -a <ip> -p 9999 -C N` on the client pod, then kills the
server. rping returns 0 only on a clean handshake — binary
pass/fail is all we surface (no useful numeric output).
- RunIbWriteBw — same lifecycle pattern. Uses `-d <dev> -R -s 65536
--report_gbits` on both sides (matches the example in
network-operator-docs/quick-start/sriov-network-rdma.rst).
parseIbWriteBwOutput reads the single summary row, surfaces peak
Gbps + MsgRate.
- killRDMAServer — best-effort PID + pkill -f cleanup with a
fresh background context so cleanup runs even when the test
context already deadlined.
New planner entry point:
- RDMAPlan(pods, *plan) extends an existing ICMP plan with rping
+ ib_write_bw test variants for the same pod set. Pairs without
RDMA devices on both sides (DiscoverRDMADevices couldn't resolve
one of them) are silently dropped — no `-d <dev>` to pass.
Server/client direction per the user-chosen "both directions like
ping" shape: every ordered pod pair gets its own pairing.
Tests:
- TestParseIbWriteBwOutput — drives the regex against a realistic
multi-line client output (header banner, address block, summary
row, trailing banner), plus malformed / no-summary / sub-Gbps
cases.
- TestRDMAPlan — covers the same-rail count, cross-rail canary,
missing-device-skip, and ICMP-skip-preserved cases. Includes the
subtle asymmetry: when one pod lacks the rail-1 device, the
cross-rail canary still emits for (other-pod.rail-0 →
one-pod.rail-1) but not the reverse.
- TestPingTestKind_Predicates — guards the new IsX() helpers
against future Kind enum changes.
What's still missing (cluster work):
- RunMatrix needs to call DiscoverRDMADevices, RDMAPlan, then
iterate RDMASameRail / RDMABwSameRail / cross variants
sequentially (per "Spawn fresh server per test" decision).
- text_report.go + report.html.tmpl need per-kind matrix grids so
the bandwidth numbers actually surface in the report.
Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
The fabric probe was rejecting any port that wasn't ACTIVE (and for IB additionally requiring sm_lid != 0). That broke discovery on freshly provisioned clusters where the switch isn't plugged in yet — `l8k generate` then errored with "no fabric resolved" even on hardware whose configured fabric was obvious from sysfs. Simplified rule: just read `/sys/class/infiniband/<dev>/ports/<port>/link_layer` and normalise. "Ethernet" → Ethernet, "InfiniBand" → InfiniBand, anything else (or unreadable file) → no verdict. The /sys-then-/host/sys fallback path from the earlier commit (cf1835b) is preserved. Dropped: - state/phys_state reads (we no longer care about runtime link state) - sm_lid read + smLidIsNonZero helper (IB subnet manager presence was only used to corroborate an ACTIVE+IB state) - The "first-path clean read but unconfirmed → don't retry" short-circuit (no clean-but-unconfirmed case exists any more — either link_layer is recognised or it isn't) - strconv import (only smLidIsNonZero used it) Tradeoff: an operator who reflashes a card to a different fabric without re-running discover would carry the old link_layer in cluster-config.yaml until the next discover. Acceptable in exchange for the "discover before the cluster is wired up" win — that scenario was the dominant source of "no fabric resolved" errors per recent user reports. Tests rewritten for the new semantics: previously-asserted "DOWN port → no verdict" now asserts "DOWN port still resolves by link_layer". The Ethernet / InfiniBand / empty / unrecognised / whitespace-trimming cases continue to hold. Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
The post-apply RV gate (commit c998fa2) was hanging the deploy state machine on SriovNetworkNodePolicy: SriovNetworkNodePolicy/ethernet-sriov-rail-0 in network-operator [3/6]: waiting for controller to observe new spec (3m11s) even though `kubectl get sriovnetworknodestate -n network-operator` showed both nodes at `Succeeded`. Root cause: the SR-IOV operator treats SriovNetworkNodePolicy as **config-only** — it never writes status back onto the policy CR. The operator's reconcile output lives on per-node SriovNetworkNodeState objects, which the validator already reads. The policy's resourceVersion stays pinned at the apply-time value forever, and the gate sits there waiting for a write that will never land. Same trap applies to NicInterfaceNameTemplate / NicConfigurationTemplate — their validators read companion NicDevice CRs, not the template itself. Fix: gate only the Kinds whose validators read .status from the CR itself. New crstate.NeedsObservationGate(gvk) returns true for: - mellanox.com/v1alpha1 NicClusterPolicy + NicNodePolicy (read .status.state on the policy) - mellanox.com/v1alpha1 HostDeviceNetwork / IPoIBNetwork / MacvlanNetwork (same status-string validator) - spectrumx.nvidia.com/v1alpha2 SpectrumXRailPoolConfig (reads .status.syncStatus on the CR) Returns false for everything else (SR-IOV node policies, nic-configuration-operator templates, existence-only Kinds like IPPool / SriovNetwork). Those Kinds' validators carry their own staleness handling — the SR-IOV soft-progress bucketing (commit aa48c78) covers "Succeeded but numVfs not at target yet" without needing an RV gate on the policy itself. applyAndWait and the phase-3 batched-apply loop both consult the helper before setting awaitObservationAfterRV, so the gate is a no-op for Kinds that would deadlock on it. Tests: a 12-case TestNeedsObservationGate covers each gated Kind, each companion-CR Kind, the existence-only Kinds, and the wrong- group / wrong-version negatives. Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
Three pieces of the rping + ib_write_bw work that don't need the
cluster to be reachable, finished offline so cluster iteration only
verifies real exec output.
1. RunMatrix orchestrator: after the ICMP stage builds its
PingResults, build the set of (srcPod, dstPod, rail) triples that
PASSED ICMP. Call RDMAPlan; iterate rping then ib_write_bw
sequentially (per the "Spawn fresh server per test" lifecycle
decision — concurrent runs would fight over the listener port).
Skip same-rail RDMA tests on pairs that failed ICMP; cross-rail
canaries always run (they probe a different path than ICMP).
buildTestPod now also returns the rail→multus-interface mapping
so DiscoverRDMADevices can populate TestPod.RDMADevsByRail —
each pod gets a single in-pod shell exec reading
/sys/class/net/<iface>/device/infiniband/ per multus iface.
2. Text renderer (RenderMatrixText): bucket results by (rail × kind
family) rather than just by rail. One grid per (rail, family);
families are ICMP / rping / ib_write_bw. Cells:
- ICMP — "✓ 0% 0.1ms" / "✗ 100%" / "✗ ERR"
- rping — "✓" / "✗"
- ib_write_bw — "✓ 194.4 Gbps" / "✗ ERR"
Cross-rail canary list is also per-family. Section headers
include the family title so the layout is "Rail rail-0 — ICMP
ping:", "Rail rail-0 — RDMA ping (rping):", etc.
3. HTML report template + helpers: same per-family layout. New
crossRailSection type so the template iterates one cross-rail
list per family. cellText template helper now takes a family
arg; cellClass stays family-agnostic (pass/fail/missing).
matrixByRail + matrixCrossRail rewritten to bucket by family
and emit a stable ordering (ICMP → rping → ib_write_bw within
each rail).
Tests:
- TestCellFor extended with rping ✓/✗ and ib_write_bw bandwidth
cases (including the defensive "OK=true, BandwidthGbps=0 → ERR"
branch).
- TestKindFamilyOf locks in the PingTestKind → kindFamily mapping.
- TestRenderMatrixText updated for the new section header format.
- Golden HTML regenerated.
What's still cluster-bound: a real run of `l8k validate
--connectivity` against the user's cluster to verify the
ib_write_bw output regex matches the actual perftest version
deployed (sandbox network policy is currently blocking the
cluster API server; user needs to allow the IP via `sbx policy
allow network -g <ip>` before I can do the verification run).
Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
…t-host The old mellanox/rping-test:latest image ships rping + ib_write_bw + ibv_devices, but not iputils — so any ICMP exec inside the pod fails with `exec: "ping": executable file not found in $PATH`. With the ICMP stage now gone from the connectivity matrix this no longer matters for the matrix proper, but the DOCA full-rt-host image is a more representative test fixture: it ships the same RDMA tools plus a full RoCE toolchain (rdma-core, perftest, iputils), so operators who shell into the pods for manual debugging have a richer environment. Spectrum-X profiles (spectrum-x, spectrum-x-ra2.1) additionally need an explicit `command: ["/bin/bash", "-c", "sleep infinity"]` because the DOCA image's default entrypoint exits immediately, unlike mellanox/rping-test's. The other profiles already had this command and were unaffected. Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
version + platform-topology checks
Three coordinated changes to the `l8k validate` UX. Each section
below is independent, but they share enough surface area in
validate.go + the connectivity package that they're bundled into
one commit.
1. RDMA-only matrix (drop ICMP)
The ICMP stage in the matrix existed as a gating signal for the
subsequent RDMA stages — but ICMP working tells you nothing
about whether the RDMA-CM path is wired up correctly (different
netdev → device mapping, GID table state, etc.). Running rping
+ ib_write_bw directly is both more diagnostic and the only
thing operators actually care about for an east-west deployment.
- pkg/networkoperatorplugin/connectivity/ping.go deleted; RunPing
gone, pingSummaryRe / pingRTTRe gone.
- PingResult moved to result.go and shrunk: PacketLoss + RTTAvgMs
dropped, ib_write_bw fields (BandwidthGbps, MsgRateMpps)
stay.
- PingTestKind collapsed from 6 → 4 values (RDMAPingSameRail,
RDMAPingCrossRail, RDMABwSameRail, RDMABwCrossRail). IsICMP /
PingSameRail / PingCrossRail removed.
- Plan() now emits RDMA tests directly (no more RDMAPlan as a
second pass). MatrixPlan exposes only the four RDMA slices.
- RunMatrix runs two stages sequentially per the "spawn fresh
server per test" lifecycle — no more concurrent ICMP fan-out,
no more passedSameRail ICMP gate. Same-rail RDMA tests now
run unconditionally; cross-rail canaries continue to always
run.
- Renderers (text + HTML) collapse to two families (rping →
ib_write_bw) and drop the ICMP grid + the Summary.ExecErrors
count.
- Options.PingCount + Options.MaxConcurrentPings removed;
`--ping-count` flag deleted from validate. `--connectivity-
timeout` help text updated to mention rping + ib_write_bw.
2. Report file + flag renames
- <deployment-files>/verify-report.html → <deployment-files>/
k8s-launch-kit-validation-report.html (override via
--report-path still works).
- `l8k deploy --verify` → `l8k deploy --validate` (deployVerify
→ deployValidate, error string "deploy succeeded but --verify
failed" → "… --validate failed").
- testdata/verify-report.golden.html → testdata/k8s-launch-kit-
validation-report.golden.html.
- template.New("verify-report") → template.New("validation-
report") (internal; doesn't surface anywhere).
- README updated for both renames.
3. Non-blocking version + topology checks; "Platform topology
validation" UX rename
Previously version-mismatch and component-mismatch short-
circuited the validate flow before connectivity ran. A stale
Helm release or hardware drift from the catalog topology
doesn't actually mean the data plane can't be tested — the
cluster is still up, the operator wants to know whether RDMA
works. The flow now runs the matrix even on a mismatch and
only fails the final exit code once everything has run.
- validate.go switch on verdict.HasError || verdict.HasMissing
keeps short-circuit semantics (manifest-broken). VersionOK
and componentMismatch no longer block the matrix; final
exit-code check unifies all gating signals (matrix Failed,
version mismatch, component mismatch, platform-topology
deviation) and exits ExitDeployment on any.
- computeOverallVerdict treats platform-topology deviation as a
fail reason (was: informational note), so PASS/FAIL banner
reflects topology drift.
- User-facing "preset" terminology renamed end-to-end:
Section heading: "Preset match (cluster groups vs. topology
preset catalog)" → "Platform topology validation"
Status labels: MATCH / DEVIATION / NO PRESET / SKIPPED →
MATCH / MISMATCH / NOT CERTIFIED / SKIPPED
Per-row detail: "preset 'X' matches" → "matches certified
topology for X server type"
Banner reason: "The detected platform topology does not
match the certified topology for <server-type> server
type: <field deltas>"
- Internal package names (presetmatch, struct fields like
PresetMatches / PresetDeviation / --for <preset> / `l8k
preset list,update`) intentionally unchanged — those are
code/admin surfaces, not validation output.
Tests + golden file updated; go build ./... and go test ./...
green across all 12 packages.
Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
The flag was renamed from --verify to --validate in the prior commit, but "--validate" overloads the `l8k validate` subcommand name and sets the wrong expectation — `l8k deploy --validate` does NOT run the full validate pipeline (Helm release check, manifest state classification, platform-topology check), it only chains the connectivity matrix. --test-connectivity names what the flag actually does and leaves "validate" to mean the subcommand. No behavior change; deployValidate → deployTestConnectivity in the flag var, help text and error string updated accordingly. Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
…cted PF tables
The previous one-table layout tried to cram the diff inline ("1021
certified topology: 1018"), with unexpected/missing rows getting
strikethrough + tag chrome. The result was visually noisy and hard
to scan — operators couldn't tell what the certified topology
actually said for a given PCI without parsing badges.
This commit splits each PF group's rendering into two parallel
sub-tables:
East-west PFs — Actual (3 certified topology expects 4)
PCI Device ID Rail Netdev ...
0000:08:00.0 1021 0 eth_r0 ... ← tinted (deviceID drift)
0000:08:00.1 1023 1 eth_r1 ...
0000:08:00.2 1023 2 eth_r2 ... ← tinted (not in cert. topo)
East-west PFs — Expected by certified topology (3)
PCI Device ID Rail Netdev ...
0000:08:00.0 1023 0 eth_r0 ... ← tinted (drift)
0000:08:00.1 1023 1 eth_r1 ...
0000:08:00.3 1023 3 eth_r3 ... ← tinted (cluster lacks it)
Rows that diverge from their counterpart in the other table are
tinted amber with a faint amber top/bottom border in both tables.
The reader scans down each side and the divergences line up.
Implementation:
- PFInfo: dropped ExpectedDeviceID + Unexpected + Missing fields,
replaced by a single Mismatched bool. NodeGroupInfo gained
ExpectedEastWestPFs + ExpectedNorthSouthPFs slices carrying the
certified topology's PFs.
- presetmatch.Result gained Preset *presets.Topology (and a
Manufacturer string), populated from the loaded preset on a
match. validate.go uses Result.Preset to project preset PFs into
the Expected sub-tables and to flag Mismatched rows in each.
- platformLabel prepends Manufacturer when set, so the banner
reads "Test-Manufacturer-Standard-PC-(Q35-+-ICH9,-2009)-Test-GPU"
rather than "Standard-PC-(Q35-+-ICH9,-2009)/Test-GPU".
- report.html.tmpl renders the four sub-tables (Actual+Expected
for east-west and north-south) and uses a single pf-mismatched
row class for the tint. The pf-tag / pf-missing / pf-unexpected
/ strikethrough / italic styles + the mismatch-inline cell text
are gone — the row tint is the only diff signal.
- The fallback path (no live preset, e.g. validate run against
stored cluster-config deviations) still flags Actual rows whose
PCI appears in any stored deviation, but the Expected tables
stay empty in that case.
Test fixture updated: one drift row + one extra row + one missing
row exercise both tables and the Mismatched flagging on each.
Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.