Skip to content

Pr/daniel noland/build system#1357

Draft
daniel-noland wants to merge 34 commits intomainfrom
pr/daniel-noland/build-system
Draft

Pr/daniel noland/build system#1357
daniel-noland wants to merge 34 commits intomainfrom
pr/daniel-noland/build-system

Conversation

@daniel-noland
Copy link
Collaborator

No description provided.

@daniel-noland daniel-noland force-pushed the pr/daniel-noland/build-system branch 2 times, most recently from 0b036e9 to f2f4c64 Compare March 19, 2026 04:46
daniel-noland and others added 4 commits March 18, 2026 22:54
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/build-system branch from f2f4c64 to 05bf11c Compare March 19, 2026 05:05
daniel-noland and others added 23 commits March 18, 2026 23:30
Move --as-needed and --gc-sections from performance-only link flags to common
RUSTFLAGS so they apply to debug builds as well.  Note that FRR builds do not
use RUSTFLAGS, so this change only affects Rust compilation.

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Add fuzz as a profile alias for release using rec to enable self-referencing
within profile-map.  This allows `nix build ... --argstr profile fuzz` to
produce the same output as release, providing a named entry point for future
fuzz-specific configuration.

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Enable Intel Control-flow Enforcement Technology (CET) shadow stack and
indirect branch tracking via -fcf-protection=full (CFLAGS) and
-Zcf-protection=full (RUSTFLAGS).  These flags were previously commented
out pending testing.

If this causes issues on hardware without CET support, revert this commit.

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Rework the llvm overlay to use the rust-overlay for toolchain management
instead of reading from rust-toolchain.toml. Switch from llvmPackages to
llvmPackages' (version-matched to rustc's LLVM), add rustPlatform'-dev for
dev tooling, use final instead of prev where appropriate, and remove the
redundant separateDebugInfo setting.

Also adds the rust-overlay to the overlay registry and removes unused
explicit parameters from the overlay entry point since individual overlays
destructure what they need from inputs.

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Remove the build-params default argument from the dpdk package in favor of
using platform.name directly and hardcoding buildtype/lto settings which are
always the same for our use case. Reorder and deduplicate meson flags, remove
the unused -Ddebug=false flag, and fix unnecessary nix string interpolation
in the cross-file argument.

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Update both overlays to use llvmPackages' (version-matched LLVM) instead of
the unversioned llvmPackages.

dataplane-dev: Add optimized gdb' package with LTO, static linking, and
minimal features for container-friendly debugging.

dataplane: Pass platform and profile through to dpdk, remove unnecessary
output entries from libmd (man, dev), drop unused ethtool/iproute2 overrides
from rdma-core, fix llvmPackages->llvmPackages' for libunwind, fix
libX11->libx11 case in hwloc, and fix perftest callPackage argument passing.

Also registers the frr overlay in the overlay entry point (forward
declaration; frr.nix is introduced in the next commit).

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Rework the core build machinery in default.nix:
- Add tag parameter for container/version tagging
- Add fuzz to cargo-profile map
- Add frr-pkgs import with FRR overlay
- Add comments explaining libc fully-qualified paths in sysroot
- Add skopeo to devroot for container operations
- Rework devenv from shellHook to structured env attributes
- Add jsonFilter for source filtering
- Simplify cargo-cmd-prefix (unconditional build-std-features)
- Remove sanitizer-conditional RUSTFLAGS block
- Add VERSION env var from tag parameter
- Rename package-builder to workspace-builder
- Rework test-builder to support building all tests at once
- Update crane config (removeReferencesToRustToolchain/VendorDir)
- Add --as-needed,--gc-sections to RUSTFLAGS in invoke

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Switch the linker driver from clang to clang++ so that C++ standard library
and exception handling runtime are linked correctly.  This matters for any
transitive C++ dependencies (e.g. DPDK PMDs, hwloc).

If this causes unexpected linking issues, revert this commit to restore the
plain C driver.

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Add docs-builder helper and docs output that runs `cargo doc` through the
nix build system with -D warnings.  Supports building docs for individual
packages or the entire workspace.

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Rework the dataplane tar to use busybox (providing a shell and coreutils
in-container), symlinks instead of copies for binaries, and additional
security hardening:
- Add /home and /tmp directories
- Use symlinks to nix store paths instead of copying binaries
- Install busybox for minimal shell access
- Change tar permissions to ugo-sw (no write, no setuid/setgid)
- Add dontPatchShebangs, dontFixup, dontPatchElf
- Include workspace.dataplane, workspace.init, workspace.cli, busybox
  and glibc.libgcc unconditionally in the tar
- Rename attribute from dataplane-tar to dataplane.tar

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Add container image definitions using nixpkgs dockerTools:
- containers.dataplane: production image with busybox, cli, init
- containers.dataplane-debugger: debug image with gdb, rr, libc debug symbols
- containers.frr.dataplane: FRR with dplane-plugin, dplane-rpc, frr-agent
- containers.frr.host: FRR host variant with fakeNss

The FRR containers include fakeRootCommands for /run/frr directory setup
and use tini as the entrypoint.

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Call rte_lcore_id() directly instead of the _w() wrapper variant which
has been removed upstream.

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Rework build.rs scripts across the workspace to use the nix build environment.
The k8s-intf build.rs now invokes kopium at build time against a nix-provided
CRD file instead of downloading CRDs via ureq. Remove build.rs from cli and
sysfs (no longer needed). Simplify dpdk-sysroot-helper to read DATAPLANE_SYSROOT
from the environment. Update Cargo.toml build-dependencies to match.

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Add feature gates to the dataplane and init crates so DPDK and dpdk-sysroot-helper are optional dependencies. The dataplane crate gets a dpdk feature (default on) and the corresponding cfg(feature = dpdk) gate on the DPDK driver module. The init crate gets a sysroot feature (default on). This allows building without a DPDK sysroot for development and testing scenarios.

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Update mgmt tests to use the new vm-based test runner: replace the old fixin::wrap(with_caps(...)) capability-escalation pattern with #[n_vm::in_vm] annotations, remove unused imports, and disable test_sample_config pending vm runner integration. Add required dev-dependencies (n-vm, tracing-subscriber to mgmt; tokio with full features to routing). Deduplicate tokio feature flags in routing.

Co-authored-by: Manish Vachharajani <manish@githedgehog.com>

Co-authored-by: Claude <noreply@anthropic.com>
daniel-noland and others added 7 commits March 18, 2026 23:41
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Manish Vachharajani <manish@githedgehog.com>
Co-authored-by: Claude <noreply@anthropic.com>
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/build-system branch from 05bf11c to 55052c4 Compare March 19, 2026 05:42
Copy link
Member

@Frostman Frostman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't really digging into the code changes, mainly trying it and looking at the DX. In general, build & test works fine.

  1. cargo build doesn't work in nix-shell, you should set some default VERSION env var in the shell, maybe just (devel) or smth, main problem is that this is exactly the version that's going to be set when you do push so it should be working fine, alternatively you can wrap cargo with just to inject properly calculated version (which is IMO better idea)
  2. just push is missing
  3. matrix approach is VERY inefficient, need to decrease usage to max 8 runners in parallel
  4. we shouldn't be building and pushing all the debug/sanitize by default in PRs

# Build and push the dataplane container
[script]
compile-env *args:
push-container target="dataplane" *args: (build-container target args) && version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have just push (like in other repos) that will push all artifacts used in a product (no debug images) and use it in CI when we're publishing a release or pushing artifacts for the VLAB. Main purpose is to have standard way between repos to push all artifacts needed by product. Additionally, it should depend on build/build-container so push would always build and push everything needed.

@@ -0,0 +1,116 @@
#!/usr/bin/env bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not a change request, just opinion) IMO devs should run Zot separately (and all the time) as it's a cache and useful to just push images to it, and it has some gc/cleanup, etc. VLAB is though just a few commands, and IMO doesn't deserve a script that'll potentially break from time to time due to e.g. changed flags

packages: "write"
id-token: "write"
env:
CACHE_REGISTRY: "run.h.hhdev.io:30000"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That env var doesn't make sense anymore, so please delete it or sync changes merged in master from my PR #1349

USER: "runner"
strategy:
fail-fast: false
matrix:
Copy link
Member

@Frostman Frostman Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matrix approach is VERY inefficient and consumes too many runners, could you please compact it to be under 8 at least? Or not run most of them by default on PRs as right now you're utilizing all lab runners we have available just in one workflow. You can run them for longer, it's not a problem, but the total number used in parallel should be smaller. Additionally, building and publishing debug and sanitizer artifacts all the time is a waste, we shouldn't be doing that in such amount.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants