Audience: platform maintainers updating an upstream dependency the platform ships with itself.
Prerequisites: clean working tree on develop; ability to run make in agent/; for ARM-only items, access to a Pi 4 or QEMU-aarch64 for boot smoke.
Runtime: 15–60 min depending on item + smoke depth.
Why this exists: per feedback_no_host_dependencies.md the platform ships the external binaries it depends on instead of apt install-ing them on target nodes. Each pinned item needs a clear, reviewable bump path. This runbook is that path.
| Item | Where pinned | What ships | Bump command |
|---|---|---|---|
| Traefik | agent/Makefile:64 → TRAEFIK_VERSION |
dist/reverse-proxy-traefik-linux-{amd64,arm64} |
make -C agent bump-traefik NEW=vX.Y.Z |
| rpi4-firmware | templates/example-modules/rpi4-firmware/manifest.yaml → build.firmware_ref |
/boot/firmware/* blobs in the rpi4 disk image |
manual edit + smoke (see below) |
| dracut | initramfs/.gitea/workflows/build.yaml:53 → APT_SNAPSHOT |
initramfs builder version (build-host only) | manual edit + reproducibility re-run |
| Kernel | initramfs/build.sh → host's /boot/vmlinuz-* selected via APT_SNAPSHOT pin |
kernel image bundled into the initramfs artifact | manual edit + boot smoke |
| k3s (Phase 3, deferred) | modules/powernode-k3s-binary/manifest.yaml → build.k3s_version |
/usr/local/bin/k3s on cluster nodes |
playbook to be added when the module ships |
For every bump, regardless of item:
- Identify the upstream release. Read the upstream's CHANGELOG / release notes. Confirm:
- No breaking changes that affect the platform's usage (Traefik config flags, kernel ABI, etc.)
- Security advisories addressed (worth bumping for)
- Asset names match the bump tooling's expectations (e.g.
traefik_<ver>_linux_<arch>.tar.gz)
- Verify upstream integrity. Where the upstream publishes signatures (cosign, gpg), validate them before consuming. Bump tooling does basic asset-presence checks but doesn't replace cryptographic verification.
- Edit the pin. One pin per bump. Never bundle a Traefik bump with a kernel bump in the same commit — they have different smoke surfaces.
- Smoke test. Per-item smoke below.
- Commit + open PR. Subject line:
chore(<item>): bump <item> <old> → <new>. Body: changelog summary + smoke evidence + rollback plan. - Tag a release if the bump is platform-critical (security CVE, breaking upstream removal).
make -C extensions/system/agent bump-traefik NEW=v3.3.5The target:
- Verifies
vX.Y.Zexists atgithub.com/traefik/traefik/releases - Verifies both
linux/amd64andlinux/arm64assets are present - Rewrites
TRAEFIK_VERSIONinagent/Makefileviased -i.bakthen removes the backup - Runs
make clean-traefik+make vendor-traefikto download + install - Prints next-step verification commands
Smoke (manual):
file extensions/system/agent/dist/reverse-proxy-traefik-linux-amd64
# Expect: ELF 64-bit LSB executable, x86-64, statically linked
extensions/system/agent/dist/reverse-proxy-traefik-linux-amd64 version
# Expect: the new version string
# End-to-end: deploy reverse-proxy-traefik module to a NodeInstance and
# verify the traefik service starts + serves :80 / :443. See
# extensions/system/modules/reverse-proxy-traefik/manifest.yaml for the
# service definition.Rollback:
make -C extensions/system/agent bump-traefik NEW=<previous-version>
# OR: git revert the bump commit + make clean-traefik && make vendor-traefikFailure modes:
| Symptom | Diagnosis | Fix |
|---|---|---|
FATAL: upstream release vX.Y.Z not found |
Wrong tag (typo, missing v prefix, doesn't exist yet) |
Re-check github.com/traefik/traefik/releases |
FATAL: linux/<arch> asset missing |
Upstream changed asset naming | Inspect checksums.txt URL manually; update bump-traefik target if naming convention changed |
vendor-traefik-amd64 fails after bump |
Tarball layout changed (no top-level traefik binary) |
Examine extracted contents; update vendor-traefik-<arch> recipes |
The firmware blobs come from github.com/raspberrypi/firmware at a pinned commit. The repo is huge (5+ GB); the build pulls just the boot files via raw GitHub URLs.
Procedure:
- Pick the new ref:
curl -sSL https://api.github.com/repos/raspberrypi/firmware/tags | jq -r '.[].name' | head
- Edit
extensions/system/templates/example-modules/rpi4-firmware/manifest.yaml:build: firmware_ref: "<new-ref>" # was "1.20240306"
- Rebuild the rpi4 disk image:
cd extensions/system/initramfs POWERNODE_OFFLINE_DEV=1 ./build.sh --arch arm64 --variants kernel-initrd,disk-image-rpi4 - Boot smoke (one of):
- Real Pi 4: flash
build/arm64/disk-image-rpi4/powernode-rpi4.imgto SD card, boot, verify SSH access + agent heartbeat - QEMU aarch64: use the bare-metal physical-device claim smoke (
smoke_test_bare_metal_claim.rb) — see SMOKE_TEST.md §"Pass 8 — Hardware / CI extras"
- Real Pi 4: flash
Why bump?
- New Pi 4 silicon revisions ship requiring newer GPU firmware
- Security fix in the bootloader
- New kernel requires newer GPU bridge
Rollback: revert the manifest commit. Old Pi 4 silicon usually stays compatible with older firmware.
dracut runs only on the build host (CI runner), never on a target node. Pinning happens via APT snapshot timestamp.
Procedure:
- Find the snapshot timestamp at https://snapshot.ubuntu.com/ (or your mirror)
- Edit
extensions/system/initramfs/.gitea/workflows/build.yaml:53:APT_SNAPSHOT: ${{ inputs.apt_snapshot || '<new-YYYYMMDDTHHMMSSZ>' }}
- Trigger the M3 reproducibility gate via
workflow_dispatchwith bothbase_image_digest+apt_snapshotinputs; gate must pass (two builds, byte-identicalbuild-manifest.json)
Rollback: revert the workflow commit.
Kernel selection happens in build_kernel_initrd() (initramfs/build.sh:79-110): prefers KERNEL_VERSION env, then running kernel, then newest kernel with modules + /boot/vmlinuz-*.
Procedure:
- Decide the desired pin:
- For HWE upgrades, pick the new Ubuntu HWE kernel ABI (e.g.
6.11.0-13-generic) - For LTS, pick the matching GA kernel
- For HWE upgrades, pick the new Ubuntu HWE kernel ABI (e.g.
- Edit
extensions/system/initramfs/.gitea/workflows/build.yamlto setKERNEL_VERSIONenv (or changeAPT_SNAPSHOTso the apt index advances) - Re-run the build pipeline
- Boot smoke is mandatory for kernel bumps:
- QEMU boot (cheap, every PR)
- Real-hardware boot (release gate, per
project_smoke_test_state.md)
Why this needs more care than Traefik: a bad kernel bumps bricks every node booting the new image. Treat kernel bumps as separate PRs with explicit release-engineer sign-off.
Rollback: revert the pin commit; the prior kernel image is still in the artifact store (per disk_image_retention_count default of 5).
When the powernode-k3s-binary module ships (planned: package the k3s binary as a versioned NodeModule so the on-node install path doesn't depend on a curl | sh of an upstream artifact), add a k3s section here following the Traefik pattern: pin in module manifest's build.k3s_version, smoke via tutorials/04-k3s-cluster.md, rollback by reverting the module-version promotion.
Until then, k3s on-node is installed via curl | sh (agent/internal/k3sd/shell_applier.go:113–131). The migration recipe will replace that with a manifest.Service entry pointing at the module-bundled binary path.
- The platform's "ship-with-platform" rule this playbook serves: external binaries the platform depends on must ship WITH the platform (Ruby gem, Go binary in
agent/cmd/, or vendored release artifact via this playbook) — never as a host-side install (apt/brew). ../MODULE_MANIFEST_COMPLETE_SCHEMA.md— manifestbuild:block fields../../initramfs/build.sh— kernel selection logic../../agent/Makefile— Traefik + ACME + agent build targets- Related forthcoming work: k3s on-node bundling (covered above under "k3s — deferred Phase 3"), reproducibility gate (M3 byte-identical builds via APT snapshot pinning — already wired into the build workflow).