Skip to content

BFD and falcon-lab cleanup [Spring Cleaning 1/N]#717

Merged
taspelund merged 17 commits intomainfrom
trey/bfd-cleanup
Apr 24, 2026
Merged

BFD and falcon-lab cleanup [Spring Cleaning 1/N]#717
taspelund merged 17 commits intomainfrom
trey/bfd-cleanup

Conversation

@taspelund
Copy link
Copy Markdown
Contributor

@taspelund taspelund commented Apr 22, 2026

First round of splits from #648

BFD daemon cleanup

  • Switch BFD threads to ManagedThread + named Builder::spawn; propagate spawn Results up to admin callers.
  • Group peer config into AddPeerRequest; add AddPeerError::PeerExists so mgd returns 409 instead of 500 on duplicates.
  • Convert egress() from blocking rx.recv() to recv_timeout(1s) with a kill-switch poll, avoiding the silent-channel deadlock.
  • Extend bfd unit tests with IPv6 peers.

New falcon-lab test: trio-bfd-static-routing

  • Dual-stack BFD + static routing over the existing trio topology (FRR + cEOS peers), with shared bootstrap extracted into boot_trio() / BootedTrio.
  • Five-phase assertion matrix exercises BFD-gated next-hop shutdown and the "all shutdown → all reinstated" fallback. Each phase validates mgd BFD state, peer-side BFD state on live peers, and the dpd target sets for both v4 and v6 prefixes.
  • Failure injection via pkill -STOP bfdd on FRR and docker pause ceos on EOS — reversible, daemon-state-preserving.
  • Plumbing needed for numbered peers to work: per-link link_ipv{4,6}_create + link_ipv6_enabled_set, run ddmd (mg-lower depends on it), bump bestpath_fanout to 2, pre-seed each tfport with an addrconf link-local before static v6, and tolerate "already"-style ipadm errors for --no-cleanup re-runs.

Trio-unnumbered assertion tightening

  • Per-neighbor FSM check.
  • Assert specific prefixes and path counts in imported RIB, selected RIB, dpd targets, and each peer's imported BGP routes, replacing count-only assertions.
  • Bump bestpath_fanout to 2 so ECMP is actually exercised.

Use mg-common's ManagedThread so we retain JoinHandles in a structured
fashion in case we want to make explicit use of these later.
Also introduce AddPeerRequest to group arguments defining a peer config.
This avoids clippy's too many arguments warning and provides some rough
structure for the peer add flow.
After converting to using a thread builder instead of thread::spawn()
(builder::spawn returns result, thread::spawn panics), the Result should
be propagated back out to the callers so they can do proper error
handling.
ensure() was returning a Result<UdpSocket> but the caller just dropped
it, so now we just return Result<()>.
Removes redundant presence check for peer during add_peer callpath.
Updates egress() to use recv_timeout() to avoid possible deadlocks where
recv() hangs on a dead channel and we miss the killswitch (AtomicBool).
Remove the workaround for Illumos #17853 from falcon-lab.sh
Adds new falcon-lab test that makes use of the pre-existing trio topo.
Generalizes the trio topology creation to reduce boilerplate.
Adds a dual-stack static routing config with BFD enabled.
Adds a test to ensure BFD comes up for v4 and v6, each next-hop can be
disabled independently, and that each path is installed into the RIB as
expected as BFD sessions are brought up/down.
Illumos complains if you try to add a static ipv6 address via ipadm
before that datalink already has an addrconf address. Make it so!

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Since mgd images in falcon-lab do not ship with the control plane, we
have to manually poke and prod dendrite to install entries into softnpu.
This includes uplink addresses we intend to use for unicast, e.g. BFD
and BGP (although unnumbered BGP works because softnpu has a punt-to-CPU
catch-all for link-local traffic).

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Fix the syntax to enable BFD for static routes on EOS. Also fixup the
parsing of EOS "show bfd peers | json".

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Adds falcon-lab/falcon-lab symlink so there's a standard directory to
run the tests from locally.
Adds a falcon-lab/cargo-bay directory for dumping binaries to be used by
falcon-lab tests.
Adds a falcon-lab/cargo-bay/.gitignore that ignores everything except
for the .gitignore itself, so nothing gets tracked by git inadvertently.

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Adds validation of routes in loc_rib ("selected") and ASIC (dpd).
Checks FSM state of both peers, not just the 0th one.
Check for specific prefixes when looking in RIB (on both local/peer).
Bumps bestpath fanout to 2 so we can validate ECMP in loc_rib/dpd.

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Cleans up the FRR config a little bit by replacing a bunch of ALLOW-ALL
boilerplate with "no bgp ebgp-requires-policy".

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
@taspelund taspelund requested a review from rcgoodfellow April 22, 2026 06:41
@taspelund taspelund self-assigned this Apr 22, 2026
@taspelund taspelund added bfd Bidirectional Forwarding Detection mgd Maghemite daemon rust Pull requests that update rust code labels Apr 22, 2026
Comment thread falcon-lab/src/test.rs
Ok(())
}

pub async fn run_trio_bfd_static_test(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. Really stoked to see this e2e test come together.

@taspelund taspelund merged commit ef72a45 into main Apr 24, 2026
15 checks passed
@taspelund taspelund deleted the trey/bfd-cleanup branch April 24, 2026 04:26
@taspelund taspelund changed the title BFD and falcon-lab cleanup BFD and falcon-lab cleanup [Spring Cleaning 1/N] Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bfd Bidirectional Forwarding Detection mgd Maghemite daemon rust Pull requests that update rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants