Skip to content

smartcontract,controller: RFC-18 devnet fixes — node segment assignment and community stamping#3648

Open
ben-malbeclabs wants to merge 3 commits intomainfrom
bc/rfc18-devnet-fixes
Open

smartcontract,controller: RFC-18 devnet fixes — node segment assignment and community stamping#3648
ben-malbeclabs wants to merge 3 commits intomainfrom
bc/rfc18-devnet-fixes

Conversation

@ben-malbeclabs
Copy link
Copy Markdown
Contributor

@ben-malbeclabs ben-malbeclabs commented May 1, 2026

Summary

  • Assign flex-algo node segments atomically during VPNv4 loopback creation — no manual migration step needed for new/reprovisioned devices
  • Stamp the default UNICAST-DEFAULT topology color for tenantless users in the controller's community stamping
  • Rename BackfillTopologyAssignTopologyNodeSegments across the codebase (instruction discriminant 110 unchanged)

Context

Two issues discovered during the RFC-18 devnet deployment:

1. Missing flex-algo node segments on new devices. The activator used to call BackfillTopology after activating a VPNv4 loopback, assigning FlexAlgoNodeSegment entries for each existing topology. The activator was removed (#3607), and this responsibility was never ported. New or reprovisioned devices (e.g., dzd4 via the daily provisioning test) were missing node segments, causing the controller to log flex_algo.enabled=true but VPNv4 loopback has no flex_algo_node_segments.

The fix extends the onchain CreateDeviceInterface instruction to accept optional topology PDA accounts. For VPNv4 loopbacks with onchain allocation, it allocates a FlexAlgoNodeSegment for each topology in the same transaction. The CLI automatically discovers existing topologies and passes them, so doublezero device interface create ... --loopback-type vpn-ipv4 is fully self-contained.

2. Community stamping skipping tenantless users. resolveTenantColors was only called inside the tenant lookup branch. All 985 unicast users on mainnet and 474/535 on testnet have no tenant, so they were silently getting no color. The fix adds an else branch that resolves the default UNICAST-DEFAULT color for tenantless users.

Related RFC: rfcs/rfc-0018-flex-algo.md

Testing Verification

  • cargo test -p doublezero-serviceability -p doublezero_sdk -p doublezero_cli — all passing
  • Existing BackfillTopology / AssignTopologyNodeSegments integration tests still pass (idempotency, non-foundation rejection, SR ID allocation)
  • Verified mainnet has 0 users with tenants, testnet has 40/535 — confirming the community stamping gap

Community stamping only resolved colors for users with a tenant
assigned. Since most users (all of mainnet, most of testnet) have no
tenant, they were silently skipped. Add an else branch that calls
resolveTenantColors with nil includeTopologies, which falls through to
the default UNICAST-DEFAULT color.
When creating a VPNv4 loopback with onchain allocation, the instruction
now accepts optional topology PDA accounts and allocates a
FlexAlgoNodeSegment for each. This replaces the activator's post-
activation backfill that was lost during the activator removal.

The CLI automatically discovers existing topologies and passes them to
the instruction, so `doublezero device interface create ... --loopback-type
vpn-ipv4` assigns all node segments atomically — no manual steps.

Also rename BackfillTopology → AssignTopologyNodeSegments across the
codebase (instruction discriminant 110 unchanged).
@ben-malbeclabs ben-malbeclabs changed the title controller: stamp default topology color for tenantless users smartcontract,controller: RFC-18 devnet fixes — node segment assignment and community stamping May 1, 2026
Copy link
Copy Markdown
Contributor

@elitegreg elitegreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution pretty much requires that you never have more than about single digit or so number of topologies. As written today, it could be up to 25, but that assumes there is room in the packet, which is why I'm saying 8 or so. Is this a reasonable limitation? It seems to me like it is not as we may have several topologies per tenant?

@ben-malbeclabs
Copy link
Copy Markdown
Contributor Author

ben-malbeclabs commented May 2, 2026

This solution pretty much requires that you never have more than about single digit or so number of topologies. As written today, it could be up to 25, but that assumes there is room in the packet, which is why I'm saying 8 or so. Is this a reasonable limitation? It seems to me like it is not as we may have several topologies per tenant?

It would be a decent limit. I can't imagine us having that many.

I think other technologies become more interesting for us without have 10s of Flex-Algo topologies.

Worst case, if we did need more than 8, or even more than 25, is that difficult to achieve?

"Invalid Topology Account Owner"
);
assert!(!topo_account.data_is_empty(), "Topology account is empty");
let topo_segment_idx = allocate_id(segment_routing_ids_ext)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocated node segment ids should be deallocated in the interface/remove instruction.

@elitegreg
Copy link
Copy Markdown
Contributor

This solution pretty much requires that you never have more than about single digit or so number of topologies. As written today, it could be up to 25, but that assumes there is room in the packet, which is why I'm saying 8 or so. Is this a reasonable limitation? It seems to me like it is not as we may have several topologies per tenant?

It would be a decent limit. I can't imagine us having that many.

I think other technologies become more interesting for us without have 10s of Flex-Algo topologies.

Worst case, if we did need more than 8, or even more than 25, is that difficult to achieve?

Difficult to achieve? Probably what we would need to do is come up with a more flexible implementation and create new version of the create interface command and retire the old one (this latter case is a big deal, with the activator instructions being the first instructions we are actually retiring). I think I would feel better if we had an instruction for adding a flex algo to an interface, and the having the interface create sdk (client side) call create interface, followed by iteratively adding the flex algo (each would allocate a segment routing id). My preference would be to do it this way now and not bork up create interface in a way that may only be temporary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants