smartcontract,controller: RFC-18 devnet fixes — node segment assignment and community stamping#3648
smartcontract,controller: RFC-18 devnet fixes — node segment assignment and community stamping#3648ben-malbeclabs wants to merge 3 commits intomainfrom
Conversation
Community stamping only resolved colors for users with a tenant assigned. Since most users (all of mainnet, most of testnet) have no tenant, they were silently skipped. Add an else branch that calls resolveTenantColors with nil includeTopologies, which falls through to the default UNICAST-DEFAULT color.
When creating a VPNv4 loopback with onchain allocation, the instruction now accepts optional topology PDA accounts and allocates a FlexAlgoNodeSegment for each. This replaces the activator's post- activation backfill that was lost during the activator removal. The CLI automatically discovers existing topologies and passes them to the instruction, so `doublezero device interface create ... --loopback-type vpn-ipv4` assigns all node segments atomically — no manual steps. Also rename BackfillTopology → AssignTopologyNodeSegments across the codebase (instruction discriminant 110 unchanged).
2b95b4c to
f15af55
Compare
elitegreg
left a comment
There was a problem hiding this comment.
This solution pretty much requires that you never have more than about single digit or so number of topologies. As written today, it could be up to 25, but that assumes there is room in the packet, which is why I'm saying 8 or so. Is this a reasonable limitation? It seems to me like it is not as we may have several topologies per tenant?
It would be a decent limit. I can't imagine us having that many. I think other technologies become more interesting for us without have 10s of Flex-Algo topologies. Worst case, if we did need more than 8, or even more than 25, is that difficult to achieve? |
| "Invalid Topology Account Owner" | ||
| ); | ||
| assert!(!topo_account.data_is_empty(), "Topology account is empty"); | ||
| let topo_segment_idx = allocate_id(segment_routing_ids_ext)?; |
There was a problem hiding this comment.
allocated node segment ids should be deallocated in the interface/remove instruction.
Difficult to achieve? Probably what we would need to do is come up with a more flexible implementation and create new version of the create interface command and retire the old one (this latter case is a big deal, with the activator instructions being the first instructions we are actually retiring). I think I would feel better if we had an instruction for adding a flex algo to an interface, and the having the interface create sdk (client side) call create interface, followed by iteratively adding the flex algo (each would allocate a segment routing id). My preference would be to do it this way now and not bork up create interface in a way that may only be temporary. |
Summary
BackfillTopology→AssignTopologyNodeSegmentsacross the codebase (instruction discriminant 110 unchanged)Context
Two issues discovered during the RFC-18 devnet deployment:
1. Missing flex-algo node segments on new devices. The activator used to call
BackfillTopologyafter activating a VPNv4 loopback, assigningFlexAlgoNodeSegmententries for each existing topology. The activator was removed (#3607), and this responsibility was never ported. New or reprovisioned devices (e.g., dzd4 via the daily provisioning test) were missing node segments, causing the controller to logflex_algo.enabled=true but VPNv4 loopback has no flex_algo_node_segments.The fix extends the onchain
CreateDeviceInterfaceinstruction to accept optional topology PDA accounts. For VPNv4 loopbacks with onchain allocation, it allocates aFlexAlgoNodeSegmentfor each topology in the same transaction. The CLI automatically discovers existing topologies and passes them, sodoublezero device interface create ... --loopback-type vpn-ipv4is fully self-contained.2. Community stamping skipping tenantless users.
resolveTenantColorswas only called inside the tenant lookup branch. All 985 unicast users on mainnet and 474/535 on testnet have no tenant, so they were silently getting no color. The fix adds an else branch that resolves the default UNICAST-DEFAULT color for tenantless users.Related RFC:
rfcs/rfc-0018-flex-algo.mdTesting Verification
cargo test -p doublezero-serviceability -p doublezero_sdk -p doublezero_cli— all passingBackfillTopology/AssignTopologyNodeSegmentsintegration tests still pass (idempotency, non-foundation rejection, SR ID allocation)