You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #369 (merged 2026-05-29) moved node_id resolution for the LabelPeerSource path from seictl's sidecar (:26657/status query) into the controller, producing fully-composed <node_id>@<host>:<port> strings in Status.ResolvedPeers and feeding them to the planner via sidecar.PeerSourceStatic. The EC2TagsPeerSource path is still split: the controller resolves the host list, but the sidecar still queries each peer's :26657/status for node_id at config-render time, via sidecar.PeerSourceDNSEndpoints.
This is the split shape #368 identified as load-bearing. Now mixed semantics inside one controller: Label peers preserve prior entries on transient failure, EC2Tag peers don't; Label peers are resilient to mass-restart, EC2Tag peers re-render fragility every reconcile.
Impact
Affects sei-infra peer discovery — pacific-1 validator nodes peer with sei-infra-managed peers via EC2 tag selectors, and those peer relationships should benefit from the controller-side resilience story #369 shipped for Labels. While DNSEndpointsSource remains live, the EC2Tag side carries the pre-#369 failure modes:
Sidecar config-render queries :26657/status of each peer at task-execution time; a peer mid-restart drops out of the rendered persistent_peers and the gap persists until the next config-render task fires.
No prior-entry preservation; each render is a fresh resolution against current peer reachability.
The EC2 tag query and the DNS query are temporally split, allowing membership drift between "what the controller thinks the peer set is" and "what got written to config.toml".
Controller's reconcilePeers learns an EC2TagsPeerSource branch alongside the existing Label branch. Resolves EC2-tagged instances to host:port, then calls the per-peer sidecar gRPC GetNodeID for each — same per-peer-best-effort semantics (preserve prior on transient sidecar failure, skip new peer with structured log).
Compose <node_id>@<host>:<port> into Status.ResolvedPeers, same wire format as the Label path.
Once both Label and EC2Tags use the static path, the seictl-side DNSEndpointsSource handler is dead code — remove it from the seictl repo as a follow-up.
Open question for the experts: EC2-tagged peers that aren't in the same K8s cluster (sei-infra-managed) — the controller can't dial a per-peer sidecar gRPC the way it does for in-cluster SeiNodes. Resolution path options:
(a) Gate the controller-side GetNodeID on "is the peer an in-cluster SeiNode"; fall back to leaving DNSEndpointsSource live for out-of-cluster EC2 peers.
(b) Controller queries :26657/status directly for out-of-cluster peers.
(c) Require sei-infra peers to also publish node_id via a discoverable surface (tag, S3, on-chain).
Worth a coral round before implementation.
Out of scope
Anything that changes the Spec.Peers user surface (ec2Tags, static, label union stays).
Genesis ceremony peer logic (controller-side genesis assembly is a different code path).
Drain policy for stale Status.ResolvedPeers entries (separate concern, deferred until prod signal warrants).
platform-engineer — DNSEndpointsSource retirement on the seictl side; the out-of-cluster identity-resolution question.
sei-network-specialist — CometBFT node_id resolution for out-of-cluster (sei-infra) peers; whether :26657/status direct-query is acceptable or if identity should come from a more authoritative surface.
Problem
PR #369 (merged 2026-05-29) moved
node_idresolution for theLabelPeerSourcepath from seictl's sidecar (:26657/statusquery) into the controller, producing fully-composed<node_id>@<host>:<port>strings inStatus.ResolvedPeersand feeding them to the planner viasidecar.PeerSourceStatic. TheEC2TagsPeerSourcepath is still split: the controller resolves the host list, but the sidecar still queries each peer's:26657/statusfornode_idat config-render time, viasidecar.PeerSourceDNSEndpoints.This is the split shape #368 identified as load-bearing. Now mixed semantics inside one controller: Label peers preserve prior entries on transient failure, EC2Tag peers don't; Label peers are resilient to mass-restart, EC2Tag peers re-render fragility every reconcile.
Impact
Affects sei-infra peer discovery — pacific-1 validator nodes peer with sei-infra-managed peers via EC2 tag selectors, and those peer relationships should benefit from the controller-side resilience story #369 shipped for Labels. While
DNSEndpointsSourceremains live, the EC2Tag side carries the pre-#369 failure modes::26657/statusof each peer at task-execution time; a peer mid-restart drops out of the renderedpersistent_peersand the gap persists until the next config-render task fires.Proposed approach (to refine)
Mirror the #369 pattern for EC2Tags:
reconcilePeerslearns anEC2TagsPeerSourcebranch alongside the existing Label branch. Resolves EC2-tagged instances tohost:port, then calls the per-peer sidecar gRPCGetNodeIDfor each — same per-peer-best-effort semantics (preserve prior on transient sidecar failure, skip new peer with structured log).<node_id>@<host>:<port>intoStatus.ResolvedPeers, same wire format as the Label path.sidecar.PeerSourceStatic(same as Labels post-feat(controller/node): resolve label peers to NLB addresses + controller-side node_id (#368) #369), retiringsidecar.PeerSourceDNSEndpointsfrom the resolver→sidecar contract.DNSEndpointsSourcehandler is dead code — remove it from the seictl repo as a follow-up.Open question for the experts: EC2-tagged peers that aren't in the same K8s cluster (sei-infra-managed) — the controller can't dial a per-peer sidecar gRPC the way it does for in-cluster SeiNodes. Resolution path options:
GetNodeIDon "is the peer an in-cluster SeiNode"; fall back to leaving DNSEndpointsSource live for out-of-cluster EC2 peers.:26657/statusdirectly for out-of-cluster peers.node_idvia a discoverable surface (tag, S3, on-chain).Worth a coral round before implementation.
Out of scope
Spec.Peersuser surface (ec2Tags,static,labelunion stays).Status.ResolvedPeersentries (separate concern, deferred until prod signal warrants).Relevant experts
reconcilePeerswith a new branch; reusing the per-peer-best-effort + prior-preserve pattern from feat(controller/node): resolve label peers to NLB addresses + controller-side node_id (#368) #369.node_idresolution for out-of-cluster (sei-infra) peers; whether:26657/statusdirect-query is acceptable or if identity should come from a more authoritative surface.References
Status.ResolvedPeersinternal/controller/node/peers.go—resolveLabelPeers; the pattern to mirror for EC2Tagsinternal/planner/planner.go— Label→Static branch; the EC2Tags branch currently maps toPeerSourceDNSEndpointsseictlrepo —DNSEndpointsSourcehandler (to retire once both controller branches use Static)