Bootstrap and replication fixes #259

j-rafique · 2026-01-13T13:57:04Z

No description provided.

roomote · 2026-01-13T13:57:36Z

Rooviewer See task on Roo Cloud

Reviewed the latest changes (documentation updates and logging level adjustment). No issues found.

Changes in this update:

Documentation updated to reference the PR URL and correct the log level description
Logging level for context deadline/canceled errors changed from Debug to Info for better operational visibility
Removed verification section from documentation

Previous reviews

caa2dc8: Review #1

_{Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.}

Copilot

Pull request overview

This PR addresses critical P2P replication and routing issues in a Kademlia DHT implementation by introducing bounded replication windows, routing eligibility gating, and reducing log noise.

Changes:

Adds chain-state-based routing eligibility checks to prevent non-active supernodes from participating in the DHT routing table
Implements bounded replication key scanning (200k keys per cycle) with stable cursor advancement to enable incremental progress on large backlogs
Refactors replication key assignment to use one-pass logic instead of per-peer re-scanning, reducing complexity and resource usage

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
p2p/kademlia/bootstrap.go	Returns active supernode IDs as allowlist from chain state and triggers routing table pruning
p2p/kademlia/dht.go	Implements routing eligibility gating infrastructure and filters ineligible nodes from routing operations
p2p/kademlia/node_activity.go	Adds eligibility checks to node activity monitoring and ping success handlers
p2p/kademlia/replication.go	Implements bounded key scanning, one-pass key assignment, and stable window-based cursor advancement
p2p/kademlia/store.go	Updates interface to accept maxKeys parameter for bounded queries
p2p/kademlia/store/mem/mem.go	Updates signature to match new interface
p2p/kademlia/store/sqlite/replication.go	Implements bounded query with limit extension for same-timestamp keys and context cancellation checks
p2p/kademlia/store/sqlite/sqlite.go	Reduces log noise by treating context timeout/cancellation as Info instead of Error
p2p/kademlia/redundant_data.go	Updates call to use unlimited keys for cleanup operations
docs/p2p-replication-and-routing-changes-2026-01-13.md	Comprehensive documentation of changes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-15T13:13:53Z

p2p/kademlia/store/sqlite/replication.go

+		}
+	}
+
+	if err := ctx.Err(); err != nil {


The context error check at line 295 occurs after all database operations complete. If the context is cancelled during the database query, the query will continue to completion before this check detects it. Consider checking ctx.Err() before executing queries or use ctx as part of the query execution (if the database driver supports it) to enable early cancellation.

Copilot · 2026-01-15T13:13:53Z

p2p/kademlia/store/sqlite/replication.go

+		if len(results) == maxKeys {
+			boundAt := results[len(results)-1].CreatedAt
+			boundKey := results[len(results)-1].Key
+
+			var extra []domain.KeyWithTimestamp
+			extraQuery := `SELECT key, createdAt FROM data WHERE createdAt = ? AND key > ? ORDER BY key ASC`


The limit extension logic for same-timestamp keys (lines 269-285) is a critical correctness mechanism but lacks test coverage. This logic ensures that when a limit is hit, all keys with the same createdAt as the boundary are included to prevent cursor advancement from skipping keys. Add test cases covering: (1) limit hit with no additional same-timestamp keys, (2) limit hit with multiple additional same-timestamp keys, and (3) proper ordering when keys have identical timestamps.

Copilot · 2026-01-15T13:13:54Z

p2p/kademlia/replication.go

+
+	assignedKeys := make(map[string][]string, len(peerStart))
 	for i := 0; i < len(replicationKeys); i++ {
 		decKey, _ := hex.DecodeString(replicationKeys[i].Key)


The hex.DecodeString error is silently ignored on line 269. If the key cannot be decoded, it will silently produce an empty/incorrect byte slice and potentially cause incorrect routing decisions. Consider logging a warning or skipping keys that fail to decode.

Suggested change

decKey, _ := hex.DecodeString(replicationKeys[i].Key)

decKey, err := hex.DecodeString(replicationKeys[i].Key)

if err != nil {

continue

}

Copilot · 2026-01-15T13:13:54Z

p2p/kademlia/bootstrap.go

+		if id == "" {
+			continue
+		}
+		if h, err := utils.Blake3Hash([]byte(id)); err == nil && len(h) == 32 {


The Blake3Hash error is checked but not logged when it fails. If hashing fails for a supernode ID, that node will be silently excluded from the routing allowlist, which could cause unexpected behavior. Consider logging a warning when hash computation fails for debugging purposes.

Suggested change

if h, err := utils.Blake3Hash([]byte(id)); err == nil && len(h) == 32 {

h, err := utils.Blake3Hash([]byte(id))

if err != nil {

logtrace.Warn(ctx, "Failed to compute Blake3 hash for supernode ID", logtrace.Fields{

logtrace.FieldModule: "p2p",

logtrace.FieldError: err.Error(),

"supernode": sn.SupernodeAccount,

})

} else if len(h) == 32 {

Copilot · 2026-01-15T13:13:55Z

p2p/kademlia/dht.go

+	if os.Getenv("INTEGRATION_TEST") == "true" {
+		return
+	}


The INTEGRATION_TEST environment variable check is repeated in multiple functions (setRoutingAllowlist, eligibleForRouting, filterEligibleNodes, pruneIneligibleRoutingPeers, addNode). Consider extracting this to a helper function or storing it as a boolean field in the DHT struct during initialization to avoid repeated environment variable lookups and improve maintainability.

Copilot · 2026-01-15T13:13:55Z

p2p/kademlia/dht.go

+		logtrace.Warn(ctx, "routing allowlist update skipped (empty)", logtrace.Fields{
+			logtrace.FieldModule: "p2p",
+		})


The empty allowlist check on line 146 prevents lockout but could mask legitimate scenarios where no active supernodes exist in the chain state. Consider differentiating between 'no allowlist available yet' (bootstrap phase) and 'zero active supernodes returned from chain' (potential chain state issue) with different log levels or messages.

Suggested change

logtrace.Warn(ctx, "routing allowlist update skipped (empty)", logtrace.Fields{

logtrace.FieldModule: "p2p",

})

// Differentiate between bootstrap (no allowlist applied yet) and a potential

// chain-state issue (we previously had an allowlist, but now see zero peers).

if !s.routingAllowReady.Load() {

logtrace.Info(ctx, "routing allowlist from chain is empty; assuming bootstrap phase and leaving gating disabled", logtrace.Fields{

logtrace.FieldModule: "p2p",

})

} else {

logtrace.Warn(ctx, "routing allowlist update skipped: chain returned zero active supernodes; retaining previous allowlist", logtrace.Fields{

logtrace.FieldModule: "p2p",

})

}

Copilot · 2026-01-15T13:13:56Z

p2p/kademlia/replication.go

+	peerStart := make(map[string]time.Time)
+	for _, info := range repInfo {
+		if !info.Active {
+			continue
+		}
+		start := historicStart
+		if info.LastReplicatedAt != nil {
+			start = *info.LastReplicatedAt
+		}
+		peerStart[string(info.ID)] = start
+	}


The new one-pass key assignment logic (lines 255-280) represents a significant algorithmic change from the previous per-peer scanning approach. This core replication logic lacks adequate test coverage to verify correctness of key assignment across multiple peers with different lastReplicatedAt values, especially edge cases like keys at window boundaries or peers with no prior replication history.

Copilot · 2026-01-15T13:13:56Z

p2p/kademlia/node_activity.go

+					if !s.eligibleForRouting(node) {
+						if info.Active {
+							s.removeNode(ctx, node)
+							if uerr := s.store.UpdateIsActive(ctx, string(node.ID), false, false); uerr != nil {


The eligibility check followed by removeNode and UpdateIsActive (lines 56-66 and 139-152) could create a race condition if the routing allowlist is updated between the eligibility check and the database update. Consider adding proper synchronization or documenting that this is acceptable given the bootstrap refresh cadence.

bootstrap and replication fixes

caa2dc8

roomote bot previously approved these changes Jan 13, 2026

View reviewed changes

update the doc

9c68f45

j-rafique dismissed roomote[bot]’s stale review via 9c68f45 January 13, 2026 14:08

roomote bot approved these changes Jan 13, 2026

View reviewed changes

a-ok123 requested a review from Copilot January 15, 2026 13:08

Copilot started reviewing on behalf of a-ok123 January 15, 2026 13:08 View session

Copilot AI reviewed Jan 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bootstrap and replication fixes #259

Bootstrap and replication fixes #259

Uh oh!

j-rafique commented Jan 13, 2026

Uh oh!

roomote bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-		if h, err := utils.Blake3Hash([]byte(id)); err == nil && len(h) == 32 {
+		h, err := utils.Blake3Hash([]byte(id))
+		if err != nil {
+			logtrace.Warn(ctx, "Failed to compute Blake3 hash for supernode ID", logtrace.Fields{
+				logtrace.FieldModule: "p2p",
+				logtrace.FieldError:  err.Error(),
+				"supernode":          sn.SupernodeAccount,
+			})
+		} else if len(h) == 32 {

-		logtrace.Warn(ctx, "routing allowlist update skipped (empty)", logtrace.Fields{
-			logtrace.FieldModule: "p2p",
-		})
+		// Differentiate between bootstrap (no allowlist applied yet) and a potential
+		// chain-state issue (we previously had an allowlist, but now see zero peers).
+		if !s.routingAllowReady.Load() {
+			logtrace.Info(ctx, "routing allowlist from chain is empty; assuming bootstrap phase and leaving gating disabled", logtrace.Fields{
+				logtrace.FieldModule: "p2p",
+			})
+		} else {
+			logtrace.Warn(ctx, "routing allowlist update skipped: chain returned zero active supernodes; retaining previous allowlist", logtrace.Fields{
+				logtrace.FieldModule: "p2p",
+			})
+		}

Bootstrap and replication fixes #259

Are you sure you want to change the base?

Bootstrap and replication fixes #259

Uh oh!

Conversation

j-rafique commented Jan 13, 2026

Uh oh!

roomote bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

roomote bot commented Jan 13, 2026 •

edited

Loading