fix(core/txpool): coordinate reset lifecycle and shutdown signaling #28837 by gzliudan · Pull Request #2132 · XinFinOrg/XDPoSChain

gzliudan · 2026-03-04T03:48:05Z

Proposed changes

Improve txpool loop synchronization around background resets.

This change:

adds an explicit termination channel to signal pool shutdown
tracks forced-reset intent and a waiter channel inside the reset loop
ensures reset waiters are notified on completion or on pool termination
allows an explicit sync request path to trigger an additional reset round when needed

Scope is limited to internal txpool concurrency control in core/txpool/txpool.go, with no protocol or RPC behavior change.

Ref: ethereum#28837

Types of changes

What types of changes does your code introduce to XDC network?
Put an ✅ in the boxes that apply

Impacted Components

Which parts of the codebase does this PR touch?
Put an ✅ in the boxes that apply

Checklist

Put an ✅ in the boxes once you have confirmed below actions (or provide reasons on not doing so) that

This PR has sufficient test coverage (unit/integration test) OR I have provided reason in the PR description for not having test coverage
Tested on a private network from the genesis block and monitored the chain operating correctly for multiple epochs.
Provide an end-to-end test plan in the PR description on how to manually test it on the devnet/testnet.
Tested the backwards compatibility.
Tested with XDC nodes running this version co-exist with those running the previous version.
Relevant documentation has been updated as part of this PR
N/A

coderabbitai · 2026-03-04T03:48:16Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: caebb303-9132-4825-8169-c8a55407638d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR improves internal synchronization within the core/txpool/txpool.go event loop, specifically around background reset lifecycle management and shutdown signaling. It introduces a termination channel (term) to signal when the pool has stopped, a sync channel for simulator/testing use to force synchronous reset completion, and resetForced/resetWaiter state variables to track pending forced resets and notify waiters upon completion or pool termination.

Changes:

Added term chan struct{} field to signal pool termination (closed via defer in loop())
Added sync chan chan error field plus resetForced / resetWaiter state to support forced-reset synchronization for simulator use
Ensured active reset waiters are notified with an error upon pool shutdown

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T03:52:14Z

core/txpool/txpool.go

+		term:     make(chan struct{}),
+		sync:     make(chan chan error),


The sync field (chan chan error) is added to the TxPool struct and initialized in New(), and the loop consumes from it via case syncc := <-p.sync. However, there is no exported method on TxPool (such as a Sync() error method) that sends to this channel. The channel is therefore inaccessible to any caller outside the package, making the entire sync/forced-reset mechanism completely unreachable from the intended simulator or testing use cases described in the PR. The companion public method is missing from this change.

Copilot · 2026-03-04T03:52:14Z

core/txpool/txpool.go

+	// Notify the live reset waiter to not block if the txpool is closed.
+	defer func() {
+		if resetWaiter != nil {
+			resetWaiter <- errors.New("pool already terminated")


The shutdown defer (lines 143–148) sends on resetWaiter using a blocking send (resetWaiter <- errors.New("pool already terminated")). If the caller that originally sent its channel via p.sync is no longer actively receiving (e.g., it has already been cancelled or timed out), this blocking send will deadlock the loop() goroutine's shutdown path indefinitely, preventing the pool from terminating cleanly. The channel should be sent to in a non-blocking manner or using a select with a default branch.

Suggested change

// Notify the live reset waiter to not block if the txpool is closed.

defer func() {

if resetWaiter != nil {

resetWaiter <- errors.New("pool already terminated")

// Notify the live reset waiter without blocking if the txpool is closed.

defer func() {

if resetWaiter != nil {

select {

case resetWaiter <- errors.New("pool already terminated"):

default:

}

Copilot · 2026-03-04T03:52:14Z

core/txpool/txpool.go

+			// the forced op is still pending. In that case, wait another round
+			// of resets.
+			if resetWaiter != nil && !resetForced {
+				resetWaiter <- nil


The notification at resetWaiter <- nil (line 194) is a blocking send on an unbuffered chan error. If a Sync() caller has abandoned its wait (e.g., due to context cancellation or timeout), this send will permanently block the main loop goroutine, preventing it from processing any further chain head events or shutdown signals. The send should be non-blocking (using a select with default) to avoid stalling the event loop.

Suggested change

resetWaiter <- nil

select {

case resetWaiter <- nil:

// notification delivered

default:

// no active listener; avoid blocking the event loop

}

Copilot · 2026-03-04T03:52:14Z

core/txpool/txpool.go


 	subs event.SubscriptionScope // Subscription scope to unsubscribe all on shutdown
 	quit chan chan error         // Quit channel to tear down the head updater
+	term chan struct{}           // Termination channel to detect a closed pool


The term channel is declared in the TxPool struct, initialized in New(), and closed via defer close(p.term) in loop(), but it is never read anywhere in the codebase. There are no callers selecting on p.term to detect pool termination. As written, this field has no observable effect and is dead code. Either the channel should be exposed (e.g., via a Terminated() method that returns the channel for callers to select on), or it should be removed to avoid confusion.

Copilot · 2026-03-04T03:52:15Z

core/txpool/txpool.go

+			// that any running reset operation finishes to make block imports
+			// deterministic. On top of that, run a new reset operation to make
+			// transaction insertions deterministic instead of being stuck in a
+			// queue waiting for a reset.


When a new sync request arrives via case syncc := <-p.sync, the code unconditionally assigns resetWaiter = syncc. If a previous resetWaiter was already set (from an earlier sync request that has not yet been fulfilled), it is silently overwritten and the previous caller goroutine that is blocked waiting on its channel will never receive a notification, causing an indefinite goroutine leak/hang.

Before overwriting resetWaiter, the prior waiter should be notified (e.g., with an appropriate error) to prevent the leak.

Suggested change

// queue waiting for a reset.

// queue waiting for a reset.

if resetWaiter != nil {

// A previous sync waiter is still pending; notify it to avoid

// leaking a goroutine waiting on the old channel.

resetWaiter <- errors.New("sync request superseded by a new request")

resetWaiter = nil

}

…thereum#28837 Improve txpool loop synchronization around background resets. This change: - adds an explicit termination channel to signal pool shutdown - tracks forced-reset intent and a waiter channel inside the reset loop - ensures reset waiters are notified on completion or on pool termination - allows an explicit sync request path to trigger an additional reset round when needed Scope is limited to internal txpool concurrency control in core/txpool/txpool.go, with no protocol or RPC behavior change.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T04:41:03Z

core/txpool/txpool.go

+func (p *TxPool) Sync() error {
+	sync := make(chan error)
+	select {
+	case p.sync <- sync:
+		return <-sync
+	case <-p.term:
+		return errors.New("pool already terminated")
+	}
+}


The new Sync() method, resetWaiter handling, and term channel shutdown signaling introduce complex concurrency behavior that lacks any unit test coverage. There are no test files at the core/txpool package level. Given the complexity of the added synchronization logic (e.g., forced reset lifecycle, waiter notification on pool termination), adding test cases to verify correct behavior and prevent regressions would be valuable. For example, tests for: (1) Sync() unblocking after a reset completes, (2) Sync() returning an error when the pool is closed, (3) correct waiter notification on pool shutdown.

Copilot · 2026-03-04T04:41:03Z

core/txpool/txpool.go

+func (p *TxPool) Sync() error {
+	sync := make(chan error)
+	select {
+	case p.sync <- sync:
+		return <-sync
+	case <-p.term:
+		return errors.New("pool already terminated")


The string "pool already terminated" is used in two separate errors.New() calls (line 145 in the defer and line 432 in Sync()), resulting in two distinct error instances. The existing pattern in errors.go defines all package-level errors as exported sentinel variables (e.g., ErrAlreadyKnown, ErrTxPoolOverflow), which allows callers to compare with errors.Is(). A sentinel error such as ErrPoolTerminated would be consistent with this codebase convention and easier to compare programmatically.

Copilot · 2026-03-04T04:41:03Z

core/txpool/txpool.go

+	defer func() {
+		if resetWaiter != nil {
+			resetWaiter <- errors.New("pool already terminated")


This errors.New("pool already terminated") and the identical one in Sync() at line 432 are two separate error instances. Replacing both with a shared sentinel variable (e.g., ErrPoolTerminated defined in errors.go) would follow the existing pattern of package-level error variables in this file and make the error checkable via errors.Is().

Copilot AI review requested due to automatic review settings March 4, 2026 03:48

Copilot started reviewing on behalf of gzliudan March 4, 2026 03:48 View session

gzliudan requested review from AnilChinchawale, anunay-xin, benjamin202410, liam-lai and wanwiset25 March 4, 2026 03:48

Copilot AI reviewed Mar 4, 2026

View reviewed changes

gzliudan force-pushed the sim-tx-lockstep branch from 9204a8e to 3c60232 Compare March 4, 2026 04:34

gzliudan requested a review from Copilot March 4, 2026 04:34

Copilot started reviewing on behalf of gzliudan March 4, 2026 04:35 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

gzliudan changed the title ~~fix(core/txpool): coordinate reset lifecycle and shutdown signaling #28837~~ [WIP] fix(core/txpool): coordinate reset lifecycle and shutdown signaling #28837 Mar 4, 2026

gzliudan added the WIP work in process label Mar 10, 2026

gzliudan changed the title ~~[WIP] fix(core/txpool): coordinate reset lifecycle and shutdown signaling #28837~~ fix(core/txpool): coordinate reset lifecycle and shutdown signaling #28837 Mar 10, 2026

gzliudan removed request for AnilChinchawale, anunay-xin, benjamin202410, liam-lai and wanwiset25 March 10, 2026 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core/txpool): coordinate reset lifecycle and shutdown signaling #28837#2132

fix(core/txpool): coordinate reset lifecycle and shutdown signaling #28837#2132
gzliudan wants to merge 1 commit intoXinFinOrg:dev-upgradefrom
gzliudan:sim-tx-lockstep

gzliudan commented Mar 4, 2026

Uh oh!

coderabbitai bot commented Mar 4, 2026 •

edited

Loading

Review skipped

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-				resetWaiter <- nil
+				select {
+				case resetWaiter <- nil:
+					// notification delivered
+				default:
+					// no active listener; avoid blocking the event loop
+				}

-			// queue waiting for a reset.
+			// queue waiting for a reset.
+			if resetWaiter != nil {
+				// A previous sync waiter is still pending; notify it to avoid
+				// leaking a goroutine waiting on the old channel.
+				resetWaiter <- errors.New("sync request superseded by a new request")
+				resetWaiter = nil
+			}

Conversation

gzliudan commented Mar 4, 2026

Proposed changes

Types of changes

Impacted Components

Checklist

Uh oh!

coderabbitai bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 4, 2026 •

edited

Loading