Skip to content

fix(consensus/XDPoS): stabilize VerifyHeaders across v1-v2 switch, fix #2138 XFN-12#2139

Open
gzliudan wants to merge 1 commit intoXinFinOrg:dev-upgradefrom
gzliudan:fix-verify-order
Open

fix(consensus/XDPoS): stabilize VerifyHeaders across v1-v2 switch, fix #2138 XFN-12#2139
gzliudan wants to merge 1 commit intoXinFinOrg:dev-upgradefrom
gzliudan:fix-verify-order

Conversation

@gzliudan
Copy link
Collaborator

@gzliudan gzliudan commented Mar 6, 2026

Proposed changes

fix:

Problem

  • Mixed v1/v2 batches around the switch boundary could produce non-deterministic verification behavior.
  • The adaptor previously relied on version-bucket concurrency and did not expose in-batch headers for GetHeaderByNumber lookups.
  • In practice this could surface as intermittent sync failures and misleading BAD BLOCK attribution near the switch height.

Root cause

  • Verify result ordering and read visibility assumptions were too weak for mixed-boundary batches.
  • Engine helpers may query headers by number during verification, but batch context only shadowed hash-based lookups.

Fix

  • Keep VerifyHeaders emission strictly aligned with input order.
  • Introduce verifyChainReader and shadow both hash and number lookups with in-batch headers:
    • GetHeader(hash, number)
    • GetHeaderByHash(hash)
    • GetHeaderByNumber(number)
  • This preserves deterministic per-header verification and avoids stale canonical reads during boundary processing.

Tests

  • Add TestAdaptorVerifyHeadersKeepsInputOrderAcrossConsensusSwitch.
  • Add TestVerifyChainReaderReturnsBatchHeaderByNumber.

Types of changes

What types of changes does your code introduce to XDC network?
Put an in the boxes that apply

  • build: Changes that affect the build system or external dependencies
  • ci: Changes to CI configuration files and scripts
  • chore: Changes that don't change source code or tests
  • docs: Documentation only changes
  • feat: A new feature
  • fix: A bug fix
  • perf: A code change that improves performance
  • refactor: A code change that neither fixes a bug nor adds a feature
  • revert: Revert something
  • style: Changes that do not affect the meaning of the code
  • test: Adding missing tests or correcting existing tests

Impacted Components

Which parts of the codebase does this PR touch?
Put an in the boxes that apply

  • Consensus
  • Account
  • Network
  • Geth
  • Smart Contract
  • External components
  • Not sure (Please specify below)

Checklist

Put an in the boxes once you have confirmed below actions (or provide reasons on not doing so) that

  • This PR has sufficient test coverage (unit/integration test) OR I have provided reason in the PR description for not having test coverage
  • Tested on a private network from the genesis block and monitored the chain operating correctly for multiple epochs.
  • Provide an end-to-end test plan in the PR description on how to manually test it on the devnet/testnet.
  • Tested the backwards compatibility.
  • Tested with XDC nodes running this version co-exist with those running the previous version.
  • Relevant documentation has been updated as part of this PR
  • N/A

Copilot AI review requested due to automatic review settings March 6, 2026 11:08
@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 65228868-6116-4ada-aa96-520f71190570

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a result-ordering bug in XDPoS.VerifyHeaders where splitting headers into v1/v2 buckets and running two concurrent goroutines caused nondeterministic result-to-header mapping around the v1→v2 switch boundary. This led to misleading "BAD BLOCK" log messages (e.g., a v1-style error attributed to a v2-height header) and sync failures (issue #2138).

Changes:

  • consensus/XDPoS/XDPoS.go: Replaced the two-bucket/two-goroutine VerifyHeaders implementation with a single goroutine that iterates headers in input order, dispatching each to the appropriate engine version.
  • consensus/tests/engine_v2_tests/adaptor_test.go: Added TestAdaptorVerifyHeadersKeepsInputOrderAcrossConsensusSwitch to assert that results arrive in the same order as the input slice across the consensus switch.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
consensus/XDPoS/XDPoS.go Rewrites VerifyHeaders to a single sequential goroutine preserving input order
consensus/tests/engine_v2_tests/adaptor_test.go New regression test for result ordering across the v1/v2 switch

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gzliudan gzliudan force-pushed the fix-verify-order branch 4 times, most recently from 16837f0 to 41a76ec Compare March 7, 2026 13:43
@gzliudan gzliudan changed the title fix(consensus): preserve verifyheaders order across v1-v2 switch, #2138, XFN-12 [WIP] fix(consensus): preserve verifyheaders order across v1-v2 switch, #2138, XFN-12 Mar 9, 2026
@gzliudan gzliudan changed the title [WIP] fix(consensus): preserve verifyheaders order across v1-v2 switch, #2138, XFN-12 [WIP] fix(consensus/XDPoS): stabilize VerifyHeaders across v1-v2 switch, fix #2138 XFN-12 Mar 9, 2026
@gzliudan gzliudan changed the title [WIP] fix(consensus/XDPoS): stabilize VerifyHeaders across v1-v2 switch, fix #2138 XFN-12 fix(consensus/XDPoS): stabilize VerifyHeaders across v1-v2 switch, fix #2138 XFN-12 Mar 9, 2026
@gzliudan gzliudan force-pushed the fix-verify-order branch 2 times, most recently from a14f15c to eea3308 Compare March 10, 2026 07:01
…XinFinOrg#2138 XFN-12

Problem
- Mixed v1/v2 batches around the switch boundary could produce non-deterministic verification behavior.
- The adaptor previously relied on version-bucket concurrency and did not expose in-batch headers for GetHeaderByNumber lookups.
- In practice this could surface as intermittent sync failures and misleading BAD BLOCK attribution near the switch height.

Root cause
- Verify result ordering and read visibility assumptions were too weak for mixed-boundary batches.
- Engine helpers may query headers by number during verification, but batch context only shadowed hash-based lookups.

Fix
- Keep VerifyHeaders emission strictly aligned with input order.
- Introduce verifyChainReader and shadow both hash and number lookups with in-batch headers:
  - GetHeader(hash, number)
  - GetHeaderByHash(hash)
  - GetHeaderByNumber(number)
- This preserves deterministic per-header verification and avoids stale canonical reads during boundary processing.

Tests
- Add TestAdaptorVerifyHeadersKeepsInputOrderAcrossConsensusSwitch.
- Add TestVerifyChainReaderReturnsBatchHeaderByNumber.

Validation
- go test ./consensus/XDPoS/...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants