Skip to content

Conversation

@codelipenghui
Copy link
Contributor

@codelipenghui codelipenghui commented Jan 15, 2026

Motivation

Resolve the OffloadReadHandleClosedException encountered by the client during cursor reset operations.

2026-01-15T15:37:52,897+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] ERROR org.apache.pulsar.broker.admin.v2.PersistentTopics - [xxx][persistent://xxx-partition-0] Failed to reset cursor on subscription yyy to time 1768490563341
org.apache.pulsar.broker.service.BrokerServiceException: org.apache.bookkeeper.mledger.ManagedLedgerException$OffloadReadHandleClosedException: Offload read handle already closed
	at org.apache.pulsar.broker.service.persistent.PersistentSubscription$6.findEntryFailed(PersistentSubscription.java:854)
	at org.apache.pulsar.broker.service.persistent.PersistentMessageFinder.findEntryFailed(PersistentMessageFinder.java:172)
	at org.apache.bookkeeper.mledger.impl.OpFindNewest.readEntryFailed(OpFindNewest.java:206)
	at org.apache.bookkeeper.mledger.impl.cache.RangeEntryCacheImpl$1.readEntriesFailed(RangeEntryCacheImpl.java:247)
	at org.apache.bookkeeper.mledger.impl.cache.PendingReadsManager$PendingRead.readEntriesFailed(PendingReadsManager.java:298)
	at org.apache.bookkeeper.mledger.impl.cache.PendingReadsManager$PendingRead.lambda$attach$0(PendingReadsManager.java:242)
	at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:128)
	at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:105)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.bookkeeper.mledger.ManagedLedgerException$OffloadReadHandleClosedException: Offload read handle already closed

Modifications

When resetting a cursor to a timestamp that falls into offloaded data, the broker reads entries from the tiered storage. If the offloaded read handle is closed (e.g., due to tiered storage handle eviction or cleanup), the read operation fails with OffloadReadHandleClosedException. The original code did not retry the read with a fresh handle, causing the reset operation to fail.

Fix: Add retry logic in RangeEntryCacheImpl.readFromStorage() to reopen the read handle and retry once when OffloadReadHandleClosedException is encountered.

Note: The data consumption path already handles retries for the "Read handle closed" exception.

Verifying this change

  • Added unit test testReadFromStorageRetriesWhenHandleClosed in RangeEntryCacheImplTest.java
  • Run: mvn -pl managed-ledger -Dtest=RangeEntryCacheImplTest test

Documentation

  • doc-not-needed

@github-actions
Copy link

@codelipenghui Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@codelipenghui codelipenghui changed the title Fix reset cursor retry and fencing [fix][broker] fix reset cursor retry and fencing Jan 15, 2026
@codelipenghui codelipenghui self-assigned this Jan 16, 2026
@codelipenghui codelipenghui added this to the 4.2.0 milestone Jan 16, 2026
@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Jan 16, 2026
@codelipenghui codelipenghui added type/bug The PR fixed a bug or issue reported a bug area/broker doc-label-missing and removed doc-not-needed Your PR changes do not impact docs doc-label-missing labels Jan 16, 2026
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 16, 2026
Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to consider opening another PR for reset cursor change.

@codelipenghui codelipenghui changed the title [fix][broker] fix reset cursor retry and fencing [fix][ml] Retry offload reads when OffloadReadHandleClosedException is encountered Jan 16, 2026
Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a small style suggestion

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 76.47059% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.60%. Comparing base (99fdca8) to head (325b5f8).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 0.00% 2 Missing ⚠️
...keeper/mledger/impl/cache/RangeEntryCacheImpl.java 86.66% 0 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #25148       +/-   ##
=============================================
+ Coverage     37.41%   72.60%   +35.19%     
- Complexity    13148    33854    +20706     
=============================================
  Files          1899     1956       +57     
  Lines        150564   154739     +4175     
  Branches      17156    17644      +488     
=============================================
+ Hits          56334   112349    +56015     
+ Misses        86500    33420    -53080     
- Partials       7730     8970     +1240     
Flag Coverage Δ
inttests 25.72% <23.52%> (+0.07%) ⬆️
systests 22.46% <23.52%> (+<0.01%) ⬆️
unittests 73.54% <76.47%> (+39.50%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 81.35% <0.00%> (+30.49%) ⬆️
...keeper/mledger/impl/cache/RangeEntryCacheImpl.java 72.08% <86.66%> (+18.97%) ⬆️

... and 1419 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Technoboy- Technoboy- merged commit 16bcec3 into apache:master Jan 16, 2026
241 of 249 checks passed
lhotari pushed a commit that referenced this pull request Jan 16, 2026
lhotari pushed a commit that referenced this pull request Jan 16, 2026
@codelipenghui codelipenghui deleted the fix-reset-cursor-handle-retry branch January 16, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants