-
Notifications
You must be signed in to change notification settings - Fork 962
[fix]Fix entry loss due to incorrect lock of LedgerHandle #4701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| } | ||
| } | ||
| handle = LedgerDescriptor.create(masterKey, ledgerId, ledgerStorage); | ||
| ledgers.putIfAbsent(ledgerId, handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can avoid the synchronization by handling the result of the putIfAbsent? I understand the main issue is that here we create two LedgerDescriptor, then used by different threads. But the putIfAbsent will reduce one. So we can handle the result to ensure the same handle is returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved, please review again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a race condition in HandleFactoryImpl.getHandle() that could lead to entry loss when multiple threads simultaneously create a LedgerDescriptor for the same ledger. The issue occurs when both the write thread and the recovery thread attempt to create a new descriptor concurrently, potentially breaking the synchronization lock that prevents fencing races.
Changes:
- Implemented double-checked locking pattern in
getHandle()to ensure only oneLedgerDescriptoris created per ledger ID - Synchronized on the
ledgersmap during descriptor creation to prevent race conditions
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/HandleFactoryImpl.java
Outdated
Show resolved
Hide resolved
| throw BookieException.create(BookieException.Code.LedgerFencedAndDeletedException); | ||
| } | ||
| LedgerDescriptor handle = LedgerDescriptor.create(masterKey, ledgerId, ledgerStorage); | ||
| ledgers.putIfAbsent(ledgerId, handle); |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The putIfAbsent on line 63 is unnecessary and potentially wasteful. Since we're already inside a synchronized block with a double-check, the key is guaranteed not to exist. Use put() instead of putIfAbsent() to avoid the redundant check and potential creation of unused LedgerDescriptor instances.
| ledgers.putIfAbsent(ledgerId, handle); | |
| ledgers.put(ledgerId, handle); |
Motivation
Background
HandleFactoryImpl, which maintains the ledger state.fencedBookieHighPriorityThreadBookieWriteThreadPoolTo avoid the following multi-threading competition issue,BK uses a lock[1]
(BookieWriteThreadPool)Add entry(BookieHighPriorityThread)Opening ledger with recoveryLedgerDescriptorfromHandleFactoryImplnon-fencedLedgerDescriptorfromHandleFactoryImpl10synchronized(LedgerDescriptor.this)Issue: there is a race condition breaks the lock above
https://github.com/apache/bookkeeper/blob/release-4.17.2/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/HandleFactoryImpl.java#L54-L68
(BookieWriteThreadPool)Add entry(BookieHighPriorityThread)Opening ledger with recoveryLedgerDescriptorfromHandleFactoryImplLedgerDescriptorfromHandleFactoryImplfalsefalseChanges
Fix the bug