[server]: remove orphan files and directories when tablet starts up#3388
Open
gyang94 wants to merge 1 commit into
Open
[server]: remove orphan files and directories when tablet starts up#3388gyang94 wants to merge 1 commit into
gyang94 wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes orphan replica directory leaks on TabletServer restart during table/partition deletion (issue #3387). Adds two cleanup paths corresponding to the two scenarios in the issue: a startup-time empty parent/table-dir sweep in LogManager's SchemaNotExistException handler (table-deletion case) and a new sweepOrphanTabletDirs invoked from ReplicaManager.stopReplicas when a NoneReplica receives deleteLocal=true (partition-deletion case).
Changes:
ReplicaManager: newsweepOrphanTabletDirsdrops the orphan log viaLogManager, removes the sibling KV tablet (viaKvManager.dropKvor directFileUtils.deleteDirectory), updatesLocalDiskManageraccounting, and registers the parent dir for empty-dir cleanup.LogManager: after deletinglog-N/andkv-N/onSchemaNotExistException, also remove the now-empty parent dir, and for partitioned tables additionally remove the empty grandparent (table) dir.StopReplicaITCase: two new IT tests covering table-drop (startup cleanup) and partition-drop (orphan-sweep on stopReplica) while a TabletServer is offline.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
fluss-server/src/main/java/org/apache/fluss/server/replica/ReplicaManager.java |
Handles NoneReplica + deleteLocal=true by sweeping orphan log/KV dirs loaded at startup. |
fluss-server/src/main/java/org/apache/fluss/server/log/LogManager.java |
Cleans up empty partition/table parent directories after residual tablet dirs are deleted. |
fluss-server/src/test/java/org/apache/fluss/server/coordinator/StopReplicaITCase.java |
New IT tests for the two orphan-cleanup paths during offline drop. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+546
to
548
| Tuple2<PhysicalTablePath, TableBucket> pathAndBucket = | ||
| FlussPaths.parseTabletDir(tabletDir); | ||
| try { |
Comment on lines
+1962
to
+1990
| logManager.dropLog(tb); | ||
|
|
||
| boolean isKvTable = false; | ||
| if (kvManager.getKv(tb).isPresent()) { | ||
| kvManager.dropKv(tb); | ||
| isKvTable = true; | ||
| } else { | ||
| File kvTabletDir = FlussPaths.kvTabletDir(dataDir, physicalTablePath, tb); | ||
| if (kvTabletDir.exists()) { | ||
| isKvTable = true; | ||
| try { | ||
| FileUtils.deleteDirectory(kvTabletDir); | ||
| } catch (IOException e) { | ||
| throw new KvStorageException( | ||
| String.format( | ||
| "Failed to delete orphan KV tablet directory %s", kvTabletDir), | ||
| e); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| localDiskManager.recordReplicaDelete(dataDir, isKvTable); | ||
|
|
||
| if (tb.getPartitionId() != null) { | ||
| deletedPartitionIds.put(tb.getPartitionId(), tabletParentDir); | ||
| deletedTableIds.put(tb.getTableId(), tabletParentDir.getParent()); | ||
| } else { | ||
| deletedTableIds.put(tb.getTableId(), tabletParentDir); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #3387
Brief change log
Two changes, one in each cleanup path:
1.
ReplicaManager: handleNoneReplicawithdeleteLocal=trueWhen the
NoneReplicabranch receivesdeleteLocal=true, look up the bucket inLogManager.currentLogs. If present, the log tablet was loaded at startup but never registered inallReplicas— it is an orphan. Drop the log vialogManager.dropLog(), delete the sibling KV tablet directory (viakvManager.dropKv()if loaded, otherwise directFileUtils.deleteDirectory), update disk usage accounting, and record the parent directory for empty-dir cleanup.This handles the partition deletion scenario.
2.
LogManager: clean up empty parent directories inSchemaNotExistExceptionhandlerAfter the existing handler deletes
log-N/andkv-N/, check whether the parent directory is empty and delete it. For partitioned tables, also check the grandparent (table directory). For non-partitioned tables, the grandparent is the database directory and must NOT be deleted. This is safe under parallelloadAllLogsexecution —deleteDirectoryQuietlytolerates races.This handles the table deletion scenario.
Tests
API and Format
Documentation