Skip to content

[server]: remove orphan files and directories when tablet starts up#3388

Open
gyang94 wants to merge 1 commit into
apache:mainfrom
gyang94:fix-ts-orphan-files
Open

[server]: remove orphan files and directories when tablet starts up#3388
gyang94 wants to merge 1 commit into
apache:mainfrom
gyang94:fix-ts-orphan-files

Conversation

@gyang94
Copy link
Copy Markdown
Contributor

@gyang94 gyang94 commented May 27, 2026

Purpose

Linked issue: close #3387

Brief change log

Two changes, one in each cleanup path:

1. ReplicaManager: handle NoneReplica with deleteLocal=true

When the NoneReplica branch receives deleteLocal=true, look up the bucket in LogManager.currentLogs. If present, the log tablet was loaded at startup but never registered in allReplicas — it is an orphan. Drop the log via logManager.dropLog(), delete the sibling KV tablet directory (via kvManager.dropKv() if loaded, otherwise direct FileUtils.deleteDirectory), update disk usage accounting, and record the parent directory for empty-dir cleanup.

This handles the partition deletion scenario.

2. LogManager: clean up empty parent directories in SchemaNotExistException handler

After the existing handler deletes log-N/ and kv-N/, check whether the parent directory is empty and delete it. For partitioned tables, also check the grandparent (table directory). For non-partitioned tables, the grandparent is the database directory and must NOT be deleted. This is safe under parallel loadAllLogs execution — deleteDirectoryQuietly tolerates races.

This handles the table deletion scenario.

Tests

API and Format

Documentation

@gyang94 gyang94 changed the title fix: remote orphan files and directories when tablet starts up [server]: remove orphan files and directories when tablet starts up May 27, 2026
@wuchong wuchong requested a review from Copilot May 31, 2026 09:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes orphan replica directory leaks on TabletServer restart during table/partition deletion (issue #3387). Adds two cleanup paths corresponding to the two scenarios in the issue: a startup-time empty parent/table-dir sweep in LogManager's SchemaNotExistException handler (table-deletion case) and a new sweepOrphanTabletDirs invoked from ReplicaManager.stopReplicas when a NoneReplica receives deleteLocal=true (partition-deletion case).

Changes:

  • ReplicaManager: new sweepOrphanTabletDirs drops the orphan log via LogManager, removes the sibling KV tablet (via KvManager.dropKv or direct FileUtils.deleteDirectory), updates LocalDiskManager accounting, and registers the parent dir for empty-dir cleanup.
  • LogManager: after deleting log-N/ and kv-N/ on SchemaNotExistException, also remove the now-empty parent dir, and for partitioned tables additionally remove the empty grandparent (table) dir.
  • StopReplicaITCase: two new IT tests covering table-drop (startup cleanup) and partition-drop (orphan-sweep on stopReplica) while a TabletServer is offline.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
fluss-server/src/main/java/org/apache/fluss/server/replica/ReplicaManager.java Handles NoneReplica + deleteLocal=true by sweeping orphan log/KV dirs loaded at startup.
fluss-server/src/main/java/org/apache/fluss/server/log/LogManager.java Cleans up empty partition/table parent directories after residual tablet dirs are deleted.
fluss-server/src/test/java/org/apache/fluss/server/coordinator/StopReplicaITCase.java New IT tests for the two orphan-cleanup paths during offline drop.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +546 to 548
Tuple2<PhysicalTablePath, TableBucket> pathAndBucket =
FlussPaths.parseTabletDir(tabletDir);
try {
Comment on lines +1962 to +1990
logManager.dropLog(tb);

boolean isKvTable = false;
if (kvManager.getKv(tb).isPresent()) {
kvManager.dropKv(tb);
isKvTable = true;
} else {
File kvTabletDir = FlussPaths.kvTabletDir(dataDir, physicalTablePath, tb);
if (kvTabletDir.exists()) {
isKvTable = true;
try {
FileUtils.deleteDirectory(kvTabletDir);
} catch (IOException e) {
throw new KvStorageException(
String.format(
"Failed to delete orphan KV tablet directory %s", kvTabletDir),
e);
}
}
}

localDiskManager.recordReplicaDelete(dataDir, isKvTable);

if (tb.getPartitionId() != null) {
deletedPartitionIds.put(tb.getPartitionId(), tabletParentDir);
deletedTableIds.put(tb.getTableId(), tabletParentDir.getParent());
} else {
deletedTableIds.put(tb.getTableId(), tabletParentDir);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[server] TabletServer leaves orphan replica directories after restart during table/partition deletion

2 participants