Skip to content

[fix](fe) Fix stale timestamp in CatalogRecycleBin erase daemon#63310

Open
heguanhui wants to merge 1 commit into
apache:masterfrom
heguanhui:fix/recycle-bin-stale-starttime
Open

[fix](fe) Fix stale timestamp in CatalogRecycleBin erase daemon#63310
heguanhui wants to merge 1 commit into
apache:masterfrom
heguanhui:fix/recycle-bin-stale-starttime

Conversation

@heguanhui
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

CatalogRecycleBin.runAfterCatalogReady() captures System.currentTimeMillis() once at the beginning of the method and shares this timestamp across erasePartition(), eraseTable(), and eraseDatabase(). Since each erase operation can take significant time (I/O, log writes, lock acquisition), the later methods use a stale timestamp for expiration checks, causing delayed cleanup of tables and databases.

For example, if erasePartition takes 5 minutes, eraseTable and eraseDatabase will use a timestamp that is 5 minutes old, potentially skipping items that became eligible for cleanup during that period.

Fix

Each erase method now gets its own fresh System.currentTimeMillis() call, ensuring accurate expiration checks.

Before:

long currentTimeMs = System.currentTimeMillis();
erasePartition(currentTimeMs);
eraseTable(currentTimeMs);
eraseDatabase(currentTimeMs);

After:

erasePartition(System.currentTimeMillis());
eraseTable(System.currentTimeMillis());
eraseDatabase(System.currentTimeMillis());

Release note

Fix delayed cleanup of tables and databases in CatalogRecycleBin when partition erasure takes significant time

Check List (For Author)

  • Test: Unit Test
    • Added testEraseUsesFreshCurrentTime in CatalogRecycleBinTest that creates a table with 1000 partitions, force-drops them, and verifies that tables and databases with expired recycle times are properly cleaned up even when partition erasure takes time
  • Behavior changed: No (only fixes timing accuracy, no API or config changes)
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: In CatalogRecycleBin.runAfterCatalogReady(), a single
currentTimeMs timestamp is captured at the start and shared across
erasePartition, eraseTable, and eraseDatabase. Since each erase
operation may take significant time (I/O, log writes, lock acquisition),
the subsequent erase methods use a stale timestamp for expiry checks,
causing cleanup delay for tables and databases.

### Release note

Fix stale timestamp issue in CatalogRecycleBin that causes delayed
cleanup of tables and databases in the recycle bin.

### Check List (For Author)

- Test: Unit Test
- Behavior changed: No
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants