[improvement](cloud) Add enable_recycler config to skip recycler dynamically#63286
[improvement](cloud) Add enable_recycler config to skip recycler dynamically#63286mymeiyi wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Pull request overview
This PR adds a mutable cloud config to dynamically disable recycler scheduling and instance recycling work.
Changes:
- Adds
enable_recyclermutable config with defaulttrue. - Gates recycler instance scanning and recycle callback processing on the new config.
- Adds unit tests for default behavior and skip behavior when disabled.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
cloud/src/common/config.h |
Defines the new mutable enable_recycler config. |
cloud/src/recycler/recycler.cpp |
Skips instance scanning and pending recycle processing when the config is disabled. |
cloud/test/recycler_test.cpp |
Adds tests covering default config and disabled recycler behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Review result: request changes. The new config is small and focused, but the implementation does not fully satisfy a dynamic recycler disable switch because in-flight recycle rounds keep running and leasing after the config is turned off. Test coverage also does not currently prove that scanning or recycling was skipped.
Critical checkpoint conclusions:
- Goal/test proof: Partially met. The code skips new scans and pending instances when disabled, but does not stop active recycling; the added tests do not prove the guards are effective.
- Scope: The code change is small and focused.
- Concurrency/lifecycle: The recycler has scanner, worker, lease, and checker threads. The new checks are only at scanner enqueue and worker dequeue boundaries, so an already-running InstanceRecycler remains live and lease_recycle_jobs continues renewing it.
- Config behavior: A mutable config was added and is observed dynamically only at loop boundaries; disabling is not applied promptly to active recycle tasks.
- Compatibility/storage format: No incompatible format or protocol changes found.
- Parallel paths: Snapshot migrator/chain compactor are separate switches and not directly affected.
- Tests: Unit tests were added, but two do not distinguish the new disabled path from existing no-op/failure behavior. No test result was run by me in this review.
- Observability: Logs are present for skipped scans/workers, but active tasks continue without a disable-triggered stop indication.
- Transaction/persistence/data correctness: No direct new persisted metadata path; the concern is that active recycler work can continue deleting objects/KV after operators disable the switch.
- Performance: The disabled scanner path can log every interval; acceptable at default interval, but tests use a zero interval.
- User focus: No additional user-provided review focus was supplied.
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Summary: I found one additional issue beyond the existing review threads. The PR adds a mutable recycler switch, but the scanner can sleep for the full recycle interval after observing enable_recycler=false, so turning the switch back on may not resume recycling for up to the default 3600 seconds. Existing inline threads also already cover the in-flight recycle not stopping promptly and the new tests not proving the disabled paths.
Critical checkpoints:
- Goal/test proof: The goal is to dynamically disable/enable recycler work. The current implementation only gates scanning/new starts and does not fully prove or deliver dynamic behavior; tests for disabled paths are already called out as insufficient, and re-enable latency is not covered.
- Scope/focus: The change is small, but the minimal checks are placed only in scanner/new-worker paths, leaving dynamic lifecycle gaps.
- Concurrency/lifecycle: Recycler scanner, worker, lease, and task-check threads run concurrently. The new flag is read from worker threads while updated externally, and the change lacks a wakeup/abort mechanism for prompt state transitions.
- Configuration: A new mutable config is added, but affected recycler threads do not reliably observe changes promptly.
- Parallel paths: Scanner and new recycle start paths were touched; active recycle/lease paths are still covered by existing review comments as missing dynamic-disable handling.
- Tests: Added tests do not distinguish skipped work from attempted work, per existing threads, and do not cover re-enabling behavior.
- Recycler-specific safety: No direct mark-before-delete or packed-file ordering change was introduced, but incomplete dynamic disable can still allow recycle/deletion to continue in active jobs as already noted.
Focus response: No additional user-provided review focus was specified.
No description provided.