feat: add clean_broken_retention CLI command#1371
Open
Slach wants to merge 11 commits into
Open
Conversation
Walks top-level of remote `path` and `object_disks_path` and batch-deletes (with retry, via the existing BatchDeleter pipeline) every entry that is not present in the live BackupList and not matched by any --keep=<glob>. Dry-run by default; --commit performs the deletes. Useful for cleaning up orphans left behind by failed retention runs (e.g. the GCS 503 scenario from #1356).
Plant orphans directly under /minio/data/clickhouse for both `path` and `object_disks_path`, then verify: - dry-run lists them without deleting, - --commit removes them from both locations, - --keep=<glob> preserves matching entries, - the live backup is never touched.
…ckends
Table-driven, with one sub-test per backend:
- S3 (minio), SFTP (sshd), FTP (proftpd), GCS_EMULATOR (fake-gcs-server)
use direct container-FS plant/assert,
- AZBLOB plants via az-cli docker run against azurite,
- real GCS and COS skip themselves unless GCS_TESTS / QA_TENCENT_SECRET_KEY
is set (their plant helpers fail loudly if reached without infra).
For each backend: create a real backup that must survive, plant 3 orphans
in path/object_disks_path, verify dry-run lists them without deleting,
verify --commit + --keep glob deletes only unmatched orphans, then a
second --commit clears the rest.
…roken_retention GCS: spins up google/cloud-sdk:slim with --volumes-from clickhouse-backup to access /etc/clickhouse-backup/credentials.json, authenticates the service account, and uses gsutil cp/ls. COS: spins up amazon/aws-cli:latest with --endpoint-url pointing at the regional COS endpoint and QA_TENCENT_SECRET_ID/KEY as AWS credentials, since COS supports the S3 API. Setup hook renders config.yml from the copied template via envsubst. Both run only when their credential env var is set.
- Collapse the three container-FS factories (S3/SFTP/FTP/GCS_EMULATOR) into a single containerFSCase helper. - Extract dockerRunSh and an azList closure to remove repetition in the AZBLOB case; reuse a single gsutil/awsRun wrapper for GCS and COS. - Drop the redundant storageType field; derive the table name from name. - Loop the plant+assertExists step over a small table instead of open-coding four calls. - Pull repeated keep-glob string into a const, drop fmt.Sprintf with no format args.
Signed-off-by: slach <bloodjazman@gmail.com>
…ash in Walk names
The command previously left orphans untouched because:
1. `BackupList(ctx, false, ...)` uses the on-disk metadata cache; on the
second invocation (e.g. dry-run → --commit) the cache returns orphan
directories with Broken="", so my Broken!="" filter let them into the
keep-set and zero orphans were detected. Now passes parseMetadata=true
so every top-level entry is stat'd for metadata.json on each call.
2. `bd.Walk("/", false, …)` emits names with a leading slash on S3 (from
TrimPrefix mismatch); the previous `strings.Contains(name, "/")` filter
rejected them as nested. Switched to strings.Trim("/") so leading and
trailing slashes are both stripped before the top-level check.
test/integration:
- Split TestCleanBrokenRetention into one top-level test per backend so
each can be run independently via RUN_TESTS=TestCleanBrokenRetention<Backend>.
- S3 case now plants orphans via 'mc cp' instead of raw file writes —
MinIO only sees objects placed through its S3 API; direct files on disk
are invisible to ListObjectsV2 and the cleanup batch finds nothing.
Signed-off-by: slach <bloodjazman@gmail.com>
…lean_broken_retention
…lean_broken_retention # Conflicts: # test/testflows/clickhouse_backup/tests/snapshots/cli.py.cli.snapshot
Signed-off-by: slach <bloodjazman@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New CLI command
clean_broken_retentionthat walks top-level of remotepathandobject_disks_pathand batch-deletes (with retry, via the existingBatchDeleterpipeline) every entry that is not present in the liveBackupListand not matched by any--keep=<glob>.--keepglobs (path.Matchsyntax, repeatable)--commitperforms the actual deletesBackupDestination.RemoveBackupRemote(forpath) andcleanBackupObjectDisks(forobject_disks_path) — both already use batched delete with retry/exponential-backoffUsage
Test plan
--commitrun — verify the orphan is removed from bothpathandobject_disks_path, batched, with retry on transient 5xx