Skip to content

hotfix v2.6.44: retry batch delete in cleanBackupObjectDisks (#1356)#1370

Open
Slach wants to merge 7 commits into
masterfrom
hotfix_2_6_44
Open

hotfix v2.6.44: retry batch delete in cleanBackupObjectDisks (#1356)#1370
Slach wants to merge 7 commits into
masterfrom
hotfix_2_6_44

Conversation

@Slach
Copy link
Copy Markdown
Collaborator

@Slach Slach commented May 13, 2026

Summary

Hotfix branch cut from tag v2.6.43 (cherry-pick of master commit 50e2b0f8).

  • Adds exponential-backoff retry around batchDeleter.DeleteKeysFromObjectDiskBackupBatch in cleanBackupObjectDisks, so transient errors (e.g. GCS 503 backendError) during retention no longer leave orphaned objects in object_disks_path. Path-side deletes in BackupDestination.RemoveBackupRemote were already retried — this brings parity for the object-disk side.
  • Fixes #1356.

Test plan

  • CI green (unit + integration)
  • Manual: trigger transient 5xx on object storage during retention, confirm retry recovers and object_disks_path/<backupName> is fully cleaned

Slach and others added 3 commits May 14, 2026 08:59
Signed-off-by: slach <bloodjazman@gmail.com>
Replace time.Sleep(1s) after starting clickhouse-backup-fips server with
active polling of both the /backup/actions HTTP endpoint and the
system.backup_actions integration table. Under CI load 1s was not enough
for the server to create the URL-engine table and start listening on
:7172, producing "Table system.backup_actions does not exist" (Code 60).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t HTTPS server

The HTTP probe curled http://localhost:7172 against an HTTPS server
(secure: true in config-s3-fips.yml with required client cert auth), so
Go's TLS server replied 400 "Client sent an HTTP request to an HTTPS
server" and the 2xx prefix check could never succeed. Also,
CreateIntegrationTables runs before ListenAndServeTLS in api.Restart(),
so table existence alone does not prove the listener is bound.

Swap curl for a pure TCP probe (bash </dev/tcp/localhost/7172). A
successful TCP connect proves the TLS listener is up; combined with
EXISTS TABLE system.backup_actions, both prerequisites for the
subsequent INSERT INTO system.backup_actions are guaranteed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Slach Slach closed this May 14, 2026
@Slach Slach reopened this May 14, 2026
Slach added 2 commits May 14, 2026 09:19
getObjectAllVersions now returns both Versions and DeleteMarkers as
ObjectIdentifier list, so each delete-marker is removed by VersionId.

deleteKeys / deleteKey: when identifiers list is empty (key already
gone, e.g. on retry) skip instead of issuing a key-only DeleteObject —
which on a versioned bucket would create a fresh empty delete-marker.

Refs: avoid delete-marker accumulation on versioned S3 buckets when
batch delete is retried after a partial failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

batching deletion failures need retry

1 participant