feat(replication-restore): operator-side orchestration for cross-region replication restore (PR 1/2)#6861
Open
RidRisR wants to merge 28 commits into
Open
feat(replication-restore): operator-side orchestration for cross-region replication restore (PR 1/2)#6861RidRisR wants to merge 28 commits into
RidRisR wants to merge 28 commits into
Conversation
Continuation of the CompactBackup sharded PR. Defines operator-side
orchestration for replication restore: two-phase BR (snapshot + log)
with CompactBackup terminal state as the gate.
Differs from 2026-04-15 v2 design in five localized decisions:
- Phase/Condition names are intent-based: SnapshotRestore / LogRestore
(status.phase), SnapshotRestored / CompactSettled (markers)
- ReplicationStep is string ("snapshot-restore" / "log-restore") not int
- No intermediate ReadyForLogRestore phase; two markers True is the gate
- statusSubPrefix removed from CRD, hardcoded to "ccr" constant
- Status write ownership: Option B (controller owns all Phase writes;
backup-manager in phase-1 wrapped by ReplicationRestoreStatusUpdater),
mirroring the ShardedCompactStatusUpdater pattern from the CompactBackup
sharded PR. Eliminates the v2 transformation-layer hole where backup-
manager's Running write during phase-1 would overwrite controller's
Phase value.
Includes PR breakdown (types -> operator logic -> backup-manager CLI)
aligned with the "single spec / multi PR" strategy.
The original 3-PR breakdown split operator-side work into a types-only
PR and a logic PR. Types without consumers make for awkward review
("where does this field get used?"), and the logic PR is tightly
coupled to the types. Merge them: PR 1 is now the full operator-side
implementation (types + label + wrapper + handler + informer wire +
unit tests); PR 2 stays as backup-manager CLI + BR flags, which must
remain independent because it depends on kernel BR readiness and
integration testing.
…e path Two post-implementation-exploration corrections to the spec: 1. The Restore status updater interface has a single Update(restore, condition, newStatus) method, not the 6-method shape of the Compact status updater I originally mirrored. Rewrote the wrapper code block as a simple no-op updater and spelled out the tradeoffs of dropping TimeStarted/Progress/CommitTs writes during phase-1. 2. UpdateRestoreCondition at pkg/apis/pingcap/v1alpha1/restore.go:73 unconditionally assigns status.Phase = condition.Type. So the controller cannot use it to write SnapshotRestored / CompactSettled markers without overwriting Phase. Documented that markers go through a handler-internal appendRestoreMarker helper which touches only status.Conditions[], preserving the "UpdateRestoreCondition is untouched" guarantee.
10 bite-sized tasks covering the operator-side work: CRD types and constants, label additions, no-op status wrapper, handler skeleton with ReplicationStep-keyed dispatch, helpers (appendRestoreMarker that preserves Phase, CompactBackup lookup with WaitTimeout, cross-CR consistency check), the three state-machine cases (initial / snapshot-restore / log-restore), restoreManager interception, and Restore controller CompactBackup informer wire-up. Each task follows TDD: failing test -> implementation -> passing test -> commit. PR 2 (backup-manager CLI + BR --replication-storage-phase) is covered by a separate plan because it gates on internal BR readiness and runs against a different test harness.
Adds ReplicationConfig struct (CompactBackupName, WaitTimeout) under RestoreSpec.ReplicationConfig, RestoreStatus.ReplicationStep string, and four new RestoreConditionType constants: SnapshotRestore and LogRestore (phase values) plus SnapshotRestored and CompactSettled (markers). No behavior change in this commit; consumed by subsequent handler and wrapper commits.
Labels are used to identify phase-1 vs phase-2 BR Jobs by selector rather than name pattern, so renaming the Job doesn't break the handler's idempotence checks.
In replication restore phase-1, backup-manager must not write status.phase or conditions — the controller is the sole writer. This wrapper drops all Update calls. Activated from the backup-manager CLI in PR 2 via --replicationPhase=1; phase-2 uses the real updater, matching the wrap-pattern shipped for CompactBackup sharded mode.
Skeleton dispatches by Status.ReplicationStep (empty / snapshot-restore / log-restore). Terminal restores are no-ops. Per-step logic arrives in subsequent commits.
appendRestoreMarker writes markers to Conditions without touching Phase (preserves SnapshotRestore / LogRestore phase values during marker observation). lookupCompactBackup encapsulates late-binding with WaitTimeout. checkCrossCRConsistency validates br.cluster/namespace and storage location, ignoring secretName differences.
On empty ReplicationStep, the handler validates cross-CR consistency, writes CompactSettled if CompactBackup is already terminal, advances Phase to SnapshotRestore, sets ReplicationStep=snapshot-restore, and creates the phase-1 BR Job with replication-step label. Late-binding and waitTimeout are handled via lookupCompactBackup.
Handler observes phase-1 Job (writes SnapshotRestored marker or fails the Restore on JobFailed) and CompactBackup (writes CompactSettled on terminal state). When both markers are True, transitions Phase to LogRestore, sets ReplicationStep=log-restore, and creates the phase-2 BR Job.
In log-restore, backup-manager drives status.phase via the real
updater (no wrap in phase-2). Handler only guards against Job-level
failures that terminate before backup-manager writes a terminal
condition. makeReplicationBRJob builds phase-specific BR Jobs with
the correct labels and passes --replicationPhase={1,2} to
backup-manager (consumed in PR 2).
buildPiTRBaseArgs is intentionally a minimal stub: full PiTR arg
parity (env vars, TLS volumes, secrets) lives in
restoreManager.makeRestoreJobWithMode and integration with that
helper is deferred to a follow-up after Task 9 wires the handler
to the restoreManager.
When Spec.Mode==pitr and Spec.ReplicationConfig!=nil, syncRestoreJob delegates to replicationHandler.Sync. All legacy single-Job machinery below the interception point is skipped; standard PiTR, snapshot, and volume-snapshot paths are unchanged.
Registers a CompactBackup EventHandler on the Restore controller that enqueues Restores whose Spec.ReplicationConfig.CompactBackupName matches the changed CompactBackup. Closes the observation loop so that a late-binding Restore unblocks as soon as its referenced CompactBackup is created, and so that a Restore in SnapshotRestore gates promptly when CompactBackup reaches terminal state.
…nsistency check - makeReplicationBRJob now uses label.NewRestore().Instance().RestoreJob().Restore() + merges restore.Labels, matching the standard restore Job builder pattern. External tooling that filters Jobs by app.kubernetes.io/managed-by=restore-operator or instance label will now find replication Jobs. - Added a doc comment in syncSnapshotRestore explaining that cross-CR consistency is validated only once (at syncInitial), per spec section 5.
Contributor
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #6861 +/- ##
==========================================
+ Coverage 57.19% 57.77% +0.57%
==========================================
Files 259 263 +4
Lines 33233 38602 +5369
==========================================
+ Hits 19008 22302 +3294
- Misses 12291 14199 +1908
- Partials 1934 2101 +167
🚀 New features to boost your workflow:
|
f6be1ed to
e00a8ce
Compare
…ally Phase and ReplicationStep were written in two separate reconciles, leaving a transient (Phase=LogRestore, Step=snapshot-restore) window observable to external watchers and requiring two catch-up branches plus a bespoke setReplicationStep with its own lister-staleness workaround. Extend RestoreUpdateStatus with ReplicationStep so enterSnapshotRestore and enterLogRestore can write Phase + condition + Step in a single Update. Delete setReplicationStep and the two protocol-only catch-up branches; the two step handlers now dispatch on Phase with no transient skew to absorb. Align FakeRestoreConditionUpdater with the production path so future tests that rely on RestoreUpdateStatus patches see the same semantics.
Job lookup in the replication handler used a label selector keyed on restore.Name, which could mistake a leftover Job from a deleted same-name predecessor (still GC'ing) for the current Restore's Job, and the lister- List + Create pattern surfaced spurious AlreadyExists errors during fast back-to-back reconciles. makeReplicationBRJob now stamps the owning Restore's UID on each Job; both ensureJobForStep and findJobForStep switch to Get-by-name and verify the UID label. UID mismatch is treated as not-found by findJobForStep and as an explicit "awaiting GC" error by ensureJobForStep, so Phase is never advanced based on a stale predecessor's Job conditions.
The cross-CR consistency check between Restore and CompactBackup compared only bucket/prefix/region for S3 (and bucket/prefix for GCS, container/ prefix for Azblob). Two CRs could share those fields while pointing at different physical backends — e.g. same bucket name on AWS vs an internal Ceph/MinIO cluster, or the same path under different GCS projects — which would silently let phase-2 BR consume compacted metadata produced against a different storage system. Add the missing location-defining fields (S3 Provider/Endpoint/Path, GCS ProjectId/Location/Path, Azblob StorageAccount/Path) to the compare, keep credentials and upload-only attributes (SecretName, SasToken, StorageClass, Acl, SSE, AccessTier) ignored.
- compactIsTerminal: warn on unrecognized CompactBackup state and explicitly enumerate known non-terminal states, so a future kernel state surfaces in logs instead of silently stalling the gate. - lookupCompactBackup: document that WaitTimeout is measured from Restore.CreationTimestamp, not from the first NotFound observation. - failRestore: document that ReplicationStep is intentionally preserved on terminal failure so triage can tell phase-1 vs phase-2 failures apart at a glance. - Drop the B<n>_ cascade-branch prefix from test names and the matching section headers; cross-references to spec branches inside test bodies remain as documentation anchors.
Sentinel 0 means "not a replication restore" (CLI flag unset); non-zero values 1/2 are the only other legal values. Range validation lives in the cmd package (Task 2). Also adds the PR 2 implementation plan.
…dation
Sentinel 0 = "not a replication restore" (cobra default when flag is
absent). Reject {-N, >=3} early in runRestore so the operator gets a
clean error instead of silently mishandling the value downstream.
Symmetric to compact.go's ShardedCompactStatusUpdater wrap, but driven by the CLI flag (not CR.Spec) because the same Restore CR runs through backup-manager twice with different phases. selectRestoreStatusUpdater returns the no-op wrapper for phase 1; phase 0 / 2 pass the real updater through unchanged.
Three BR flags are appended when --replicationPhase > 0: --replication-storage-phase, --replication-status-sub-prefix=ccr, --pitr-concurrency=1024. Constants live in restore.go (BR call details; not exposed via API per spec §6). restoreData itself takes only a one-line edit to call the new replicationBRFlags helper; phase 0 returns nil so append is a no-op for standard PiTR. The 200-line restoreData body is otherwise unchanged.
Reorder so pm.Enable runs before the replication interception. Both PiTR variants (standard PiTR and replication restore) now wait for TiKV's gc.ratio-threshold = -1 override to propagate via ConfigMap before any BR Job is launched. The override itself is written by tikvMemberManager.applyPiTRConfigOverride in the TiKV reconcile loop, which already covers replication restores because they satisfy Spec.Mode == PiTR. No TiKV-side change is needed. The replication routing test is updated to seed a -1 ConfigMap up front; a new test exercises the gating behavior end-to-end (closed gate => handler not called; open gate => handler called once).
Replace the stub buildPiTRBaseArgs with an injected job builder (production = rm.makeRestoreJobWithMode) plus a small applyReplicationPhase post-processor. Phase-1 and phase-2 Jobs now inherit the full PiTR Job setup: TiDBBackupManagerImage main container, BR-binary init container with br-bin shared volume, TLS volume mounts, password / storage env, owner reference. Replication adds three mutations on top: Job name override, replication-step + restore-UID labels, --replicationPhase arg. NewRestoreManager is two-stepped so it can pass a function reference to rm.makeRestoreJobWithMode into newReplicationHandler. The handler takes a jobBuilderFunc as a constructor param; tests inject a small fake builder via newHandlerForTest.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Adds operator-side orchestration for replication restore — a cross-region disaster-recovery flow that defers log compaction from upstream to the downstream PITR phase, avoiding double cross-region transfer of log data. The CompactBackup
mode: shardedfeature shipped earlier provides parallel sharded compaction; this PR is the Restore-side companion that gates BR phase-2 (log restore) on CompactBackup terminal state.This is PR 1 of 2. PR 2 (backup-manager CLI + BR
--replication-storage-phasewiring) will follow once internal BR is ready.Spec:
docs/superpowers/specs/2026-04-23-replication-restore-design.mdPlan:
docs/superpowers/plans/2026-04-23-replication-restore-pr1.mdWhat is changed and how does it work?
User triggers replication restore by setting
+ "spec.mode: pitr" +and a new+ "spec.replicationConfig" +field on a Restore CR (referencing a CompactBackup CR by name). The controller dispatches a dedicated handler (+ "pkg/backup/restore/replication_handler.go" +) that drives a two-phase state machine keyed on a new+ "status.replicationStep" +string field:+ "status.phase = SnapshotRestore" +,+ "replicationStep = snapshot-restore" ++ "SnapshotRestored" +and+ "CompactSettled" +markers (without touching Phase) → when both markers True, transition to+ "LogRestore" +and create phase-2 Job+ "Running/Complete/Failed" +via the standard updater (PR 2 wires the actual+ "--replicationPhase=2" +arg)Status write ownership uses Option B: the controller is the sole writer of
+ "status.phase" +during phase-1; backup-manager status writes are no-op'd via a wrapper (+ "NewReplicationRestoreStatusUpdater" +, activated by PR 2). This eliminates a v2-design hole where backup-manager's+ "Running" +write would clobber controller-set Phase. Pattern mirrors+ "ShardedCompactStatusUpdater" +from the sharded CompactBackup PR.Markers (
+ "SnapshotRestored" +/+ "CompactSettled" +) bypass+ "UpdateRestoreCondition" +(which unconditionally writes+ "status.Phase = condition.Type" +) via a private+ "appendRestoreMarker" +helper.+ "UpdateRestoreCondition" +itself is untouched.A CompactBackup informer is wired into the Restore controller so that CompactBackup state changes wake referencing Restores (reverse lookup by
+ "spec.replicationConfig.compactBackupName" +).Backward compatibility
Existing PiTR / snapshot / volume-snapshot users are completely unaffected. The interception in
+ "syncRestoreJob" +triggers only when+ "spec.mode == pitr && spec.replicationConfig != nil" +; nil falls through to the original code path. New CRD fields are all optional with zero-value-equivalent-old-behavior semantics.Known follow-ups (out of scope for this PR)
+ "buildPiTRBaseArgs" +is a minimal stub — the full PiTR arg set (env vars, TLS volumes, secrets, resources) lives in+ "restoreManager.makeRestoreJobWithMode" +and is not replicated yet. PR 2 (or a follow-up) must integrate, either by calling+ "rm.makeRestoreJobWithMode" +and post-processing or by extracting a shared helper.+ "--replicationPhase" +, BR command+ "--replication-storage-phase=N" +,+ "--replication-status-sub-prefix=ccr" +, and+ "--pitr-concurrency=1024" +— PR 2.Code changes
Tests
Side effects