refactor/total → main#127
Merged
Merged
Conversation
Strict equality with annotation value 'true' — every other value defaults to direct-create-equivalent (no auto-GC). Pure, side-effect-free. (Design §3.2.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two mutually-exclusive predicates serving different reconcile-flow positions: - needsOwnerTransfer (early): sources exist, owner gone -> promote one. - isOrphaned (late): no sources, no owner, auto-created -> candidate for self-delete subject to two-tick grace. (Design §3.2.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 3 original cases never stressed the invariant: under the regression of dropping isOrphaned's AttachedSources==0 clause, none flipped nt && io to true. Add the discriminating state (auto-created + sources + no owners) where correct code gives nt=true/io=false but the regression gives both true. Verified the new case fails under the simulated regression and passes with the real implementation. (Per-task review finding.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lex-smallest live remaining attacher promoted to controller-owner via an optimistic-lock client.MergeFrom Patch. Filters DeletionTimestamp'd candidates; direct typed Get catches NotFound between snapshot and Patch; bounded at 5 attempts; Conflict -> (false,nil) so the next reconcile retries; emits ReasonOwnerTransferred on success. Adaptation vs plan-literal: reuses reconcile.SetControllerOwner (controllerutil.SetControllerReference) for the owner-ref instead of a hand-built apiutil.GVKForObject OwnerReference -- project code-reuse mandate, identical Controller/BlockOwnerDeletion/GVK shape, drops the apiutil dependency. needsOwnerTransfer guarantees zero pre-existing owner refs so the single-owner replace cannot conflict. (Design §4.2.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cked
The plan's reference code used plain client.MergeFrom(tn), which adds NO
resourceVersion precondition — the apiserver never returns 409 Conflict,
so the IsConflict->(false,nil) retry branch was dead in production and
two concurrent owner-transfers could silently last-write-wins clobber
each other. Design §4.2 (and the plan's own prose) require optimistic
locking. Switch to MergeFromWithOptions(tn, MergeFromWithOptimisticLock{})
so the patch carries tn.ResourceVersion and a stale Patch is rejected
with Conflict, which the existing branch turns into a clean next-reconcile
retry. Plan-internal-contradiction drift resolved in favor of the design
(per-task review finding; same drift-handling pattern as P3).
Also documents that cross-namespace attachers surface as (false,err) via
SetControllerReference and are out of scope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
needsOwnerTransfer -> TransferOwnershipIfNeeded runs between the credentials-halt branch and the status snapshot (design §4.1 step 5) so all subsequent reconcile work sees a valid OwnerReference. Successful transfer requeues immediately for a fresh view; no-live-candidates falls through to orphan-state management (Task 9). Finalizer is already ensured before this point. (Design §4.1.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Auto-created tunnel CRs with no owner and no attached sources stamp Status.LastOrphanedAt on first observation (persisted via the existing change-detection gate; LastOrphanedAt is unmasked) and self-delete on a later reconcile once the 60s two-tick grace window elapses (Warning TerminalNoSources event + terminal Ready=False, then Delete). Source reattach / owner promotion within the window clears the stamp. Direct-create CRs are skipped entirely (isAutoCreated gate). A pendingRequeueAfter local overrides the default interval during the grace window (explicit local, no defer). (Design §4.1 step 10, §3.1.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hook Adds CloudflareTunnelReconciler.PendingDeletionGrace (optional; zero => 60s pendingDeletionGrace const) so envtests run with a short two-tick window instead of real 60s sleeps — brief-mandated adaptation of the plan's pendingDeletionGraceForTest mirror; also a legitimate operator knob. Production behaviour unchanged when unset (Task-9 unit tests pass as-is). setupServiceEnv sets a 3s grace (verified harmless to existing callers: none deletes the last source of an auto-created tunnel then asserts persistence). TestEnvtest_CascadeGC_OwnerTransfer: two annotated Services share a tunnel-name; deleting the owner Service transfers ownership to the remaining Service; the auto-created tunnel persists. (Design §8.2.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single annotated Service -> auto-created tunnel; Service deleted; after the (short, harness-overridden) two-tick grace the tunnel CR stamps LastOrphanedAt then self-deletes (Warning TerminalNoSources, drain via the finalizer path, CR gone). Reuses Task-10's GC-emulation (envtest has no garbage collector, so the dead owner's dangling ownerReference is stripped to let the production isOrphaned path fire — the stamp/grace/ self-delete are all production code). (Design §8.2.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The GC-emulation strip returned as soon as ownerRefs hit zero. If the tunnel reconciler then fired before the ServiceSourceReconciler cleared the deleted Service from the shared cache, observeAttachedSources still saw the source, isOrphaned was false, and the reconciler requeued at defaultTunnelInterval (30m) — so the test timed out in strict isolation (it only passed in the full suite because concurrent managers warmed the scheduler so the service reconciler won the race). Gate the strip's completion on BOTH ownerRefs==0 AND observed AttachedSources==0. Bump a test-only label (cloudflare.io/test-strip-tick) on every iteration to force a real k8s write (real rv change → real watch event) on each tick: k8s 1.30 does not bump resourceVersion for no-op writes, so a plain Update with already-empty ownerRefs would produce no watch event, leaving the tunnel reconciler stuck on its 30m requeue. The label value changes every tick; the tunnel reconciler ignores labels entirely. The repeated real writes keep re-triggering the tunnel reconciler until it observes the drained cache and the isOrphaned path can fire. Test-only; production unchanged; still mutation-proven non-vacuous (neutering self-delete fails the test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two scenarios + a shared GC-emulation helper: - TwoTickRaceProtection: source deleted, LastOrphanedAt stamped, a new annotated Service reattaches within the grace window -> tunnel is NOT self-deleted, LastOrphanedAt cleared, ownership transfers to the new source. - DirectCreateNeverGCd: a user-authored CloudflareTunnel (no cloudflare.io/auto-created) adopts then loses its only attaching Service -> never stamps LastOrphanedAt, never self-deletes (isAutoCreated gate skips the orphan path). Adopt path does not backfill the marker. - Extracted gcEmulateStripDeadOwner shared by the last-source + two-tick-race tests (code-reuse; envtest has no GC). Renamed the strip-tick label out of the reserved cloudflare.io/* prefix to envtest.local/strip-tick (test-only mechanism, prior-review nit). (Design §3.1, §8.2.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The require.Never passed because the tunnel reconciler was quiescent (no Service watch; source-deletion only clears the cache, no tunnel-CR write -> no watch event -> next requeue is the 30m defaultTunnelInterval), NOT because the isAutoCreated gate blocked self-delete: mutating isAutoCreated->true left the test green. Two interacting issues kept the test vacuous: 1. No retriggering: without forcing reconcile events during the Never window, the orphan block was never re-evaluated. 2. Dangling ownerRef: needsOwnerTransfer promotes the first attaching source (the Service) to controller-owner on a subsequent reconcile; envtest has no garbage collector, so after the Service is deleted its ownerRef dangles — len(OwnerReferences)>0 permanently blocked isOrphaned from evaluating, regardless of retriggering. Fix: each require.Never tick now strips any dangling ownerReference whose owner is the deleted Service (envtest GC emulation, same pattern as T10 and T11) AND bumps a benign envtest.local/retrigger-tick label so the production orphan block is actively re-evaluated while the source is gone. With the isAutoCreated gate intact the unannotated direct-create tunnel is skipped every tick (no stamp, no self-delete). With the gate mutated to return true the tunnel self-deletes within the 12s window and require.Never trips — the isAutoCreated->true mutation now FAILS the test (verified: require.Never tripped at 5.52s). Duration extended 8s->12s to comfortably cover stamp + 3s grace + drain. No-backfill assertions preserved; docstring corrected to describe the retrigger+strip mechanism and the harness-fidelity note about needsOwnerTransfer. Test-only; production unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
needsOwnerTransfer was ungated, so a Service annotation-attaching to a
user-authored direct-create CloudflareTunnel triggered owner-transfer,
stamping the Service as Controller+BlockOwnerDeletion owner of the user's
CR — and deleting that Service let Kubernetes GC delete the user's tunnel,
violating design §7 ("direct-create tunnel ... CR persists") and the
"user-authored CRs are never auto-GC'd" invariant (our isAutoCreated
self-delete gate does not stop k8s GC of an operator-attached
controller-ref). Gate needsOwnerTransfer on isAutoCreated, symmetric with
isOrphaned: the entire cascade-GC machinery (owner-transfer rebalancing +
self-delete) is now inert for direct-create CRs; the operator never takes
controller-ownership of a user's tunnel. TransferOwnershipIfNeeded (the
mechanism) stays ungated — needsOwnerTransfer is the single policy gate.
Surfaced by the Task-12 envtest deep review (mutation-proven). New unit +
envtest pin the invariant; rippled Task-6 true-case fixture annotated.
(Design §7, §3.2.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Batched Minor findings from the final comprehensive review, zero behavior change: - drop bare plan-task references from a tunnel_controller.go comment and from cascade-GC envtest godoc (Pattern-13: no plan artifacts in shipped code/comments) - correct stale post-§7-gate harness comments (direct-create CRs never acquire a controller-ownerRef in the unmutated run; the per-tick strip is mutation-only scaffolding) - gcEmulateStripDeadOwner: ctx-first parameter order (Go convention) - DirectCreateNeverGCd: reuse bumpRetriggerTick instead of an inline copy Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These six files accumulated gofmt drift across the prior squash-merges onto refactor/total (trailing blank lines, struct/comment column re-alignment, one import-group reorder). They predate and are unrelated to any in-flight feature branch, but `make lint` (gofmt -l . | (! read)) fails repo-wide on their account — blocking CI for every branch cut from refactor/total. Pure `gofmt -w`; zero behavior change (build, vet, full unit suite green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chore: gofmt the 6 pre-existing unformatted files on refactor/total
P4: tunnel cascade GC + owner transfer (design §12.6)
RegistryPayload (v1 schema), AffixName helper (apex + subdomain + multi-dot collapse), ErrUnrecognizedCodec sentinel. Pure (no client deps); Codec interface + impls land in tasks 2-4. (Design §3.2.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…xName Adds TestRegistryPayload_JSONTags (verifies compact tags + h omitempty — the wire-format contract was previously untested), collapses the redundant len==2 branch in AffixName to the uniform form, and locks ErrUnrecognizedCodec's message. Addresses P5-T1 review findings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bare-JSON encoder/decoder + Codec interface; v1alpha1 default codec. Rejects unknown schema versions and malformed JSON via ErrUnrecognizedCodec wrapping. (Design §4.1.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ecode Both plaintext Decode-failure tests now assert errors.Is(err, ErrUnrecognizedCodec) — the sentinel Task 9/12 branch on for AdoptRefusedNoTXT. Adds a compile-time Codec satisfaction guard for plaintextCodec. Addresses P5-T2 review findings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Encrypted codec, v1:<nonce>:<ciphertext> wire format, fresh GCM nonce per Encode. All decode failures (wrong key, tampering, malformed envelope, bad version) wrap ErrUnrecognizedCodec so callers collapse to a single AdoptRefusedNoTXT condition. (Design §4.2.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pattern-13 cleanup: the Codec interface doc and aesCodec comment named autoDetectingCodec before it exists. Trimmed to describe only the currently-implemented codecs; the read-side dispatcher documents itself when it lands. Addresses P5-T3 review findings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sniffs the v1: prefix to dispatch plaintext vs AES decode. Encode panics (read-only wrapper). v1: input with no key configured surfaces ErrUnrecognizedCodec so the reconciler refuses adoption. Restores the Codec interface doc to list all three impls now that the type exists. (Design §4.3.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spec.Mode: Managed (default) | Observe. omitempty is safe (string with non-empty kubebuilder default). Designed for reuse on other CRDs that grow read-only semantics; shared reconcile.ShouldMutate helper lands in task 7. CRD YAML regenerated to internal/bootstrap/crds (no make manifests target — make generate does deepcopy+crd). (Design §3.3.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The plan placed +kubebuilder:validation:Enum on both the RecordMode type and the Mode field, producing a redundant allOf with two identical enum blocks in the generated CRD. The named type carries the enum; the field inherits it. Keeping the marker only on the type yields a single clean enum. Plan-vs-idiomatic adaptation. Addresses P5-T5 review finding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds TxtRecordID, TxtAffix, ObservedTXT (typed ObservedTXTPayload mirroring the decoded codec form + RawContent fallback). All additive omitempty; v1alpha1-safe. Deepcopy + CRD YAML regenerated to internal/bootstrap/crds. (Design §3.3.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-line predicate: false iff mode == "Observe". String-typed so future CRDs with different enum names reuse it without per-CRD wiring. (Design §3.4.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AdoptRefusedNoTXT, AdoptRefusedForeign, TxtRegistryKeyUnavailable, Observing. Kept 1:1 across the const block, ZoneReasons(), and the TestZoneReasons_Registered enumerating test in the same commit (per the project's recurring conventions-registry-sync finding). Consumed by the DNSRecord reconciler in P5 tasks 10-14. (Design §3.3.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-contained sample: applying examples/annotated_httproute.yaml now materializes both the tunnel-enabled Gateway and the attached HTTPRoute in one go. The standalone annotated_gateway.yaml remains as a Gateway-only reference; annotated_httproute_override.yaml depends on this Gateway (cross-referenced in its header). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
config/crd/bases/ is purely a kubebuilder/controller-gen output — make generate recreates it from api/v2alpha1/ Go types and copies the result into chart/templates/crd-*.yaml (the published source). Keeping the intermediate config/crd/bases/ in VCS leaves the door open for drift between Go types and committed CRDs, so untrack it and add config/ to .gitignore. Verified make generate still works post-untrack and the chart-template CRDs still get refreshed. While here, fix the README's 'Local development' section. The prior 'make install' / 'make deploy' commands were kubebuilder boilerplate — those targets never existed in this Makefile. Replaced with the actual flow: make generate # regenerate CRD bundles kubectl apply -f config/crd/bases/ # apply locally go run ./cmd/manager # run the operator Plus a chart-from-local-checkout install option for users who want to test chart changes without publishing to OCI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit untracked config/crd/bases/ but make generate still recreated the directory locally on every run. Now make generate writes the intermediate CRD bundles straight to bin/crd-staging/ (already gitignored via bin/), so the config/ directory is never produced. Changes: - Makefile: output:crd:artifacts:config=bin/crd-staging (was: config/crd/bases) - Makefile: chart-template copy loop reads from bin/crd-staging/ (was: config/crd/bases/) - .gitignore: drop the now-unneeded config/ entry - README: local-dev kubectl apply path updated to bin/crd-staging/ Verified: make generate → bin/crd-staging/ populated, chart/templates/crd-*.yaml refreshed; no config/ anywhere in the tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the (API token, account ID) credential pair end-to-end:
- The credential precedence: per-CR spec.cloudflare → operator-level
default → ErrAccountIDUnset.
- The token Secret shape (SecretReference: name + namespace + key)
with sensible defaults (namespace = CR's namespace; key = 'token').
- The label-scope requirement: every Secret the operator should be
able to read must carry app.kubernetes.io/part-of=cloudflare-operator.
This is the #1 first-time-setup failure mode; without the label,
Get returns NotFound and resolve fails with ErrSecretNotFound.
- Inline accountID vs accountIDSecretRef (the XValidation rule) +
the 'key defaults to token, not accountID' footgun.
- Credential rotation: the cfgo.Client cache's 30-minute absolute TTL
and how to force immediate adoption via cloudflare.io/reconcile-at.
- Common errors with concrete fixes: ErrSecretNotFound,
ErrSecretKeyMissing, ErrAccountIDUnset, 401/403 from Cloudflare.
- Pointer to multiple-accounts.md (future) for multi-tenant patterns.
Linked from README's Documentation table.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The canonical reference for every cloudflare.io/* annotation the
operator reads or writes. Pairs with examples/annotated_*.yaml — the
samples demonstrate; this doc explains.
Groups annotations by purpose:
- Tunnel attachment (cloudflare.io/tunnel, /tunnel-name,
/gateway-service, /gateway-apex, /hostnames)
- Inherited by emitted DNSRecord (Gateway → Route → operator default
precedence chain; covers zone-ref, zone-ref-namespace, proxied,
ttl, no-tls-verify, origin-server-name, scheme, adopt)
- DNS-only (Service path: dns-record, dns-target, port)
- Force-reconcile (cloudflare.io/reconcile-at, restart-immune)
- Operator-managed (auto-created; don't set manually)
Also documents the ParseTruthy vocabulary (true/yes/enable/enabled +
their negatives, case-insensitive, whitespace-trimmed) and the
'precedence: Route > Gateway > operator default' rule.
Cross-linked from README's Documentation table. References reconciliation.md
+ credentials.md + (future) gateway-api.md + adopting-existing-records.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The headline integration story. Covers:
- What the operator materializes when a Gateway is opted into tunneling:
auto-created CloudflareTunnel CR, cloudflared Deployment, metrics
Service, connector-token Secret, N emitted CloudflareDNSRecord CRs.
- The 'opt-in lives ONLY on Gateway, not Route' invariant (Slice 1
correctness fix); Routes attach via parentRefs and inherit the
tunnel decision.
- Required annotations on the Gateway: cloudflare.io/tunnel,
cloudflare.io/gateway-service (no label fallback), cloudflare.io/zone-ref.
- Optional inheritance annotations and the Route > Gateway > default
precedence chain.
- HTTPRoute vs TLSRoute attachment shapes; the scheme-from-listener
inheritance.
- Per-Route override patterns (grey-cloud one hostname, self-signed
backend cert, apex bypass).
- The full generated-object inventory (Kubernetes side + Cloudflare side).
- Cascade-GC: auto-created vs direct-create lifecycle.
- Common gotchas: GatewayServiceUnspecified, wildcard-only Gateway
requiring gateway-apex, Route attached but no DNS record (3 causes),
records appear but tunnel doesn't serve (3 causes).
Cross-linked from README's Documentation table and from
annotations.md / reconciliation.md / credentials.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Explains the TXT-companion-verified adopt flow: why it exists, what
the three outcomes are, the recommended migration flow, and why the
operator never silently backfills.
Structure:
- Why adopt exists (the take-over-without-outage use case).
- Why the TXT companion exists (operator-wars + accidental-clobber
failure modes that motivate ownership marking).
- The companion's shape: cf-txt.<hostname> sibling DNS name carrying
a JSON ownership payload (namespace + name + UID).
- The three adopt outcomes:
Adopted — companion identifies THIS CR.
AdoptRefusedNoTXT — primary exists, companion doesn't.
AdoptRefusedForeign — companion identifies a different CR.
- The recommended flow: Observe (reconnaissance) → optional manual
companion injection → Adopt:true + Mode:Managed.
- The pragmatic 'just delete and recreate' migration path for
pre-feature records, with the brief-outage trade-off documented.
- Why no silent backfill (distinguishes pre-feature records from
other-operator records; refuses to clobber).
- Status / condition reference for the adopt-refusal reasons.
- When NOT to use adopt (greenfield, observe-first, post-delete
recreate via the S1 self-heal path).
Cross-linked to credentials.md (resolve failures), reconciliation.md
(force-reconcile after fix), annotations.md (cloudflare.io/adopt for
Gateway-API path), and the future txt-registry.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The operational pair to reconciliation.md. Where reconciliation.md
answers 'when does the next reconcile fire?', this answers 'the
reconcile fired and something is wrong, now what?'
Structured as a 6-step diagnostic sequence:
Step 0 — confirm operator pods are alive
Step 1 — read Status (Phase + Conditions[Ready] + LastSyncedAt /
LastReconcileToken)
Step 2 — match the Ready=False reason to a fix:
- Credential reasons: CredentialsUnavailable, CredentialsInsufficient
- Adoption reasons: AdoptRefusedNoTXT, AdoptRefusedForeign,
AdoptedExistingRecord
- Tunnel reasons: GatewayServiceUnspecified, DuplicateHostname,
ControllerOffline
- Zone/DNSRecord reasons: DependencyMissing, DriftDetected,
OwnershipCompanionFailed, Ignored
- ZoneConfig/Ruleset: SettingsApplied, SettingsApplyFailed,
PlanTierInsufficient
Step 3 — read operator logs (meta + zone + tunnel sub-operators)
Step 4 — check Events
Step 5 — did the reconcile actually run? (annotation-ack trick)
Step 6 — read the Cloudflare side (dashboard / dig)
Plus a quick-reference cheat sheet of the canonical kubectl
invocations and a 'when to involve a maintainer' template.
Cross-linked from README's Documentation table. References every
other docs/* page; designed to be the entry point when a user has a
problem.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the standalone CloudflareTunnel CR end-to-end — the
CRD-first counterpart to gateway-api.md's annotation-driven view:
- Two creation paths: direct-create (user lifecycle) vs auto-created
(operator lifecycle via cloudflare.io/tunnel: true on a source).
- The 52-char naming budget and why it exists (cloudflared-<name>
must fit DNS-1123 63-char label limit; rename is unsupported per
the CEL immutability rule).
- What the operator materializes: tunnel + remote config on the
Cloudflare side; Deployment + metrics Service + connector-token
Secret on the Kubernetes side (all owner-referenced to the CR).
- The connector spec surface: replicas (1-25, no HPA), protocol
(auto/http2/quic), logLevel, gracePeriodSeconds, resources, and
scheduling pass-throughs; plus the cloudflared image-override
precedence chain (per-CR > chart-default > compile-time pin).
- Routing defaults: fallback (httpStatus vs url XOR) and the
originRequest passthrough (with a heads-up on the limited
TunnelOriginRequest CRD shape — only noTLSVerify +
originServerName today).
- The Status surface field-by-field, including the Slice 2 H
ObservedDataplane{Deployment,Service}Hash optimization fields
and the G ObservedIngress used for skip-when-snap-matches.
- Cascade-GC rules: auto-created tunnels self-delete when last
source detaches; direct-create tunnels never auto-GC.
- Common gotchas: 'Ready=True but no traffic flows',
ConnectorReady=False, naming conflicts, OOM tuning.
Cross-linked from README's Documentation table. References every
other docs/* page and the example file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the four forward-references that existed from
adopting-existing-records.md and credentials.md. Covers the TXT
ownership-marker mechanism end-to-end:
- Companion naming: cf-txt.<hostname> as a fresh leftmost DNS
label (post-S1 affix scheme). Wildcards rewrite to _wildcard
sentinel. Legacy cf-txt-<host> form is read-recognised but
never written; pruned opportunistically.
- RegistryPayload JSON shape (v=1, k=kind, ns=namespace,
n=name, h=optional content hash). Identifier is
(kind, namespace, name) — NOT UID — so deleting and recreating
a CR with the same NS/name lets the new CR adopt cleanly.
- AES-256-GCM wire format: 'v1:<base64-nonce>:<base64-ciphertext>'.
Random 12-byte GCM nonce per encode. Enabled via
CloudflareCredentialRef.TxtRegistryKeySecretRef pointing at a
32-byte AES key Secret.
- The encrypt-vs-plaintext trade-offs: AES hides owner identity
from public DNS readers; doesn't help against API-token-holders
(who see primaries directly) or Kubernetes-API-readers (who see
CRs directly).
- Rolling between modes: free because the read side autodetects;
no rotation primitive today, work around via plaintext→fresh AES
re-cycle.
- Inspecting companions: dig from outside, Status.txtRecordID
from inside, operator log grep.
- The engineering migration procedure for adopting pre-companion
records without an outage (vs. the simpler delete-and-recreate
flow in adopting-existing-records.md).
- Common gotchas: OwnershipCompanionFailed root causes, missing
companions, AES decrypt failures after key rotation.
Cross-linked from README's Documentation table; closes the
'(future)' placeholders in adopting-existing-records.md and
credentials.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chart/README.md and docs/crd-reference.md are now generated by
make generate, not hand-maintained. Two new tools added to the
make tools target:
- github.com/norwoodj/helm-docs renders chart/README.md from
chart/values.yaml + chart/README.md.gotmpl. Field-level
descriptions live in values.yaml using the helm-docs '# --'
convention.
- github.com/elastic/crd-ref-docs renders docs/crd-reference.md
from the api/v2alpha1 Go types. Templates forked under
hack/crd-ref-docs-templates/ so CRDs (types with a GVK)
promote to H2 with a horizontal-rule separator above each, and
non-Kinds collect at the bottom under a single '### Sub-types'
section. The fork is two files (gv_details.tpl + type.tpl);
gv_list.tpl + type_members.tpl are upstream-identical so a
later upstream pull can refresh just those two.
chart/values.yaml is restructured to use the helm-docs '# --'
field-level annotation convention. Every documented field now has
its description directly above; helm-docs picks them up into the
generated values table. Behavior unchanged — only the comment
shape changed.
hack/crd-ref-docs-config.yaml filters out the *List wrapper types
(CloudflareDNSRecordList, etc.) which are never user-instantiated.
The regenerated output (chart/README.md, docs/crd-reference.md)
lands in a follow-up commit so bisecting works.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Result of running 'make generate' against the auto-gen tooling
landed in the prior commit. chart/README.md goes from 683 lines
of hand-maintained chronological-slice content to a 47-line pure
values reference; docs/crd-reference.md is a new 886-line field-by-field
reference for all 5 CRDs and their 27 sub-types.
README's Documentation table updated:
- chart/README.md: replaced the now-stale 'chronological behavior-change
notes' description with 'auto-generated from chart/values.yaml'.
- docs/crd-reference.md: new row (auto-generated from api/v2alpha1).
Hand-edits to these files will be overwritten by the next
make generate. The chronological behavior-change content that
previously lived in chart/README.md is preserved in the
docs/<discrete-domain>.md pages shipped earlier (gateway-api.md,
tunnels.md, txt-registry.md, etc.).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves the 7 local-only docs subdirs (design/, plans/, prompt/, follow/, issues/, review/, summary/) under a single docs/appendix/ parent and collapses the .gitignore from 7 individual rules to one (docs/appendix/). Committable in-depth docs continue to live as flat files at docs/<discrete-domain>.md per the established convention. Also refreshes the .dockerignore explanatory comment to reflect the post-config-removal state: drops the now-stale 'config/' reference and mentions examples/ + .private-journal/ which are also auto-excluded by the **-then-allowlist pattern. The allowlist semantics are unchanged (Docker build still includes only cmd/ + api/ + internal/ + go.mod/go.sum). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restore CHANGELOG.md from main so release automation continues to own it, and align chart/Chart.yaml with main's metadata (version 0.18.3, kubeVersion, keywords, sources, maintainers) while keeping the refactor's common 5.0.1 bump. Description rewritten to reflect the meta-operator + bundled controllers shape and Gateway API integration. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Track future Cloudflare surfaces — likely CRD shapes and open design questions for each — without committing to scope. Includes a top-of-page TOC and an explicit "what this page is not" disclaimer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cross-checked R2, Email Service, and Workers CRD sketches against the cloudflare-go v6 SDK + public API reference and tightened the likely-CRD lists to match real endpoint shapes (R2 lifecycle/CORS sub-resources, Workers version→deployment model, Email Routing's zone-scoped rules vs account-scoped addresses). Added Containers as a fourth track with an explicit "SDK surface unverified" caveat since cloudflare-go didn't surface it in the docs query — design work must re-confirm the paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cloudflare's docs now explicitly recommend Workers Static Assets over Pages for new static sites, SPAs, and full-stack apps. Rework the Workers section accordingly: call out the three modes (pure static / SPA / full-stack), extend Worker.spec with an assets sub-spec, and replace the "Pages as a separate track" open question with the now-relevant artifact delivery problem (scripts are one file, asset bundles are directory trees — OCI artifact is the obvious fit). Adds a residual Pages-legacy question for managing existing Pages projects. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- test/envtest: CRDDirectoryPaths references the old config/crd/bases tree that 7cfad92 removed. Point it at bin/crd-staging instead, matching the Makefile's controller-gen output path. Unblocks every push since 7cfad92. - txt_canonical: silence gosec G115 false positive on byte(val). The preceding `if val > 255 { return raw }` bounds-checks val into the byte range; gosec can't see the guard. - tunnel_controller_test + tlsroute_source_controller_test: apply gofmt — colon-alignment on an ObjectMeta literal and an indented `drainLoop:` label that should sit at column 1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Split out a manifests target (controller-gen for CRDs only) and make both generate and test depend on it. The envtest harness reads CRDs from bin/crd-staging — without this dep, `make test` in CI ran before any CRD generation and panicked with "no such file or directory". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…subsections Context7 confirmed cloudflare-go is still on v6 (no v7 indexed) and that Cloudflare Containers ships as a Worker binding (Durable-Object-style class, config declared in the Worker code) rather than as a standalone REST product. There's no client.Containers.* surface in the typed Go SDK. Restructure the roadmap accordingly: - Remove the standalone Containers section; its substance moves into Worker.spec.containers[] per binding (image, defaultPort, sleepAfter, envVars, entrypoint, egress policy). - Add a fourth Worker mode (container-backed) alongside pure-static / SPA / full-stack, with a matching paragraph on the [[containers]] upload payload. - Add two new subsections under Workers — "static assets (the Pages replacement)" and "containers" — so the deeper detail on each mode is navigable and not buried in one giant CRD bullet. - Extend the open questions with container image source, egress ergonomics, and the management-API gap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… D semantics Bug D (commit 8082d5f, 2026-05-19) intentionally changed the source controllers' chain-CNAME semantics via chainContentFor (apex.go): when the parent Gateway has a concrete listener and no cloudflare.io/gateway- apex annotation, the chain content is now tn.Status.TunnelCNAME, not the listener hostname. The HTTPRoute + TLSRoute envtest assertions were written before Bug D and kept asserting the literal listener hostname ("ext.example.com"), so they timed out at 15s on Eventually after Bug D landed. The mismatch was masked by an unrelated CRD-path break (fixed in 9460b9d) until envtest could run again. This change is test-only. The controllers are correct. - A1: update the two failing tests to capture tn.Status.TunnelCNAME from the parent tunnel CR (after the gateway-creation helper waits on it) and assert the emitted chain DNSRecord's Content matches. - C: add two new tests (HTTPRoute + TLSRoute) that exercise the apex- override branch of chainContentFor — Gateway carries cloudflare.io/gateway-apex: "apex.example.com" and the chain content is asserted to equal that override. Previously had no envtest coverage. - Refactor: createGatewayForRouteTest + createTLSGatewayForRouteTest now take a gatewayApex string ("" = no annotation); existing call sites pass "". Verified locally: make test passes; the four target cases pass in 0.7-3.0s each (down from 15.5s timeouts). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add a paragraph under the Disclaimer making the authorship pattern explicit, so adopters can weigh that when evaluating fit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add a top-level LICENSE file (MIT, Copyright (c) 2026 jacaudi), update the Kubebuilder boilerplate template + every Go file's header, update the README's license section, and add artifacthub.io/license: MIT to the chart annotations. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ExternalSecret + SecretStore wiring that materializes the `cloudflare-credentials` Secret the operator consumes. Provider-agnostic — callers swap the SecretStore.provider stanza for their backend (1Password Connect, Vault, AWS SM, GCP SM, …) without touching the ExternalSecret. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Show that both credential-resolution patterns are valid: per-CR override
via spec.cloudflare (kept on cloudflarednsrecord, cloudflaretunnel, and
the ESO example) and operator-level default via the chart's
credentials.{existingSecret,tokenKey,accountIDKey} (now demonstrated by
cloudflarezone, cloudflareruleset, and cloudflarezoneconfig).
Also adds the ESO + CR example (external_secret_cloudflare_credentials.yaml)
held over from the previous round.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
|
🎉 This PR is included in version 0.19.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
refactor/total → main
Initial integration of the operator's v2alpha1 API and complete
controller stack. This is the long-pending refactor branch's first
merge to
main—mainpreviously held only project scaffolding.refactor/totalis 291 commits ahead ofmain, touching 364 files (+44,579 / -38,938).What's in this PR
CRD surface — v2alpha1
Five CRDs in the
cloudflare.io/v2alpha1API group:CloudflareZoneCloudflareZoneConfigCloudflareRulesetCloudflareDNSRecordCloudflareTunnelPlus annotation-driven attachment from Gateway API and Services:
cloudflare.io/tunnel: "true"on a Gateway / Service opts it in.Architecture
tunnel sub-operators based on chart values.
watch their respective object types and emit
CloudflareDNSRecordCRspointing at the tunnel CNAME.
with a paired
cf-txt.<hostname>TXT entry (plaintext JSON, oroptional AES-256-GCM via
TxtRegistryKeySecretRef). Adoption isrefused if the companion doesn't identify the adopting CR.
cloudflare.io/reconcile-atannotation — restart-immune ack via
Status.LastReconcileToken.cloudflare.io/zone-ref-namespace.Reconciliation model
reconcile.UpdateStatusIfChanged[T]used by all 5 reconcilers, with the universal in-memory-rollback
invariant on the no-write branch.
reconcile.HaltWithforReady=Falseshort-circuits (sibling toHaltDependency/HaltCredentialsUnavailable).*cfgo.Clientcache (32-entry, 30-min absolute TTL) keyed bysha256(token || accountID)to reuse HTTP/2 connection pools.on
CloudflareTunnel(ObservedDataplane{Deployment,Service}Hash).app.kubernetes.io/part-of=cloudflare-operator)to avoid a cluster-wide LIST/WATCH on every Secret.
Documentation
README.mdwith project blurb, Quickstart, Disclaimer,Features, Documentation table, Acknowledgements.
docs/carries 9 in-depth pages (8 hand-authored + 1 auto-generated):architectureadjacent topics — credentials, gateway-api, tunnels,adopting-existing-records, annotations, reconciliation, troubleshooting,
txt-registry — plus the auto-generated
crd-reference.md.chart/README.mdis now a pure values reference, auto-generated.examples/carries 9 ready-to-apply manifests (5 CRD samples + 4annotation-driven attachment samples).
Test coverage
_test.gofiles;go test ./... -race -count=1is 12/12 packages PASS.test/envtest/exercisingthe full reconcile loop against a real kube-apiserver. The chart's
CI workflow runs them with
setup-envtest+ Kubernetes 1.30 assets.staticcheck QF1008 — documented as accepted).
Build + release artifacts
ghcr.io/jacaudi/cloudflare-operatoroci://ghcr.io/jacaudi/charts/cloudflare-operatortest-build.ymlworkflow onrefactor/total. Latest test build:v0.0.0-alpha.refactor-total.e05390f(green).Verification
Reviewer checklist before merging:
ci-cd.ymlpipeline againstrefactor/total(smoke-deploy gate — explicit user gate per project memory).
latest
refactor/totalchart tag) has been reviewed.docs/troubleshooting.md—
Phase=Ready,Status.LastReconcileTokenack roundtrip works,ConnectorReadyis True on attached tunnels.Notes
main. SubsequentPRs land incrementally; the
refactor/totalbranch will continueas the integration branch until v1 cutover.
Chart.yamlversion line should bump toa release-track number (currently
0.1.0).LICENSEfile at repo root is still a pending follow-up — sourceheaders carry Apache 2.0, but no top-level LICENSE file exists yet.
values reference is captured by the new
make generatetooling(
helm-docs+crd-ref-docs). Hand-edits tochart/README.mdor
docs/crd-reference.mdare overwritten on the next run.Out of scope
main, a separaterelease-prep PR will bump version + chart appVersion to
0.1.0proper and publish a non-alpha tag.
aesCodeciscode-present but the operator hardcodes
nilkeyref today). Trackedin the local backlog as a separate Design→Plan→Execute track.