Skip to content

feat(applicationlayer): WAF v3 render + kube-controllers config plumbing#4821

Draft
electricjesus wants to merge 17 commits into
tigera:masterfrom
electricjesus:seth/applicationlayer-render-v3
Draft

feat(applicationlayer): WAF v3 render + kube-controllers config plumbing#4821
electricjesus wants to merge 17 commits into
tigera:masterfrom
electricjesus:seth/applicationlayer-render-v3

Conversation

@electricjesus
Copy link
Copy Markdown
Member

@electricjesus electricjesus commented May 19, 2026

⏸️ Hold review — pending merge of #4690 ("Gatewayapi Namespaced Mode" — Walter Neto). This PR is stacked on radixo:gatewayapi-deployment-enterprise, so the diff currently includes Walter's commits (~15 files / +2219 / -53662). Once #4690 merges to master, this diff collapses to just the WAF v3 work (6 files / +555 / -7).

A parallel master-based variant — same content, rebased onto current master — lives at #4779, available as an alternative merge path if release timing prefers it.


Summary

Operator-side render for WAF v3 (Coraza WASM) admission-webhook integration with calico-kube-controllers. Pairs with tigera/calico-private#11834. Design: tigera/designs#25.

  • pkg/render/applicationlayer/gateway_waf.go: WAFAdmissionWebhookComponents — Deployment + Service + SA + ClusterRole + ClusterRoleBinding + ValidatingWebhookConfiguration. Sibling to existing WAF v1 (sidecar / ModSecurity) render in pkg/render/applicationlayer/applicationlayer.go, which is untouched.
  • pkg/render/kubecontrollers/kube-controllers.go: render WASM_IMAGE / WASM_PULL_SECRET / WASM_CA_CERT env on calico-kube-controllers; extend RBAC (secret replication, EEP create); add ENABLED_CONTROLLERS=applicationlayer plumbing.
  • pkg/common/common.go: GatewayAddonsFeature constant (gates Tigera-built add-ons, not the bare ingress data path).
  • pkg/components/enterprise.go: WAF v3 component metadata.

Stack relationship

Stacked on #4690 to honor the prior agreement that #4690 merges first. Technically standalone (no symbol dependency on namespaced-GW changes), so a parallel variant on master is maintained at #4779 in case release timing favors that path.

Test plan

  • go test ./pkg/render/applicationlayer/... ./pkg/render/kubecontrollers/... -v
  • go build ./...
  • E2E verified on seth-ez-a3b5 against tigera/calico-private#11834 HEAD: SQLi ?id=1' OR 1=1-- → HTTP 403 via Coraza WASM filter; clean GET → HTTP 200

Release Note

```release-note
Add operator render for the WAF v3 (Coraza WASM) admission webhook + plumb
`WASM_IMAGE` / `WASM_PULL_SECRET` / `WASM_CA_CERT` env vars onto
`calico-kube-controllers` for the new `WAFReconciler` (paired with
`tigera/calico-private#11834`). Existing WAF v1 (sidecar / ModSecurity)
render path is untouched.
```

Linked

radixo and others added 8 commits May 7, 2026 17:31
- Swap the checked-in gateway_api_resources.yaml for the embedded gateway-helm.tgz rendered via the helm SDK at startup; K8SGatewayAPICRDs/GatewayAPICRDs now take a runtime.Scheme and return an error (istio_controller updated for the new signature)
- Deploy two envoy-gateway controllers: legacy in tigera-gateway (user-declared classes via Spec.GatewayClasses) and a new one in calico-system with deploy.type=GatewayNamespace; auto-provision the tigera-gateway-class-ns GatewayClass bound to the new controller
- Group the tigera-gateway install behind legacyObjects/legacyTeardownObjects so the eventual deprecation is a single delete
- HasLegacyGateways classifier in the controller: build a className -> controllerName map seeded from Spec.GatewayClasses + existing GatewayClass resources, classify every live Gateway; when no Gateway targets the tigera-gateway controller, the install is torn down; during the teardown-then-redeploy race the legacy render is deferred to avoid a "Namespace is terminating, skipping creation" log flood
- Legacy teardown queues only the Namespace + cluster-scoped objects + the Deployment (for status.RemoveDeployments); in-namespace RBAC/Secrets ride the cascade to avoid the tigera-operator-secrets RoleBinding race
- Move the shared waf-http-filter ClusterRoles out of the legacy bundle so the calico-system-side proxies keep their cluster-scoped perms after tigera-gateway is retired
- Per-namespace Enterprise resources (SA, RoleBindings, pull secret, shared CRB subject) for namespaces hosting a namespaced-class Gateway; reserved namespaces skip shared resource create/delete; Secret goes before RoleBinding on cleanup to avoid 403
- Gate v3 NetworkPolicies on the calico-system Tier; render calico-system.envoy-gateway allow for the controller and certgen
- Update unit tests and Makefile/docs accordingly

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Cover the calico-system envoy-gateway controller lifecycle, per-namespace resource provisioning and cleanup, custom EnvoyProxy and EnvoyGateway ConfigMap watches, owning-gateway env vars in l7-log-collector, and the legacy-class teardown path
- Teardown sequencing for tigera-gateway cascading

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lico-system

- Render one envoy-gateway controller in calico-system with deploy.type=GatewayNamespace
- Auto-provision tigera-gateway-class; honour user overrides if redeclared in Spec.GatewayClasses
- Enumerate every operator-owned object from the legacy tigera-gateway install for cleanup (pull Secrets before tigera-operator-secrets); keep the Namespace itself in case users placed their own resources there
- Point GatewayAPI finalizer at the calico-system envoy-gateway Deployment
- Drop dual-controller fixtures and the legacy-undeploy test; consolidate FV tests to the calico-system layout
Upstream envoy-gateway rejects the combination of mergeGateways: true
and GatewayNamespaceMode, so any user-supplied EnvoyProxy with merging
enabled would cause its referenced Gateways to silently stop being
programmed after the switch to GatewayNamespace
(https://gateway.envoyproxy.io/docs/tasks/operations/gateway-namespace-mode/).

In the GatewayAPI reconciler, when a Spec.GatewayClasses[].EnvoyProxyRef
points at an EnvoyProxy with Spec.MergeGateways == true, force the
field to false in our managed copy and log a warning naming the
EnvoyProxy and GatewayClass. The user's source CR is not mutated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- remove controllerName param (never set by callers)
- inline ReleaseName and GatewayNamespace deploy type
- add DeploymentNamespace constant for the install namespace
- drop now-unused helmGateway type
- parseManifest now errors on kinds it doesn't recognize so a chart
  bump that emits a new kind trips the existing render tests
Bug introduced in this branch; reverts the render and UT to master's behavior.

- drop the render-side auto-provision of tigera-gateway-class
- flip the UT that asserted the buggy output
Under deploy.type=GatewayNamespace (tigera#4690), envoy-proxy pods land in the
Gateway's own namespace and mount the operator trust bundle at
/etc/pki/tls/certs (added by tigera#4796).  The mount references a ConfigMap
in the proxy pod's own namespace, but tigera#4796 only writes the ConfigMap
into calico-system (the controller's namespace), so the proxy Pod stops
at Init:0/2 with:

  Warning  FailedMount  MountVolume.SetUp failed for volume
                        "tigera-ca-bundle": configmap not found

Mirror the trust bundle into each Gateway namespace alongside the
existing per-namespace propagation of tigera-pull-secret and the
waf-http-filter SA / RoleBindings.  Reuses the existing reserved-NS
guard and follows the same delete-before-RoleBinding ordering as the
pull-secret cleanup.

Reproduced live on seth-ez-a3b5 2026-05-19 with operator
walter-merge-2026-05-18 (has both tigera#4690 and tigera#4796): fresh Gateway
namespace -> everything else propagates but tigera-ca-bundle does not,
proxy Pod stuck Init:0/2.

Brief:
  tigera/gateway-extensions-controller/docs/planning/briefs/2026-05-19-ca-bundle-propagation-brief.md
Walter-supplied positive test: configure two Gateway namespaces
("default" and "app-ns") with a TrustedBundle, render, assert the
trust bundle ConfigMap (TrustedCertConfigMapName) lands in each
Gateway namespace.

Companion to the per-NS ConfigMap copy added in the previous commit.
Adds the WAF v2/v3 admission webhook render path as forward-looking
infrastructure. Sibling to the existing WAF v1 (sidecar / ModSecurity)
render in pkg/render/applicationlayer/applicationlayer.go, which is
untouched.

- pkg/common/common.go: GatewayAddonsFeature = "ingress-gateway-addons"
  constant. Naming is deliberate: the feature gates Tigera-built ADD-ONS
  (currently the WAF v2/v3 admission webhook), NOT the bare ingress
  gateway data path.
- pkg/render/applicationlayer/gateway_waf.go: WAFAdmissionWebhookComponents
  returns the 6 objects required for the webhook (Deployment + Service +
  ServiceAccount + ClusterRole + ClusterRoleBinding +
  ValidatingWebhookConfiguration).
- pkg/render/applicationlayer/gateway_waf_test.go: per-function test.

Out of scope for this PR (deferred to follow-ups):
- Controller wire-up: invoking the render path requires resolving the
  webhook cert pair + image references, plus a license fetch. That
  wire-up depends on a GatewayWAF (or similar) field being added to
  ApplicationLayerSpec to gate the existing reconcile body.
- KubeControllersConfiguration ApplicationLayer field defaulting:
  depends on a tigera/api version bump after the calico-private side of
  the design lands.
- BFF + frontend module render: belongs to a separate UI epic, not this
  controller-side scope.

(cherry picked from commit 0c6c253)
…ONTROLLERS

The applicationlayer manager registered by tigera/calico-private#11834
runs WAF / GlobalWAF / Plugin / GlobalPlugin / Validation reconcilers
inside the kube-controllers binary on Calico Enterprise variants. End-to-
end smoke against a licensed cluster confirmed the binary is otherwise
ready; what's missing operator-side is RBAC + the ENABLED_CONTROLLERS
entry that turns the controller on.

Add to the Enterprise common rule set:

  * applicationlayer.projectcalico.org WAF{Policy,Plugin,ValidationPolicy}
    + Global variants (resources, /status, /finalizers).
  * gateway.networking.k8s.io gateways/httproutes/tcproutes/tlsroutes/
    grpcroutes (read+update for targetRef validation, /status for
    surfaces).
  * core / events.k8s.io events (create+patch) - controller-runtime
    Recorder emits events on watched objects via either API group
    depending on the kubernetes minor; without this, every reconcile
    that hits Recorder.Eventf is rejected with 'events is forbidden'.

Append 'applicationlayer' to the Enterprise enabledControllers list so
the controller is constructed at startup; runtime activation is still
license-gated via features.IngressGateway in calico-private's lifecycle
loop, so absence of a gateway-addons-bearing license is a no-op.

Verified end-to-end on seth-calient-master: WAFPolicy reconcile completes
with conditions Licensed=True / Accepted=True / Validated=Unknown /
Programmed=False (Programmed gates on WASMImage, separate follow-up).

(cherry picked from commit ebe70a2)
The applicationlayer manager registered by tigera/calico-private#11834
reads WASM_IMAGE from the kube-controllers Deployment env to construct
the OCI reference its WAF reconcilers use to program EnvoyExtensionPolicy
attachments. Without this env, every reconcile stamps the WAFPolicy with
Programmed=False / Reason=WASMUnavailable.

Add ComponentCorazaWASM (image 'coraza-wasm', enterpriseVariant,
master version) to the operator components map and the EnterpriseImages
list so it participates in the standard image-version overrides and
hashrelease bumps. Resolve the OCI reference through the existing
registry / imagePath / imagePrefix path in kubeControllersComponent's
ResolveImages, and emit WASM_IMAGE on the deployment env when the
variant is Enterprise.

The reconciler tolerates an empty WASM_IMAGE by stamping
Programmed=False/WASMUnavailable, so this change is safe to land before
calico-private's manager.go starts reading the env (companion commit on
PR #11834).

(cherry picked from commit 51d7bb5)
…cation + EEP creation

End-to-end smoke surfaced two more permissions the applicationlayer
reconcilers need that the prior RBAC commit (f28b241) missed:

  * secrets + configmaps (cluster-wide, full CRUD): the WAF reconciler
    replicates WASM_PULL_SECRET from controllerNamespace into each
    WAFPolicy's namespace so the rendered EnvoyExtensionPolicy can
    reference it; analogous flow for the WASM_CA_CERT ConfigMap.
    Without this, reconcile fails with 'secrets is forbidden ... cannot
    create resource secrets in API group ""'.

  * gateway.envoyproxy.io/envoyextensionpolicies (cluster-wide, full
    CRUD): the WAF reconciler emits one EnvoyExtensionPolicy per
    targetRef to bind the Coraza wasm filter at the gateway / route.
    Without this, reconcile errors before patching status.

Mirrors the resources tigera/calico-private's WAFReconciler/Generator
already write to; verified end-to-end against seth-calient-master.

(cherry picked from commit 0a2ad96)
… calico-kube-controllers

Companion to commit 38f81fd ("inject WASM_IMAGE env"). The
applicationlayer manager registered by tigera/calico-private#11834 also
reads WASM_PULL_SECRET and WASM_CA_CERT off its process env: the
reconciler replicates the named Secret and ConfigMap from the
kube-controllers namespace into each WAFPolicy's namespace before
referencing them from the rendered EnvoyExtensionPolicy so the wasm
fetcher can pull from a private Tigera registry over its TLS chain.

End-to-end smoke on seth-calient-master confirmed Programmed=True only
after a manually-crafted GCR pull-secret was attached and WASM_PULL_SECRET
was set on the Deployment by hand. With this change, every Enterprise
install picks the right values up automatically.

Sources:
  WASM_PULL_SECRET <- first entry of Installation.ImagePullSecrets
                       (the same secret already attached to the Deployment
                       via PodSpec.ImagePullSecrets, so multi-tenant /
                       BYO-registry installs reuse whatever operator
                       already wires up here)
  WASM_CA_CERT     <- certificatemanagement.TrustedCertConfigMapName
                       (the existing tigera-ca-bundle ConfigMap, already
                       mounted on this Deployment via TrustedBundle)

The supplementary RBAC commit 6c68d19 already grants secrets+configmaps
cluster-wide CRUD; no additional RBAC required. Both env vars stay
conditional on the Enterprise variant; Calico OSS does not deploy the
applicationlayer manager. The reconciler tolerates empty values by
stamping Programmed=False/WASMUnavailable, so this change is safe to land
independent of #11834's merge order.

(cherry picked from commit d42d74a)
…econcilers

The four prior wiring commits (f28b241 ENABLED_CONTROLLERS, ad421ec
webhook render, 38f81fd WASM_IMAGE, 6c68d19 RBAC for secret
replication + EEP creation) plus the WASM_PULL_SECRET / WASM_CA_CERT
emission in the previous commit changed the Enterprise render shape but
left the existing test fixtures expecting the pre-applicationlayer
deployment.

Fix the four broken Enterprise fixtures and extend the standalone case
with positive assertions on the reconciler's RBAC + env contract:

* ENABLED_CONTROLLERS now includes "applicationlayer" on the Enterprise
  calico-kube-controllers Deployment (two fixtures)
* ClusterRole rule counts move 27 -> 36 (calico) and 25 -> 34 (es) after
  the 9 applicationlayer-related rules added by f28b241 + 6c68d19
* Standalone Enterprise fixture asserts:
    - applicationlayer.projectcalico.org rules for waf{policies,plugins,
      validationpolicies} + global variants (resources, /status, /finalizers)
    - gateway.networking.k8s.io for gateways/httproutes/tcproutes/tlsroutes/
      grpcroutes (read+update + /status)
    - core/events + events.k8s.io/events (create+patch) so controller-runtime
      Recorder.Eventf works on either k8s version
    - core/secrets + configmaps cluster-wide CRUD for cross-namespace
      replication of pull secrets + CA bundles
    - gateway.envoyproxy.io/envoyextensionpolicies cluster-wide CRUD
    - WASM_IMAGE resolved through registry/imagePath/imagePrefix
    - WASM_PULL_SECRET = first Installation.ImagePullSecrets entry,
      asserted alongside PodSpec.ImagePullSecrets to lock the
      single-source-of-truth invariant
    - WASM_CA_CERT = certificatemanagement.TrustedCertConfigMapName

Calico OSS coverage stays exhaustive via the existing ConsistOf on the
custom-config fixture; none of the applicationlayer wiring is conditional
on anything but the Enterprise variant, so OSS env remains unchanged.

(cherry picked from commit 2ec5d82)
…ons.waf.state

Per the 2026-05-12 design walkthrough on PMREQ-384 (recorded in
tigera/designs#25 docs/decisions/2026-05-08.md §1), the WAF v3 (Gateway API
add-on) surface on calico-kube-controllers is opt-in via a new field on
the existing operator GatewayAPI CR:

    apiVersion: operator.tigera.io/v1
    kind: GatewayAPI
    metadata:
      name: default
    spec:
      extensions:
        waf:
          state: Enabled  # default Disabled

Default-off semantics: unset Extensions, unset WAF, unset State, and
explicit Disabled all leave the WAF surface unrendered. Only an explicit
{state: Enabled} flips the gate on.

What the gate covers:

- enabledControllers: "applicationlayer" entry appended only when gate
  is on (otherwise kube-controllers does not start the WAF reconcilers).
- Coraza-WASM image resolution in ResolveImages.
- WASM_IMAGE / WASM_PULL_SECRET / WASM_CA_CERT env vars on the
  calico-kube-controllers Deployment.
- WAF / Gateway-API / EnvoyExtensionPolicy / event / secret-replication
  ClusterRole rules.

API change is operator-only: GatewayAPI lives in the operator's own
api/v1 package, not tigera/api or calico-private. No tigera/api sync
delay. kube-controllers behavior (license-gated WAFReconciler) is
unchanged.

Installation controller now reads the GatewayAPI CR via the existing
"default" name (NotFound = gate off) and watches it so that toggling
the field re-triggers a render.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants