This runbook is the operator guide for rebuilding, validating, presenting, and tearing down the CloudOps Platform demo safely.
- Modular Terraform provisioning for AWS VPC + EKS
- Ingress via NGINX behind AWS NLB with TLS (ACM) and HTTPS-only enforcement
- HPA autoscaling under real CPU load (metrics-server gate + evidence)
- Observability: Prometheus scrape + Grafana views for ingress traffic
- Cost discipline: teardown prevents orphaned NLB, NAT gateways, and ENIs
- aws
- kubectl
- helm
- terraform
- jq
- dig
- curl
- AWS credentials configured locally (or via assumed role)
- Access to the EKS cluster defined by
EKS_CLUSTER_NAMEinAWS_REGION
AWS_REGION(default:ca-central-1)EKS_CLUSTER_NAME(default:cloudops-dev-eks)
Used only to create or update the DNS record for the demo:
ROUTE53_ZONE_ID(acceptsZXXXXor/hostedzone/ZXXXX)ROUTE53_RECORD_NAME(example:app.utieyincloud.com)
APP_NS(default:apps)APP_HOST(default:app.utieyincloud.com)APP_INGRESS_NAME(default:hpa-demo)APP_POD_SELECTOR(default:app=hpa-demo)
OBS_WINDOW_SECONDS(default:120)OBS_INTERVAL_SECONDS(default:5)
Terraform + NGINX Ingress + Application
Command:
./scripts/rebuild-demo.shWhat this step does:
- Runs
terraform applyinterraform/environments/dev - Configures kubeconfig for the EKS cluster
- Installs or upgrades ingress-nginx using an AWS NLB with ACM TLS
- Enforces HTTPS-only external access
- Deploys the demo application manifests
- Optionally updates Route53 DNS (if configured)
- Performs proof checks for HTTPS success and HTTP failure
Expected outcomes:
- An AWS NLB hostname is printed
- The application Ingress has an external address
- HTTPS returns HTTP 200
- HTTP access fails or times out (expected)
Command:
./scripts/validate-env.shWhat this step validates:
- Kubernetes context and node health
- Ingress controller Service and NLB hostname
- Metrics API availability (required for HPA)
- HTTPS access (DNS optional via
--resolve) - HTTP negative test (should not be the primary path)
- HPA behavior observed over time (non-brittle reporting)
Expected outcomes:
- HTTPS returns HTTP 200
- Metrics API is Available or a warning is reported
- HPA status shows replica counts and scaling decisions
Command:
./scripts/teardown.shWhat this step does:
- Best-effort Kubernetes cleanup (apps and ingress first)
- Attempts
terraform destroy - If dependency violations occur:
- Detects VPC ID
- Removes NLBs, target groups, NAT gateways, ENIs, and other blockers
- Retries
terraform destroy
- Final best-effort cleanup to prevent orphaned AWS resources
Expected outcomes:
- Terraform state destroyed cleanly
- No orphaned NLBs, NAT gateways, or ENIs
- AWS account left in a cost-neutral state
Cause:
- NLB, NAT gateway, or ENIs still exist
Resolution:
./scripts/teardown.shThe script performs deep cleanup and retries automatically.
Cause:
- Metrics Server not ready or Metrics API unavailable
Resolution:
kubectl get apiservice v1beta1.metrics.k8s.io
kubectl top nodes
kubectl -n apps top pods