A production-ready multi-cloud Kubernetes platform supporting AWS EKS and Oracle Cloud Infrastructure (OCI) OKE with GitOps deployment using ArgoCD.
┌─────────────────────────────────────────────────────────────────────────────┐
│ Multi-Cloud Kubernetes Platform │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────┐ ┌─────────────────────────┐ │
│ │ AWS EKS │ │ OCI OKE │ │
│ │ ┌───────────────────┐ │ │ ┌───────────────────┐ │ │
│ │ │ Control Plane │ │ │ │ Control Plane │ │ │
│ │ └───────────────────┘ │ │ └───────────────────┘ │ │
│ │ ┌───────────────────┐ │ │ ┌───────────────────┐ │ │
│ │ │ Worker Nodes │ │ │ │ Worker Nodes │ │ │
│ │ │ ┌─────┐ ┌─────┐ │ │ │ │ ┌─────┐ ┌─────┐ │ │ │
│ │ │ │ Pod │ │ Pod │ │ │ │ │ │ Pod │ │ Pod │ │ │ │
│ │ │ └─────┘ └─────┘ │ │ │ │ └─────┘ └─────┘ │ │ │
│ │ └───────────────────┘ │ │ └───────────────────┘ │ │
│ └─────────────────────────┘ └─────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ GitOps (ArgoCD) │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Dev │ │ Staging │ │ Prod │ │ │
│ │ │ Auto-Sync │ │ Auto-Sync │ │ Manual-Sync │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
- Multi-Cloud Support: Deploy to AWS EKS and OCI OKE
- Infrastructure as Code: Terraform modules for reproducible deployments
- GitOps: ArgoCD for declarative application delivery
- Environment Management: Separate configurations for dev, staging, and prod
- Security: Network policies, RBAC, pod security, and secrets encryption
- Full Observability Stack: Prometheus, Grafana, Loki, Jaeger, AlertManager
- Cross-Cloud Dashboards: Compare metrics between AWS and OCI clusters
- CI/CD: GitHub Actions workflows for automation
- Cost Optimization: Spot instances support for non-production environments
multi-cloud-k8s-platform/
├── terraform/
│ ├── modules/
│ │ ├── eks/ # AWS EKS module
│ │ ├── oke/ # OCI OKE module
│ │ ├── vpc-aws/ # AWS VPC module
│ │ └── vcn-oci/ # OCI VCN module
│ └── environments/
│ ├── dev/
│ │ ├── aws/
│ │ └── oci/
│ ├── staging/
│ │ ├── aws/
│ │ └── oci/
│ └── prod/
│ ├── aws/
│ └── oci/
├── kubernetes/
│ ├── base/
│ │ ├── namespaces/
│ │ ├── rbac/
│ │ ├── network-policies/
│ │ └── resource-quotas/
│ ├── apps/
│ │ └── app-example/
│ │ ├── base/
│ │ └── overlays/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── prod/
│ ├── argocd/
│ │ ├── base/
│ │ └── overlays/
│ └── monitoring/
│ ├── prometheus/ # Metrics collection
│ ├── grafana/ # Visualization & dashboards
│ ├── loki/ # Log aggregation
│ ├── jaeger/ # Distributed tracing
│ └── alertmanager/ # Alerting
├── scripts/
│ ├── setup.sh
│ ├── destroy.sh
│ ├── install-tools.sh
│ └── validate-manifests.sh
├── .github/
│ └── workflows/
│ ├── ci.yaml
│ ├── terraform-plan.yaml
│ ├── terraform-apply.yaml
│ ├── kubernetes-validate.yaml
│ └── argocd-sync.yaml
└── docs/
- Terraform >= 1.5.0
- kubectl >= 1.28
- Kustomize >= 5.0
- Helm >= 3.0
- AWS CLI >= 2.0 (for AWS deployments)
- OCI CLI >= 3.0 (for OCI deployments)
# Use the provided script to install all tools
./scripts/install-tools.shgit clone https://github.com/johnpaulnasc/multi-cloud-k8s-platform.git
cd multi-cloud-k8s-platformAWS:
aws configure
# Or use IAM roles with OIDC for GitHub ActionsOCI:
oci setup config
# Follow the prompts to configure OCI CLIUsing the setup script (recommended):
./scripts/setup.shManual deployment:
# Initialize Terraform
cd terraform/environments/dev/aws
terraform init
# Plan the deployment
terraform plan -out=tfplan
# Apply the changes
terraform apply tfplan
# Configure kubectl
aws eks update-kubeconfig --region us-east-1 --name multi-cloud-k8s-dev# Apply ArgoCD manifests
kustomize build kubernetes/argocd/overlays/dev | kubectl apply -f -
# Wait for ArgoCD to be ready
kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s
# Get the initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
# Port forward to access ArgoCD UI
kubectl port-forward svc/argocd-server -n argocd 8080:443# Apply the app-of-apps pattern
kubectl apply -f kubernetes/argocd/base/applications/app-of-apps.yaml- 2 Availability Zones
- Single NAT Gateway (cost optimization)
- Spot instances for worker nodes
- Minimal resource allocation
- Auto-sync enabled in ArgoCD
- 3 Availability Zones
- Single NAT Gateway
- Mix of On-Demand and Spot instances
- Medium resource allocation
- Auto-sync enabled in ArgoCD
- 3 Availability Zones
- HA NAT Gateways (one per AZ)
- On-Demand instances only
- Full resource allocation
- Private API endpoint
- Manual sync in ArgoCD
- Full logging and monitoring
module "eks" {
source = "./modules/eks"
cluster_name = "my-cluster"
kubernetes_version = "1.29"
vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnet_ids
node_groups = {
general = {
instance_types = ["t3.large"]
capacity_type = "ON_DEMAND"
desired_size = 3
max_size = 10
min_size = 2
}
}
}module "oke" {
source = "./modules/oke"
compartment_id = var.compartment_id
cluster_name = "my-cluster"
vcn_id = module.vcn.vcn_id
node_pools = {
general = {
node_shape = "VM.Standard.E4.Flex"
is_flex_shape = true
ocpus = 4
memory_in_gbs = 32
node_count = 3
}
}
}Applications follow the app-of-apps pattern:
# kubernetes/argocd/base/applications/app-of-apps.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: app-of-apps
namespace: argocd
spec:
project: platform
source:
repoURL: https://github.com/johnpaulnasc/multi-cloud-k8s-platform.git
targetRevision: HEAD
path: kubernetes/argocd/base/applications
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true- Create base manifests in
kubernetes/apps/your-app/base/ - Create overlays for each environment in
kubernetes/apps/your-app/overlays/ - Create ArgoCD Application manifests in
kubernetes/argocd/overlays/*/applications/ - Push changes and ArgoCD will automatically sync (for dev/staging)
| Workflow | Trigger | Description |
|---|---|---|
| CI | Push/PR | Linting, security scans, validation |
| Terraform Plan | PR to terraform/ | Plan infrastructure changes |
| Terraform Apply | Merge to main | Apply infrastructure changes |
| Kubernetes Validate | PR to kubernetes/ | Validate K8s manifests |
| ArgoCD Sync | Manual | Trigger ArgoCD sync |
| Security Scan | Push/PR/Weekly | Trivy, Checkov, TFSec, Kubesec, Gitleaks |
| Release | Tag push | Create GitHub release |
Configure these secrets in your GitHub repository:
AWS:
AWS_ROLE_ARN- IAM role ARN for OIDC authentication
OCI:
OCI_CLI_USER- OCI user OCIDOCI_CLI_TENANCY- OCI tenancy OCIDOCI_CLI_FINGERPRINT- API key fingerprintOCI_CLI_KEY_CONTENT- API private key contentOCI_CLI_REGION- OCI regionOCI_COMPARTMENT_ID- Compartment OCIDOCI_NODE_IMAGE_ID- Node image OCID
ArgoCD:
ARGOCD_SERVER- ArgoCD server URLARGOCD_ADMIN_PASSWORD- ArgoCD admin password
The platform implements defense-in-depth security with multiple layers of protection.
┌─────────────────────────────────────────────────────────────────────────────┐
│ Security Layers │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Admission Control (OPA Gatekeeper) │ │
│ │ • Required labels • Container limits • Allowed repos • Probes │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Pod Security Standards │ │
│ │ • Restricted (prod) • Baseline (staging) • Privileged (system) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Network Policies │ │
│ │ • Default deny • Namespace isolation • Egress control │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ RBAC (Per Environment) │ │
│ │ • Dev: developers full access • Staging: QA + viewers │ │
│ │ • Prod: SRE only + read-only for devs │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Secrets Management (External Secrets + Vault) │ │
│ │ • HashiCorp Vault • AWS Secrets Manager • OCI Vault │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Vulnerability Scanning (Trivy) │ │
│ │ • Image scanning • Config auditing • Secret detection │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The platform enforces Kubernetes Pod Security Standards per namespace:
| Environment | Enforce | Audit | Warn |
|---|---|---|---|
| Production | restricted |
restricted |
restricted |
| Staging | baseline |
restricted |
restricted |
| Development | baseline |
baseline |
restricted |
| System (kube-system, etc.) | privileged |
privileged |
privileged |
- Default deny: All ingress/egress blocked unless explicitly allowed
- Namespace isolation: Pods can only communicate within their namespace by default
- Controlled egress: DNS, HTTPS, and cloud metadata access allowed
- Cross-namespace rules: Monitoring and ArgoCD namespaces have specific access
Deploy network policies:
kustomize build kubernetes/security/network-policies | kubectl apply -f -Admission control policies enforce security at deployment time:
| Policy | Enforcement | Description |
|---|---|---|
block-privileged-containers |
Deny | Prevents privileged containers |
block-host-namespace |
Deny | Prevents hostNetwork, hostPID, hostIPC |
require-run-as-nonroot |
Deny | Requires runAsNonRoot: true |
require-readonly-rootfs |
Warn | Recommends read-only root filesystem |
block-latest-tag |
Deny | Blocks :latest image tag |
require-probes |
Warn/Deny | Requires liveness/readiness probes |
allowed-repos |
Deny | Restricts container registries |
container-limits |
Deny | Requires resource limits |
required-labels |
Deny | Enforces standard labels |
Deploy Gatekeeper policies:
kustomize build kubernetes/security/gatekeeper | kubectl apply -f -Granular RBAC per environment following principle of least privilege:
Cluster Roles:
platform-admin- Full cluster accesssecurity-auditor- Read-only security resourcesmonitoring-viewer- View metrics and monitoring data
Namespace Roles:
namespace-operator- Full namespace controlnamespace-developer- Deploy and manage workloadsnamespace-viewer- Read-only access
Environment Permissions:
| Role | Dev | Staging | Prod |
|---|---|---|---|
| Developers | Full (operator) | Read-only | Read-only |
| QA Team | Viewer | Full (developer) | Read-only |
| SRE Team | Operator | Operator | Full (operator) |
| On-Call | - | Developer | Developer |
Deploy RBAC:
# Per environment
kustomize build kubernetes/security/rbac/dev | kubectl apply -f -
kustomize build kubernetes/security/rbac/staging | kubectl apply -f -
kustomize build kubernetes/security/rbac/prod | kubectl apply -f -The platform uses External Secrets Operator integrated with multiple secret backends:
Supported Backends:
- HashiCorp Vault (recommended)
- AWS Secrets Manager
- OCI Vault
Example ExternalSecret:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
namespace: prod
spec:
refreshInterval: 15m
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: database-credentials
data:
- secretKey: DB_PASSWORD
remoteRef:
key: prod/database
property: passwordDeploy External Secrets:
kustomize build kubernetes/security/external-secrets/overlays/prod | kubectl apply -f -Continuous security scanning with Trivy Operator:
- Image vulnerabilities: Scans all container images
- Config auditing: Detects misconfigurations
- Secret detection: Finds exposed secrets in images
View vulnerability reports:
# List all vulnerability reports
kubectl get vulnerabilityreports -A
# Get detailed report
kubectl describe vulnerabilityreport <name> -n <namespace>
# List config audit reports
kubectl get configauditreports -AThe platform includes automated security scanning in GitHub Actions:
| Scanner | Target | Description |
|---|---|---|
| Trivy | IaC | Terraform and Kubernetes misconfigurations |
| Checkov | IaC | Policy-as-code security checks |
| TFSec | Terraform | Terraform-specific security analysis |
| Kubesec | Kubernetes | Kubernetes manifest security scoring |
| Gitleaks | Repository | Secret detection in code |
| TruffleHog | Repository | Verified secret detection |
Security scan results are uploaded to GitHub Security tab (SARIF format).
Run security scan manually:
# Trigger workflow
gh workflow run security-scan.yaml- Enable Pod Security Standards for all namespaces
- Deploy OPA Gatekeeper with all constraint templates
- Configure network policies for all namespaces
- Set up RBAC per environment
- Integrate External Secrets Operator with Vault
- Enable Trivy vulnerability scanning
- Configure security scanning in CI/CD
- Enable audit logging on clusters
- Configure secrets encryption at rest (KMS/OCI Vault)
- Implement image signing and verification
The platform includes a comprehensive observability stack for metrics, logs, traces, and alerting.
┌─────────────────────────────────────────────────────────────────────────────┐
│ Observability Stack │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Prometheus │ │ Loki │ │ Jaeger │ │ Alertmanager│ │
│ │ (Metrics) │ │ (Logs) │ │ (Traces) │ │ (Alerts) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │ │
│ └──────────────────┴──────────────────┴──────────────────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ Grafana │ │
│ │ (Dashboards) │ │
│ └───────────────┘ │
│ │
│ Data Collection: │
│ ┌─────────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │Node Exporter│ │ Promtail │ │OTel Collector│ │
│ │ (Nodes) │ │ (Logs) │ │ (Traces) │ │
│ └─────────────┘ └─────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Component | Description | Port |
|---|---|---|
| Prometheus | Metrics collection and storage | 9090 |
| Grafana | Visualization and dashboards | 3000 |
| Loki | Log aggregation (like Prometheus for logs) | 3100 |
| Promtail | Log collection agent (DaemonSet) | 9080 |
| Jaeger | Distributed tracing | 16686 |
| OTel Collector | OpenTelemetry trace processing | 4317/4318 |
| Alertmanager | Alert routing and notifications | 9093 |
| kube-state-metrics | Kubernetes object metrics | 8080 |
| node-exporter | Node-level metrics | 9100 |
| Dashboard | Description |
|---|---|
| Kubernetes Cluster Overview | Nodes, pods, CPU, memory, namespaces |
| Multi-Cloud Overview | Cross-cloud comparison (AWS vs OCI) |
| Application Metrics (RED) | Rate, Errors, Duration per service |
| Logs Explorer | Search and analyze logs from Loki |
| Tracing Overview | Distributed traces from Jaeger |
The platform includes pre-configured alerts for:
Infrastructure Alerts:
NodeNotReady- Node is not ready for 5+ minutesNodeHighCPU- CPU usage above 85%NodeHighMemory- Memory usage above 85%NodeDiskFull- Disk usage above 90%
Pod/Deployment Alerts:
PodCrashLooping- Pod restarting frequentlyPodNotReady- Pod not ready for 10+ minutesContainerOOMKilled- Container killed due to OOMDeploymentReplicasMismatch- Desired vs available replicas
SLO-Based Alerts:
HighErrorRate- Error rate above 5%HighLatency- p95 latency above 1 secondLowAvailability- Availability below 99.9%
Multi-Cloud Alerts:
CloudProviderDown- No nodes detected for a cloud providerCrossCloudLatencyHigh- High latency between clouds
Via ArgoCD (Recommended):
# The monitoring stack is automatically deployed via app-of-apps
kubectl apply -f kubernetes/argocd/base/applications/app-of-apps.yamlManual Deployment:
# Deploy entire stack
kustomize build kubernetes/monitoring | kubectl apply -f -
# Or deploy individual components
kustomize build kubernetes/monitoring/prometheus/base | kubectl apply -f -
kustomize build kubernetes/monitoring/grafana/base | kubectl apply -f -
kustomize build kubernetes/monitoring/loki/base | kubectl apply -f -
kustomize build kubernetes/monitoring/jaeger/base | kubectl apply -f -
kustomize build kubernetes/monitoring/alertmanager/base | kubectl apply -f -# Grafana (default: admin/admin123!)
kubectl port-forward svc/grafana -n monitoring 3000:3000
# Prometheus
kubectl port-forward svc/prometheus -n monitoring 9090:9090
# Jaeger UI
kubectl port-forward svc/jaeger-query -n monitoring 16686:16686
# Alertmanager
kubectl port-forward svc/alertmanager -n monitoring 9093:9093Update kubernetes/monitoring/alertmanager/base/alertmanager-config.yaml with your notification channels:
receivers:
- name: 'slack-alerts'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
channel: '#alerts'
- name: 'pagerduty-critical'
pagerduty_configs:
- service_key: 'YOUR_PAGERDUTY_KEY'For metrics (Prometheus):
# Add annotations to your pods
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"For tracing (OpenTelemetry):
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector.monitoring:4317"
- name: OTEL_SERVICE_NAME
value: "my-service"For logs (Loki/Promtail):
- Logs are automatically collected from stdout/stderr
- Use structured logging (JSON) for better querying
- Add labels via pod annotations for filtering
Terraform state lock:
terraform force-unlock <lock-id>ArgoCD out of sync:
argocd app sync <app-name> --forcekubectl context issues:
# AWS
aws eks update-kubeconfig --region <region> --name <cluster-name>
# OCI
oci ce cluster create-kubeconfig --cluster-id <cluster-id> --file ~/.kube/config# Check cluster status
kubectl get nodes
kubectl get pods -A
# Check ArgoCD applications
argocd app list
argocd app get <app-name>
# View logs
kubectl logs -f deployment/<name> -n <namespace>
# Validate manifests locally
./scripts/validate-manifests.sh- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
John Paul Nascimento
- GitHub: @johnpaulnasc