GitHub - codewithbrandon/secure-cloud-platform: Enterprise Azure Infrastructure with Zero Trust Security

██████╗ ███████╗██╗   ██╗███████╗███████╗ ██████╗
██╔══██╗██╔════╝██║   ██║██╔════╝██╔════╝██╔════╝
██║  ██║█████╗  ██║   ██║███████╗█████╗  ██║
██║  ██║██╔══╝  ╚██╗ ██╔╝╚════██║██╔══╝  ██║
██████╔╝███████╗ ╚████╔╝ ███████║███████╗╚██████╗
╚═════╝ ╚══════╝  ╚═══╝  ╚══════╝╚══════╝ ╚═════╝

    P O L I C Y - E N F O R C E D   D E V S E C O P S

Enterprise Azure security platform — every resource declared, scanned, policy-evaluated, and validated.

What Makes This Different

Most DevSecOps demos add a Trivy scan to a pipeline. This platform enforces security policy at every layer — from the Terraform plan to the live Azure control plane.

Layer	Tool	Gate
Source code	gitleaks + Bandit	Blocks on any secret or HIGH severity
Dependencies	pip-audit	Advisory check against OSV/PyPI
IaC (source)	Checkov + tfsec	Blocks on CRITICAL/HIGH — parallel
IaC (runtime)	OPA / Conftest	Blocks on plan-evaluated policy violations
Container	Trivy	Blocks on CRITICAL/HIGH CVEs in image
Infrastructure	validate_azure.sh	Blocks deploy if live Azure state fails checks
Runtime	Microsoft Sentinel	KQL detections with 5-minute polling
Drift	terraform plan	Daily — unstable if production diverges

Architecture

                               ╔═══════════════╗
                               ║    INTERNET    ║
                               ╚═══════╤═══════╝
                                       │ HTTPS only
                               ╔═══════▼═══════╗
                               ║ APP GATEWAY   ║  ← Sole internet-facing boundary
                               ║ WAF v2        ║    DDoS, SQL inj., TLS termination
                               ╚═══════╤═══════╝
                                       │
╔══════════════════════════════════════╪══════════════════════════════════════╗
║              VIRTUAL NETWORK  10.0.0.0/16                                  ║
║                                      │                                     ║
║  ╔══════════════════════════════════════════════════════════╗               ║
║  ║         AKS CLUSTER  (private API server)                ║               ║
║  ║                                                          ║               ║
║  ║  ╔═══════════════════╗     ╔═══════════════════════════╗ ║               ║
║  ║  ║  secure-app       ║     ║  monitoring               ║ ║               ║
║  ║  ║  ─────────────    ║     ║  ─────────────────────    ║ ║               ║
║  ║  ║  Flask API  x2    ║     ║  Prometheus  Grafana      ║ ║               ║
║  ║  ║  NetworkPolicy    ║     ║  Alertmanager             ║ ║               ║
║  ║  ║  (default-deny)   ║     ║  node-exporter            ║ ║               ║
║  ║  ╚═══════════════════╝     ╚═══════════════════════════╝ ║               ║
║  ╚══════════════════════════════════════════════════════════╝               ║
║                                                                             ║
║  ╔══════════════╗  ╔═══════════════╗  ╔═══════════╗  ╔══════════════════╗  ║
║  ║  JENKINS VM  ║  ║   AZURE SQL   ║  ║ KEY VAULT ║  ║  ACR (Premium)   ║  ║
║  ║  IaC + CI/CD ║  ║  Private EP   ║  ║ MSI-only  ║  ║  Admin disabled  ║  ║
║  ╚══════════════╝  ╚═══════════════╝  ╚═══════════╝  ╚══════════════════╝  ║
║                                                                             ║
║  ╔═══════════════════════════════════════════════════════════════════════╗  ║
║  ║  LOG ANALYTICS WORKSPACE  (90-day retention)                         ║  ║
║  ║  AKS audit → Microsoft Sentinel → KQL rules → Incident → Playbook   ║  ║
║  ╚═══════════════════════════════════════════════════════════════════════╝  ║
╚═════════════════════════════════════════════════════════════════════════════╝

Deployed Resources — 71 Total

Category	Resources
Compute	AKS (private, RBAC, AAD, autoscale 2–5 nodes), Jenkins VM
Networking	VNet, 5 Subnets, 5 NSGs (default-deny), Application Gateway WAF v2
Data	Azure SQL (private endpoint), Key Vault (MSI-only, purge-protect)
Containers	ACR Premium (admin disabled), Flask API (x2 pods, non-root)
Observability	Prometheus, Grafana, Alertmanager, kube-state-metrics, node-exporter
Security	Managed Identities, RBAC Assignments, NetworkPolicies, Sentinel Rules
Governance	Log Analytics (90d retention), Diagnostic Settings (all resources)

CI/CD Pipeline — 19 Stages

 EVERY BRANCH / PR
 ┌─────────────────────────────────────────────────────────────────────────┐
 │                                                                         │
 │  [1]Checkout ──► [2]Secrets Scan ──► [3]Dependency Scan ──► [4]SAST   │
 │                       │                                        │        │
 │                    BLOCKS                                   BLOCKS      │
 │                  on any leak                              on HIGH sev   │
 │                                                                │        │
 │  [5]Unit Tests ◄────────────────────────────────────────────────        │
 │       │                                                                 │
 │       ▼                                                                 │
 │  [6]Terraform Init + fmt-check + validate                               │
 │       │  BLOCKS if unformatted or invalid schema                        │
 │       │                                                                 │
 │       ▼                                                                 │
 │  [7]IaC Security Scan ─────────── parallel ─────────────────────────   │
 │       │         Checkov (NIST/CIS/SOC2)    tfsec (Azure-specific)      │
 │       │         BLOCKS on CRITICAL/HIGH    BLOCKS on CRITICAL/HIGH     │
 │       │                                                                 │
 │  [8]Terraform Plan ──► saves binary plan + JSON                        │
 │       │  (binary reused for apply — no TOCTOU)                         │
 │       │                                                                 │
 │       ▼                                                                 │
 │  [9]OPA / Conftest Policy Check                                         │
 │       │   deny_public_ip.rego ──── no internet-facing resources        │
 │       │   deny_open_nsg.rego  ──── no 0.0.0.0/0 on mgmt ports         │
 │       │   require_tags.rego   ──── 6 mandatory tags enforced           │
 │       │   kubernetes.rego     ──── container security baseline         │
 │       │   BLOCKS on ANY policy violation                                │
 │       │                                                                 │
 │  [10]Build ──► [11]Trivy Scan ──► [12]Push ACR                        │
 │                     │                                                  │
 │                  BLOCKS on                                              │
 │                  CRIT/HIGH CVE                                          │
 │                                                                         │
 │  [13]Validate K8s  [14]Validate Monitoring                             │
 │                                                                         │
 └─────────────────────────────────────────────────────────────────────────┘

 MAIN BRANCH ONLY  ── manual approval gate captures JIRA ticket
 ┌─────────────────────────────────────────────────────────────────────────┐
 │                                                                         │
 │  [15]Terraform Apply (exact plan from stage 8)                         │
 │       │                                                                 │
 │       ▼                                                                 │
 │  [16]Azure Post-Deploy Validation (validate_azure.sh)                  │
 │       │   Queries LIVE Azure control plane — not Terraform state       │
 │       │   Private cluster / RBAC / node IPs / NSGs / ACR / Logs        │
 │       │   BLOCKS AKS deployment on any critical finding                 │
 │       │                                                                 │
 │  [17]Deploy to AKS ──► [18]Deploy Monitoring ──► [19]Smoke Tests      │
 │                                                                         │
 └─────────────────────────────────────────────────────────────────────────┘

 SCHEDULED NIGHTLY (separate job)
 ┌────────────────────────────────────────────────┐
 │  terraform plan -detailed-exitcode             │
 │  exit 2 → UNSTABLE + alert #security-ops       │
 │  "Drift detection: nightly posture check"      │
 └────────────────────────────────────────────────┘

Policy-as-Code

OPA/Conftest evaluates the Terraform plan JSON — not the source code. This means it sees what will actually be deployed, including computed values and resource combinations that look fine individually but create risk together.

terraform plan -out=tfplan
terraform show -json tfplan > tfplan.json
conftest test tfplan.json --policy policies/ --output table

`policies/deny_public_ip.rego` — MITRE T1190, T1046

DENY  azurerm_public_ip on any non-approved resource
DENY  AKS cluster without private_cluster_enabled = true
DENY  SQL server with public_network_access_enabled = true
DENY  Key Vault with network_acls.default_action = "Allow"
ALLOW Application Gateway  (approved DMZ boundary)
ALLOW Azure Bastion        (approved management entry point)

`policies/deny_open_nsg.rego` — MITRE T1021, T1133

DENY  Allow-Inbound from 0.0.0.0/0 → port 22   (SSH)
DENY  Allow-Inbound from 0.0.0.0/0 → port 3389  (RDP)
DENY  Allow-Inbound from 0.0.0.0/0 → port *     (wildcard)
DENY  AKS subnet NSG with any inbound from internet
WARN  Allow-Outbound to 0.0.0.0/0 on all ports  (T1041 exfil risk)

`policies/require_tags.rego` — Governance / Compliance

DENY  Missing: Environment | Team | CostCenter | Owner | ManagedBy | DataClassification
DENY  DataClassification not in {public|internal|confidential|restricted}
DENY  ManagedBy != "Terraform"  (unmanaged = bypasses pipeline scanning)
DENY  CostCenter not matching CC-XXXX format
DENY  Prod databases/vaults with DataClassification = "public"
WARN  Owner tag is not an email address  (can't page during incident)

Infrastructure Drift Detection

A dedicated Jenkins job runs every morning at 06:00:

terraform plan -detailed-exitcode -var-file=terraform.tfvars

# Exit codes:
#   0 = no changes     → green
#   1 = plan error     → pipeline failure (investigate)
#   2 = drift detected → UNSTABLE + Slack alert to #security-ops

Why this matters: A drifted resource is an unreviewed change — equivalent to an unreviewed PR. Common causes: emergency portal fixes, Azure platform updates, compromised accounts modifying resources directly (MITRE T1578).

Drift detection closes the gap from "discovered at next audit" to "discovered by 06:15 AM the next morning."

Post-Deploy Validation

validation/validate_azure.sh queries the live Azure control plane after every apply. Terraform state can diverge from reality — provider bugs, race conditions, and manual edits all create gaps. This script is the authoritative check.

#	Check	Method	MITRE
1	AKS private cluster enabled	`az aks show`	T1046
2	AKS RBAC enabled	`az aks show`	T1078
3	No public IPs on AKS nodes	`az network nic list`	T1190
4	No 0.0.0.0/0 on SSH/RDP in NSGs	`az network nsg list`	T1021
5	ACR admin account disabled	`az acr show`	T1525
6	AcrPull role exists on ACR	`az role assignment list`	T1610
7	AKS diagnostic logs enabled	`az monitor diagnostic-settings list`	T1562

Pipeline behavior: any CRITICAL failure returns exit code 1 — Stage 17 (AKS deploy) does not run.

Defense in Depth

 GOVERNANCE ──────────────────────────────────────────────────────────────
  OPA plan evaluation    Catches misconfig before any Azure API call
  Checkov + tfsec        NIST 800-53, CIS, SOC 2, Azure-specific rules
  Drift detection        Nightly posture check against Terraform state
  Mandatory tagging      Owner + CostCenter + DataClassification enforced

 NETWORK ──────────────────────────────────────────────────────────────────
  Default-deny NSGs      No implicit permit between subnets
  K8s NetworkPolicies    Pod-level default-deny in every namespace
  Private AKS            API server reachable only inside VNet
  App Gateway WAF        Single internet entry point; all else private

 IDENTITY ─────────────────────────────────────────────────────────────────
  Managed Identity       No passwords, keys, or connection strings stored
  OIDC pipeline auth     Short-lived tokens; no ARM_CLIENT_SECRET
  Azure AD + K8s RBAC    Least-privilege at every layer

 CONTAINER ────────────────────────────────────────────────────────────────
  Non-root (UID 65534)   No process runs as root inside containers
  ReadOnly root FS       Filesystem immutable at runtime
  Dropped capabilities   All Linux caps removed; none added back
  PSS restricted         Pod Security Standards enforced at namespace level

 PIPELINE ─────────────────────────────────────────────────────────────────
  gitleaks               Blocks on any secret/credential detected
  Bandit SAST            Blocks on HIGH severity code findings
  pip-audit              Dependency CVE check against OSV advisories
  Checkov + tfsec        IaC misconfig — blocks on CRITICAL/HIGH
  OPA / Conftest         Plan-level policy enforcement
  Trivy                  Container image CVE — blocks on CRITICAL/HIGH

 DETECT & RESPOND ─────────────────────────────────────────────────────────
  Microsoft Sentinel     KQL analytics rules with 5-minute polling
  AKS audit logs         kubectl exec, privileged pods, RBAC escalation
  Prometheus alerts      CPU/memory anomalies, error rate spikes
  Post-deploy checks     Live Azure state validated after every apply

Threat Model

STRIDE Analysis

Category	Attack Scenario	Control	Detection
Spoofing	Stolen kubeconfig used to exec into pod	Private cluster (no external API)	Sentinel T1609 — 5-min alert
Tampering	Portal edit opens NSG to 0.0.0.0/0	NSG OPA policy (plan-time)	Drift detection catches by 06:15 AM
Repudiation	No audit trail for infrastructure changes	Terraform state + Log Analytics (90d)	All API calls logged
Info Disclosure	ACR admin password leaked in CI logs	Admin disabled; MSI-only auth	No password exists to leak
Denial of Service	Cryptomining container spikes CPU	Resource limits + HPA	Prometheus alert in minutes
Elevation of Privilege	cluster-admin binding via compromised SA	OPA deny (pipeline)	Sentinel KQL (runtime)

MITRE ATT&CK Coverage

Technique	Description	Control
T1190	Exploit Public-Facing Application	No public IPs (OPA + post-deploy)
T1046	Network Service Discovery	Private AKS + NSG default-deny
T1021	Remote Services (SSH/RDP)	NSG policy + Azure Bastion only
T1078	Valid Accounts (RBAC escalation)	RBAC enforced + Sentinel KQL
T1525	Implant Internal Image	ACR admin disabled + Trivy scan
T1552	Credentials in Files	gitleaks + OIDC (no stored secrets)
T1562	Disable/Modify Cloud Logs	Diagnostic settings validated
T1609	Container Administration Command	Sentinel KQL — exec detection
T1610	Deploy Container	AcrPull role enforced
T1611	Escape to Host	Non-root + dropped caps + PSS
T1578	Modify Cloud Compute Infrastructure	Drift detection — nightly check
T1195	Supply Chain Compromise	pip-audit + pinned providers

Observability

Prometheus (metrics) ──► Grafana (dashboards) ──► Alertmanager (routing)
                                                        │
                              ┌─────────────────────────┼────────────────┐
                         critical                    warning          inhibit
                              │                          │               │
                        PagerDuty               Email (8h repeat)  dedup warn
                        (0s delay,                                  same alert
                         1h repeat)                                 + namespace

Security dashboard panels: request rate · error rate · P95/P99 latency · auth failure rate · pod restarts · CPU/memory utilization · active pods by namespace

Runtime Detection (Microsoft Sentinel)

AKS API Server
    → kube-audit logs
    → Diagnostic Settings
    → Log Analytics (AKSAudit table)
    → Sentinel Analytics Rules (5-min KQL polling)
    → Incident + Entity Mapping
    → Logic App Playbook (auto-response)

Detection	Severity	MITRE	KQL File
`kubectl exec` into running pod	High → Critical	T1609	`kql/aks_pod_exec_detection.kql`
Privileged container deployed	High → Critical	T1611	`kql/privileged_pod_detection.kql`
Dangerous `hostPath` mount	Medium → Critical	T1611	`kql/privileged_pod_detection.kql`
`cluster-admin` binding created	Critical	T1078	`kql/privileged_pod_detection.kql`

Project Structure

secure-cloud-platform/
│
├── terraform/                     Infrastructure as Code (71 resources)
│   ├── main.tf                    Module orchestration
│   ├── variables.tf               Input validation with constraints
│   ├── outputs.tf                 References for downstream consumers
│   ├── providers.tf               Pinned providers, OIDC auth, remote state
│   └── modules/
│       ├── networking/            VNet · 5 Subnets · 5 NSGs
│       ├── aks/                   Private cluster · RBAC · AAD · Azure CNI
│       ├── acr/                   Premium · private endpoint · admin disabled
│       ├── sql/                   Private endpoint · AAD auth
│       ├── keyvault/              MSI-only · purge-protect · network restricted
│       ├── appgateway/            WAF v2 · TLS termination
│       └── jenkins/               CI/CD VM in management subnet
│
├── policies/                      OPA / Conftest policy bundle
│   ├── deny_public_ip.rego        T1190/T1046 — no public IPs on internals
│   ├── deny_open_nsg.rego         T1021 — no 0.0.0.0/0 on mgmt ports
│   ├── require_tags.rego          Governance — 6 mandatory resource tags
│   └── kubernetes.rego            Container security baseline
│
├── validation/
│   └── validate_azure.sh          Post-deploy Azure CLI checks (7 controls)
│
├── jenkins/
│   └── Jenkinsfile                19-stage pipeline + drift detection template
│
├── app/                           Flask API — metrics source + attack surface sim
├── k8s/                           Application manifests (PSS restricted)
├── monitoring/                    Prometheus · Grafana · Alertmanager stack
├── sentinel/                      KQL detection rules + ARM templates
└── docs/                          Runtime detection guide + SOC runbook

Quick Start

# 1. Prerequisites
az login && az account set --subscription <id>
terraform version   # >= 1.6
conftest version    # >= 0.46

# 2. Infrastructure
cd terraform
cp terraform.tfvars.example terraform.tfvars && vim terraform.tfvars
terraform init
checkov -d . --hard-fail-on CRITICAL,HIGH
tfsec . --minimum-severity HIGH
terraform plan -out=tfplan
terraform show -json tfplan > tfplan.json
conftest test tfplan.json --policy ../policies \
  --namespace terraform.deny_public_ip \
  --namespace terraform.deny_open_nsg \
  --namespace terraform.require_tags
terraform apply tfplan

# 3. Validate live state
RESOURCE_GROUP=rg-seccloud-prod \
AKS_CLUSTER_NAME=aks-seccloud-prod \
ACR_NAME=acrseccloudprod \
./validation/validate_azure.sh

# 4. Access Grafana
kubectl port-forward svc/grafana 3000:3000 -n monitoring
# → http://localhost:3000

Interview Talking Points

"How do you prevent cloud misconfiguration?"

Three layers. Checkov and tfsec scan the Terraform source code for known-bad patterns. OPA evaluates the actual plan JSON — what will be created including computed values that only resolve at runtime. Then validate_azure.sh queries the live Azure control plane after apply. Each layer catches what the previous one cannot. Terraform state diverges from live state more often than people expect.

"What is Policy-as-Code and why does it matter at scale?"

Policy-as-Code treats security rules as version-controlled artifacts. Every change to a Rego policy is a PR with a diff, review, and approval — creating an audit trail of every policy decision. It also scales infinitely: 100 engineers committing simultaneously all get the same policy evaluation. Manual security review cannot match that consistency. The OPA policies in this repo block a new engineer from accidentally creating a public-facing database as reliably as they block a senior engineer.

"How do you handle secrets in a CI/CD pipeline?"

No secrets. The pipeline authenticates to Azure via OIDC Workload Identity Federation — the Jenkins agent exchanges a short-lived OIDC token for a scoped Azure access token at runtime. No ARM_CLIENT_SECRET is stored anywhere. Application secrets live in Key Vault, accessed by pods through Managed Identity. gitleaks scans every commit so accidental inclusions never reach remote. MITRE T1552.

"How do you detect infrastructure drift?"

A scheduled Jenkins job runs terraform plan -detailed-exitcode every morning. Exit code 2 means drift — a resource differs from Terraform state. This catches emergency portal changes that were never committed, Azure platform updates that modified properties, and — critically — unauthorized modifications by a compromised identity (MITRE T1578). The window between a change and its detection goes from "next audit" to "before 07:00 AM."

"Why not just use Azure Policy for enforcement?"

Azure Policy enforces at the Azure API layer — after a resource is created or modified. OPA enforces at the Terraform plan layer — before any API call is made. OPA also supports governance rules that aren't misconfigurations per se: tagging formats, naming conventions, approved regions, cost center validation. The combination gives shift-left enforcement (OPA in CI) plus runtime backstop (Azure Policy), with full audit trail for both.

Compliance Coverage

Framework	Controls
PCI-DSS	Network segmentation · encryption at rest/transit · access control · 90-day audit logs · vulnerability scanning
SOC 2	Availability (HPA, multi-node) · access controls · change management (Terraform gates) · security monitoring
HIPAA	Encryption (TDE + TLS 1.2+) · MSI/RBAC access · audit trails · minimum necessary access
NIST CSF	Identify (tagging) · Protect (IaC controls) · Detect (Sentinel) · Respond (runbooks) · Recover (drift detection)

Tech Stack

	Technology	Purpose
☁️	Microsoft Azure	Cloud platform (71 resources)
🏗️	Terraform 1.6+	Infrastructure as Code, remote state, OIDC
⚖️	OPA / Conftest	Policy-as-Code on Terraform plan JSON
🔍	Checkov + tfsec	IaC security scanning (parallel, NIST/CIS/Azure)
🐳	Docker + Kubernetes 1.28	Containerised workloads, private cluster
🔧	Jenkins	19-stage DevSecOps pipeline
🛡️	Trivy + Bandit + gitleaks	Container, SAST, and secrets scanning
📡	Microsoft Sentinel	SIEM/SOAR — KQL runtime detection
📊	Prometheus + Grafana	Metrics, dashboards, alert routing
🔐	Azure Key Vault + Managed Identity	Secrets management, zero stored credentials
🐍	Python 3.11 + Flask	Application layer
📝	Bash + Azure CLI	Post-deploy infrastructure validation

Every commit evaluated. Every resource validated. Every change audited.

Built to demonstrate $130k–$150k DevSecOps engineering depth across governance, automation, enforcement, and cloud security posture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What Makes This Different

Architecture

CI/CD Pipeline — 19 Stages

Policy-as-Code

`policies/deny_public_ip.rego` — MITRE T1190, T1046

`policies/deny_open_nsg.rego` — MITRE T1021, T1133

`policies/require_tags.rego` — Governance / Compliance

Infrastructure Drift Detection

Post-Deploy Validation

Defense in Depth

Threat Model

Observability

Runtime Detection (Microsoft Sentinel)

Project Structure

Quick Start

Interview Talking Points

Compliance Coverage

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
ansible/playbooks		ansible/playbooks
app		app
docs		docs
images		images
jenkins		jenkins
k8s		k8s
monitoring		monitoring
policies		policies
scripts		scripts
sentinel		sentinel
terraform		terraform
validation		validation
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

What Makes This Different

Architecture

CI/CD Pipeline — 19 Stages

Policy-as-Code

policies/deny_public_ip.rego — MITRE T1190, T1046

policies/deny_open_nsg.rego — MITRE T1021, T1133

policies/require_tags.rego — Governance / Compliance

Infrastructure Drift Detection

Post-Deploy Validation

Defense in Depth

Threat Model

Observability

Runtime Detection (Microsoft Sentinel)

Project Structure

Quick Start

Interview Talking Points

Compliance Coverage

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`policies/deny_public_ip.rego` — MITRE T1190, T1046

`policies/deny_open_nsg.rego` — MITRE T1021, T1133

`policies/require_tags.rego` — Governance / Compliance

Packages