Note: This project was built almost 2 years ago when I first started diving deep into cloud infrastructure and DevOps practices. It represents my first production-grade deployment on AWS — from zero infrastructure knowledge to a fully automated, highly available architecture. The lessons learned here directly shaped how I approach infrastructure today.
A full-stack e-commerce application (React + Node.js + MySQL) deployed on a production-grade AWS infrastructure built entirely with Infrastructure as Code. The goal was not just to deploy an app — but to build the kind of infrastructure you'd find in a real enterprise environment: multi-AZ, auto-scaling, automated deployments, and cost-optimized.
Key results achieved:
- 99.9% uptime after architecture stabilization
- Deployment time reduced from 35 minutes (manual) → 6 minutes (CI/CD)
- Infrastructure cost optimized by -35% through right-sizing and Spot instances
- 10-15 deployments/week with zero downtime
┌─────────────────────────────────┐
│ AWS Cloud (eu-west-1) │
│ │
Users ──▶ Route53 ──▶│ CloudFront + S3 (Frontend) │
│ │ │
│ ▼ │
│ ALB (SSL/TLS termination) │
│ │ │
│ ┌────▼────────────────┐ │
│ │ EKS Cluster │ │
│ │ (multi-AZ) │ │
│ │ 4-10 nodes (CA) │ │
│ │ 10-20 pods (HPA) │ │
│ └────────────────┬─────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ RDS MySQL │ │
│ │ (multi-AZ) │ │
│ └─────────────┘ │
└─────────────────────────────────┘
Application
- Frontend: React.js + Bootstrap
- Backend: Node.js + Express.js
- Database: MySQL on AWS RDS (multi-AZ)
- Auth: JWT (access + refresh tokens)
Infrastructure (AWS)
- EKS — Kubernetes cluster with auto-scaling (4-10 nodes)
- RDS — MySQL managed database, multi-AZ with automated backups
- CloudFront + S3 — Frontend CDN distribution
- ALB — Application Load Balancer with SSL/TLS termination
- Route53 — DNS management and failover
- VPC — Multi-AZ with public/private subnet isolation
- ACM — SSL certificates management
- IAM — Least privilege access control
- Secrets Manager — Credentials management
DevOps & Automation
- Terraform — 8 modules, 3 environments (dev/stage/prod), S3 backend + DynamoDB state locking
- GitLab CI/CD — 2 pipelines (infrastructure + application)
- Docker — Containerization of frontend and backend
- Kubernetes — Deployments, Services, ConfigMaps, HPA, Cluster Autoscaler
- Helm — ALB Ingress Controller installation
- Prometheus + Grafana — Infrastructure and application monitoring
The infrastructure is split into 8 Terraform modules across 3 environments :
terraform/
├── modules/
│ ├── vpc/ # VPC, subnets, IGW, NAT Gateway
│ ├── eks/ # EKS cluster, node groups, OIDC, IAM roles
│ └── rds/ # RDS MySQL instance, subnet groups, security groups
└── environments/
├── dev/ # Development environment
└── stage/ # Staging environment
State management: Remote backend on S3 + DynamoDB table for state locking — no conflicts between team members.
Two separate GitLab CI/CD pipelines:
Application pipeline (.gitlab-ci.yml):
Code Push → Build Docker image → Push to DockerHub → Deploy to EKS
Key design decisions:
- GitLab Runner with Docker executor
- Dedicated IAM user (
gitlab-ci) with minimal EKS permissions - Credentials injected via GitLab CI/CD variables — never stored in code
EKS Cluster
├── backend/
│ ├── Deployment (rolling update strategy)
│ ├── Service (LoadBalancer / NodePort)
│ ├── ConfigMap (environment configuration)
│ └── HPA (CPU/Memory autoscaling)
└── frontend/
├── Deployment
├── Service
└── ConfigMap
Cluster Autoscaler (CA) manages node scaling based on pending pods — coordinated with HPA for smooth scale-out under load.
Auto-scaling coordination (HPA + Cluster Autoscaler) The first major challenge: HPA scaled pods faster than CA added nodes, causing pending pods. Fixed by tuning CA buffer nodes and memory thresholds.
Zero-downtime deployments
Implemented proper readinessProbe and rollingUpdate strategy to ensure no traffic hits pods before they're ready.
Frontend HTTPS / CORS CloudFront forces HTTPS. Backend had to be exposed via ALB + ACM certificate on a custom subdomain to avoid mixed-content errors.
Incident — CI/CD cluster connectivity
Symptom : "couldn't get the server" during staging deployment
Diagnosis: GitLab logs → kubeconfig inspection
Root cause: Incorrect cluster credentials in CI/CD variables
Fix : Regenerated kubeconfig + validated connectivity
MTTR : 8 minutes
- Right-sized EC2 instances after load testing (m5.large → t3.medium for non-critical nodes)
- Mixed node groups: On-Demand (min capacity) + Spot instances (burst)
- S3 lifecycle policies for logs and backups
- RDS instance scheduling for dev/stage environments
This project was my real entry point into cloud infrastructure. Building it end-to-end taught me that:
- IaC is a methodology, not just a tool — versioning, testing, documentation matter as much as the code itself
- Auto-scaling requires fine tuning — enabling HPA/CA is step one, the real work is in the metrics and thresholds
- Security starts from the beginning — IAM least privilege, no secrets in code, network isolation
- FinOps is continuous — cost optimization is an ongoing process, not a one-time task
.
├── client/ # React frontend
├── server/ # Node.js backend
├── deployment/ # Kubernetes manifests (dev/stage/prod)
│ ├── dev/
│ ├── stage/
│ └── prod/
├── terraform/ # Infrastructure as Code
│ ├── modules/
│ └── environments/
├── .gitlab-ci.yml # CI/CD pipeline
└── docker-compose.yml # Local development
Built in 2024 — my first production-grade cloud infrastructure project. Current projects involve more advanced MLOps and Platform Engineering patterns.