A full-stack solution for monitoring, analyzing, and optimizing AWS cloud costs. Built with a FastAPI backend and PostgreSQL, deployed on a self-managed Kubernetes cluster provisioned entirely with Infrastructure as Code.
- Architecture
- Features
- Tech Stack
- Repository Structure
- Getting Started
- Local Development
- Infrastructure
- Kubernetes & Helm
- CI/CD Pipeline
- Monitoring
- API Endpoints
- Testing
┌─────────────────────────────────────────────────────────────┐
│ AWS EC2 │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Kubernetes Cluster │ │
│ │ │ │
│ │ ┌──────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │cost-optimizer│ │ monitoring │ │ argocd │ │ │
│ │ │ namespace │ │ namespace │ │ namespace │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ FastAPI app │ │ Victoria │ │ ArgoCD │ │ │
│ │ │ PostgreSQL │ │ Metrics │ │ │ │ │
│ │ │ │ │ Grafana │ │ │ │ │
│ │ │ │ │ Loki │ │ │ │ │
│ │ │ │ │ Promtail │ │ │ │ │
│ │ │ │ │ Alertmgr │ │ │ │ │
│ │ └──────────────┘ └─────────────┘ └─────────────┘ │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌────────────┐ ┌─────────┐ │
│ │ ECR │ │ S3 │ │ Secrets │ │ IAM │ │
│ │ Docker │ │ reports │ │ Manager │ │ roles │ │
│ │ images │ │ backups │ │ kubeconf │ │ │ │
│ │ │ │ tf state │ │ ssh keys │ │ │ │
│ └──────────┘ └──────────┘ └────────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────────┘
The application interacts with the AWS Cost Explorer API by assuming IAM roles in connected customer accounts, fetching cost data without storing long-term credentials.
- User Authentication — Secure registration and login with JWT tokens
- Multi-Account AWS Integration — Connect multiple AWS accounts via IAM Role + External ID
- Cost & Usage Analysis — Flexible queries by date range, granularity (daily/monthly), and dimension
- Cost Forecasting — Monthly cost predictions up to 90 days ahead
- Service Cost Breakdown — Cost distribution across AWS services sorted by spend
- Database Migrations — Automated schema management via Alembic run as a Helm post-install hook
- Automated Infrastructure — Full IaC with Terraform and Ansible
- GitOps Deployment — ArgoCD for Kubernetes-native continuous delivery
- Comprehensive Monitoring — Metrics, logs, and alerts out of the box
- Security Scanning — Trivy image scanning + SonarCloud static analysis on every build
- Telegram Notifications — CI/CD pipeline results sent directly to Telegram
| Layer | Technology |
|---|---|
| Backend | Python 3.11, FastAPI, SQLAlchemy (async), Pydantic v2 |
| Database | PostgreSQL, asyncpg, Alembic |
| Auth | JWT, python-jose, passlib/bcrypt |
| Infrastructure | Terraform, AWS (EC2, ECR, S3, IAM, Secrets Manager, DynamoDB) |
| Kubernetes | kubeadm 1.28, Calico CNI, Helm 3 |
| CI/CD | GitHub Actions, ArgoCD, SonarCloud, Trivy |
| Monitoring | VictoriaMetrics, Grafana, Loki, Promtail, Alertmanager, Node Exporter |
| Provisioning | Ansible, amazon.aws collection |
| Container | Docker (multi-stage build) |
.
├── .github/workflows/
│ ├── ansible.yml # Cluster provisioning workflow
│ ├── deploy.yml # Build, test, and deploy workflow
│ └── terraform.yml # Infrastructure lifecycle workflow
├── ansible/
│ ├── inventory/dev/ # AWS EC2 dynamic inventory
│ ├── playbooks/ # site_k8s, site_k3s, master, worker, calico, helm, runner
│ └── roles/ # common, kubernetes, master, worker, helm, github-runner, calico
├── docker/
│ └── Dockerfile # Multi-stage Python build
├── helm/
│ ├── argocd/ # ArgoCD Helm chart (values, values-dev)
│ ├── cost-optimizer/ # Application Helm chart
│ └── monitoring/ # Full monitoring stack Helm chart
├── src/
│ ├── alembic/ # Database migrations
│ ├── app/
│ │ ├── api/v1/ # Routers: auth, users, aws_accounts, costs, health
│ │ ├── core/ # Config, database, security, deps
│ │ ├── models/ # SQLAlchemy models
│ │ ├── schemas/ # Pydantic schemas
│ │ └── services/ # AWS Cost Explorer service
│ ├── dev/ # docker-compose and local .env
│ └── tests/ # unit, integration, smoke
└── terraform/
├── bootstrap/ # S3 + DynamoDB for Terraform state
├── environments/dev/ # Dev environment entrypoint
└── modules/ # vpc, ec2, ecr, iam, rds, s3, alb, security_groups
- AWS Account with administrative permissions
- Terraform CLI
>= 1.6.0 - Ansible
- Docker
- kubectl and Helm 3
- GitHub PAT with
reposcope (for self-hosted runner registration)
Configure these in Settings → Secrets and variables → Actions before running any workflow:
| Secret | Description |
|---|---|
AWS_ACCESS_KEY_ID |
AWS access key ID |
AWS_SECRET_ACCESS_KEY |
AWS secret access key |
AWS_REGION |
AWS region (e.g. us-east-1) |
DEV_SECRET_KEY |
Application JWT secret key |
DEV_DB_PASSWORD |
PostgreSQL password for dev |
SONAR_TOKEN |
SonarCloud authentication token |
TOKEN |
GitHub token for SonarCloud |
GH_RUNNER_PAT |
GitHub PAT for runner registration |
K3S_TOKEN |
Token for k3s cluster (used by Ansible) |
TELEGRAM_BOT_TOKEN |
Telegram bot token for notifications |
TELEGRAM_CHAT_ID |
Telegram chat ID for notifications |
# Clone the repository
git clone https://github.com/kubezen-stack/cloud-resource-cost.git
cd cloud-resource-cost
# Configure local environment
cp src/dev/.env.example src/dev/.env
# Edit src/dev/.env with your values
# Start the full stack with Docker Compose
cd src/dev
docker-compose up -d
# Or run manually
pip install -r src/requirements.txt
cd src && alembic upgrade head
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000API docs available at http://localhost:8000/api/v1/openapi.json.
Creates S3 bucket and DynamoDB table for remote Terraform state. Run once:
cd terraform/bootstrap
terraform init
terraform applyThis creates cost-optimizer-terraform-state (S3) and cost-optimizer-terraform-locks (DynamoDB).
Via GitHub Actions → Terraform workflow → apply action.
Or locally:
cd terraform/environments/dev
terraform init
terraform plan -var="environment=dev"
terraform apply -var="environment=dev"Resources created: VPC, subnets, Security Groups, EC2 instances (t3.medium), ECR repository, S3 buckets (reports + backups), IAM roles and policies, Secrets Manager secrets (SSH key, kubeconfig, ArgoCD password).
Via GitHub Actions → Ansible workflow → select playbook.
Supported playbooks:
| Playbook | Description |
|---|---|
site_k8s.yml |
Full kubeadm cluster setup (recommended) |
site_k3s.yml |
Full k3s cluster setup |
common.yml |
Base packages + kernel config |
master_k8s.yml |
Initialize Kubernetes master node |
calico.yml |
Install and configure Calico CNI |
worker_k8s.yml |
Join worker nodes to the cluster |
helm_group.yml |
Install Helm + create ECR pull secret |
github_runner.yml |
Register self-hosted GitHub Actions runner |
kubeadm flow (site_k8s.yml):
common setup → kubernetes install → master init → calico CNI → worker join → helm install → github runner
k3s flow (site_k3s.yml):
common setup → k3s deploy → helm install → github runner
After completion, kubeconfig is automatically saved to AWS Secrets Manager as cost-optimizer-dev-kubeconfig.
| Namespace | Components |
|---|---|
cost-optimizer |
FastAPI application, PostgreSQL |
monitoring |
VictoriaMetrics, Grafana, Loki, Promtail, Node Exporter, Alertmanager |
argocd |
ArgoCD server |
Application:
helm dependency build ./helm/cost-optimizer --skip-refresh
helm upgrade --install cost-optimizer ./helm/cost-optimizer \
-f ./helm/cost-optimizer/values-dev.yaml \
--set image.repository=<ECR_URL> \
--set image.tag=<TAG> \
--set app.secretKey=<SECRET_KEY> \
--set database.password=<DB_PASSWORD> \
--set postgresql.auth.password=<DB_PASSWORD> \
--namespace cost-optimizer \
--create-namespace \
--timeout 600s --waitMonitoring:
helm dependency build ./helm/monitoring --skip-refresh
helm upgrade --install monitoring ./helm/monitoring \
-f ./helm/monitoring/values-dev.yaml \
--namespace monitoring --create-namespace \
--timeout 600s --waitArgoCD:
helm dependency build ./helm/argocd --skip-refresh
helm upgrade --install argocd ./helm/argocd \
-f ./helm/argocd/values-dev.yaml \
--namespace argocd --create-namespace \
--timeout 300s --wait| Service | NodePort | URL |
|---|---|---|
| Application | 30080 | http://<EC2_IP>:30080 |
| Grafana | 30300 | http://<EC2_IP>:30300 (admin / admin) |
| ArgoCD | 30808 | http://<EC2_IP>:30808 (admin / see below) |
| VictoriaMetrics | 8428 | port-forward required |
Get ArgoCD admin password:
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d
# Or from AWS Secrets Manager:
aws secretsmanager get-secret-value \
--secret-id "cost-optimizer-dev-argocd-password" \
--query SecretString --output textAccess VictoriaMetrics:
kubectl port-forward svc/monitoring-victoria-metrics-single-server \
8428:8428 -n monitoring --address 0.0.0.0Triggered automatically on push to main (paths: src/**, helm/**, docker/**) or manually.
sonarcloud ───────────────────────────────────────────────────────────────┐
▼
build-and-push ──► helm-deploy-dev ──────────────► run-tests ────────► notify
│
├──► helm-deploy-monitoring-dev ──────────────────┤
└──► argo-cd-dev ────────────────────────────────┘
| Job | Runner | Description |
|---|---|---|
sonarcloud |
ubuntu-latest | Static code analysis |
build-and-push |
ubuntu-latest | Build Docker image, Trivy scan, push to ECR |
helm-deploy-dev |
self-hosted | Deploy application via Helm |
helm-deploy-monitoring-dev |
self-hosted | Deploy monitoring stack via Helm |
argo-cd-dev |
self-hosted | Deploy ArgoCD via Helm (continue-on-error: true on login) |
run-tests |
self-hosted | Unit, integration, and smoke tests |
notify |
ubuntu-latest | Telegram notification with full results |
Note: Note: The ArgoCD login step uses
continue-on-error: truebecause it is not a critical step — the application is already deployed via Helm in the previous job. The login only verifies that ArgoCD is accessible after deployment.
Triggered on changes to terraform/** or manually.
| Action | Trigger |
|---|---|
plan |
Push / PR to main + manual |
apply |
Manual only |
destroy |
Manual only |
Manual trigger only. Supports both kubeadm (k8s) and k3s cluster flavors.
| Component | Role | Dev | Prod |
|---|---|---|---|
| VictoriaMetrics | Metrics storage | 3d retention | 14d + 20Gi PVC |
| VM Agent | Metrics scraping every 10s | enabled | enabled |
| Grafana | Dashboards | NodePort 30300 | ClusterIP + Ingress |
| Loki | Log aggregation | filesystem | filesystem |
| Promtail | Log shipping from pods | enabled | enabled |
| Node Exporter | Host-level metrics | enabled | enabled |
| Alertmanager | Alert routing | disabled | enabled |
- VictoriaMetrics —
http://monitoring-victoria-metrics-single-server:8428(default) - Loki —
http://monitoring-loki:3100
- Node Exporter Full (gnetId: 1860)
- Kubernetes Cluster (gnetId: 315)
The FastAPI app exposes Prometheus metrics at /metrics via prometheus-fastapi-instrumentator. VictoriaMetrics agent scrapes it at cost-optimizer.cost-optimizer.svc.cluster.local:80.
Base URL: /api/v1
| Method | Path | Description |
|---|---|---|
| POST | /auth/register |
Register a new user |
| POST | /auth/login |
Login, returns JWT access token |
| Method | Path | Description |
|---|---|---|
| GET | /users/me |
Get current user profile |
| PUT | /users/me |
Update name, email, or password |
| DELETE | /users/me |
Permanently delete account |
| Method | Path | Description |
|---|---|---|
| POST | /aws_accounts/ |
Connect an AWS account via IAM Role |
| GET | /aws_accounts/ |
List all connected accounts |
| GET | /aws_accounts/{id} |
Get details for a specific account |
| Method | Path | Query Params | Description |
|---|---|---|---|
| GET | /costs/{account_id}/costs |
start_date, end_date, granularity, group_by |
Cost and usage data |
| GET | /costs/{account_id}/forecast |
start_date, end_date |
90-day cost forecast |
| GET | /costs/{account_id}/breakdown |
start_date, end_date |
Cost breakdown by service |
| Method | Path | Description |
|---|---|---|
| GET | /health/health |
Health check (verifies DB connection) |
| GET | /health/ready |
Readiness probe for Kubernetes |
Tests are in src/tests/ and run automatically in the CI/CD pipeline.
cd src
# Unit tests
pytest tests/unit/ -v
# Integration tests (uses in-memory SQLite — no DB required)
pytest tests/integration/ -v
# Smoke tests (requires a running application)
APP_URL=http://localhost:8000 pytest tests/smoke/ -v
# All tests with JUnit XML output
pytest -v --junitxml=results.xml| Type | Location | Scope |
|---|---|---|
| Unit | tests/unit/ |
Security functions, schema validation, AWS service mocking |
| Integration | tests/integration/ |
Full API flow with SQLite in-memory database |
| Smoke | tests/smoke/ |
End-to-end tests against a live deployed application |
Test results are uploaded as artifacts in GitHub Actions and reported to SonarCloud for quality gate analysis.
- Health and readiness endpoints
- OpenAPI schema presence and required routes
- Auth registration and login flows
- JWT middleware enforcement
- User profile CRUD
- AWS account creation and listing
- Cost endpoint availability (without real AWS credentials)