Skip to content

cloudon-one/FinOps-Guardian

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GCP AWS Python Terraform License

FinOps Guardian

Enterprise-grade multi-cloud FinOps automation toolkit for cost optimization across GCP and AWS.

Solution What it does Cloud Notification
GCP FinOps Guardian Scans for cost optimization recommendations via Recommender API GCP Slack
AWS Resource Cleanup Identifies and removes unused resources across regions AWS SES Email

Table of Contents


GCP FinOps Guardian

Serverless solution that periodically checks for GCP recommendations using the Recommender API and delivers Slack alerts about cost savings. Supports both organization-level and project-level scanning.

Feature Detail
Runtime Python 3.12 on Cloud Functions
Memory / Timeout 512 MB / 300 s
Trigger Cloud Scheduler → Pub/Sub → Cloud Function
Notifications Slack (direct webhook or via Secret Manager)
Recommenders 10 types, individually toggleable
IaC Terraform (Google provider ~> 5.0)

GCP Architecture

GCP Architecture

Components: Cloud Scheduler → Pub/Sub → Cloud Function → Recommender API → Slack

Component Purpose
Cloud Function Runs recommendation checks (Python 3.12, factory pattern)
Cloud Scheduler + Pub/Sub Triggers on configurable cron schedule
Service Account + IAM Org-level or project-level least-privilege access
Cloud Storage Versioned bucket for function code archives
Secret Manager (optional) Secure storage for Slack webhook URL

Supported Recommenders

All 10 are enabled by default and individually toggleable via environment variables.

Category Recommender ID Detects
Idle Resources google.compute.instance.IdleResourceRecommender Idle VM instances
google.compute.disk.IdleResourceRecommender Idle persistent disks
google.compute.image.IdleResourceRecommender Unused custom images
google.compute.address.IdleResourceRecommender Idle static IP addresses
google.cloudsql.instance.IdleRecommender Idle Cloud SQL instances
Right-Sizing google.compute.instance.MachineTypeRecommender VM instance right-sizing
google.compute.instanceGroupManager.MachineTypeRecommender MIG machine type optimization
google.cloudsql.instance.OverprovisionedRecommender Cloud SQL right-sizing
Cost Optimization google.compute.commitment.UsageCommitmentRecommender Usage-based CUD recommendations
google.cloudbilling.commitment.SpendBasedCommitmentRecommender Spend-based cost savings

GCP Configuration

Required Variables
Variable Description Default
GCP_PROJECT Project where function is deployed (required)
SCAN_SCOPE "organization" or "project" "project"
ORGANIZATION_ID GCP Org ID (required if scope = organization) ""
SLACK_HOOK_URL Slack webhook URL (if not using Secret Manager) (required)
Optional Variables
Variable Description Default
MIN_COST_THRESHOLD Skip recommendations below this USD value 0
USE_SECRET_MANAGER Use Secret Manager for webhook "false"
SLACK_WEBHOOK_SECRET_NAME Secret Manager resource name ""
Recommender Toggles (all default to true)
Variable Resource
IDLE_VM_RECOMMENDER_ENABLED VM instances
IDLE_DISK_RECOMMENDER_ENABLED Persistent disks
IDLE_IMAGE_RECOMMENDER_ENABLED Custom images
IDLE_IP_RECOMMENDER_ENABLED Static IPs
IDLE_SQL_RECOMMENDER_ENABLED Cloud SQL instances
RIGHTSIZE_VM_RECOMMENDER_ENABLED VM right-sizing
RIGHTSIZE_SQL_RECOMMENDER_ENABLED Cloud SQL right-sizing
MIG_RIGHTSIZE_RECOMMENDER_ENABLED MIG machine types
COMMITMENT_USE_RECOMMENDER_ENABLED CUD usage
BILLING_USE_RECOMMENDER_ENABLED Billing optimization
IAM Roles (automatically assigned by Terraform)
Role Purpose
roles/cloudasset.viewer Asset inventory access
roles/recommender.computeViewer Compute recommendations
roles/recommender.cloudsqlViewer Cloud SQL recommendations
roles/recommender.cloudAssetInsightsViewer Asset insights
roles/recommender.billingAccountCudViewer Billing CUD recommendations
roles/recommender.ucsViewer Unattended project recommendations
roles/recommender.projectCudViewer Project-level CUD recommendations
roles/storage.objectCreator Function code storage

Required APIs (auto-enabled): Cloud Asset, Cloud Build, Cloud Functions, Cloud Scheduler, Recommender, Service Usage, Cloud Resource Manager, Secret Manager, Pub/Sub

GCP Deployment

Project-level scanning:

# terraform.tfvars
gcp_project        = "your-project-id"
gcp_region         = "us-central1"
recommender_bucket = "your-project-recommender"
scan_scope         = "project"
slack_webhook_url  = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
job_schedule       = "0 0 * * *"   # daily at midnight
job_timezone       = "America/New_York"

Organization-level scanning — add:

scan_scope      = "organization"
organization_id = "123456789012"
cd gcp-finops && terraform init && terraform plan && terraform apply
Deployed Resources
  • Service Account with IAM roles (org or project level)
  • Cloud Storage bucket (versioned, lifecycle-managed)
  • Cloud Function (Python 3.12, 512 MB)
  • Pub/Sub topic + Cloud Scheduler job
  • Secret Manager resources (if enabled)
  • All required API enablements
Slack Notification Format

Each message includes: GCP Project ID, recommendation type, recommended action, description, cost savings (with currency), and projection duration.


AWS Resource Cleanup

Automated multi-region resource cleanup with dry-run safety, tag-based preservation, and comprehensive email reporting.

Feature Detail
Runtime Python 3.12 on Lambda
Timeout 300 s (configurable)
Trigger EventBridge (configurable CRON)
Notifications SES email reports
Resources 11 resource types across all regions
IaC Terraform (AWS provider ~> 5.0)

AWS Architecture

AWS Architecture

Components: EventBridge → Lambda → (EC2, RDS, EKS, ...) → SES Email Report

Component Purpose
Lambda Function Thread-safe resource cleanup with retry & paginators
EventBridge Scheduled trigger (configurable CRON)
CloudWatch Alarms Error, throttle, duration, DLQ monitoring
SQS Dead Letter Queue Encrypted queue for failed invocations
SES Email reports (configurable region)
IAM Least-privilege, per-service granular permissions

Supported Resources

Resource Action Protection Concurrency
EC2 Instances Stop Tag + Spot exclusion Sequential
EC2 Monitoring Disable detailed monitoring - Sequential
Elastic IPs Release unassociated Tag Sequential
EBS Volumes Delete unattached Tag + EKS cluster check Sequential
Classic ELBs Delete empty Tag Sequential
RDS Instances & Clusters Stop (available only) - Threaded
EKS Node Groups Scale to zero - Threaded
Kinesis Streams Delete (upsolver_* preserved) Prefix Threaded
MSK Clusters Delete (ACTIVE only) State filter Threaded
OpenSearch Domains Delete (idle only) State filter Threaded
Resource Tags Add CreatedOn - Threaded

Safety: All operations default to dry-run mode. Set DRY_RUN=false to perform actual changes.

AWS Configuration

Environment Variables
Variable Description Default
DRY_RUN "true" for dry-run, "false" for actual deletions "true"
CHECK_ALL_REGIONS Scan all enabled AWS regions "false"
KEEP_TAG_KEY Tag key for resource preservation "auto-deletion"
KEEP_TAG_VALUE Tag value for resource preservation "skip-resource"
EMAIL_IDENTITY SES verified sender email (required)
TO_ADDRESS Recipient email address (required)
SES_REGION AWS region where SES identity is verified "us-east-1"
Default Regions (when CHECK_ALL_REGIONS=false)

us-east-1, us-east-2, us-west-1, us-west-2, eu-north-1, eu-central-1, eu-west-1

AWS Deployment

# terraform.tfvars
function_name     = "aws-resource-cleanup"
dry_run           = true
check_all_regions = false
keep_tag_key      = { "auto-deletion" = "skip-resource" }
email_identity    = "your-sender@domain.com"
to_address        = "your-recipient@domain.com"
ses_region        = "us-east-1"
event_cron        = "cron(0 23 * * ? *)"    # 11 PM UTC
sns_topic_arn     = ""                       # optional: SNS ARN for alarm notifications
cd aws-finops && terraform init && terraform plan && terraform apply
Deployed Resources
  • Lambda function (Python 3.12) with least-privilege IAM
  • EventBridge rule for scheduling
  • Encrypted SQS Dead Letter Queue
  • 6 CloudWatch alarms (with optional SNS)
  • SES email identity verification

Monitoring & Observability

Alarm Metric Threshold
Errors AWS/Lambda Errors > 0
Throttles AWS/Lambda Throttles > 0
Duration AWS/Lambda Duration > 90% of timeout
Concurrent Executions AWS/Lambda ConcurrentExecutions > 50
DLQ Delivery Failures AWS/Lambda DeadLetterErrors > 0
DLQ Messages AWS/SQS ApproximateNumberOfMessagesVisible > 0

All alarms support optional SNS notifications via the sns_topic_arn variable.

Email reports include: deleted/stopped resources, skipped resources (dry-run or tagged), failed operations, and resources needing attention.


Feature Comparison

Capability GCP FinOps Guardian AWS Resource Cleanup
Runtime Python 3.12 / Cloud Functions Python 3.12 / Lambda
IaC Terraform (Google ~> 5.0) Terraform (AWS ~> 5.0)
Trigger Cloud Scheduler + Pub/Sub EventBridge CRON
Notifications Slack (webhook / Secret Manager) SES email reports
Scope 10 recommender types 11 resource cleanup types
Concurrency Sequential per asset ThreadPoolExecutor (5 workers)
Safety Cost threshold filtering, per-recommender toggles Dry-run mode, tag-based preservation, state filtering
Credentials Optional Secret Manager Configurable SES region
Monitoring Cloud Logging + Slack CloudWatch Alarms + optional SNS + DLQ
IAM Least-privilege org/project roles Least-privilege per-service policies

Getting Started

Prerequisites

GCP AWS
Account GCP Organization or Project AWS Account
Terraform >= 1.3 (Google provider ~> 5.0) >= 1.3 (AWS provider ~> 5.0)
Python 3.12 3.12
Notification Slack webhook URL SES verified email identity

Quick Start

# GCP
cd gcp-finops && terraform init && terraform apply

# AWS
cd aws-finops && terraform init && terraform apply

Contributing

Area Guidelines
GCP Follow factory pattern for new recommenders, maintain 80%+ test coverage, use structured logging
AWS Test in dry-run mode first, follow modular function patterns, add error handling
Both Update documentation, follow existing code patterns

Adding a new GCP recommender:

  1. Create class extending Recommender in localpackage/recommender/compute/ or cloudsql/
  2. Register in factory.py
  3. Add Terraform variable in variables.tf + env var in main.tf

License

MIT License. See LICENSE for details.

Issues & Support: Open a GitHub issue with environment details and redacted logs.

About

AWS & GCP FinOps Tools

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Contributors