This project implements a PPO-based adaptive cloud scheduler that dynamically allocates CPU and memory resources across multiple compute nodes. The environment simulates workload arrivals and cluster-level scheduling decisions.
- Multi-node cloud cluster simulation
- Poisson workload generator
- SLA-aware reward function
- Load balancing penalty
- PPO reinforcement learning agent
- Tensorboard experiment tracking
- Docker reproducibility
Proximal Policy Optimization (PPO)
- Per-node CPU and memory utilization
- Incoming task CPU and memory demand
- Global queue backlog
The reward function optimizes:
- Resource utilization
- Task latency
- SLA violations
- Load imbalance
Run training:
python train_ppo.py
Run evaluation:
python evaluate.py
tensorboard --logdir logs
docker build -t cloud-rl-advanced . docker run cloud-rl-advanced
The trained PPO agent learns scheduling strategies that improve utilization while maintaining low task backlog and balanced cluster load.