A baseline predictive alerting system for cloud-service metrics using a sliding-window formulation and logistic regression.
The goal of this project is to predict whether an incident will occur within the next H time steps based on the previous W steps of service metrics.
In this prototype:
- W (window size) is the number of past time steps used as input
- H (horizon) is the number of future time steps in which an incident is predicted
The task is formulated as a binary classification problem:
1— an incident will occur within the nextHsteps0— no incident will occur within the nextHsteps
To keep the project focused on problem formulation, model design, and evaluation, I used a synthetic multivariate time-series dataset instead of a large real-world dataset.
The generated metrics are:
cpu_usagememory_usagerequest_rateerror_rate
The dataset also includes binary incident labels:
incident = 1means the system is in an incident intervalincident = 0means normal operation
Synthetic incident intervals are injected by increasing several metrics for short periods of time, which simulates abnormal system behavior.
The time series is converted into supervised learning examples using sliding windows.
For each sample:
- input
Xcontains the previousWtime steps of all metrics - target
yis1if at least one incident occurs in the nextHtime steps
In the current baseline:
W = 20H = 5
The project pipeline is:
- Generate synthetic cloud-service metrics with incident intervals
- Create sliding-window samples
- Flatten windows into feature vectors for a classical ML model
- Split data into train and test sets
- Scale features with
StandardScaler - Train a
LogisticRegressionbaseline - Predict incident probabilities
- Apply a configurable alert threshold
- Evaluate the model using classification metrics
I used Logistic Regression as a simple and interpretable baseline for binary classification.
This choice makes it easy to:
- validate the problem formulation
- establish a baseline before trying more complex models
- inspect the effect of threshold selection on alert behavior
This baseline is useful because it provides a simple and interpretable starting point for predictive alerting.
It helps validate:
- the sliding-window problem formulation
- the incident labeling strategy
- the probability-based alerting setup
- the effect of threshold selection on alert behavior
Before moving to more complex models, this baseline makes it easier to understand whether the core framing of the task is reasonable.
The model is evaluated using:
- Precision — how often predicted incidents are correct
- Recall — how many real incidents are detected
- F1-score — balance between precision and recall
- Confusion matrix — summary of correct and incorrect predictions
In predictive alerting, model errors have operational meaning:
- False positives correspond to unnecessary alerts
- False negatives correspond to missed incidents
This makes threshold selection especially important. A lower threshold may increase recall but also produce more alert noise, while a higher threshold may reduce false alarms at the cost of missing more real incidents.
Because of this, the project evaluates not only raw model predictions, but also how decision thresholds affect alert quality.
The model predicts incident probabilities, and an alert is raised if the probability exceeds a chosen threshold.
I tested multiple thresholds:
| Threshold | Precision | Recall | F1-score |
|---|---|---|---|
| 0.3 | 0.7037 | 0.7917 | 0.7451 |
| 0.5 | 0.8636 | 0.7917 | 0.8261 |
| 0.7 | 0.9500 | 0.7917 | 0.8636 |
On the current synthetic split, 0.7 produced the strongest result because it reduced false positives while keeping recall unchanged.
Current best baseline configuration on the synthetic test split:
- Model: Logistic Regression
- Window size (W): 20
- Prediction horizon (H): 5
- Best threshold: 0.7
- Precision: 0.9500
- Recall: 0.7917
- F1-score: 0.8636
This threshold produced the strongest trade-off on the current split by reducing false positives while preserving recall.
The project generates plots in the artifacts/ folder:
- metrics_with_incidents.png — service metrics over time with highlighted incident intervals
- predicted_probabilities.png — predicted incident probabilities with a decision threshold
These plots help interpret both the synthetic data and the model behavior.
predictive-cloud-alerting/
├── README.md
├── requirements.txt
├── data/
├── artifacts/
└── src/
├── main.py
├── data_generation.py
├── dataset.py
├── model.py
├── evaluation.py
└── visualization.py
Install dependencies:
python3 -m pip install --user --break-system-packages -r requirements.txtRun the project:
python3 src/main.pyThe experiment can be configured from the command line:
python3 src/main.py --num-steps 300 --window-size 20 --horizon 5 --threshold 0.7 --random-seed 42Key configurable parameters:
--num-steps— number of generated time steps--window-size— sliding window sizeW--horizon— prediction horizonH--threshold— alert decision threshold--random-seed— random seed for reproducible synthetic data
This is still a simplified prototype.
Main limitations:
- the dataset is synthetic and does not capture all real production behaviors
- the baseline model uses flattened windows and does not explicitly model temporal structure
- incident generation is rule-based and intentionally simplified
- results may vary depending on synthetic data settings and train/test split
Potential next steps:
- add more realistic seasonality and noise patterns
- include additional metrics such as latency
- compare Logistic Regression with Random Forest or other baselines
- save synthetic data and evaluation results automatically
- tune thresholds based on operational goals
- test the approach on a public real-world time-series dataset
A similar predictive alerting approach could be adapted to real monitored systems such as:
- cloud services
- backend applications
- infrastructure nodes
- VPN gateways
For example, a similar pipeline could be applied to a small self-managed VPN or gateway-like server.
In such a setup, the model could monitor signals such as:
- CPU usage
- memory usage
- active connection count
- reconnect frequency
- failed handshake rate
- packet loss
- latency
This could help raise alerts before severe overload, abnormal traffic spikes, or broader service degradation become critical.

