Streaming probability calibration via multiplicative weights.
pip install streamcalFor development:
pip install -e ".[dev]"ML models output probabilities that are often miscalibrated—a predicted 70% doesn't mean 70% of those cases are positive. Batch calibrators (Platt scaling, isotonic regression) require periodic refits, creating a compute-drift tradeoff.
MWU maintains per-bucket bias factors with O(#buckets) cost per batch, adapting continuously without offline retraining.
Maintain bias factors
where
Semi-synthetic experiments (LightGBM base model, linear drift, B=50 buckets):
| Method | Brier | ECE | CPU ms/batch |
|---|---|---|---|
| MWU | 0.133 | 0.070 | 0.08 |
| Platt | 0.129 | 0.043 | 4.92 |
| Isotonic | 0.128 | 0.043 | 4.36 |
MWU is 61× faster than Platt while achieving comparable Brier scores.
from streamcal import MWUCalibrator
cal = MWUCalibrator(n_buckets=50, eta=0.1)
for p_raw, y in data_stream:
p_calibrated = cal.update(p_raw, y)Streaming (online):
MWUCalibrator- Multiplicative Weights UpdateOnlineSGD- Online SGD with additive updatesPerBucketEMA- Per-bucket exponential moving average
Batch (refit on accumulated data):
PlattScaling- Logistic regression on logitsIsotonicCalibrator- Isotonic regressionTemperatureScaling- Temperature scaling
from streamcal import brier_score, expected_calibration_error
brier = brier_score(y_true, y_pred)
ece = expected_calibration_error(y_true, y_pred, n_bins=20)pip install -e ".[experiments]"
python experiments/run_experiments.py
python experiments/generate_figures.pySee ms/mwu_calibration.pdf for theory and full results.
This uses the same MWU/mirror descent algorithm as onlinerake (survey weighting), applied to probability calibration instead of sample reweighting.
MIT