MWU Calibration

Streaming probability calibration via multiplicative weights.

Installation

pip install streamcal

For development:

pip install -e ".[dev]"

The Problem

ML models output probabilities that are often miscalibrated—a predicted 70% doesn't mean 70% of those cases are positive. Batch calibrators (Platt scaling, isotonic regression) require periodic refits, creating a compute-drift tradeoff.

MWU maintains per-bucket bias factors with O(#buckets) cost per batch, adapting continuously without offline retraining.

Method

Maintain bias factors $c_b$ per bucket. After each batch:

$$c_b \leftarrow c_b \cdot \exp(-\eta \cdot (\bar{p}_b - \bar{y}_b))$$

where $\bar{p}_b$ is the mean calibrated probability and $\bar{y}_b$ is the observed outcome rate in bucket $b$.

Results

Semi-synthetic experiments (LightGBM base model, linear drift, B=50 buckets):

Method	Brier	ECE	CPU ms/batch
MWU	0.133	0.070	0.08
Platt	0.129	0.043	4.92
Isotonic	0.128	0.043	4.36

MWU is 61× faster than Platt while achieving comparable Brier scores.

Usage

from streamcal import MWUCalibrator

cal = MWUCalibrator(n_buckets=50, eta=0.1)

for p_raw, y in data_stream:
    p_calibrated = cal.update(p_raw, y)

Available Calibrators

Streaming (online):

MWUCalibrator - Multiplicative Weights Update
OnlineSGD - Online SGD with additive updates
PerBucketEMA - Per-bucket exponential moving average

Batch (refit on accumulated data):

PlattScaling - Logistic regression on logits
IsotonicCalibrator - Isotonic regression
TemperatureScaling - Temperature scaling

Metrics

from streamcal import brier_score, expected_calibration_error

brier = brier_score(y_true, y_pred)
ece = expected_calibration_error(y_true, y_pred, n_bins=20)

Reproduce Experiments

pip install -e ".[experiments]"
python experiments/run_experiments.py
python experiments/generate_figures.py

Paper

See ms/mwu_calibration.pdf for theory and full results.

Related Work

This uses the same MWU/mirror descent algorithm as onlinerake (survey weighting), applied to probability calibration instead of sample reweighting.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
docs		docs
experiments		experiments
ms		ms
src/streamcal		src/streamcal
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MWU Calibration

Installation

The Problem

Method

Results

Usage

Available Calibrators

Metrics

Reproduce Experiments

Paper

Related Work

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MWU Calibration

Installation

The Problem

Method

Results

Usage

Available Calibrators

Metrics

Reproduce Experiments

Paper

Related Work

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages