Skip to content

unshrink: post-hoc debiasing of machine learning predictions to reduce shrinkage-induced attenuation bias in downstream causal inference -- without requiring fresh ground-truth data [Python]

License

Notifications You must be signed in to change notification settings

AIandGlobalDevelopmentLab/unshrink

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 ╔══════════════════════════════════════════════════════════════════╗
 ║  UNSHRINK v0.1.0                         STATUS: DEBIASED        ║
 ╠══════════════════════════════════════════════════════════════════╣
 ║  METHOD: Tweedie (Berkson / score-swap)                          ║
 ║  PSEUDO-OUTCOME:  y_tilde = z - sigma^2 * d/dz( log p_Y_hat(z) ) ║
 ║                                                                  ║
 ║  [INPUT: BIASED]               [OUTPUT: RESTORED]                ║
 ║  Variance: 0.42 (▼ 58%)        Variance: 1.00 (✔ OK)             ║
 ║                                                                  ║
 ║      SpikeAtMean                  TrueDistribution               ║
 ║          │                              │                        ║
 ║       ▄▄██▄▄                       ▂▄▆██▆▄▂                      ║
 ║      ▄██████▄                    ▄██████████▄                    ║
 ║     ██▀    ▀██                  ▄█▀          ▀█▄                 ║
 ║     ─┴────────┴─                ─┴────────────┴─                 ║
 ╚══════════════════════════════════════════════════════════════════╝

Overview of unshrink

unshrink is a Python package for correcting attenuation bias in machine learning predictions used for causal inference. When ML models predict outcomes (e.g., poverty levels from satellite imagery), their predictions systematically "shrink toward the mean"—overestimating low values and underestimating high values. This shrinkage causes treatment effects to appear smaller than they actually are.

The package provides two debiasing methods:

  • TweedieDebiaser: Empirical Bayes correction using Tweedie's formula with KDE-based score estimation
  • LccDebiaser: Linear Calibration Correction via inverse linear regression

Both methods require only a calibration dataset where you have both predictions and ground truth outcomes.

Installation

Install directly from GitHub:

pip install git+https://github.com/AIandGlobalDevelopmentLab/unshrink.git

For development:

git clone https://github.com/AIandGlobalDevelopmentLab/unshrink.git
cd unshrink
pip install -e .[dev]

Quick Start

Using TweedieDebiaser

from unshrink import TweedieDebiaser

# Fit on calibration data (where you have ground truth)
debiaser = TweedieDebiaser()
debiaser.fit(cal_predictions, cal_targets)

# Get debiased predictions for new data
debiased_preds = debiaser.debiased_predictions(test_predictions)

# Or get just the debiased mean
debiased_mean = debiaser.debiased_mean(test_predictions)

Using LccDebiaser

from unshrink import LccDebiaser

# Linear Calibration Correction - simpler, works well with linear shrinkage
debiaser = LccDebiaser()
debiaser.fit(cal_predictions, cal_targets)

debiased_preds = debiaser.debiased_predictions(test_predictions)

Input Requirements

  • cal_predictions: numpy array of shape (n_cal,) — model predictions for units with ground truth
  • cal_targets: numpy array of shape (n_cal,) — true outcomes for the same units
  • test_predictions: numpy array of shape (n_test,) — predictions to debias

Important: Calibration predictions should be out-of-sample (use cross-fitting if needed to avoid target leakage).

When to Use Which Method

Method Best For Pros Cons
TweedieDebiaser Nonlinear shrinkage patterns More flexible; handles complex bias Requires more calibration data; sensitive to density estimation
LccDebiaser Linear shrinkage patterns Simple; robust with small samples Assumes linear relationship between predictions and targets

Rule of thumb: Start with LccDebiaser for simplicity. Use TweedieDebiaser if you have ample calibration data (n > 500) or suspect nonlinear shrinkage.

Estimating Treatment Effects

# Compute debiased Average Treatment Effect (ATE)
ate = debiaser.debiased_ate(
    treated_predictions,
    control_predictions,
    iptw_treated=treated_weights,  # optional inverse probability weights
    iptw_control=control_weights
)

API Reference

TweedieDebiaser

Uses Tweedie's formula with a KDE-based score function to correct prediction bias.

TweedieDebiaser(delta=1e-5)

Parameters:

  • delta (float): Step size for numerical differentiation of the log-density. Default: 1e-5

Methods:

  • fit(cal_predictions, cal_targets) — Calibrate using labeled data. Learns sigma_ (residual std) and kde_ (kernel density estimate).
  • debiased_predictions(predictions) — Returns per-unit debiased predictions as an array.
  • debiased_mean(predictions) — Returns the debiased mean as a scalar.
  • debiased_ate(treated, control, iptw_treated=None, iptw_control=None) — Returns the debiased average treatment effect.

LccDebiaser

Linear Calibration Correction applies an inverse linear transformation learned from calibration data.

LccDebiaser()

Methods:

  • fit(cal_predictions, cal_targets) — Fits a linear regression to learn intercept_ and slope_.
  • debiased_predictions(predictions) — Returns (predictions - intercept) / slope.
  • debiased_mean(predictions) — Returns the debiased mean as a scalar.
  • debiased_ate(treated, control, iptw_treated=None, iptw_control=None) — Returns the debiased average treatment effect.

Citation

Paper: arXiv:2508.01341 | Project Page

@inproceedings{pettersson2025debiasingmachinelearningpredictions,
  title        = {Debiasing Machine Learning Predictions for Causal Inference Without Additional Ground Truth Data: One Map, Many Trials in Satellite-Driven Poverty Analysis},
  author       = {Markus Pettersson and Connor T. Jerzak and Adel Daoud},
  booktitle    = {Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-26), Special Track on AI for Social Impact},
  year         = {2026},
  url          = {https://arxiv.org/abs/2508.01341}
}

About

unshrink: post-hoc debiasing of machine learning predictions to reduce shrinkage-induced attenuation bias in downstream causal inference -- without requiring fresh ground-truth data [Python]

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •