Skip to content

samalyarov/uplift_modeling_setup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Smart Reach - Uplift Modeling for Marketing Campaign Targeting

A production-style uplift modeling pipeline for a direct marketing campaign. The idea is to learn which customers to contact to maximize net profit - not simply who is most likely to buy, but who is caused to buy by receiving an offer - taking into account communication costs, discount size and profit margins.

Built end-to-end: from raw transactional data through causal feature engineering, multi-model Optuna hyperparameter tuning, and a Prefect-orchestrated scoring pipeline with MLflow experiment tracking.


The Problem

Standard propensity models rank customers by their baseline purchase probability. But the customers most likely to buy anyway are wasted spend - they would have converted without the offer. Thus, the relevant question for targeting is:

By how much does this specific customer's purchase probability increase because they received the campaign?

This is the Conditional Average Treatment Effect (CATE), also called the individual uplift. A customer is worth targeting when:

CATE × margin_per_gram > discount_cost + contact_cost

The campaign dataset contains randomized control group assignments, making it possible to estimate CATE directly from observed outcomes.

image

Approach

Meta-Learners

Three families of CATE estimators are implemented and benchmarked:

Model Description
S-Learner Single model; treatment flag is just another feature. CATE = f(X, T=1) − f(X, T=0). Simple baseline.
X-Learner Two-stage learner. Stage 1 fits outcome models on each arm; Stage 2 fits CATE models on imputed treatment effects. Propensity-weighted combination at prediction time.
R-Learner Robinson decomposition. Residualizes both outcome and treatment against their conditional means, then fits CATE on the residuals. Theoretically efficient.
Uplift Tree / RF Tree models with a KL-divergence splitting criterion that directly optimizes uplift in each node. Keep in mind that there is no GPU training options for these, which makes their training take A LONG time.

Each meta-learner is paired with three gradient-boosted tree backends: LightGBM, XGBoost, and CatBoost. Technically, there could be many more combinations (and models used) - feel free to experiment. image

Evaluation

Two metrics guide model selection and Optuna tuning:

  • Qini coefficient - area under the Qini curve minus the random baseline. Primary tuning objective.
  • Uplift@K - actual ATE in the top-K fraction ranked by predicted uplift. Used for business-level validation.
image

Tech Stack

Layer Tool
Causal ML causalml, custom S/X/R-Learner wrappers
Gradient boosting LightGBM · XGBoost · CatBoost
Feature pipeline scikit-learn
Hyperparameter tuning Optuna
Orchestration Prefect 3
Experiment tracking MLflow
Data layer pandas · pyarrow

Project Structure

uplift_modeling_setup/
│
├── configs/
│   ├── campaign.json          # scoring run config (date, extract, selection threshold)
│   └── system.json            # data paths, artifact root, MLflow URI
│
├── artifacts/
│   └── serving_extract_config.json   # feature config used at inference time
│       # (*.pickle files are gitignored; generated by train.py)
│
├── src/
│   ├── datalib/
│   │   ├── __init__.py        # Engine - lightweight pandas data store abstraction
│   │   ├── features.py        # Feature calculators (receipts, recency, loyalty, …)
│   │   └── transforms.py      # sklearn-compatible transformers (FillNa, LocationEncoder)
│   ├── training/
│   │   ├── learners.py        # S/X/R-Learner and UpliftModel wrappers
│   │   └── metrics.py         # Qini coefficient, Uplift@K, Optuna TrialLogger
│   ├── campaign_flow.py       # Prefect flow: load → extract → transform → score → export
│   ├── model_utils.py         # ModelKeeper - model + column list bundle with MLflow support
│   └── utils.py               # I/O helpers
│
├── notebooks/
│   └── train_model.ipynb      # Exploratory training notebook (full model comparison)
│
├── train.py                   # CLI training script (replicates the notebook end-to-end for programmatic usage)
├── run_campaign.py            # CLI scoring script (Prefect flow entry point)
└── requirements.txt

Quick Start

1. Install dependencies

python -m pip install -r requirements.txt

2. Train the model

Runs Optuna tuning across all 11 models, selects the best, and saves serving artifacts to artifacts/.

# Full training run (all 11 models)
python train.py

# Fast iteration - train a specific subset
python train.py --models slearner-lgb xlearner-lgb rlearner-lgb

# CPU-only
python train.py --device cpu

# Custom trial budget
python train.py --n-trials-fast 30 --n-trials-medium 20 --n-trials-slow 15

3. Run the campaign scoring pipeline

Loads data, extracts features, scores all 2M customers, and outputs a submission CSV with the targeted customer IDs.

# One-shot run with defaults
python run_campaign.py

# Override the feature cut-off date (integer day-number in this dataset)
python run_campaign.py --date-to 95

# Custom output path
python run_campaign.py -o runs/my_submission.csv

# Register as a long-running Prefect deployment
python run_campaign.py --serve

Output: runs/submission.csv - a single-column CSV of customer_id values.

4. Inspect results (optional)

# Start the MLflow UI to browse all training and scoring runs
mlflow ui --port 5000

Then open http://127.0.0.1:5000.


Feature Engineering

119 features are computed per customer from three raw tables (customers, receipts, campaigns):

Feature group Description
Receipt aggregates Sum, mean, max, min, std of purchase amount and value over 7 time windows (7 / 15 / 30 / 60 / 90 / 180 / 365 days). Includes transaction count, mean inter-purchase interval, and recency within each window.
Global recency Days since the customer's last purchase before the feature date.
Purchase trend Ratio of short-window spend to long-window spend across 5 window pairs - captures whether the customer is accelerating or decelerating.
Demographics Customer age and encoded city/location.
Campaign history Number of past campaigns the customer appeared in; binary flag for prior treatment group membership.
Day-of-week Share of purchases on each pseudo day-of-week (date mod 7), mode day, and weekend purchase share.
City cheque Customer's average cheque relative to their city's average.
Loyalty Purchase frequency (unique days / lifespan), spend per active day, composite loyalty score.

After feature extraction a preprocessing pipeline (FillNaTransformer + LocationEncoder) fills missing values and one-hot encodes location, expanding to 125 features.


Results

These are our results on the data we have (basically proving that it WORKS). Naturally, the actual numbers would be different in your case.

On the holdout evaluation set (campaign date = 101):

Metric Value
Customers scored 2,000,000
Customers selected (CATE > 0) 867,292 (43.4%)
Mean CATE of selected customers +$7.99
Mean CATE overall −$1.87

The negative overall mean CATE confirms that blanket outreach destroys value - targeting only positive-uplift customers is critical for campaign profitability.


Configuration

configs/system.json - environment-level settings:

{
  "database": { "root_path": "data" },
  "artifacts_root_path": "artifacts",
  "mlflow": {
    "tracking_uri": "http://127.0.0.1:5000",
    "experiment_name": "smart-reach-uplift"
  }
}

configs/campaign.json - campaign-level settings:

{
  "date_to": "101",
  "extract": "serving_extract_config.json",
  "transform": [...],
  "selection": {
    "score_column": "uplift_score",
    "threshold": 0.0
  }
}

Note on dates: The dataset uses integer day-numbers (0, 1, 2, …) rather than calendar dates. The value 101 corresponds to the campaign launch day; features are computed from all history strictly before that day.

About

A production-style uplift modeling pipeline for a direct marketing campaign. The idea is to learn which customers to contact to maximize net profit - not simply who is most likely to buy, but who is caused to buy by receiving an offer - taking into account communication costs, discount size and profit margins.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages