Fix uplift tree p-value NaN from division by zero (#585)#882
Fix uplift tree p-value NaN from division by zero (#585)#882jeongyoonlee merged 2 commits intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes issue #585 where UpliftTreeClassifier produced NaN p-values due to division by zero in the Wald test formula when a tree node had zero treatment or control observations (n_t=0 or n_c=0), or when variance was zero (p=0 or p=1 for all observations in a group). The fix returns p_value=1.0 (maximally non-significant) in these degenerate cases.
Changes:
- Guards the p-value computation against division by zero (
n_t=0orn_c=0) and zero variance, returningp_value=1.0as a conservative fallback - Modifies
.gitignoreto narrow the.claude/ignore pattern and add.worktrees/entries (appears unrelated to the fix)
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
causalml/inference/tree/uplift.pyx |
Adds guards around the p-value Wald test computation to prevent NaN from division by zero or zero variance |
.gitignore |
Changes .claude/ ignore to .claude/.worktrees/ and adds duplicate .worktrees/ entries |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
|
Addressed the review comment: added |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
When a tree node has zero treatment or control observations (n_t=0 or n_c=0), the p-value variance formula divides by zero, producing NaN. Also guards against zero variance when all observations in a group have the same outcome (p=0 or p=1). Returns p_value=1.0 (maximally non-significant) in these degenerate cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test verifies predictions don't contain NaN when tree nodes have zero treatment or control observations (min_samples_treatment=0, heavily imbalanced groups). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
144e6b6 to
a4d8e79
Compare
Summary
n_t=0orn_c=0), the p-value variance formulap_t*(1-p_t)/n_t + p_c*(1-p_c)/n_cdivides by zero, producing NaNp=0orp=1for all observations in a group)p_value=1.0(maximally non-significant) in these degenerate cases, which is the correct statistical interpretation — no evidence of treatment effect with no observationsTest plan
pytest tests/test_uplift_trees.py— 23 passed🤖 Generated with Claude Code