-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Summary
Improve top-tail income representation in the enhanced CPS through two complementary approaches:
Phase 1: Include PUF aggregate records (this PR)
The IRS PUF contains 4 aggregate records (MARS=0) that bundle ultra-high-income filers for anonymity protection. These have been dropped from the PUF pipeline (puf = puf[puf.MARS != 0]), discarding $140B+ in weighted AGI — mostly in the $10M+ bracket.
Changes:
- Assign demographics to aggregate records (filing status, age, gender) instead of filtering them out
- Inject high-income PUF records (AGI > $1M) directly into the ExtendedCPS dataset, giving the reweighter actual high-income observations
Phase 2: Forbes 400 synthetic records (future)
Add Forbes 400 records with wealth-to-income imputation for the extreme top tail.
Problem
The CPS has catastrophic under-representation at the top of the income distribution:
- $5M-$10M AGI bracket: -98.5% calibration error
- $10M+ AGI bracket: -95.1% calibration error
This means millionaire/billionaire tax scoring is unreliable, and calibration weights get distorted trying to compensate.
Key data findings
The 4 aggregate records contain:
- ~1,233 total weighted filers
- $140.3B weighted AGI ($152.9B in $10M+ bracket alone)
- Massive capital gains ($86.7B), dividends ($20.4B), partnership income ($11.2B)
- Each has XTOT=1 (single filer, not multiple bundled) with weights of 140-465
Approach
assign_aggregate_demographics()assigns MARS, age, gender to MARS=0 records_inject_high_income_puf_records()appends PUF records with AGI > $1M to ExtendedCPS- The reweighting optimizer adjusts weights to match SOI targets
Verification needed
- Build ExtendedCPS with aggregate records included
- Compare calibration_log.csv before/after — $5M+ bracket errors should improve
- Run full EnhancedCPS build and verify reweighting convergence
- Score a millionaire tax reform before/after to validate revenue estimates
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels