Skip to content

Commit d58eac6

Browse files
authored
Merge pull request #473 from PolicyEngine/health-insurance-premiums
Add health insurance premiums to local area calibration, a matrix builder function, Modal model fitting
2 parents c351735 + e4e449d commit d58eac6

18 files changed

Lines changed: 739 additions & 304 deletions

.github/workflows/reusable_test.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,8 +65,7 @@ jobs:
6565
run: |
6666
modal run modal_app/data_build.py \
6767
${{ inputs.upload_data && '--upload' || '--no-upload' }} \
68-
--branch=${{ github.head_ref || github.ref_name }} \
69-
${{ inputs.upload_data && '--no-test-lite' || '--test-lite' }}
68+
--branch=${{ github.head_ref || github.ref_name }}
7069
7170
- name: Install package
7271
run: uv sync --dev

.github/workflows/versioning.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,12 @@ jobs:
2323
uses: actions/setup-python@v5
2424
with:
2525
python-version: 3.12
26+
- name: Install uv
27+
uses: astral-sh/setup-uv@v5
2628
- name: Build changelog
2729
run: pip install yaml-changelog && make changelog
30+
- name: Update lockfile
31+
run: uv lock
2832
- name: Preview changelog update
2933
run: ".github/get-changelog-diff.sh"
3034
- name: Update changelog

Makefile

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.PHONY: all format test install download upload docker documentation data data-local-area publish-local-area clean build paper clean-paper presentations
1+
.PHONY: all format test install download upload docker documentation data publish-local-area clean build paper clean-paper presentations
22

33
all: data test
44

@@ -71,13 +71,6 @@ data: download
7171
python policyengine_us_data/datasets/cps/extended_cps.py
7272
python policyengine_us_data/datasets/cps/enhanced_cps.py
7373
python policyengine_us_data/datasets/cps/small_enhanced_cps.py
74-
mv policyengine_us_data/storage/enhanced_cps_2024.h5 policyengine_us_data/storage/dense_enhanced_cps_2024.h5
75-
cp policyengine_us_data/storage/sparse_enhanced_cps_2024.h5 policyengine_us_data/storage/enhanced_cps_2024.h5
76-
77-
data-local-area: data
78-
LOCAL_AREA_CALIBRATION=true python policyengine_us_data/datasets/cps/cps.py
79-
LOCAL_AREA_CALIBRATION=true python policyengine_us_data/datasets/puf/puf.py
80-
LOCAL_AREA_CALIBRATION=true python policyengine_us_data/datasets/cps/extended_cps.py
8174
python policyengine_us_data/datasets/cps/local_area_calibration/create_stratified_cps.py 10500
8275

8376
publish-local-area:

changelog_entry.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
- bump: minor
2+
changes:
3+
added:
4+
- Support for health_insurance_premiums_without_medicare_part_b in local area calibration
5+
changed:
6+
- Removed dense reweighting path from enhanced CPS; only sparse (L0) weights are produced
7+
- Eliminated TEST_LITE and LOCAL_AREA_CALIBRATION flags; all datasets generated unconditionally
8+
- Merged data-local-area Makefile target into data target
9+
removed:
10+
- Redundant test_sparse_matrix_builder.py (tests consolidated in test_matrix_national_variation.py)
11+
- Redundant build_calibration_matrix.py (functionality in fit_calibration_weights.py)
12+
fixed:
13+
- Versioning workflow now runs uv lock after version bump to keep uv.lock in sync

docs/local_area_calibration_setup.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -459,10 +459,10 @@
459459
"print(\"Remember, this is a North Carolina target:\\n\")\n",
460460
"print(targets_df.iloc[row_loc])\n",
461461
"\n",
462-
"print(\"\\nHousehold donated to NC's 2nd district, 2023 SNAP dollars:\")\n",
462+
"print(\"\\nNC State target. Household donated to NC's 2nd district, 2023 SNAP dollars:\")\n",
463463
"print(X_sparse[row_loc, positions['3702']]) # Household donated to NC's 2nd district\n",
464464
"\n",
465-
"print(\"\\nHousehold donated to NC's 2nd district, 2023 SNAP dollars:\")\n",
465+
"print(\"\\nSame target, same household, donated to AK's at Large district, 2023 SNAP dollars:\")\n",
466466
"print(X_sparse[row_loc, positions['201']]) # Household donated to AK's at Large District"
467467
]
468468
},

modal_app/README.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Modal App for GPU Weight Fitting
2+
3+
Run calibration weight fitting on Modal's cloud GPUs.
4+
5+
## Prerequisites
6+
7+
- [Modal](https://modal.com/) account and CLI installed (`pip install modal`)
8+
- `modal token new` to authenticate
9+
- HuggingFace token stored as Modal secret named `huggingface-token`
10+
11+
## Usage
12+
13+
```bash
14+
modal run modal_app/remote_calibration_runner.py --branch <branch> --epochs <n> --gpu <type>
15+
```
16+
17+
### Arguments
18+
19+
| Argument | Default | Description |
20+
|----------|---------|-------------|
21+
| `--branch` | `main` | Git branch to clone and run |
22+
| `--epochs` | `200` | Number of training epochs |
23+
| `--gpu` | `T4` | GPU type: `T4`, `A10`, `A100-40GB`, `A100-80GB`, `H100` |
24+
| `--output` | `calibration_weights.npy` | Local path for weights file |
25+
| `--log-output` | `calibration_log.csv` | Local path for calibration log |
26+
27+
### Example
28+
29+
```bash
30+
modal run modal_app/remote_calibration_runner.py --branch health-insurance-premiums --epochs 100 --gpu T4
31+
```
32+
33+
## Output Files
34+
35+
- **calibration_weights.npy** - Fitted household weights
36+
- **calibration_log.csv** - Per-target performance metrics across epochs (target_name, estimate, target, epoch, error, rel_error, abs_error, rel_abs_error, loss)
37+
38+
## Changing Hyperparameters
39+
40+
Hyperparameters are in `policyengine_us_data/datasets/cps/local_area_calibration/fit_calibration_weights.py`:
41+
42+
```python
43+
BETA = 0.35
44+
GAMMA = -0.1
45+
ZETA = 1.1
46+
INIT_KEEP_PROB = 0.999
47+
LOG_WEIGHT_JITTER_SD = 0.05
48+
LOG_ALPHA_JITTER_SD = 0.01
49+
LAMBDA_L0 = 1e-8
50+
LAMBDA_L2 = 1e-8
51+
LEARNING_RATE = 0.15
52+
```
53+
54+
To change them:
55+
1. Edit `fit_calibration_weights.py`
56+
2. Commit and push to your branch
57+
3. Re-run the Modal command with that branch
58+
59+
## Important Notes
60+
61+
- **Keep your connection open** - Modal needs to stay connected to download results. Don't close your laptop or let it sleep until you see the local "Weights saved to:" and "Calibration log saved to:" messages.
62+
- Modal clones from GitHub, so local changes must be pushed before they take effect.

modal_app/data_build.py

Lines changed: 4 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ def setup_gcp_credentials():
3838
def build_datasets(
3939
upload: bool = False,
4040
branch: str = "main",
41-
test_lite: bool = False,
4241
):
4342
setup_gcp_credentials()
4443

@@ -49,8 +48,6 @@ def build_datasets(
4948
subprocess.run(["uv", "sync", "--locked"], check=True)
5049

5150
env = os.environ.copy()
52-
if test_lite:
53-
env["TEST_LITE"] = "true"
5451

5552
# Download prerequisites
5653
subprocess.run(
@@ -79,44 +76,8 @@ def build_datasets(
7976
print(f"Running {script}...")
8077
subprocess.run(["uv", "run", "python", script], check=True, env=env)
8178

82-
os.rename(
83-
"policyengine_us_data/storage/enhanced_cps_2024.h5",
84-
"policyengine_us_data/storage/dense_enhanced_cps_2024.h5",
85-
)
86-
subprocess.run(
87-
[
88-
"cp",
89-
"policyengine_us_data/storage/sparse_enhanced_cps_2024.h5",
90-
"policyengine_us_data/storage/enhanced_cps_2024.h5",
91-
],
92-
check=True,
93-
)
94-
95-
# Build local area calibration datasets (without TEST_LITE - must match full dataset)
96-
print("Building local area calibration datasets...")
97-
local_area_env = os.environ.copy()
98-
local_area_env["LOCAL_AREA_CALIBRATION"] = "true"
99-
100-
subprocess.run(
101-
["uv", "run", "python", "policyengine_us_data/datasets/cps/cps.py"],
102-
check=True,
103-
env=local_area_env,
104-
)
105-
subprocess.run(
106-
["uv", "run", "python", "policyengine_us_data/datasets/puf/puf.py"],
107-
check=True,
108-
env=local_area_env,
109-
)
110-
subprocess.run(
111-
[
112-
"uv",
113-
"run",
114-
"python",
115-
"policyengine_us_data/datasets/cps/extended_cps.py",
116-
],
117-
check=True,
118-
env=local_area_env,
119-
)
79+
# Build stratified CPS for local area calibration
80+
print("Running create_stratified_cps.py...")
12081
subprocess.run(
12182
[
12283
"uv",
@@ -126,7 +87,7 @@ def build_datasets(
12687
"10500",
12788
],
12889
check=True,
129-
env=local_area_env,
90+
env=env,
13091
)
13192

13293
# Run local area calibration tests
@@ -140,7 +101,7 @@ def build_datasets(
140101
"-v",
141102
],
142103
check=True,
143-
env=local_area_env,
104+
env=env,
144105
)
145106

146107
# Run main test suite
@@ -167,11 +128,9 @@ def build_datasets(
167128
def main(
168129
upload: bool = False,
169130
branch: str = "main",
170-
test_lite: bool = False,
171131
):
172132
result = build_datasets.remote(
173133
upload=upload,
174134
branch=branch,
175-
test_lite=test_lite,
176135
)
177136
print(result)

0 commit comments

Comments
 (0)