Skip to content

Blend entity values on would_file draws; fix entity weights#611

Open
baogorek wants to merge 5 commits intomainfrom
fix-would-file-blend-and-entity-weights
Open

Blend entity values on would_file draws; fix entity weights#611
baogorek wants to merge 5 commits intomainfrom
fix-would-file-blend-and-entity-weights

Conversation

@baogorek
Copy link
Collaborator

Summary

  • Matrix builder: Precompute a second set of entity values with would_file_taxes_voluntarily=False for tax_unit targets. In the clone worker, compute would_file draws first, blend between the two branches, then apply the target's own takeup draw. This ensures X@w matches sim.calculate for targets affected by non-target "state" variables. Fixes Matrix builder: blend entity values based on would_file draws #609
  • publish_local_area: Remove incorrect sub-entity weight overrides (tax_unit_weight, spm_unit_weight, family_weight, marital_unit_weight, person_weight) that used a wrong person-count-splitting formula. These are formula variables in policyengine-us that correctly derive from household_weight at runtime. Fixes Local area H5: remove incorrect sub-entity weight overrides #610

Context

8 of 9 takeup variables are "gate" variables — they sit between eligibility and the benefit, so eligible_amount × draw works. The 9th (would_file_taxes_voluntarily) is a "state" variable — it changes upstream simulation state (is_filer) that other targets depend on. You can't post-multiply a state change; you have to pre-branch it.

The entity weight bug caused sim.calculate("aca_ptc").sum() (weighted by tax_unit_weight) to differ from sim.calculate("aca_ptc", map_to="household").sum() (weighted by household_weight) in local area H5 files.

Verification

With a 2K household / 10 clone test dataset for South Carolina:

  • X@w for aca_ptc across 7 SC districts: 145.4M
  • sim.calculate("aca_ptc", map_to="household").sum() from SC.h5: 145.4M (exact match)
  • sim.calculate("aca_ptc").sum() (tax_unit level): 145.4M (now matches after weight fix)

Test plan

  • pytest policyengine_us_data/tests/test_calibration/test_unified_calibration.py — 42 passed
  • X@w matches sim.calculate on SC test dataset
  • Tax-unit and household level weighted sums agree after weight fix

🤖 Generated with Claude Code

baogorek and others added 2 commits March 16, 2026 13:54
Matrix builder: precompute entity values with would_file=False alongside
the all-True values, then blend per tax unit based on the would_file draw
before applying target takeup draws. This ensures X@w matches sim.calculate
for targets affected by non-target state variables.

Fixes #609

publish_local_area: remove explicit sub-entity weight overrides
(tax_unit_weight, spm_unit_weight, family_weight, marital_unit_weight,
person_weight) that used incorrect person-count splitting. These are
formula variables in policyengine-us that correctly derive from
household_weight at runtime.

Fixes #610

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace block-based RNG salting with (hh_id, clone_idx) salting.
Draws are now tied to the donor household identity and independent
across clones, eliminating the multi-clone-same-block collision
issue (#597). Geographic variation comes through the rate threshold,
not the draw.

Closes #597

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@baogorek baogorek force-pushed the fix-would-file-blend-and-entity-weights branch from 7578ba2 to 310bb73 Compare March 16, 2026 17:54
baogorek and others added 3 commits March 16, 2026 16:08
County precomputation crashes on LA County (06037) because
aca_ptc → slcsp_rating_area_la_county → three_digit_zip_code
calls zip_code.astype(int) on 'UNKNOWN'. Set zip_code='90001'
for LA County in both precomputation and publish_local_area
so X @ w matches sim.calculate("aca_ptc").sum().

Fixes #612

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The zip_code set for LA County (06037) was being wiped by
delete_arrays which only preserved "county". Also apply the
06037 zip_code fix to the in-process county precomputation
path (not just the parallel worker function).

Fixes #612

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The only county-dependent variable (aca_ptc) does not depend on
would_file_taxes_voluntarily, so the entity_wf_false pass was
computing identical values. Removing it eliminates ~2,977 extra
simulation passes during --county-level builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Local area H5: remove incorrect sub-entity weight overrides Matrix builder: blend entity values based on would_file draws

1 participant