Blend entity values on would_file draws; fix entity weights#611
Open
Blend entity values on would_file draws; fix entity weights#611
Conversation
Matrix builder: precompute entity values with would_file=False alongside the all-True values, then blend per tax unit based on the would_file draw before applying target takeup draws. This ensures X@w matches sim.calculate for targets affected by non-target state variables. Fixes #609 publish_local_area: remove explicit sub-entity weight overrides (tax_unit_weight, spm_unit_weight, family_weight, marital_unit_weight, person_weight) that used incorrect person-count splitting. These are formula variables in policyengine-us that correctly derive from household_weight at runtime. Fixes #610 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace block-based RNG salting with (hh_id, clone_idx) salting. Draws are now tied to the donor household identity and independent across clones, eliminating the multi-clone-same-block collision issue (#597). Geographic variation comes through the rate threshold, not the draw. Closes #597 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7578ba2 to
310bb73
Compare
County precomputation crashes on LA County (06037) because
aca_ptc → slcsp_rating_area_la_county → three_digit_zip_code
calls zip_code.astype(int) on 'UNKNOWN'. Set zip_code='90001'
for LA County in both precomputation and publish_local_area
so X @ w matches sim.calculate("aca_ptc").sum().
Fixes #612
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The zip_code set for LA County (06037) was being wiped by delete_arrays which only preserved "county". Also apply the 06037 zip_code fix to the in-process county precomputation path (not just the parallel worker function). Fixes #612 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The only county-dependent variable (aca_ptc) does not depend on would_file_taxes_voluntarily, so the entity_wf_false pass was computing identical values. Removing it eliminates ~2,977 extra simulation passes during --county-level builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
would_file_taxes_voluntarily=Falsefor tax_unit targets. In the clone worker, compute would_file draws first, blend between the two branches, then apply the target's own takeup draw. This ensures X@w matches sim.calculate for targets affected by non-target "state" variables. Fixes Matrix builder: blend entity values based on would_file draws #609tax_unit_weight,spm_unit_weight,family_weight,marital_unit_weight,person_weight) that used a wrong person-count-splitting formula. These are formula variables in policyengine-us that correctly derive fromhousehold_weightat runtime. Fixes Local area H5: remove incorrect sub-entity weight overrides #610Context
8 of 9 takeup variables are "gate" variables — they sit between eligibility and the benefit, so
eligible_amount × drawworks. The 9th (would_file_taxes_voluntarily) is a "state" variable — it changes upstream simulation state (is_filer) that other targets depend on. You can't post-multiply a state change; you have to pre-branch it.The entity weight bug caused
sim.calculate("aca_ptc").sum()(weighted bytax_unit_weight) to differ fromsim.calculate("aca_ptc", map_to="household").sum()(weighted byhousehold_weight) in local area H5 files.Verification
With a 2K household / 10 clone test dataset for South Carolina:
sim.calculate("aca_ptc", map_to="household").sum()from SC.h5: 145.4M (exact match)sim.calculate("aca_ptc").sum()(tax_unit level): 145.4M (now matches after weight fix)Test plan
pytest policyengine_us_data/tests/test_calibration/test_unified_calibration.py— 42 passed🤖 Generated with Claude Code