Skip to content

Comments

Increase several test tolerances to ensure tests pass on more computers#411

Merged
martinholmer merged 2 commits intomasterfrom
test-tolerances
Feb 21, 2026
Merged

Increase several test tolerances to ensure tests pass on more computers#411
martinholmer merged 2 commits intomasterfrom
test-tolerances

Conversation

@martinholmer
Copy link
Collaborator

@martinholmer martinholmer commented Feb 20, 2026

Even after the substantial reweighting improvements in PR #407, the reweighting results are not exactly the same on all computers. The changes in this pull request increase some of the numpy.allclose tolerances so that the tests also pass when the reweighting is done on Apple M4 CPU chips.

@donboyd5 donboyd5 changed the title Increase sevral test tolerances to ensure tests pass on more computers Increase several test tolerances to ensure tests pass on more computers Feb 20, 2026
@donboyd5
Copy link
Collaborator

Not surprisingly, it passes all tests on my gpu. I will not be able to run it on my cpu until tomorrow. However, I neglected to run test on my cpu before doing my final pr. I got the following errors last night. While it's not a problem for me because I will use gpu, it suggests that others with different cpu setups might get slightly different results. It might make sense to expand bounds slightly for these variables. I hope to have a chance to look for ways to tighten up cross-machine differences further but it will be awhile before I can do that.

I can send cpu results tomorrow if you want.

Personally I think it would be fine to merge as or with additional tolerance increases to deal with the cases below.

E           WEIGHT_DIFF:sdev,act,exp= 961.6711634863782 961.7270821801824; **needs expansion**

E           ! YR,KIND,EST= 2023 paytax 1381.9 actual; ok within .01
E           ! YR,KIND,EST= 2023 paytax 1381.8 expected

E           ! YR,KIND,EST= 2023 qbid 52.7 actual; **needs .01**
E           ! YR,KIND,EST= 2023 qbid 52.6 expected

E           DIFF:OTM,totben,act,exp= 23.94 23.95 ok within .01
E           DIFF:OTM,affpct,act,exp= 8.89 8.9 **needs .01**
E           DIFF:ALL,totben,act,exp= 54.85 54.86 needs **.01**

@donboyd5
Copy link
Collaborator

@martinholmer, I ended up getting a chance to run it cpu-only on my machine. I had one test failure, below. It looks like loosening the tolerance on OTM,affpct to .01 would get all to pass on my machine cpu-only. Eventually, if others run it we may find additional small differences. But maybe by then I'll have a PR to get it to solve with more precision in reasonable time, so that looser tolerances won't be necessary.

E           ValueError: 
E           IMPUTED VARIABLE DEDUCTION BENEFIT ACT-vs-EXP DIFFS:
E           DIFF:OTM,affpct,act,exp,atol= 8.89 8.9 1e-08

@martinholmer
Copy link
Collaborator Author

@donboyd5 reported these seemingly contradictory CPU-based test differences an hour apart:

E           WEIGHT_DIFF:sdev,act,exp= 961.6711634863782 961.7270821801824; **needs expansion**

E           ! YR,KIND,EST= 2023 paytax 1381.9 actual; ok within .01
E           ! YR,KIND,EST= 2023 paytax 1381.8 expected

E           ! YR,KIND,EST= 2023 qbid 52.7 actual; **needs .01**
E           ! YR,KIND,EST= 2023 qbid 52.6 expected

E           DIFF:OTM,totben,act,exp= 23.94 23.95 ok within .01
E           DIFF:OTM,affpct,act,exp= 8.89 8.9 **needs .01**
E           DIFF:ALL,totben,act,exp= 54.85 54.86 needs **.01**

and then

E           ValueError: 
E           IMPUTED VARIABLE DEDUCTION BENEFIT ACT-vs-EXP DIFFS:
E           DIFF:OTM,affpct,act,exp,atol= 8.89 8.9 1e-08

@donboyd5, I completely confused. Why the difference in these two sets of results?

@donboyd5
Copy link
Collaborator

donboyd5 commented Feb 21, 2026

@martinholmer, sorry for the confusion. The 2nd one is the correct one.

After updating to commit 0f9dc5d all non-skipped tests pass when running on cpu-only.

@martinholmer martinholmer merged commit a39e6fe into master Feb 21, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants