This issue can be closed with data samples in a internal repo where current nvbench_compare.py behavior is not enough / unstable.
The ideal would be two distributions from two benchmarking runs where the existing script would say they are different, but in actuality are the "same" because the algorithm itself did not change.
The intention for this data is to provide an example of distributions to test new comparison logic that should tell us that they are indeed the same distribution.
This issue can be closed with data samples in a internal repo where current
nvbench_compare.pybehavior is not enough / unstable.The ideal would be two distributions from two benchmarking runs where the existing script would say they are different, but in actuality are the "same" because the algorithm itself did not change.
The intention for this data is to provide an example of distributions to test new comparison logic that should tell us that they are indeed the same distribution.