Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds deterministic sorting of concatenated datasets by shot and time to ensure reproducible outputs when processing shots in parallel. The changes address a subtle but important issue where multi-processed workflows with shuffled shot order would produce different dataset ordering, making dataset comparisons fail even when the data content was identical.
Changes:
- Added sorting by shot and time after concatenating datasets in
DatasetOutputSetting.concat() - Enhanced logging to show concatenation timing and sorting performance
- Updated debug logging format to use thousand separators for shot counts
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
yumouwei
left a comment
There was a problem hiding this comment.
I've never thought about this, but yes I'd prefer to have shot & time sorted.
ZanderKeith
left a comment
There was a problem hiding this comment.
I like having this option and the implementation looks good.
rationale
although for most purposes this should not be an issue, it's better to sort idx-indexed datasets by shot and time.
if the datasets were indexed by physics-based coordinates, eg:
then downstream comparisons would be handled by xarray itself, eg:
but with idx-indexed datasets this fails and requires sorting -- after all,
idxis left un-coordinated because it's unphysical.a local full-scale run on the C-MOD database shows that the new sorting requirement is hardly taxing on the workflow:
as this drastically simplifies downstream comparison of idx-indexed datasets, I'd switch on by default.
reviewers, please take some time to understand and try this out.
test
simple test to execute two iterations with multi-processed shuffled shots.
result on
dev:garbled!
vs
comparison:
result on this branch:
same!
comparison:
Truethoughts?