Conversation
There was a problem hiding this comment.
Pull request overview
This PR enhances reproducibility by dumping the configuration as JSON next to the output files. The changes filter out sensitive data (passwords), remove test-specific configuration when not running tests, and exclude the attributes structure that contains TOML-specific values (nan/inf) not compatible with JSON.
Changes:
- Added
filter_dictutility function to recursively filter dictionary keys containing a specified substring - Integrated configuration dumping in the main workflow to save sanitized config as JSON alongside outputs
- Added debug logging for the config dump location
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| disruption_py/workflow.py | Added imports and logic to dump sanitized configuration as JSON to temporary folder, filtering passwords and conditionally removing test data |
| disruption_py/core/utils/misc.py | Added filter_dict utility function to recursively remove dictionary keys containing a specified substring |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
yumouwei
left a comment
There was a problem hiding this comment.
I'm able to generate the config.json file with output_setting='dataset' or 'dataframe'. However, currently it does not work if I specify the file path (e.g. output_setting="<path_to_file>/temp.h5" or other format); it will only generate the dataset file but not the config file in the specified folder.
Additionally, it would also be idea to set the name of the config file to something like <dataset_file_name>.json or <dataset_file_name>_config.json so that a user can distinguish them in case there are multiple dataset files stored in the same folder.
this should help for reproducibility:
testsstructure (if not testing),attributesstructure (asnanandinfare supported by TOML but not by JSON, and they should be stored as metadata anyways),in the future we might want to add this back in as xarray metadata, rather than a configuration dump.