Skip to content

Dump configuration next to the output#522

Open
gtrevisan wants to merge 7 commits intodevfrom
glt/config
Open

Dump configuration next to the output#522
gtrevisan wants to merge 7 commits intodevfrom
glt/config

Conversation

@gtrevisan
Copy link
Member

this should help for reproducibility:

  • dump configuration next to the outputs as JSON,
  • drop tests structure (if not testing),
  • drop attributes structure (as nan and inf are supported by TOML but not by JSON, and they should be stored as metadata anyways),
  • add a debug statement.

in the future we might want to add this back in as xarray metadata, rather than a configuration dump.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances reproducibility by dumping the configuration as JSON next to the output files. The changes filter out sensitive data (passwords), remove test-specific configuration when not running tests, and exclude the attributes structure that contains TOML-specific values (nan/inf) not compatible with JSON.

Changes:

  • Added filter_dict utility function to recursively filter dictionary keys containing a specified substring
  • Integrated configuration dumping in the main workflow to save sanitized config as JSON alongside outputs
  • Added debug logging for the config dump location

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
disruption_py/workflow.py Added imports and logic to dump sanitized configuration as JSON to temporary folder, filtering passwords and conditionally removing test data
disruption_py/core/utils/misc.py Added filter_dict utility function to recursively remove dictionary keys containing a specified substring

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gtrevisan gtrevisan requested a review from zapatace February 12, 2026 21:52
Copy link
Contributor

@yumouwei yumouwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm able to generate the config.json file with output_setting='dataset' or 'dataframe'. However, currently it does not work if I specify the file path (e.g. output_setting="<path_to_file>/temp.h5" or other format); it will only generate the dataset file but not the config file in the specified folder.

Additionally, it would also be idea to set the name of the config file to something like <dataset_file_name>.json or <dataset_file_name>_config.json so that a user can distinguish them in case there are multiple dataset files stored in the same folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants