Skip to content

Add exp bfe data#98

Open
jthorton wants to merge 10 commits intomainfrom
bfe_refs
Open

Add exp bfe data#98
jthorton wants to merge 10 commits intomainfrom
bfe_refs

Conversation

@jthorton
Copy link
Contributor

@jthorton jthorton commented Feb 6, 2026

Adds a JSON to each folder with the experimental binding free energy data tagged with openff units, this can be loaded with gufe.

I plan on adding some helper functions in another PR to load the data into a DF with units for the analysis scripts.

if system_group not in available_groups:
raise ValueError(f"System group {system_group} not found. Available groups: {available_groups}")
# check the system name
group_data = ref_dg_data[ref_dg_data["system group"] == system_group].reset_index(drop=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we also need to do the name conversions here, similar to the network script?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the name of the output file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes good catch I think we would need to do the reverse to make them match. Or would it make sense to go the other way and clean the ligand names to remove the spaces?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still rely on the names nowadays? My understanding is that this is mostly fixed in openfe now. If so, it might be better to just have the original names from the original publications (with spaces if necessary)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats how we are matching the reference data to the edges, we could store other identifiers like smiles and inchikey to help mitigate the effects of naming issues though.

Copy link
Collaborator

@jaclark5 jaclark5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I'll leave it for @hannahbaumann to approve. The only concern I have is the the names experimental_binding_data.json is really general if there happens to be additional data to add in the future. We could handle that just like the network files where the stem of the filename is the key to a dictionary. These files could be more specific if they were named something like experimental_binding_data_schrodinger.json.

After this is merges I can draft a PR with an attribute BenchmarkData.experimental_binding_data.

The data factory is now merged so if you want to add a tag to the yaml "exp_bfe" or something, the CI will pick it up if you put the conditions for the tags in the header of test_benchmark_index.py with ("exp_bfe", ["experimental_binding_data_*.json"]) or whatever glob should be associated with the tag.

@IAlibay
Copy link
Member

IAlibay commented Feb 7, 2026

These files could be more specific if they were named something like experimental_binding_data_schrodinger.json.

I'm not sure I understand - why would there be experimental data specificly from Schrodinger? All that Schrodinger did was gather the experimental data from published sources.

if system_group not in available_groups:
raise ValueError(f"System group {system_group} not found. Available groups: {available_groups}")
# check the system name
group_data = ref_dg_data[ref_dg_data["system group"] == system_group].reset_index(drop=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still rely on the names nowadays? My understanding is that this is mostly fixed in openfe now. If so, it might be better to just have the original names from the original publications (with spaces if necessary)

"pint_unit_registry": "openff_units"
},
"uncertainty": {
"magnitude": 0.0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we happy having to assert of near zero floats to be "no experimental uncertainty" or would it be better to use -1 like what David Hahn did in the PLB?

Or maybe even better, don't have the dictionary entry if there's not uncertainty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point lets just remove the entry to avoid any confusion in future if we have no data for it.

@jaclark5
Copy link
Collaborator

jaclark5 commented Feb 9, 2026

These files could be more specific if they were named something like experimental_binding_data_schrodinger.json.

I'm not sure I understand - why would there be experimental data specificly from Schrodinger? All that Schrodinger did was gather the experimental data from published sources.

@IAlibay Yes that my quick naive suggestion to make a point that something more specific would allows additional data to follow a similar naming scheme.

However @jthorton are the dois or information on where this data came from recorded anywhere? If you don't want to reference the compiled dataset it seems to me that you need to include the direct ref for the experimental data.

@IAlibay
Copy link
Member

IAlibay commented Feb 10, 2026

Adding some kind of metadata to the json files with provenance would be good - ideally we should point to e.g. the original paper and the Ross et al one.

jthorton and others added 4 commits February 11, 2026 14:35
…ata.py

Co-authored-by: Hannah Baumann <43765638+hannahbaumann@users.noreply.github.com>
…ata.py

Co-authored-by: Hannah Baumann <43765638+hannahbaumann@users.noreply.github.com>
@jthorton
Copy link
Contributor Author

If we are happy with this format, I'll look into adding the DOI metadata for at least the Ross et al paper.

@IAlibay
Copy link
Member

IAlibay commented Feb 12, 2026

@jthorton format looks great to me!

@jthorton jthorton mentioned this pull request Feb 13, 2026
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look, this seems good to me - will defer to the others for final approval though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants