Add exp bfe data by jthorton · Pull Request #98 · OpenFreeEnergy/openfe-benchmarks

jthorton · 2026-02-06T14:48:39Z

Adds a JSON to each folder with the experimental binding free energy data tagged with openff units, this can be loaded with gufe.

I plan on adding some helper functions in another PR to load the data into a DF with units for the analysis scripts.

openfe_benchmarks/data/data_generation/generate_industry_ref_data.py

hannahbaumann · 2026-02-06T15:08:38Z

openfe_benchmarks/data/data_generation/generate_industry_ref_data.py

+    if system_group not in available_groups:
+        raise ValueError(f"System group {system_group} not found. Available groups: {available_groups}")
+    # check the system name
+    group_data = ref_dg_data[ref_dg_data["system group"] == system_group].reset_index(drop=True)


Would we also need to do the name conversions here, similar to the network script?

Do you mean the name of the output file?

Ah yes good catch I think we would need to do the reverse to make them match. Or would it make sense to go the other way and clean the ligand names to remove the spaces?

Do we still rely on the names nowadays? My understanding is that this is mostly fixed in openfe now. If so, it might be better to just have the original names from the original publications (with spaces if necessary)

Yes thats how we are matching the reference data to the edges, we could store other identifiers like smiles and inchikey to help mitigate the effects of naming issues though.

jaclark5

LGTM but I'll leave it for @hannahbaumann to approve. The only concern I have is the the names experimental_binding_data.json is really general if there happens to be additional data to add in the future. We could handle that just like the network files where the stem of the filename is the key to a dictionary. These files could be more specific if they were named something like experimental_binding_data_schrodinger.json.

After this is merges I can draft a PR with an attribute BenchmarkData.experimental_binding_data.

The data factory is now merged so if you want to add a tag to the yaml "exp_bfe" or something, the CI will pick it up if you put the conditions for the tags in the header of test_benchmark_index.py with ("exp_bfe", ["experimental_binding_data_*.json"]) or whatever glob should be associated with the tag.

IAlibay · 2026-02-07T00:54:10Z

These files could be more specific if they were named something like experimental_binding_data_schrodinger.json.

I'm not sure I understand - why would there be experimental data specificly from Schrodinger? All that Schrodinger did was gather the experimental data from published sources.

IAlibay · 2026-02-07T00:55:17Z

openfe_benchmarks/data/data_generation/generate_industry_ref_data.py

+    if system_group not in available_groups:
+        raise ValueError(f"System group {system_group} not found. Available groups: {available_groups}")
+    # check the system name
+    group_data = ref_dg_data[ref_dg_data["system group"] == system_group].reset_index(drop=True)


Do we still rely on the names nowadays? My understanding is that this is mostly fixed in openfe now. If so, it might be better to just have the original names from the original publications (with spaces if necessary)

IAlibay · 2026-02-07T00:57:13Z

.../data/industry_benchmark_systems/charge_annihilation_set/cdk2/experimental_binding_data.json

+            "pint_unit_registry": "openff_units"
+        },
+        "uncertainty": {
+            "magnitude": 0.0,


Are we happy having to assert of near zero floats to be "no experimental uncertainty" or would it be better to use -1 like what David Hahn did in the PLB?

Or maybe even better, don't have the dictionary entry if there's not uncertainty?

Good point lets just remove the entry to avoid any confusion in future if we have no data for it.

jaclark5 · 2026-02-09T15:41:42Z

These files could be more specific if they were named something like experimental_binding_data_schrodinger.json.

I'm not sure I understand - why would there be experimental data specificly from Schrodinger? All that Schrodinger did was gather the experimental data from published sources.

@IAlibay Yes that my quick naive suggestion to make a point that something more specific would allows additional data to follow a similar naming scheme.

However @jthorton are the dois or information on where this data came from recorded anywhere? If you don't want to reference the compiled dataset it seems to me that you need to include the direct ref for the experimental data.

IAlibay · 2026-02-10T03:44:19Z

Adding some kind of metadata to the json files with provenance would be good - ideally we should point to e.g. the original paper and the Ross et al one.

…ata.py Co-authored-by: Hannah Baumann <43765638+hannahbaumann@users.noreply.github.com>

jthorton · 2026-02-11T16:52:38Z

If we are happy with this format, I'll look into adding the DOI metadata for at least the Ross et al paper.

IAlibay · 2026-02-12T23:28:05Z

@jthorton format looks great to me!

IAlibay

From a quick look, this seems good to me - will defer to the others for final approval though.

add reference data for the bfe datasets

b511129

jthorton requested review from hannahbaumann and jaclark5 February 6, 2026 14:50

hannahbaumann reviewed Feb 6, 2026

View reviewed changes

openfe_benchmarks/data/data_generation/generate_industry_ref_data.py Outdated Show resolved Hide resolved

hannahbaumann reviewed Feb 6, 2026

View reviewed changes

openfe_benchmarks/data/data_generation/generate_industry_ref_data.py Outdated Show resolved Hide resolved

hannahbaumann reviewed Feb 6, 2026

View reviewed changes

Merge branch 'main' into bfe_refs

459da89

jaclark5 reviewed Feb 6, 2026

View reviewed changes

IAlibay reviewed Feb 7, 2026

View reviewed changes

jthorton and others added 4 commits February 11, 2026 14:35

Update openfe_benchmarks/data/data_generation/generate_industry_ref_d…

5ccb672

…ata.py Co-authored-by: Hannah Baumann <43765638+hannahbaumann@users.noreply.github.com>

Update openfe_benchmarks/data/data_generation/generate_industry_ref_d…

a0602bb

…ata.py Co-authored-by: Hannah Baumann <43765638+hannahbaumann@users.noreply.github.com>

Merge branch 'main' into bfe_refs

fd6a083

update script to rename ligands, add smiles and inchikeys to ref data

9818f16

add ref data

edfc3a4

jthorton added 2 commits February 13, 2026 10:29

fix tests

fc8b5ba

update benchmark data to include reference data

480283c

jthorton mentioned this pull request Feb 13, 2026

Add FreeSolv #104

Open

add reference doi to bfe exp data

fe2f90c

jthorton requested review from IAlibay, hannahbaumann and jaclark5 February 13, 2026 14:29

IAlibay approved these changes Feb 13, 2026

View reviewed changes

Conversation

jthorton commented Feb 6, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaclark5 left a comment

Choose a reason for hiding this comment

Uh oh!

IAlibay commented Feb 7, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaclark5 commented Feb 9, 2026

Uh oh!

IAlibay commented Feb 10, 2026

Uh oh!

jthorton commented Feb 11, 2026

Uh oh!

IAlibay commented Feb 12, 2026

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants