Description:
Currently, the code related to dynamic splitting in chebi.py and the proteins repo’s data class is duplicated. Both implementations are effectively the same, which leads to unnecessary code redundancy.
Proposed changes:
-
Move common code to base class — e.g., DynamicDataset — to encapsulate shared dynamic splitting logic.
- Both ChEBI and protein dataset classes should inherit from this base class.
- This will centralize changes and make maintenance easier.
-
Refactor dataset hierarchy to be more generic:
-
Certain hyperparameters that are specific to ChEBI, such as
in XYBaseDataModule, should be pushed down into a ChEBI-specific base class rather than existing in a generic base.
-
Outcome:
- Eliminate duplicate code between
chebi.py and the proteins repo.
- Improve maintainability by isolating dataset-specific configurations.
- Make it easier to introduce new datasets without rewriting the splitting logic.
Description:
Currently, the code related to dynamic splitting in
chebi.pyand the proteins repo’s data class is duplicated. Both implementations are effectively the same, which leads to unnecessary code redundancy.Proposed changes:
Move common code to base class — e.g.,
DynamicDataset— to encapsulate shared dynamic splitting logic.Refactor dataset hierarchy to be more generic:
Certain hyperparameters that are specific to ChEBI, such as
in
XYBaseDataModule, should be pushed down into a ChEBI-specific base class rather than existing in a generic base.Outcome:
chebi.pyand the proteins repo.