Preprocessing unit tests

We already have some tests for data preprocessing. However, those are more integration tests that capture the behaviour of the tool as a whole than unit tests for specific functions. 
In order to efficiently test the different preprocessing functionalities, we need to add some smaller-scale unit tests. Those should not include real data, but sample input values that can be generated from scratch.

Here are the classes / functions that should be covered (from the implementation in the `protein_prediction` branch
`reader.py`:
- DataReader: `to_data()`
- ChemDataReader: `_read_data()`
- DeepChemDataReader: `_read_data()`
- SelfiesReader: `_read_data()`
- ProteinDataReader: `_read_data()`
`collate.py`:
- DefaultCollator: `__call__()`
- RaggedCollator: `__call__()`, `process_label_rows()`
`datasets/base.py`
- XYBaseDataModule: `_filter_labels()`
- DynamicDataset: `get_test_split()`, `get_train_val_splits_given_test()`
`datasets/chebi.py`
- _ChEBIDataExtractor: `_extract_class_hierarchy()`, `_graph_to_raw_dataset()`, `_load_dict()`, `_setup_pruned_test_set()`
- ChEBIOverX: `select_classes()`
- ChEBIOverXPartial: `extract_class_hierarchy()`
- `term_callback()`
`datasets/go_uniprot.py`:
- _GOUniprotDataExtractor: `_extract_class_hierarchy()`, `term_callback()`, `_graph_to_raw_dataset()`, `_get_swiss_to_go_mapping()`, `_load_dict()`
- _GoUniProtOverX: `select_classes()`
`datasets/tox21.py`:
- ~Tox21MolNet: `setup_processed()`, `_load_data_from_file()`~
- Tox21Challenge: `setup_processed()`, `_load_data_from_file()`, `_load_dict()`

For some functions, it is necessary to read from / write to files. Instead of real files, I would suggest to use mock objects (see e.g. [this comment](https://stackoverflow.com/a/55657594/22965350))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing unit tests #45

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Preprocessing unit tests #45

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions