Technical issues in Tox21MolNet:
Issue 1 : Missing group Key
I've encountered an issue with the setup_processed method when working with the Tox21MolNet and its data (tox21.csv file). It appears that the file does not include a header or key named "group", which is causing a KeyError in the line:
groups = np.array([d["group"] for d in data])
Additionally, the _load_data_from_file method does not seem to utilize the any Reader to create or handle a "group" key in the data. As a result, the group key does not exist in the dictionaries produced by _load_data_from_file, leading to the observed error.
The _load_data_from_file method only yields three keys: features, labels, and ident:
yield dict(features=smiles, labels=labels, ident=row["mol_id"])
Issue 2: Generator Issue with train_test_split
Another issue arises from the use of a generator in the _load_data_from_file method. The generator object cannot be directly passed to train_test_split, as it expects a collection (e.g., a list or array). This causes the following error:
TypeError: Singleton array array(<generator object Tox21MolNet._load_data_from_file at 0x000001FD068AB1B0>,
dtype=object) cannot be considered a valid collection.
Solution: To fix this, the generator output should be converted to a list before using it for splitting:
data = list(self._load_data_from_file(os.path.join(self.raw_dir, f"tox21.csv")))
Tests
Technical issues in
Tox21MolNet:Issue 1 : Missing
groupKeyI've encountered an issue with the
setup_processedmethod when working with theTox21MolNetand its data (tox21.csvfile). It appears that the file does not include a header or key named"group", which is causing aKeyErrorin the line:Additionally, the
_load_data_from_filemethod does not seem to utilize the anyReaderto create or handle a"group"key in the data. As a result, thegroupkey does not exist in the dictionaries produced by_load_data_from_file, leading to the observed error.The
_load_data_from_filemethod only yields three keys:features,labels, andident:Issue 2: Generator Issue with
train_test_splitAnother issue arises from the use of a generator in the
_load_data_from_filemethod. The generator object cannot be directly passed totrain_test_split, as it expects a collection (e.g., a list or array). This causes the following error:Solution: To fix this, the generator output should be converted to a list before using it for splitting:
Tests
setup_processed()with mock data.features,labels,identkeys, features have to be>> able to be converted to a tensor_load_data_from_file()using mock file operations.