-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi everyone,
Our group at the Department of Informatics at King's College London - under Dr. Sophia Tsoka @sophiatsoka - have been revisiting this modelling challenge and we have some questions about changes in SMILES codes in the Master Chemical List.
Ruby (@yutongLi1997) has downloaded the newest version of the master list and compared it with the previous version I had from when I participated in Round #2 of the Competition.
She notice that the structures listed below were a bit different this time. My guess is that these compounds had the wrong SMILES and had been revised more recently but I couldn't locate the changes in the spreadsheet. Can anyone confirm this?
# OSM Codes:
['OSM-S-82', 'OSM-S-88', 'OSM-S-89', 'OSM-S-351', 'OSM-S-546', 'OSM-S-631']
#old_smiles:
['CNC(=O)COC(=O)c1cc(C)n(c1C)c2ccc(F)cc2',
'CNC(=O)CN1CCC(CC1)NCc2cc(C)n(c2C)c3ccc(Cl)cc3',
'CCNC(=O)[C@@H]1C[C@@H](N)CN1Cc2cc(C)n(c2C)c3ccccc3Cl',
'Clc1cccc(c1Cl)c2nnc3cncc(OCCc4ccccc4)n23',
'Fc1ccc(CCOc2cncc3nnc(c4ccc5c[nH]nc5c4)n23)cc1F',
'COc1ccc(cc1)c2n[nH]c(n2)c3nccn3CCc4ccccc4']
#new_smiles:
['CC1=CC(=C(C)[N]1C2=CC=C(C=C2)F)C(=O)OCC(=NC)O',
'CC1=CC(=C(C)[N]1C2=CC=C(C=C2)Cl)CNC3CCN(CC3)CC(=NC)O',
'CCN=C([C@@H]1C[C@H](CN1CC2=C(C)[N](C(=C2)C)C3=CC=CC=C3Cl)N)O',
'ClC1=CC(Cl)=C(C2=NN=C3C=NC=C(N32)OCCC4=CC=CC=C4)C=C1',
'FC1=CC(CCOC2=CN=CC3=NN=C(C4=CC=C5C(NN=C5)=C4)N32)=CC=C1F',
'COC(C=C1)=CC=C1C2=NN=C(N2)C3=NC=CN3CCC4=CC=CC=C4']
PS: What we have been up to
-
We are working on improving the accuracy of our algorithm (modSAR), assessing its weaknesses and limitations, while modelling OSM data.
-
The model I trained on Round 2 did not predict activity of the external test set that well even though the algorithm had performed well on previous datasets we've worked on. Changing from CDK molecular descriptors to more widely used RDKit circular fingerprints have already improved the fit and accuracy of the model in general, but we are still working on validating these results.
-
We are also planning to apply shapley values to help explain activity and to debug models results
-
I uploaded a Jupyter notebook to our repository with exploration on the earlier version of the dataset. Here is the link if anyone is interested.