Skip to content

Changes in SMILES code in the Master Chemical List? #31

@jonjoncardoso

Description

@jonjoncardoso

Hi everyone,

Our group at the Department of Informatics at King's College London - under Dr. Sophia Tsoka @sophiatsoka - have been revisiting this modelling challenge and we have some questions about changes in SMILES codes in the Master Chemical List.

Ruby (@yutongLi1997) has downloaded the newest version of the master list and compared it with the previous version I had from when I participated in Round #2 of the Competition.

She notice that the structures listed below were a bit different this time. My guess is that these compounds had the wrong SMILES and had been revised more recently but I couldn't locate the changes in the spreadsheet. Can anyone confirm this?

# OSM Codes:
['OSM-S-82', 'OSM-S-88', 'OSM-S-89', 'OSM-S-351', 'OSM-S-546', 'OSM-S-631']

#old_smiles:
 ['CNC(=O)COC(=O)c1cc(C)n(c1C)c2ccc(F)cc2',
 'CNC(=O)CN1CCC(CC1)NCc2cc(C)n(c2C)c3ccc(Cl)cc3',
 'CCNC(=O)[C@@H]1C[C@@H](N)CN1Cc2cc(C)n(c2C)c3ccccc3Cl',
 'Clc1cccc(c1Cl)c2nnc3cncc(OCCc4ccccc4)n23',
 'Fc1ccc(CCOc2cncc3nnc(c4ccc5c[nH]nc5c4)n23)cc1F',
 'COc1ccc(cc1)c2n[nH]c(n2)c3nccn3CCc4ccccc4']

#new_smiles:
['CC1=CC(=C(C)[N]1C2=CC=C(C=C2)F)C(=O)OCC(=NC)O',
 'CC1=CC(=C(C)[N]1C2=CC=C(C=C2)Cl)CNC3CCN(CC3)CC(=NC)O',
 'CCN=C([C@@H]1C[C@H](CN1CC2=C(C)[N](C(=C2)C)C3=CC=CC=C3Cl)N)O',
 'ClC1=CC(Cl)=C(C2=NN=C3C=NC=C(N32)OCCC4=CC=CC=C4)C=C1',
 'FC1=CC(CCOC2=CN=CC3=NN=C(C4=CC=C5C(NN=C5)=C4)N32)=CC=C1F',
 'COC(C=C1)=CC=C1C2=NN=C(N2)C3=NC=CN3CCC4=CC=CC=C4']

PS: What we have been up to

  • We are working on improving the accuracy of our algorithm (modSAR), assessing its weaknesses and limitations, while modelling OSM data.

  • The model I trained on Round 2 did not predict activity of the external test set that well even though the algorithm had performed well on previous datasets we've worked on. Changing from CDK molecular descriptors to more widely used RDKit circular fingerprints have already improved the fit and accuracy of the model in general, but we are still working on validating these results.

  • We are also planning to apply shapley values to help explain activity and to debug models results

  • I uploaded a Jupyter notebook to our repository with exploration on the earlier version of the dataset. Here is the link if anyone is interested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions