#3525 [Feature] Expose ability to calculate fully expanded large molecule into chemical structures#3525
Conversation
…, and fixed existing method expandedMonomersToAtoms crashing out due to invalid indexes in the cloned molecule when attaching.
…T-groups. For Saver: do not save Tgroups with invalid ID reference. Obviously we will still keep templates.
…T-groups. For Saver: do not save Tgroups with invalid ID reference. Obviously we will still keep templates.
# Conflicts: # api/c/indigo/src/indigo_molecule.cpp # core/indigo-core/molecule/src/base_molecule.cpp
AlexanderSavelyev
left a comment
There was a problem hiding this comment.
please add python tests for the new API
Also, one need to add to other languages as well (.NET and java)
1. Need to expand two specific psuedoatoms: OH and NH2, in Ketcher default Monomer library file. 2. Added support for other langauges.
post-merge fixes.
added unit test and fixed clang format.
added unit test and fixed clang format.
added unit test and fixed clang format.
fixed clang format. Actually ran clang this time locally with same method as CI :)
fixed accidental line change unrelated.
now updated python lint for the new unit test.
forgot to git add the unit test data.
moving unit test folder for consistency.
|
I believe I have now addressed all requested items, and the tests are passing successfully. When you have a chance, I would be grateful if you could take another look. Thank you in advance for your review. I would also like to provide one note regarding the new C-level method I added this week in indigo_group_psuedoatoms_expand.h and indigo_abbrevations_expand.cpp. The entries in GROUP_PSEUDOATOM_EXPAND_LABELS correspond to labels used in the default monomers library distributed with recent Ketcher releases. In those upstream monomers.ket library files, atoms appear whose element/type is set to "OH" and "NH2". These are not valid standard atoms for V3000 interchange. If such labels are preserved as atom types, Indigo will treat them as pseudoatoms during property calculations such as molecular weight, causing them to contribute 0.0 mass. In other toolkits, including RDKit, such structures may fail to load altogether. To address this, I added an additional method that constructs the full substructure for OH and NH2, respectively, and then performs a substructure merge after creation. I kept this logic in new methods rather than incorporating it directly into the original monomer expansion function because the issue does not arise from Indigo’s self-contained core API alone. Rather, it is needed to handle compatibility with labels originating from the upstream Ketcher monomer-library definitions. For clarity, this repository itself does not include monomers.ket (included as tests but not in compilation), but only the more rudimentary amino-acid abbreviation table. Nevertheless, the additional handling is still necessary here for interoperability with structures produced from the upstream Ketcher monomer library. The unit test added for this new function is based on a structure that contains a macromolecule requiring expansion. One component in that structure is "PEG-4", which, in monomers.ket, includes "OH" represented as a single pseudoatom. This test case was chosen specifically to verify that the additional expansion step handles such upstream monomer definitions correctly. For that reason, these methods are used within the new API functionality exposed to Python, Java, C++, .NET, and R, since without this full expansion step, reliable structure interchange and accurate property calculation would not be possible. Best regards, |
|
@sapiosciences-dev merged, thanks for the contribution! |
Developer:
Yechen Qiao, Core Platform Developer, Sapio Sciences.
Summary of Changes:
Exposes indigoExpandedMonomersToAtoms C++ for external calls to python via expandedMonomersToAtoms.
The purpose of this method: Fully expands all template atoms (monomers) in a macromolecule to regular atoms.
It creates a working copy of the molecule to avoid side effects, marks all template atoms as
expanded, then calls the internal expandedMonomersToAtoms() method to perform the actual
expansion. The result is a new molecule with all monomers fully expanded to their atomic
structures, suitable for molecular weight calculations and compatibility with third-party tools.
Internally, expandedMonomersToAtoms was previously expanding T-groups but fails to delete T-group references. Causing JSON loading/saving to fail.
bfsPropagate in mm_expand crashes with segmentation fault if there is no monomers to expand.
Generic request
#1234 – issue nameFor release/xx branch
Optional