Skip to content

#3525 [Feature] Expose ability to calculate fully expanded large molecule into chemical structures#3525

Merged
AlexanderSavelyev merged 16 commits intoepam:masterfrom
sapiosciences-dev:yqiao_pr/CHEMBUGS-79
Mar 26, 2026
Merged

#3525 [Feature] Expose ability to calculate fully expanded large molecule into chemical structures#3525
AlexanderSavelyev merged 16 commits intoepam:masterfrom
sapiosciences-dev:yqiao_pr/CHEMBUGS-79

Conversation

@sapiosciences-dev
Copy link
Copy Markdown
Collaborator

@sapiosciences-dev sapiosciences-dev commented Mar 2, 2026

Developer:
Yechen Qiao, Core Platform Developer, Sapio Sciences.

Summary of Changes:

  1. Exposes indigoExpandedMonomersToAtoms C++ for external calls to python via expandedMonomersToAtoms.
    The purpose of this method: Fully expands all template atoms (monomers) in a macromolecule to regular atoms.
    It creates a working copy of the molecule to avoid side effects, marks all template atoms as
    expanded, then calls the internal expandedMonomersToAtoms() method to perform the actual
    expansion. The result is a new molecule with all monomers fully expanded to their atomic
    structures, suitable for molecular weight calculations and compatibility with third-party tools.

  2. Internally, expandedMonomersToAtoms was previously expanding T-groups but fails to delete T-group references. Causing JSON loading/saving to fail.

  3. bfsPropagate in mm_expand crashes with segmentation fault if there is no monomers to expand.

Generic request

  • PR name follows the pattern #1234 – issue name
  • branch name does not contain '#'
  • base branch (master or release/xx) is correct
  • PR is linked with the issue
  • task status changed to "Code review"
  • code follows product standards
  • regression tests updated

For release/xx branch

  • backmerge to master (or newer release/xx) branch is created

Optional

  • unit-tests written
  • documentation updated

…, and fixed existing method expandedMonomersToAtoms crashing out due to invalid indexes in the cloned molecule when attaching.
…T-groups.

For Saver: do not save Tgroups with invalid ID reference. Obviously we will still keep templates.
…T-groups.

For Saver: do not save Tgroups with invalid ID reference. Obviously we will still keep templates.
# Conflicts:
#	api/c/indigo/src/indigo_molecule.cpp
#	core/indigo-core/molecule/src/base_molecule.cpp
Copy link
Copy Markdown
Collaborator

@AlexanderSavelyev AlexanderSavelyev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add python tests for the new API
Also, one need to add to other languages as well (.NET and java)

1. Need to expand two specific psuedoatoms: OH and NH2, in Ketcher default Monomer library file.
2. Added support for other langauges.
post-merge fixes.
added unit test and fixed clang format.
added unit test and fixed clang format.
added unit test and fixed clang format.
fixed clang format. Actually ran clang this time locally with same method as CI :)
fixed accidental line change unrelated.
now updated python lint for the new unit test.
forgot to git add the unit test data.
@sapiosciences-dev sapiosciences-dev changed the title [Feature] Expose ability to calculate fully expanded large molecule into chemical structures #3525 [Feature] Expose ability to calculate fully expanded large molecule into chemical structures Mar 17, 2026
moving unit test folder for consistency.
@sapiosciences-dev
Copy link
Copy Markdown
Collaborator Author

@AlexanderSavelyev

I believe I have now addressed all requested items, and the tests are passing successfully. When you have a chance, I would be grateful if you could take another look. Thank you in advance for your review.

I would also like to provide one note regarding the new C-level method I added this week in indigo_group_psuedoatoms_expand.h and indigo_abbrevations_expand.cpp.

The entries in GROUP_PSEUDOATOM_EXPAND_LABELS correspond to labels used in the default monomers library distributed with recent Ketcher releases. In those upstream monomers.ket library files, atoms appear whose element/type is set to "OH" and "NH2".

These are not valid standard atoms for V3000 interchange. If such labels are preserved as atom types, Indigo will treat them as pseudoatoms during property calculations such as molecular weight, causing them to contribute 0.0 mass. In other toolkits, including RDKit, such structures may fail to load altogether.

To address this, I added an additional method that constructs the full substructure for OH and NH2, respectively, and then performs a substructure merge after creation.

I kept this logic in new methods rather than incorporating it directly into the original monomer expansion function because the issue does not arise from Indigo’s self-contained core API alone. Rather, it is needed to handle compatibility with labels originating from the upstream Ketcher monomer-library definitions.

For clarity, this repository itself does not include monomers.ket (included as tests but not in compilation), but only the more rudimentary amino-acid abbreviation table. Nevertheless, the additional handling is still necessary here for interoperability with structures produced from the upstream Ketcher monomer library.

The unit test added for this new function is based on a structure that contains a macromolecule requiring expansion. One component in that structure is "PEG-4", which, in monomers.ket, includes "OH" represented as a single pseudoatom. This test case was chosen specifically to verify that the additional expansion step handles such upstream monomer definitions correctly.

For that reason, these methods are used within the new API functionality exposed to Python, Java, C++, .NET, and R, since without this full expansion step, reliable structure interchange and accurate property calculation would not be possible.

Best regards,
Yechen Qiao
Sapio Sciences

@sapiosciences-dev sapiosciences-dev self-assigned this Mar 17, 2026
@AlexanderSavelyev AlexanderSavelyev merged commit 22fd4b0 into epam:master Mar 26, 2026
48 checks passed
@AlexanderSavelyev
Copy link
Copy Markdown
Collaborator

@sapiosciences-dev merged, thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants