-
Notifications
You must be signed in to change notification settings - Fork 24
Added implementation of the Kabsch algorithm in ARCSpecies API and as a standalone function in converter. #819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1cb8e1f to
87ab0fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements the Kabsch algorithm for optimal molecular alignment by minimizing root-mean-square deviation (RMSD) between species geometries. The implementation leverages SciPy's Rotation.align_vectors for numerical stability.
Changes:
- Added standalone
kabsch()function inarc/species/converter.pyfor computing RMSD between two XYZ coordinate sets - Integrated Kabsch alignment into
ARCSpeciesAPI via newkabasch()method (note: spelling inconsistency) - Added
translate_to_indices()helper method to support atom mapping and reordering for alignment
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 16 comments.
| File | Description |
|---|---|
| arc/species/converter.py | Implements standalone kabsch() function with center-of-mass translation and SciPy-based alignment; removes unused TYPE_CHECKING import |
| arc/species/species.py | Adds translate_to_indices() for atom reordering and kabasch() method for species-to-species alignment |
| arc/species/converter_test.py | Adds tests for kabsch() function including translation invariance, rotation invariance, and error handling |
| arc/species/species_test.py | Adds tests for both translate_to_indices() and kabasch() methods with various alignment scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Args: | ||
| other (ARCSpecies): The other species to compare to. | ||
| map_ (list): A list of atom indices mapping atoms from this species to the other species. (i.e., ) |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for the 'map_' parameter is incomplete. The parenthetical comment "(i.e., )" is empty and doesn't provide useful information about what the mapping represents. This should clarify that map_[i] gives the index in 'other' that corresponds to atom i in 'self'.
| map_ (list): A list of atom indices mapping atoms from this species to the other species. (i.e., ) | |
| map_ (list): A list where map_[i] is the index in ``other`` corresponding to atom i in ``self``. |
| import math | ||
| import numpy as np | ||
| import os | ||
| from typing import TYPE_CHECKING, Dict, Iterable, List, Optional, Tuple, Union |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TYPE_CHECKING import is no longer used after removing the conditional import block for ARCSpecies. It should be removed from the import statement on line 8 to avoid unused imports.
| from typing import TYPE_CHECKING, Dict, Iterable, List, Optional, Tuple, Union | |
| from typing import Dict, Iterable, List, Optional, Tuple, Union |
arc/species/species_test.py
Outdated
| def test_kabasch(self): | ||
| """Test the kabasch() method.""" | ||
| # Test with self (RMSD should be 0) | ||
| rmsd = self.spc1.kabasch(self.spc1, [0, 1, 2, 3, 4, 5]) | ||
| self.assertAlmostEqual(rmsd, 0.0, delta=1e-5) | ||
|
|
||
| # Test with a copy (RMSD should be 0) | ||
| spc1_copy = self.spc1.copy() | ||
| rmsd = self.spc1.kabasch(spc1_copy, [0, 1, 2, 3, 4, 5]) | ||
| self.assertAlmostEqual(rmsd, 0.0, delta=1e-5) | ||
|
|
||
| # Test with shuffled atoms | ||
| # Create a shuffled version of spc1: swap first two C atoms | ||
| # spc1: C, C, O, H, H, H | ||
| # shuffled: C(1), C(0), O(2), H(3), H(4), H(5) | ||
| # xyz of spc1 | ||
| xyz = self.spc1.get_xyz() | ||
| new_coords = (xyz['coords'][1], xyz['coords'][0]) + xyz['coords'][2:] | ||
| new_symbols = (xyz['symbols'][1], xyz['symbols'][0]) + xyz['symbols'][2:] | ||
| new_isotopes = (xyz['isotopes'][1], xyz['isotopes'][0]) + xyz['isotopes'][2:] | ||
| shuffled_xyz = {'symbols': new_symbols, 'isotopes': new_isotopes, 'coords': new_coords} | ||
| spc_shuffled = ARCSpecies(label='shuffled', xyz=shuffled_xyz, smiles='C=C[O]') | ||
|
|
||
| # Map: we want to pull atoms from spc_shuffled to match spc1 | ||
| # spc1[0] is C(0). In spc_shuffled, C(0) is at index 1. | ||
| # spc1[1] is C(1). In spc_shuffled, C(1) is at index 0. | ||
| # Rest are same. | ||
| map_indices = [1, 0, 2, 3, 4, 5] | ||
| rmsd = self.spc1.kabasch(spc_shuffled, map_indices) | ||
| self.assertAlmostEqual(rmsd, 0.0, delta=1e-5) | ||
|
|
||
| # Test exception | ||
| with self.assertRaises(SpeciesError): | ||
| self.spc1.kabasch("not a species", []) |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage for the case where the map_ list has a different length than the number of atoms in the species. The kabasch method does not validate that the length of map_ matches the number of atoms, which could lead to unexpected behavior or errors. Add a test case to verify proper handling when map_ has incorrect length.
| def test_kabsch(self): | ||
| """Test the kabsch function""" | ||
| xyz1 = {'symbols': ('O', 'H', 'H'), 'isotopes': (16, 1, 1), | ||
| 'coords': ((0.0, 0.0, 0.0), | ||
| (0.0, 0.757, 0.586), | ||
| (0.0, -0.757, 0.586))} | ||
| xyz2 = converter.translate_xyz(xyz1, (10.0, 0.0, 0.0)) | ||
| score = converter.kabsch(xyz1, xyz2) | ||
| self.assertAlmostEqual(score, 0.0, places=5) | ||
|
|
||
| r = Rotation.from_quat([0, 0, np.sin(np.pi/4), np.cos(np.pi/4)]) | ||
| r.apply(np.array(xyz1["coords"])) | ||
| self.assertAlmostEqual(converter.kabsch(xyz1, converter.xyz_from_data(coords=r.apply(np.array(xyz1["coords"])), symbols=xyz1["symbols"], isotopes=xyz1["isotopes"])), 0.0, places=5) | ||
| xyz2 = {i:v for i, v in xyz1.items()} | ||
| xyz2['symbols'] = ('O', 'H', 'H', 'H') | ||
| with self.assertRaises(ValueError): | ||
| converter.kabsch(xyz1, xyz2) |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage for edge case where xyz coordinates have mismatched isotopes but same symbols. The kabsch function only validates that symbols match but doesn't check if isotopes match. Consider adding a test to verify the behavior when isotopes differ.
alongd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kfir4444! Please see some comments below
arc/species/converter.py
Outdated
| from typing import TYPE_CHECKING, Dict, Iterable, List, Optional, Tuple, Union | ||
|
|
||
| from ase import Atoms | ||
| from scipy.spatial.transform import Rotation as R |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R looks too generic for me. I suggest at least to call it Rotation if not something more explicit like VectorRotation
arc/species/species.py
Outdated
| """ | ||
| if not isinstance(other, ARCSpecies): | ||
| raise SpeciesError(f'Other must be an ARCSpecies instance, got {type(other)}' | ||
| f'If you meant to use the XYZ coordinates directly, use arc.species.converter.kabasch.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no \n or at least . (dot & space) between these two sentences, so we'll get "... got TYPEIf you..."
please fix
arc/species/species.py
Outdated
| raise SpeciesError(f'Other must be an ARCSpecies instance, got {type(other)}' | ||
| f'If you meant to use the XYZ coordinates directly, use arc.species.converter.kabasch.') | ||
|
|
||
| xyz1 = translate_to_center_of_mass(self.get_xyz()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The converter.kabsch() function already translates to the center of mass (safest to do it there), we should avoid translating here as well.
| with self.assertRaises(IndexError): | ||
| converter.sorted_distances_of_atom(xyz_dict, 5) | ||
|
|
||
| def test_kabsch(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a "wildtype" test of a real medium-size molecule?
| xyz_to_zmat) | ||
|
|
||
| if TYPE_CHECKING: | ||
| from arc.species.species import ARCSpecies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we remove it? It's being used here: https://github.com/ReactionMechanismGenerator/ARC/blob/main/arc/species/converter.py#L450
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in parenthesis, so it's not actually being used as an object, but as a string there. Otherwise it would cause the tests to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's the idea. You import TYPE_CHECKING which evaluates to False during runtime. then you can safely import objects without worrying about circular references, and only use them for type hints for static IDE checks, given that they are in " ". Otherwise, the static checks won't really work.
Not sure if this is the best reference, but see:
https://stackoverflow.com/questions/75633665/type-checking-for-mutually-referencing-classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where you are coming from, but when the object is in quotation mark, the import doesn't happen. From the link you added:
# Source - https://stackoverflow.com/a
# Posted by Silvio Mayolo
# Retrieved 2026-01-14, License - CC BY-SA 4.0
# example.py
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from a import A
def example(a: A):
...
# a.py
class A:
passNote that in def example(a: A): the A is not in parenthesis. Since in our code we use a string for the type hint, I figured that we don't have to import and type check.
The goal is to be able to match the rmsd of two atom mapped species xyzs. Good for TS, verification, atom mapping, etc.
This is causing cyclic imports and not necessary, as the type hinting for ARCSpecies in this module is in parenthesis either way.
About the algorithm
Kabsch’s algorithm computes the optimal rotation that superimposes two sets of vectors (represented as matrices) by minimizing the root-mean-square deviation (RMSD) between them.
For numerical stability and robustness, this implementation relies on the optimized routine provided by SciPy via scipy.spatial.transform.Rotation.align_vectors
This PR
This PR introduces a Kabsch-based alignment utility in
species/converter.pyand integrates it into theARCSpeciesAPI.The implementation explicitly accounts for key assumptions required for molecular alignment, including S2S mapping.
To support this workflow, a new helper method,
translate_to_indices, was added. This method converts an atom mapping into index-based ordering, enabling consistent reordering and alignment of species geometries prior to applying the Kabsch algorithm.Overall, this addition enables robust, minimal-RMSD alignment of mapped species geometries while preserving chemical correspondence.