Skip to content

Conversation

@kfir4444
Copy link
Collaborator

About the algorithm

Kabsch’s algorithm computes the optimal rotation that superimposes two sets of vectors (represented as matrices) by minimizing the root-mean-square deviation (RMSD) between them.
For numerical stability and robustness, this implementation relies on the optimized routine provided by SciPy via scipy.spatial.transform.Rotation.align_vectors

This PR

This PR introduces a Kabsch-based alignment utility in species/converter.py and integrates it into the ARCSpecies API.
The implementation explicitly accounts for key assumptions required for molecular alignment, including S2S mapping.
To support this workflow, a new helper method, translate_to_indices, was added. This method converts an atom mapping into index-based ordering, enabling consistent reordering and alignment of species geometries prior to applying the Kabsch algorithm.

Overall, this addition enables robust, minimal-RMSD alignment of mapped species geometries while preserving chemical correspondence.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the Kabsch algorithm for optimal molecular alignment by minimizing root-mean-square deviation (RMSD) between species geometries. The implementation leverages SciPy's Rotation.align_vectors for numerical stability.

Changes:

  • Added standalone kabsch() function in arc/species/converter.py for computing RMSD between two XYZ coordinate sets
  • Integrated Kabsch alignment into ARCSpecies API via new kabasch() method (note: spelling inconsistency)
  • Added translate_to_indices() helper method to support atom mapping and reordering for alignment

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 16 comments.

File Description
arc/species/converter.py Implements standalone kabsch() function with center-of-mass translation and SciPy-based alignment; removes unused TYPE_CHECKING import
arc/species/species.py Adds translate_to_indices() for atom reordering and kabasch() method for species-to-species alignment
arc/species/converter_test.py Adds tests for kabsch() function including translation invariance, rotation invariance, and error handling
arc/species/species_test.py Adds tests for both translate_to_indices() and kabasch() methods with various alignment scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Args:
other (ARCSpecies): The other species to compare to.
map_ (list): A list of atom indices mapping atoms from this species to the other species. (i.e., )
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for the 'map_' parameter is incomplete. The parenthetical comment "(i.e., )" is empty and doesn't provide useful information about what the mapping represents. This should clarify that map_[i] gives the index in 'other' that corresponds to atom i in 'self'.

Suggested change
map_ (list): A list of atom indices mapping atoms from this species to the other species. (i.e., )
map_ (list): A list where map_[i] is the index in ``other`` corresponding to atom i in ``self``.

Copilot uses AI. Check for mistakes.
import math
import numpy as np
import os
from typing import TYPE_CHECKING, Dict, Iterable, List, Optional, Tuple, Union
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TYPE_CHECKING import is no longer used after removing the conditional import block for ARCSpecies. It should be removed from the import statement on line 8 to avoid unused imports.

Suggested change
from typing import TYPE_CHECKING, Dict, Iterable, List, Optional, Tuple, Union
from typing import Dict, Iterable, List, Optional, Tuple, Union

Copilot uses AI. Check for mistakes.
Comment on lines 2822 to 2855
def test_kabasch(self):
"""Test the kabasch() method."""
# Test with self (RMSD should be 0)
rmsd = self.spc1.kabasch(self.spc1, [0, 1, 2, 3, 4, 5])
self.assertAlmostEqual(rmsd, 0.0, delta=1e-5)

# Test with a copy (RMSD should be 0)
spc1_copy = self.spc1.copy()
rmsd = self.spc1.kabasch(spc1_copy, [0, 1, 2, 3, 4, 5])
self.assertAlmostEqual(rmsd, 0.0, delta=1e-5)

# Test with shuffled atoms
# Create a shuffled version of spc1: swap first two C atoms
# spc1: C, C, O, H, H, H
# shuffled: C(1), C(0), O(2), H(3), H(4), H(5)
# xyz of spc1
xyz = self.spc1.get_xyz()
new_coords = (xyz['coords'][1], xyz['coords'][0]) + xyz['coords'][2:]
new_symbols = (xyz['symbols'][1], xyz['symbols'][0]) + xyz['symbols'][2:]
new_isotopes = (xyz['isotopes'][1], xyz['isotopes'][0]) + xyz['isotopes'][2:]
shuffled_xyz = {'symbols': new_symbols, 'isotopes': new_isotopes, 'coords': new_coords}
spc_shuffled = ARCSpecies(label='shuffled', xyz=shuffled_xyz, smiles='C=C[O]')

# Map: we want to pull atoms from spc_shuffled to match spc1
# spc1[0] is C(0). In spc_shuffled, C(0) is at index 1.
# spc1[1] is C(1). In spc_shuffled, C(1) is at index 0.
# Rest are same.
map_indices = [1, 0, 2, 3, 4, 5]
rmsd = self.spc1.kabasch(spc_shuffled, map_indices)
self.assertAlmostEqual(rmsd, 0.0, delta=1e-5)

# Test exception
with self.assertRaises(SpeciesError):
self.spc1.kabasch("not a species", [])
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for the case where the map_ list has a different length than the number of atoms in the species. The kabasch method does not validate that the length of map_ matches the number of atoms, which could lead to unexpected behavior or errors. Add a test case to verify proper handling when map_ has incorrect length.

Copilot uses AI. Check for mistakes.
Comment on lines 4970 to 4985
def test_kabsch(self):
"""Test the kabsch function"""
xyz1 = {'symbols': ('O', 'H', 'H'), 'isotopes': (16, 1, 1),
'coords': ((0.0, 0.0, 0.0),
(0.0, 0.757, 0.586),
(0.0, -0.757, 0.586))}
xyz2 = converter.translate_xyz(xyz1, (10.0, 0.0, 0.0))
score = converter.kabsch(xyz1, xyz2)
self.assertAlmostEqual(score, 0.0, places=5)

r = Rotation.from_quat([0, 0, np.sin(np.pi/4), np.cos(np.pi/4)])
r.apply(np.array(xyz1["coords"]))
self.assertAlmostEqual(converter.kabsch(xyz1, converter.xyz_from_data(coords=r.apply(np.array(xyz1["coords"])), symbols=xyz1["symbols"], isotopes=xyz1["isotopes"])), 0.0, places=5)
xyz2 = {i:v for i, v in xyz1.items()}
xyz2['symbols'] = ('O', 'H', 'H', 'H')
with self.assertRaises(ValueError):
converter.kabsch(xyz1, xyz2)
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for edge case where xyz coordinates have mismatched isotopes but same symbols. The kabsch function only validates that symbols match but doesn't check if isotopes match. Consider adding a test to verify the behavior when isotopes differ.

Copilot uses AI. Check for mistakes.
Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kfir4444! Please see some comments below

from typing import TYPE_CHECKING, Dict, Iterable, List, Optional, Tuple, Union

from ase import Atoms
from scipy.spatial.transform import Rotation as R
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R looks too generic for me. I suggest at least to call it Rotation if not something more explicit like VectorRotation

"""
if not isinstance(other, ARCSpecies):
raise SpeciesError(f'Other must be an ARCSpecies instance, got {type(other)}'
f'If you meant to use the XYZ coordinates directly, use arc.species.converter.kabasch.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no \n or at least . (dot & space) between these two sentences, so we'll get "... got TYPEIf you..."
please fix

raise SpeciesError(f'Other must be an ARCSpecies instance, got {type(other)}'
f'If you meant to use the XYZ coordinates directly, use arc.species.converter.kabasch.')

xyz1 = translate_to_center_of_mass(self.get_xyz())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The converter.kabsch() function already translates to the center of mass (safest to do it there), we should avoid translating here as well.

with self.assertRaises(IndexError):
converter.sorted_distances_of_atom(xyz_dict, 5)

def test_kabsch(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a "wildtype" test of a real medium-size molecule?

xyz_to_zmat)

if TYPE_CHECKING:
from arc.species.species import ARCSpecies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in parenthesis, so it's not actually being used as an object, but as a string there. Otherwise it would cause the tests to fail

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that's the idea. You import TYPE_CHECKING which evaluates to False during runtime. then you can safely import objects without worrying about circular references, and only use them for type hints for static IDE checks, given that they are in " ". Otherwise, the static checks won't really work.
Not sure if this is the best reference, but see:
https://stackoverflow.com/questions/75633665/type-checking-for-mutually-referencing-classes

Copy link
Collaborator Author

@kfir4444 kfir4444 Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you are coming from, but when the object is in quotation mark, the import doesn't happen. From the link you added:

# Source - https://stackoverflow.com/a
# Posted by Silvio Mayolo
# Retrieved 2026-01-14, License - CC BY-SA 4.0

# example.py
from __future__ import annotations
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from a import A

def example(a: A):
    ...

# a.py
class A:
    pass

Note that in def example(a: A): the A is not in parenthesis. Since in our code we use a string for the type hint, I figured that we don't have to import and type check.

The goal is to be able to match the rmsd of two atom mapped species xyzs. Good for TS, verification, atom mapping, etc.
This is causing cyclic imports and not necessary, as the type hinting for ARCSpecies in this module is in parenthesis either way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants