Add tcrblosum support to TCRdist#685
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #685 +/- ##
==========================================
- Coverage 19.75% 19.60% -0.16%
==========================================
Files 51 51
Lines 4581 4607 +26
==========================================
- Hits 905 903 -2
- Misses 3676 3704 +28
🚀 New features to boost your workflow:
|
grst
left a comment
There was a problem hiding this comment.
Implementation-wise this looks great!
What's still missing is
- changelog update
- Documentation-update of the user-facing (pp.ir_dist) method. Probably best to add a new metric
tcrblosumortcrdist_tcrblosum. - Reference to the TCRblosum paper in the documentation
- Maybe tutorial update?
| # fmt: off | ||
| tcr_dict_distance_matrix = {('A', 'A'): 0, ('A', 'C'): 4, ('A', 'D'): 4, ('A', 'E'): 4, ('A', 'F'): 4, ('A', 'G'): 4, ('A', 'H'): 4, ('A', 'I'): 4, ('A', 'K'): 4, ('A', 'L'): 4, ('A', 'M'): 4, ('A', 'N'): 4, ('A', 'P'): 4, ('A', 'Q'): 4, ('A', 'R'): 4, ('A', 'S'): 3, ('A', 'T'): 4, ('A', 'V'): 4, ('A', 'W'): 4, ('A', 'Y'): 4, ('C', 'A'): 4, ('C', 'C'): 0, ('C', 'D'): 4, ('C', 'E'): 4, ('C', 'F'): 4, ('C', 'G'): 4, ('C', 'H'): 4, ('C', 'I'): 4, ('C', 'K'): 4, ('C', 'L'): 4, ('C', 'M'): 4, ('C', 'N'): 4, ('C', 'P'): 4, ('C', 'Q'): 4, ('C', 'R'): 4, ('C', 'S'): 4, ('C', 'T'): 4, ('C', 'V'): 4, ('C', 'W'): 4, ('C', 'Y'): 4, ('D', 'A'): 4, ('D', 'C'): 4, ('D', 'D'): 0, ('D', 'E'): 2, ('D', 'F'): 4, ('D', 'G'): 4, ('D', 'H'): 4, ('D', 'I'): 4, ('D', 'K'): 4, ('D', 'L'): 4, ('D', 'M'): 4, ('D', 'N'): 3, ('D', 'P'): 4, ('D', 'Q'): 4, ('D', 'R'): 4, ('D', 'S'): 4, ('D', 'T'): 4, ('D', 'V'): 4, ('D', 'W'): 4, ('D', 'Y'): 4, ('E', 'A'): 4, ('E', 'C'): 4, ('E', 'D'): 2, ('E', 'E'): 0, ('E', 'F'): 4, ('E', 'G'): 4, ('E', 'H'): 4, ('E', 'I'): 4, ('E', 'K'): 3, ('E', 'L'): 4, ('E', 'M'): 4, ('E', 'N'): 4, ('E', 'P'): 4, ('E', 'Q'): 2, ('E', 'R'): 4, ('E', 'S'): 4, ('E', 'T'): 4, ('E', 'V'): 4, ('E', 'W'): 4, ('E', 'Y'): 4, ('F', 'A'): 4, ('F', 'C'): 4, ('F', 'D'): 4, ('F', 'E'): 4, ('F', 'F'): 0, ('F', 'G'): 4, ('F', 'H'): 4, ('F', 'I'): 4, ('F', 'K'): 4, ('F', 'L'): 4, ('F', 'M'): 4, ('F', 'N'): 4, ('F', 'P'): 4, ('F', 'Q'): 4, ('F', 'R'): 4, ('F', 'S'): 4, ('F', 'T'): 4, ('F', 'V'): 4, ('F', 'W'): 3, ('F', 'Y'): 1, ('G', 'A'): 4, ('G', 'C'): 4, ('G', 'D'): 4, ('G', 'E'): 4, ('G', 'F'): 4, ('G', 'G'): 0, ('G', 'H'): 4, ('G', 'I'): 4, ('G', 'K'): 4, ('G', 'L'): 4, ('G', 'M'): 4, ('G', 'N'): 4, ('G', 'P'): 4, ('G', 'Q'): 4, ('G', 'R'): 4, ('G', 'S'): 4, ('G', 'T'): 4, ('G', 'V'): 4, ('G', 'W'): 4, ('G', 'Y'): 4, ('H', 'A'): 4, ('H', 'C'): 4, ('H', 'D'): 4, ('H', 'E'): 4, ('H', 'F'): 4, ('H', 'G'): 4, ('H', 'H'): 0, ('H', 'I'): 4, ('H', 'K'): 4, ('H', 'L'): 4, ('H', 'M'): 4, ('H', 'N'): 3, ('H', 'P'): 4, ('H', 'Q'): 4, ('H', 'R'): 4, ('H', 'S'): 4, ('H', 'T'): 4, ('H', 'V'): 4, ('H', 'W'): 4, ('H', 'Y'): 2, ('I', 'A'): 4, ('I', 'C'): 4, ('I', 'D'): 4, ('I', 'E'): 4, ('I', 'F'): 4, ('I', 'G'): 4, ('I', 'H'): 4, ('I', 'I'): 0, ('I', 'K'): 4, ('I', 'L'): 2, ('I', 'M'): 3, ('I', 'N'): 4, ('I', 'P'): 4, ('I', 'Q'): 4, ('I', 'R'): 4, ('I', 'S'): 4, ('I', 'T'): 4, ('I', 'V'): 1, ('I', 'W'): 4, ('I', 'Y'): 4, ('K', 'A'): 4, ('K', 'C'): 4, ('K', 'D'): 4, ('K', 'E'): 3, ('K', 'F'): 4, ('K', 'G'): 4, ('K', 'H'): 4, ('K', 'I'): 4, ('K', 'K'): 0, ('K', 'L'): 4, ('K', 'M'): 4, ('K', 'N'): 4, ('K', 'P'): 4, ('K', 'Q'): 3, ('K', 'R'): 2, ('K', 'S'): 4, ('K', 'T'): 4, ('K', 'V'): 4, ('K', 'W'): 4, ('K', 'Y'): 4, ('L', 'A'): 4, ('L', 'C'): 4, ('L', 'D'): 4, ('L', 'E'): 4, ('L', 'F'): 4, ('L', 'G'): 4, ('L', 'H'): 4, ('L', 'I'): 2, ('L', 'K'): 4, ('L', 'L'): 0, ('L', 'M'): 2, ('L', 'N'): 4, ('L', 'P'): 4, ('L', 'Q'): 4, ('L', 'R'): 4, ('L', 'S'): 4, ('L', 'T'): 4, ('L', 'V'): 3, ('L', 'W'): 4, ('L', 'Y'): 4, ('M', 'A'): 4, ('M', 'C'): 4, ('M', 'D'): 4, ('M', 'E'): 4, ('M', 'F'): 4, ('M', 'G'): 4, ('M', 'H'): 4, ('M', 'I'): 3, ('M', 'K'): 4, ('M', 'L'): 2, ('M', 'M'): 0, ('M', 'N'): 4, ('M', 'P'): 4, ('M', 'Q'): 4, ('M', 'R'): 4, ('M', 'S'): 4, ('M', 'T'): 4, ('M', 'V'): 3, ('M', 'W'): 4, ('M', 'Y'): 4, ('N', 'A'): 4, ('N', 'C'): 4, ('N', 'D'): 3, ('N', 'E'): 4, ('N', 'F'): 4, ('N', 'G'): 4, ('N', 'H'): 3, ('N', 'I'): 4, ('N', 'K'): 4, ('N', 'L'): 4, ('N', 'M'): 4, ('N', 'N'): 0, ('N', 'P'): 4, ('N', 'Q'): 4, ('N', 'R'): 4, ('N', 'S'): 3, ('N', 'T'): 4, ('N', 'V'): 4, ('N', 'W'): 4, ('N', 'Y'): 4, ('P', 'A'): 4, ('P', 'C'): 4, ('P', 'D'): 4, ('P', 'E'): 4, ('P', 'F'): 4, ('P', 'G'): 4, ('P', 'H'): 4, ('P', 'I'): 4, ('P', 'K'): 4, ('P', 'L'): 4, ('P', 'M'): 4, ('P', 'N'): 4, ('P', 'P'): 0, ('P', 'Q'): 4, ('P', 'R'): 4, ('P', 'S'): 4, ('P', 'T'): 4, ('P', 'V'): 4, ('P', 'W'): 4, ('P', 'Y'): 4, ('Q', 'A'): 4, ('Q', 'C'): 4, ('Q', 'D'): 4, ('Q', 'E'): 2, ('Q', 'F'): 4, ('Q', 'G'): 4, ('Q', 'H'): 4, ('Q', 'I'): 4, ('Q', 'K'): 3, ('Q', 'L'): 4, ('Q', 'M'): 4, ('Q', 'N'): 4, ('Q', 'P'): 4, ('Q', 'Q'): 0, ('Q', 'R'): 3, ('Q', 'S'): 4, ('Q', 'T'): 4, ('Q', 'V'): 4, ('Q', 'W'): 4, ('Q', 'Y'): 4, ('R', 'A'): 4, ('R', 'C'): 4, ('R', 'D'): 4, ('R', 'E'): 4, ('R', 'F'): 4, ('R', 'G'): 4, ('R', 'H'): 4, ('R', 'I'): 4, ('R', 'K'): 2, ('R', 'L'): 4, ('R', 'M'): 4, ('R', 'N'): 4, ('R', 'P'): 4, ('R', 'Q'): 3, ('R', 'R'): 0, ('R', 'S'): 4, ('R', 'T'): 4, ('R', 'V'): 4, ('R', 'W'): 4, ('R', 'Y'): 4, ('S', 'A'): 3, ('S', 'C'): 4, ('S', 'D'): 4, ('S', 'E'): 4, ('S', 'F'): 4, ('S', 'G'): 4, ('S', 'H'): 4, ('S', 'I'): 4, ('S', 'K'): 4, ('S', 'L'): 4, ('S', 'M'): 4, ('S', 'N'): 3, ('S', 'P'): 4, ('S', 'Q'): 4, ('S', 'R'): 4, ('S', 'S'): 0, ('S', 'T'): 3, ('S', 'V'): 4, ('S', 'W'): 4, ('S', 'Y'): 4, ('T', 'A'): 4, ('T', 'C'): 4, ('T', 'D'): 4, ('T', 'E'): 4, ('T', 'F'): 4, ('T', 'G'): 4, ('T', 'H'): 4, ('T', 'I'): 4, ('T', 'K'): 4, ('T', 'L'): 4, ('T', 'M'): 4, ('T', 'N'): 4, ('T', 'P'): 4, ('T', 'Q'): 4, ('T', 'R'): 4, ('T', 'S'): 3, ('T', 'T'): 0, ('T', 'V'): 4, ('T', 'W'): 4, ('T', 'Y'): 4, ('V', 'A'): 4, ('V', 'C'): 4, ('V', 'D'): 4, ('V', 'E'): 4, ('V', 'F'): 4, ('V', 'G'): 4, ('V', 'H'): 4, ('V', 'I'): 1, ('V', 'K'): 4, ('V', 'L'): 3, ('V', 'M'): 3, ('V', 'N'): 4, ('V', 'P'): 4, ('V', 'Q'): 4, ('V', 'R'): 4, ('V', 'S'): 4, ('V', 'T'): 4, ('V', 'V'): 0, ('V', 'W'): 4, ('V', 'Y'): 4, ('W', 'A'): 4, ('W', 'C'): 4, ('W', 'D'): 4, ('W', 'E'): 4, ('W', 'F'): 3, ('W', 'G'): 4, ('W', 'H'): 4, ('W', 'I'): 4, ('W', 'K'): 4, ('W', 'L'): 4, ('W', 'M'): 4, ('W', 'N'): 4, ('W', 'P'): 4, ('W', 'Q'): 4, ('W', 'R'): 4, ('W', 'S'): 4, ('W', 'T'): 4, ('W', 'V'): 4, ('W', 'W'): 0, ('W', 'Y'): 2, ('Y', 'A'): 4, ('Y', 'C'): 4, ('Y', 'D'): 4, ('Y', 'E'): 4, ('Y', 'F'): 1, ('Y', 'G'): 4, ('Y', 'H'): 2, ('Y', 'I'): 4, ('Y', 'K'): 4, ('Y', 'L'): 4, ('Y', 'M'): 4, ('Y', 'N'): 4, ('Y', 'P'): 4, ('Y', 'Q'): 4, ('Y', 'R'): 4, ('Y', 'S'): 4, ('Y', 'T'): 4, ('Y', 'V'): 4, ('Y', 'W'): 2, ('Y', 'Y'): 0} | ||
| blosum62_distance_matrix = {('A', 'A'): 0, ('A', 'C'): 4, ('A', 'D'): 4, ('A', 'E'): 4, ('A', 'F'): 4, ('A', 'G'): 4, ('A', 'H'): 4, ('A', 'I'): 4, ('A', 'K'): 4, ('A', 'L'): 4, ('A', 'M'): 4, ('A', 'N'): 4, ('A', 'P'): 4, ('A', 'Q'): 4, ('A', 'R'): 4, ('A', 'S'): 3, ('A', 'T'): 4, ('A', 'V'): 4, ('A', 'W'): 4, ('A', 'Y'): 4, ('C', 'A'): 4, ('C', 'C'): 0, ('C', 'D'): 4, ('C', 'E'): 4, ('C', 'F'): 4, ('C', 'G'): 4, ('C', 'H'): 4, ('C', 'I'): 4, ('C', 'K'): 4, ('C', 'L'): 4, ('C', 'M'): 4, ('C', 'N'): 4, ('C', 'P'): 4, ('C', 'Q'): 4, ('C', 'R'): 4, ('C', 'S'): 4, ('C', 'T'): 4, ('C', 'V'): 4, ('C', 'W'): 4, ('C', 'Y'): 4, ('D', 'A'): 4, ('D', 'C'): 4, ('D', 'D'): 0, ('D', 'E'): 2, ('D', 'F'): 4, ('D', 'G'): 4, ('D', 'H'): 4, ('D', 'I'): 4, ('D', 'K'): 4, ('D', 'L'): 4, ('D', 'M'): 4, ('D', 'N'): 3, ('D', 'P'): 4, ('D', 'Q'): 4, ('D', 'R'): 4, ('D', 'S'): 4, ('D', 'T'): 4, ('D', 'V'): 4, ('D', 'W'): 4, ('D', 'Y'): 4, ('E', 'A'): 4, ('E', 'C'): 4, ('E', 'D'): 2, ('E', 'E'): 0, ('E', 'F'): 4, ('E', 'G'): 4, ('E', 'H'): 4, ('E', 'I'): 4, ('E', 'K'): 3, ('E', 'L'): 4, ('E', 'M'): 4, ('E', 'N'): 4, ('E', 'P'): 4, ('E', 'Q'): 2, ('E', 'R'): 4, ('E', 'S'): 4, ('E', 'T'): 4, ('E', 'V'): 4, ('E', 'W'): 4, ('E', 'Y'): 4, ('F', 'A'): 4, ('F', 'C'): 4, ('F', 'D'): 4, ('F', 'E'): 4, ('F', 'F'): 0, ('F', 'G'): 4, ('F', 'H'): 4, ('F', 'I'): 4, ('F', 'K'): 4, ('F', 'L'): 4, ('F', 'M'): 4, ('F', 'N'): 4, ('F', 'P'): 4, ('F', 'Q'): 4, ('F', 'R'): 4, ('F', 'S'): 4, ('F', 'T'): 4, ('F', 'V'): 4, ('F', 'W'): 3, ('F', 'Y'): 1, ('G', 'A'): 4, ('G', 'C'): 4, ('G', 'D'): 4, ('G', 'E'): 4, ('G', 'F'): 4, ('G', 'G'): 0, ('G', 'H'): 4, ('G', 'I'): 4, ('G', 'K'): 4, ('G', 'L'): 4, ('G', 'M'): 4, ('G', 'N'): 4, ('G', 'P'): 4, ('G', 'Q'): 4, ('G', 'R'): 4, ('G', 'S'): 4, ('G', 'T'): 4, ('G', 'V'): 4, ('G', 'W'): 4, ('G', 'Y'): 4, ('H', 'A'): 4, ('H', 'C'): 4, ('H', 'D'): 4, ('H', 'E'): 4, ('H', 'F'): 4, ('H', 'G'): 4, ('H', 'H'): 0, ('H', 'I'): 4, ('H', 'K'): 4, ('H', 'L'): 4, ('H', 'M'): 4, ('H', 'N'): 3, ('H', 'P'): 4, ('H', 'Q'): 4, ('H', 'R'): 4, ('H', 'S'): 4, ('H', 'T'): 4, ('H', 'V'): 4, ('H', 'W'): 4, ('H', 'Y'): 2, ('I', 'A'): 4, ('I', 'C'): 4, ('I', 'D'): 4, ('I', 'E'): 4, ('I', 'F'): 4, ('I', 'G'): 4, ('I', 'H'): 4, ('I', 'I'): 0, ('I', 'K'): 4, ('I', 'L'): 2, ('I', 'M'): 3, ('I', 'N'): 4, ('I', 'P'): 4, ('I', 'Q'): 4, ('I', 'R'): 4, ('I', 'S'): 4, ('I', 'T'): 4, ('I', 'V'): 1, ('I', 'W'): 4, ('I', 'Y'): 4, ('K', 'A'): 4, ('K', 'C'): 4, ('K', 'D'): 4, ('K', 'E'): 3, ('K', 'F'): 4, ('K', 'G'): 4, ('K', 'H'): 4, ('K', 'I'): 4, ('K', 'K'): 0, ('K', 'L'): 4, ('K', 'M'): 4, ('K', 'N'): 4, ('K', 'P'): 4, ('K', 'Q'): 3, ('K', 'R'): 2, ('K', 'S'): 4, ('K', 'T'): 4, ('K', 'V'): 4, ('K', 'W'): 4, ('K', 'Y'): 4, ('L', 'A'): 4, ('L', 'C'): 4, ('L', 'D'): 4, ('L', 'E'): 4, ('L', 'F'): 4, ('L', 'G'): 4, ('L', 'H'): 4, ('L', 'I'): 2, ('L', 'K'): 4, ('L', 'L'): 0, ('L', 'M'): 2, ('L', 'N'): 4, ('L', 'P'): 4, ('L', 'Q'): 4, ('L', 'R'): 4, ('L', 'S'): 4, ('L', 'T'): 4, ('L', 'V'): 3, ('L', 'W'): 4, ('L', 'Y'): 4, ('M', 'A'): 4, ('M', 'C'): 4, ('M', 'D'): 4, ('M', 'E'): 4, ('M', 'F'): 4, ('M', 'G'): 4, ('M', 'H'): 4, ('M', 'I'): 3, ('M', 'K'): 4, ('M', 'L'): 2, ('M', 'M'): 0, ('M', 'N'): 4, ('M', 'P'): 4, ('M', 'Q'): 4, ('M', 'R'): 4, ('M', 'S'): 4, ('M', 'T'): 4, ('M', 'V'): 3, ('M', 'W'): 4, ('M', 'Y'): 4, ('N', 'A'): 4, ('N', 'C'): 4, ('N', 'D'): 3, ('N', 'E'): 4, ('N', 'F'): 4, ('N', 'G'): 4, ('N', 'H'): 3, ('N', 'I'): 4, ('N', 'K'): 4, ('N', 'L'): 4, ('N', 'M'): 4, ('N', 'N'): 0, ('N', 'P'): 4, ('N', 'Q'): 4, ('N', 'R'): 4, ('N', 'S'): 3, ('N', 'T'): 4, ('N', 'V'): 4, ('N', 'W'): 4, ('N', 'Y'): 4, ('P', 'A'): 4, ('P', 'C'): 4, ('P', 'D'): 4, ('P', 'E'): 4, ('P', 'F'): 4, ('P', 'G'): 4, ('P', 'H'): 4, ('P', 'I'): 4, ('P', 'K'): 4, ('P', 'L'): 4, ('P', 'M'): 4, ('P', 'N'): 4, ('P', 'P'): 0, ('P', 'Q'): 4, ('P', 'R'): 4, ('P', 'S'): 4, ('P', 'T'): 4, ('P', 'V'): 4, ('P', 'W'): 4, ('P', 'Y'): 4, ('Q', 'A'): 4, ('Q', 'C'): 4, ('Q', 'D'): 4, ('Q', 'E'): 2, ('Q', 'F'): 4, ('Q', 'G'): 4, ('Q', 'H'): 4, ('Q', 'I'): 4, ('Q', 'K'): 3, ('Q', 'L'): 4, ('Q', 'M'): 4, ('Q', 'N'): 4, ('Q', 'P'): 4, ('Q', 'Q'): 0, ('Q', 'R'): 3, ('Q', 'S'): 4, ('Q', 'T'): 4, ('Q', 'V'): 4, ('Q', 'W'): 4, ('Q', 'Y'): 4, ('R', 'A'): 4, ('R', 'C'): 4, ('R', 'D'): 4, ('R', 'E'): 4, ('R', 'F'): 4, ('R', 'G'): 4, ('R', 'H'): 4, ('R', 'I'): 4, ('R', 'K'): 2, ('R', 'L'): 4, ('R', 'M'): 4, ('R', 'N'): 4, ('R', 'P'): 4, ('R', 'Q'): 3, ('R', 'R'): 0, ('R', 'S'): 4, ('R', 'T'): 4, ('R', 'V'): 4, ('R', 'W'): 4, ('R', 'Y'): 4, ('S', 'A'): 3, ('S', 'C'): 4, ('S', 'D'): 4, ('S', 'E'): 4, ('S', 'F'): 4, ('S', 'G'): 4, ('S', 'H'): 4, ('S', 'I'): 4, ('S', 'K'): 4, ('S', 'L'): 4, ('S', 'M'): 4, ('S', 'N'): 3, ('S', 'P'): 4, ('S', 'Q'): 4, ('S', 'R'): 4, ('S', 'S'): 0, ('S', 'T'): 3, ('S', 'V'): 4, ('S', 'W'): 4, ('S', 'Y'): 4, ('T', 'A'): 4, ('T', 'C'): 4, ('T', 'D'): 4, ('T', 'E'): 4, ('T', 'F'): 4, ('T', 'G'): 4, ('T', 'H'): 4, ('T', 'I'): 4, ('T', 'K'): 4, ('T', 'L'): 4, ('T', 'M'): 4, ('T', 'N'): 4, ('T', 'P'): 4, ('T', 'Q'): 4, ('T', 'R'): 4, ('T', 'S'): 3, ('T', 'T'): 0, ('T', 'V'): 4, ('T', 'W'): 4, ('T', 'Y'): 4, ('V', 'A'): 4, ('V', 'C'): 4, ('V', 'D'): 4, ('V', 'E'): 4, ('V', 'F'): 4, ('V', 'G'): 4, ('V', 'H'): 4, ('V', 'I'): 1, ('V', 'K'): 4, ('V', 'L'): 3, ('V', 'M'): 3, ('V', 'N'): 4, ('V', 'P'): 4, ('V', 'Q'): 4, ('V', 'R'): 4, ('V', 'S'): 4, ('V', 'T'): 4, ('V', 'V'): 0, ('V', 'W'): 4, ('V', 'Y'): 4, ('W', 'A'): 4, ('W', 'C'): 4, ('W', 'D'): 4, ('W', 'E'): 4, ('W', 'F'): 3, ('W', 'G'): 4, ('W', 'H'): 4, ('W', 'I'): 4, ('W', 'K'): 4, ('W', 'L'): 4, ('W', 'M'): 4, ('W', 'N'): 4, ('W', 'P'): 4, ('W', 'Q'): 4, ('W', 'R'): 4, ('W', 'S'): 4, ('W', 'T'): 4, ('W', 'V'): 4, ('W', 'W'): 0, ('W', 'Y'): 2, ('Y', 'A'): 4, ('Y', 'C'): 4, ('Y', 'D'): 4, ('Y', 'E'): 4, ('Y', 'F'): 1, ('Y', 'G'): 4, ('Y', 'H'): 2, ('Y', 'I'): 4, ('Y', 'K'): 4, ('Y', 'L'): 4, ('Y', 'M'): 4, ('Y', 'N'): 4, ('Y', 'P'): 4, ('Y', 'Q'): 4, ('Y', 'R'): 4, ('Y', 'S'): 4, ('Y', 'T'): 4, ('Y', 'V'): 4, ('Y', 'W'): 2, ('Y', 'Y'): 0} | ||
| tcrblosum_alpha_distance_matrix = {('A', 'A'): 0, ('A', 'R'): 4, ('A', 'N'): 4, ('A', 'D'): 4, ('A', 'C'): 4, ('A', 'Q'): 4, ('A', 'E'): 4, ('A', 'G'): 4, ('A', 'H'): 4, ('A', 'I'): 4, ('A', 'L'): 4, ('A', 'K'): 4, ('A', 'M'): 4, ('A', 'F'): 4, ('A', 'P'): 4, ('A', 'S'): 4, ('A', 'T'): 4, ('A', 'W'): 4, ('A', 'Y'): 4, ('A', 'V'): 4, ('R', 'A'): 4, ('R', 'R'): 0, ('R', 'N'): 4, ('R', 'D'): 4, ('R', 'C'): 3, ('R', 'Q'): 4, ('R', 'E'): 4, ('R', 'G'): 4, ('R', 'H'): 4, ('R', 'I'): 4, ('R', 'L'): 4, ('R', 'K'): 4, ('R', 'M'): 4, ('R', 'F'): 4, ('R', 'P'): 4, ('R', 'S'): 4, ('R', 'T'): 4, ('R', 'W'): 4, ('R', 'Y'): 4, ('R', 'V'): 4, ('N', 'A'): 4, ('N', 'R'): 4, ('N', 'N'): 0, ('N', 'D'): 4, ('N', 'C'): 4, ('N', 'Q'): 4, ('N', 'E'): 4, ('N', 'G'): 4, ('N', 'H'): 4, ('N', 'I'): 4, ('N', 'L'): 4, ('N', 'K'): 3, ('N', 'M'): 4, ('N', 'F'): 4, ('N', 'P'): 4, ('N', 'S'): 4, ('N', 'T'): 4, ('N', 'W'): 4, ('N', 'Y'): 4, ('N', 'V'): 4, ('D', 'A'): 4, ('D', 'R'): 4, ('D', 'N'): 4, ('D', 'D'): 0, ('D', 'C'): 4, ('D', 'Q'): 4, ('D', 'E'): 4, ('D', 'G'): 4, ('D', 'H'): 4, ('D', 'I'): 4, ('D', 'L'): 4, ('D', 'K'): 4, ('D', 'M'): 4, ('D', 'F'): 4, ('D', 'P'): 4, ('D', 'S'): 4, ('D', 'T'): 4, ('D', 'W'): 4, ('D', 'Y'): 4, ('D', 'V'): 4, ('C', 'A'): 4, ('C', 'R'): 3, ('C', 'N'): 4, ('C', 'D'): 4, ('C', 'C'): 0, ('C', 'Q'): 4, ('C', 'E'): 4, ('C', 'G'): 4, ('C', 'H'): 4, ('C', 'I'): 4, ('C', 'L'): 4, ('C', 'K'): 4, ('C', 'M'): 4, ('C', 'F'): 4, ('C', 'P'): 4, ('C', 'S'): 4, ('C', 'T'): 4, ('C', 'W'): 4, ('C', 'Y'): 4, ('C', 'V'): 4, ('Q', 'A'): 4, ('Q', 'R'): 4, ('Q', 'N'): 4, ('Q', 'D'): 4, ('Q', 'C'): 4, ('Q', 'Q'): 0, ('Q', 'E'): 4, ('Q', 'G'): 4, ('Q', 'H'): 4, ('Q', 'I'): 4, ('Q', 'L'): 4, ('Q', 'K'): 3, ('Q', 'M'): 4, ('Q', 'F'): 4, ('Q', 'P'): 4, ('Q', 'S'): 4, ('Q', 'T'): 4, ('Q', 'W'): 4, ('Q', 'Y'): 4, ('Q', 'V'): 4, ('E', 'A'): 4, ('E', 'R'): 4, ('E', 'N'): 4, ('E', 'D'): 4, ('E', 'C'): 4, ('E', 'Q'): 4, ('E', 'E'): 0, ('E', 'G'): 4, ('E', 'H'): 3, ('E', 'I'): 4, ('E', 'L'): 4, ('E', 'K'): 4, ('E', 'M'): 4, ('E', 'F'): 4, ('E', 'P'): 4, ('E', 'S'): 4, ('E', 'T'): 4, ('E', 'W'): 4, ('E', 'Y'): 4, ('E', 'V'): 4, ('G', 'A'): 4, ('G', 'R'): 4, ('G', 'N'): 4, ('G', 'D'): 4, ('G', 'C'): 4, ('G', 'Q'): 4, ('G', 'E'): 4, ('G', 'G'): 0, ('G', 'H'): 4, ('G', 'I'): 4, ('G', 'L'): 4, ('G', 'K'): 4, ('G', 'M'): 4, ('G', 'F'): 4, ('G', 'P'): 4, ('G', 'S'): 4, ('G', 'T'): 4, ('G', 'W'): 4, ('G', 'Y'): 4, ('G', 'V'): 4, ('H', 'A'): 4, ('H', 'R'): 4, ('H', 'N'): 4, ('H', 'D'): 4, ('H', 'C'): 4, ('H', 'Q'): 4, ('H', 'E'): 3, ('H', 'G'): 4, ('H', 'H'): 0, ('H', 'I'): 4, ('H', 'L'): 4, ('H', 'K'): 4, ('H', 'M'): 4, ('H', 'F'): 4, ('H', 'P'): 3, ('H', 'S'): 4, ('H', 'T'): 4, ('H', 'W'): 3, ('H', 'Y'): 4, ('H', 'V'): 4, ('I', 'A'): 4, ('I', 'R'): 4, ('I', 'N'): 4, ('I', 'D'): 4, ('I', 'C'): 4, ('I', 'Q'): 4, ('I', 'E'): 4, ('I', 'G'): 4, ('I', 'H'): 4, ('I', 'I'): 0, ('I', 'L'): 4, ('I', 'K'): 4, ('I', 'M'): 4, ('I', 'F'): 4, ('I', 'P'): 4, ('I', 'S'): 4, ('I', 'T'): 3, ('I', 'W'): 4, ('I', 'Y'): 4, ('I', 'V'): 4, ('L', 'A'): 4, ('L', 'R'): 4, ('L', 'N'): 4, ('L', 'D'): 4, ('L', 'C'): 4, ('L', 'Q'): 4, ('L', 'E'): 4, ('L', 'G'): 4, ('L', 'H'): 4, ('L', 'I'): 4, ('L', 'L'): 0, ('L', 'K'): 4, ('L', 'M'): 4, ('L', 'F'): 3, ('L', 'P'): 4, ('L', 'S'): 4, ('L', 'T'): 4, ('L', 'W'): 4, ('L', 'Y'): 4, ('L', 'V'): 4, ('K', 'A'): 4, ('K', 'R'): 4, ('K', 'N'): 3, ('K', 'D'): 4, ('K', 'C'): 4, ('K', 'Q'): 3, ('K', 'E'): 4, ('K', 'G'): 4, ('K', 'H'): 4, ('K', 'I'): 4, ('K', 'L'): 4, ('K', 'K'): 0, ('K', 'M'): 4, ('K', 'F'): 4, ('K', 'P'): 4, ('K', 'S'): 4, ('K', 'T'): 4, ('K', 'W'): 4, ('K', 'Y'): 4, ('K', 'V'): 4, ('M', 'A'): 4, ('M', 'R'): 4, ('M', 'N'): 4, ('M', 'D'): 4, ('M', 'C'): 4, ('M', 'Q'): 4, ('M', 'E'): 4, ('M', 'G'): 4, ('M', 'H'): 4, ('M', 'I'): 4, ('M', 'L'): 4, ('M', 'K'): 4, ('M', 'M'): 0, ('M', 'F'): 4, ('M', 'P'): 4, ('M', 'S'): 4, ('M', 'T'): 4, ('M', 'W'): 4, ('M', 'Y'): 4, ('M', 'V'): 4, ('F', 'A'): 4, ('F', 'R'): 4, ('F', 'N'): 4, ('F', 'D'): 4, ('F', 'C'): 4, ('F', 'Q'): 4, ('F', 'E'): 4, ('F', 'G'): 4, ('F', 'H'): 4, ('F', 'I'): 4, ('F', 'L'): 3, ('F', 'K'): 4, ('F', 'M'): 4, ('F', 'F'): 0, ('F', 'P'): 4, ('F', 'S'): 4, ('F', 'T'): 4, ('F', 'W'): 4, ('F', 'Y'): 4, ('F', 'V'): 4, ('P', 'A'): 4, ('P', 'R'): 4, ('P', 'N'): 4, ('P', 'D'): 4, ('P', 'C'): 4, ('P', 'Q'): 4, ('P', 'E'): 4, ('P', 'G'): 4, ('P', 'H'): 3, ('P', 'I'): 4, ('P', 'L'): 4, ('P', 'K'): 4, ('P', 'M'): 4, ('P', 'F'): 4, ('P', 'P'): 0, ('P', 'S'): 4, ('P', 'T'): 4, ('P', 'W'): 4, ('P', 'Y'): 4, ('P', 'V'): 4, ('S', 'A'): 4, ('S', 'R'): 4, ('S', 'N'): 4, ('S', 'D'): 4, ('S', 'C'): 4, ('S', 'Q'): 4, ('S', 'E'): 4, ('S', 'G'): 4, ('S', 'H'): 4, ('S', 'I'): 4, ('S', 'L'): 4, ('S', 'K'): 4, ('S', 'M'): 4, ('S', 'F'): 4, ('S', 'P'): 4, ('S', 'S'): 0, ('S', 'T'): 4, ('S', 'W'): 4, ('S', 'Y'): 4, ('S', 'V'): 4, ('T', 'A'): 4, ('T', 'R'): 4, ('T', 'N'): 4, ('T', 'D'): 4, ('T', 'C'): 4, ('T', 'Q'): 4, ('T', 'E'): 4, ('T', 'G'): 4, ('T', 'H'): 4, ('T', 'I'): 3, ('T', 'L'): 4, ('T', 'K'): 4, ('T', 'M'): 4, ('T', 'F'): 4, ('T', 'P'): 4, ('T', 'S'): 4, ('T', 'T'): 0, ('T', 'W'): 4, ('T', 'Y'): 4, ('T', 'V'): 4, ('W', 'A'): 4, ('W', 'R'): 4, ('W', 'N'): 4, ('W', 'D'): 4, ('W', 'C'): 4, ('W', 'Q'): 4, ('W', 'E'): 4, ('W', 'G'): 4, ('W', 'H'): 3, ('W', 'I'): 4, ('W', 'L'): 4, ('W', 'K'): 4, ('W', 'M'): 4, ('W', 'F'): 4, ('W', 'P'): 4, ('W', 'S'): 4, ('W', 'T'): 4, ('W', 'W'): 0, ('W', 'Y'): 4, ('W', 'V'): 4, ('Y', 'A'): 4, ('Y', 'R'): 4, ('Y', 'N'): 4, ('Y', 'D'): 4, ('Y', 'C'): 4, ('Y', 'Q'): 4, ('Y', 'E'): 4, ('Y', 'G'): 4, ('Y', 'H'): 4, ('Y', 'I'): 4, ('Y', 'L'): 4, ('Y', 'K'): 4, ('Y', 'M'): 4, ('Y', 'F'): 4, ('Y', 'P'): 4, ('Y', 'S'): 4, ('Y', 'T'): 4, ('Y', 'W'): 4, ('Y', 'Y'): 0, ('Y', 'V'): 4, ('V', 'A'): 4, ('V', 'R'): 4, ('V', 'N'): 4, ('V', 'D'): 4, ('V', 'C'): 4, ('V', 'Q'): 4, ('V', 'E'): 4, ('V', 'G'): 4, ('V', 'H'): 4, ('V', 'I'): 4, ('V', 'L'): 4, ('V', 'K'): 4, ('V', 'M'): 4, ('V', 'F'): 4, ('V', 'P'): 4, ('V', 'S'): 4, ('V', 'T'): 4, ('V', 'W'): 4, ('V', 'Y'): 4, ('V', 'V'): 0} |
There was a problem hiding this comment.
Did you make any modifications to the original TCRblosum matrix? e.g. the cap at 4?
There was a problem hiding this comment.
First of all, you always have this conversion from the substitution matrix (blosum62, tcrblosum) to a distance matrix.
In the tcrblosum paper , they just menttion "Firstly, we transformed the tcrBLOSUM similarity matrix into a distance matrix according to the rules of TCRdist [12]".
In the original TCRdist paper they write:
"The mismatch distance is defined based on the BLOSUM62 (ref. 37) substitution matrix as follows: distance (a, a)=0; distance (a, b)=min (4, 4-BLOSUM62 (a, b)), where 4 is 1 unit greater than the most favourable BLOSUM62 score for a mismatch, and a and b are amino acids".
Now there would be two options:
- also use the constant cap 4 like in the original tcrdist paper
- use a cap that is one unit greater than the most favourable score
However, the most favourable score would be 1 for tcrblosum_alpha and 2 for tcrblosum_beta which would result in a quite low cap (see matrices in notebook). Therefore I just went with the fixed cap of 4 for the transformation formula from substitution matrix to distance matrix of the original tcrdist implementation:
distance(a, a) = 0 and distance(a, b) = min(4, 4 - score)
Unfortunately, I wasn't able to find out how they did it exactly in the tcrblosum paper. Another thing to consider is, that the distance values for alpha and beta chain might be compared later during the clonotype clustering. Therefore having two different caps might cause problems.
What do you think would be the best option?
There was a problem hiding this comment.
Dear @apostovskaya,
we are working on integrating your tcrBLOSUM matrix into scirpy, which contains a reimplementation of the TCRdist algorithm.
We are unsure how you intended the matrix to be used with TCRdist. Could you please clarify what you mean with
Firstly, we transformed the tcrBLOSUM similarity matrix into a distance matrix according to the rules of TCRdist [12]
in your paper and how you would suggest to set the cap?
There was a problem hiding this comment.
"The mismatch distance is defined based on the BLOSUM62 (ref. 37) substitution matrix as follows: distance (a, a)=0; distance (a, b)=min (4, 4-BLOSUM62 (a, b)), where 4 is 1 unit greater than the most favourable BLOSUM62 score for a mismatch, and a and b are amino acids".
tbh, I never understood this part about TCRdist. It skews the matrix quite a bit, basically assigning a distance of 4 to all mismatches that have a negative distance in BLOSUM62 (even if it's just -1). So the only way to get a score
Having a cap of 2 with TCRblosum doesn't make sense to me, because then the strong negative effect of mismatches with C would be completely gone which defeats the purpose of TCRblosum. Using 4 is just as arbitrary.
1086548 to
0177b37
Compare
|
I updated the approach now according to what was discussed via email. The distance matrices are now computed from the original substitution matrices (blosum62, tcrblosum_alpha, tcrblosum_beta) rather than stored as precomputed distance tables. In general, the calculation is now done as follows: The My intention for the user call would be the following rather than creating an additional tcrdist_tcrblosum metric: tcrdist with blosum62 ir.pp.ir_dist(
mdata,
metric="tcrdist",
sequence="aa",
cutoff=15,
#base_matrix="blosum62", #set by default
#distance_cap=4, #set by default
)tcrdist with tcrblosum matrices ir.pp.ir_dist(
mdata,
metric="tcrdist",
sequence="aa",
cutoff=15,
base_matrix="tcrblosum",
#distance_cap=None, #set by default
) |
for more information, see https://pre-commit.ci
…rving the existing defaults for BLOSUM62 and TCRBLOSUM. Cover default, capped, custom capped, and uncapped behavior in sequence_dist tests.
3a61b7a to
6c4c5e6
Compare
| distance_matrix: | ||
| distance lookup matrix | ||
| """ | ||
| dm = np.zeros((len(alphabet), len(alphabet)), dtype=np.int32) |
There was a problem hiding this comment.
Does this imply that letters in the alphabet, but not in the matrix have a distance of 0 (e.g. X vs. A)?
Yes - this behavior comes from the original pwseqdist base implementation. Maybe this was the pragmatic approach to treat unknown residues as "uninformative" rather than as evidence of dissimilarity. One could see capturing true matches as more important than avoiding false matches.
The question is if we want to change this.
FWIW, there's also variants of BLOSUM62 that include these extended amino acids
However, probably we don't have something like that for the tcrblosum matrices.
|
Implementation-wise this looks good, and I also prefer to make the tcrdist implementation generic such that it works with arbitrary matrices rather than introducing another metric. We just need to make sure to advertise it appropriately in the tutorial, otherwise the functionality is a bit hidden. |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
I made a short tutorial update to mention tcrblosum usage. |

So far, the TCRdist metric used a distance matrix derived from the blosum62 substitution matrix. This PR extends TCRdistDistanceCalculator with a new base_matrix="tcrblosum" option alongside the existing default blosum62 behavior. This way, distance matrices based on the tcrblosum substitution matrices (different matrices for alpha and beta chain) are used for the TCRdist metric calculation.
I try to illustrate how I derived the tcrblosum based distance matrices in this google colab notebook.
The usage of the tcrblosum matrices was already discussed in #591.