Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions machine_learning/confusion_matrix.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
"""
Confusion Matrix implementation for evaluating classification models.

A confusion matrix is a table used to evaluate the performance of a
classification algorithm by comparing predicted labels against actual labels.

Reference: https://en.wikipedia.org/wiki/Confusion_matrix
"""

import numpy as np


def confusion_matrix(actual: list, predicted: list) -> np.ndarray:
"""
Calculate the confusion matrix for binary or multiclass classification.

Args:
actual: List of actual class labels.
predicted: List of predicted class labels.

Returns:
A 2D numpy array representing the confusion matrix.

Examples:
>>> actual = [1, 0, 1, 1, 0, 1]
>>> predicted = [1, 0, 0, 1, 0, 0]
>>> confusion_matrix(actual, predicted)
array([[2, 0],
[2, 2]])

>>> actual = [0, 0, 1, 1, 2, 2]
>>> predicted = [0, 1, 1, 2, 2, 0]
>>> confusion_matrix(actual, predicted)
array([[1, 1, 0],
[0, 1, 1],
[1, 0, 1]])
"""
classes = sorted(set(actual) | set(predicted))
n = len(classes)
class_to_index = {c: i for i, c in enumerate(classes)}

matrix = np.zeros((n, n), dtype=int)
for a, p in zip(actual, predicted):
matrix[class_to_index[a]][class_to_index[p]] += 1
Comment on lines +42 to +44
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zip(actual, predicted) will silently drop extra items when the input lists have different lengths, producing an incorrect confusion matrix without any error. Add an explicit length check up-front (and raise ValueError) so mismatched inputs fail fast (similar to other ML metric functions in this repo).

Copilot uses AI. Check for mistakes.

return matrix


def precision(actual: list, predicted: list, positive_label: int = 1) -> float:
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The positive_label: int = 1 type hint is overly restrictive: class labels are often strings or other hashable types, and precision/recall/f1_score work as long as positive_label is comparable to items in actual/predicted. Consider loosening the annotation (e.g., a TypeVar/Hashable) to avoid misleading API contracts and type checker errors.

Copilot uses AI. Check for mistakes.
"""
Calculate precision: TP / (TP + FP).

Args:
actual: List of actual class labels.
predicted: List of predicted class labels.
positive_label: The label considered as positive class.

Comment on lines +49 to +57
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precision, recall, and f1_score are implemented as binary (or one-vs-rest via positive_label) metrics, but the docstrings don’t state this and could be interpreted as multiclass-averaged metrics. Clarify the behavior in the docstrings (and optionally add a doctest example showing one-vs-rest usage for a multiclass label set).

Copilot uses AI. Check for mistakes.
Returns:
Precision score as a float.

Examples:
>>> actual = [1, 0, 1, 1, 0, 1]
>>> predicted = [1, 0, 0, 1, 0, 0]
>>> precision(actual, predicted)
1.0

>>> actual = [1, 0, 1, 1, 0, 1]
>>> predicted = [1, 1, 0, 1, 0, 0]
>>> precision(actual, predicted)
0.6666666666666666
"""
tp = sum(
1
for a, p in zip(actual, predicted)
if a == positive_label and p == positive_label
)
fp = sum(
1
for a, p in zip(actual, predicted)
if a != positive_label and p == positive_label
)
return tp / (tp + fp) if (tp + fp) > 0 else 0.0
Comment on lines +72 to +82
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precision() iterates with zip(actual, predicted), so if the two inputs differ in length the computation is silently truncated. Consider validating equal lengths (and raising ValueError) before computing TP/FP so callers can’t get an incorrect metric without noticing.

Copilot uses AI. Check for mistakes.


def recall(actual: list, predicted: list, positive_label: int = 1) -> float:
"""
Calculate recall (sensitivity): TP / (TP + FN).

Args:
actual: List of actual class labels.
predicted: List of predicted class labels.
positive_label: The label considered as positive class.

Returns:
Recall score as a float.

Examples:
>>> actual = [1, 0, 1, 1, 0, 1]
>>> predicted = [1, 0, 0, 1, 0, 0]
>>> recall(actual, predicted)
0.5

>>> actual = [1, 0, 1, 1, 0, 1]
>>> predicted = [1, 1, 1, 1, 0, 1]
>>> recall(actual, predicted)
1.0
"""
tp = sum(
1
for a, p in zip(actual, predicted)
if a == positive_label and p == positive_label
)
fn = sum(
1
for a, p in zip(actual, predicted)
if a == positive_label and p != positive_label
)
return tp / (tp + fn) if (tp + fn) > 0 else 0.0
Comment on lines +108 to +118
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recall() has the same silent-truncation issue as precision() due to zip(actual, predicted). Add an explicit length check (raise ValueError) before computing TP/FN.

Copilot uses AI. Check for mistakes.


def f1_score(actual: list, predicted: list, positive_label: int = 1) -> float:
"""
Calculate F1 score: harmonic mean of precision and recall.

Args:
actual: List of actual class labels.
predicted: List of predicted class labels.
positive_label: The label considered as positive class.

Returns:
F1 score as a float.

Examples:
>>> actual = [1, 0, 1, 1, 0, 1]
>>> predicted = [1, 0, 0, 1, 0, 0]
>>> round(f1_score(actual, predicted), 4)
0.6667

>>> actual = [1, 0, 1, 1, 0, 1]
>>> predicted = [1, 0, 1, 1, 0, 1]
>>> f1_score(actual, predicted)
1.0
"""
p = precision(actual, predicted, positive_label)
r = recall(actual, predicted, positive_label)
return 2 * p * r / (p + r) if (p + r) > 0 else 0.0


if __name__ == "__main__":
import doctest

doctest.testmod()
Loading