Skip to content

feat: add confusion matrix with precision, recall, and F1 score#14318

Open
Sagargupta16 wants to merge 2 commits intoTheAlgorithms:masterfrom
Sagargupta16:add-confusion-matrix
Open

feat: add confusion matrix with precision, recall, and F1 score#14318
Sagargupta16 wants to merge 2 commits intoTheAlgorithms:masterfrom
Sagargupta16:add-confusion-matrix

Conversation

@Sagargupta16
Copy link

Describe your change:

Added classification evaluation metrics to machine_learning/:

  • confusion_matrix: Binary and multiclass support
  • precision: TP / (TP + FP)
  • recall: TP / (TP + FN)
  • f1_score: Harmonic mean of precision and recall

The existing scoring_functions.py only has regression metrics (MAE, MSE, RMSE). Classification metrics were missing.

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Add or change documentation?
  • An existing implementation is improved

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
  • If this pull request is for a pre-existing algorithm, I have linked to the issue.

Sagargupta16 and others added 2 commits March 2, 2026 07:04
Add classification evaluation metrics:
- confusion_matrix: binary and multiclass support
- precision: TP / (TP + FP)
- recall (sensitivity): TP / (TP + FN)
- f1_score: harmonic mean of precision and recall

All functions include doctests.
Copilot AI review requested due to automatic review settings March 2, 2026 01:46
@algorithms-keeper algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Mar 2, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds core classification evaluation utilities to machine_learning/, complementing the existing regression-focused metrics by providing confusion-matrix-based scoring.

Changes:

  • Introduces a confusion_matrix() implementation supporting binary and multiclass labels.
  • Adds binary/one-vs-rest precision(), recall(), and f1_score() metrics (via positive_label).
  • Includes doctest examples and a __main__ doctest runner for the new module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +42 to +44
matrix = np.zeros((n, n), dtype=int)
for a, p in zip(actual, predicted):
matrix[class_to_index[a]][class_to_index[p]] += 1
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zip(actual, predicted) will silently drop extra items when the input lists have different lengths, producing an incorrect confusion matrix without any error. Add an explicit length check up-front (and raise ValueError) so mismatched inputs fail fast (similar to other ML metric functions in this repo).

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +82
tp = sum(
1
for a, p in zip(actual, predicted)
if a == positive_label and p == positive_label
)
fp = sum(
1
for a, p in zip(actual, predicted)
if a != positive_label and p == positive_label
)
return tp / (tp + fp) if (tp + fp) > 0 else 0.0
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precision() iterates with zip(actual, predicted), so if the two inputs differ in length the computation is silently truncated. Consider validating equal lengths (and raising ValueError) before computing TP/FP so callers can’t get an incorrect metric without noticing.

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +118
tp = sum(
1
for a, p in zip(actual, predicted)
if a == positive_label and p == positive_label
)
fn = sum(
1
for a, p in zip(actual, predicted)
if a == positive_label and p != positive_label
)
return tp / (tp + fn) if (tp + fn) > 0 else 0.0
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recall() has the same silent-truncation issue as precision() due to zip(actual, predicted). Add an explicit length check (raise ValueError) before computing TP/FN.

Copilot uses AI. Check for mistakes.
return matrix


def precision(actual: list, predicted: list, positive_label: int = 1) -> float:
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The positive_label: int = 1 type hint is overly restrictive: class labels are often strings or other hashable types, and precision/recall/f1_score work as long as positive_label is comparable to items in actual/predicted. Consider loosening the annotation (e.g., a TypeVar/Hashable) to avoid misleading API contracts and type checker errors.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +57
def precision(actual: list, predicted: list, positive_label: int = 1) -> float:
"""
Calculate precision: TP / (TP + FP).

Args:
actual: List of actual class labels.
predicted: List of predicted class labels.
positive_label: The label considered as positive class.

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precision, recall, and f1_score are implemented as binary (or one-vs-rest via positive_label) metrics, but the docstrings don’t state this and could be interpreted as multiclass-averaged metrics. Clarify the behavior in the docstrings (and optionally add a doctest example showing one-vs-rest usage for a multiclass label set).

Copilot uses AI. Check for mistakes.
@algorithms-keeper algorithms-keeper bot added the tests are failing Do not merge until tests pass label Mar 2, 2026
@Sagargupta16
Copy link
Author

@copilot open a new pull request to apply changes based on the comments in this thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting reviews This PR is ready to be reviewed tests are failing Do not merge until tests pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants