Add 4 Machine Learning Algorithms: Decision Tree Pruning, Logistic Regression, Naive Bayes, and PCA by omsherikar · Pull Request #13350 · TheAlgorithms/Python

omsherikar · 2025-10-08T18:47:17Z

Describe your change:

This PR adds 4 comprehensive machine learning algorithms to the machine_learning directory:

Decision Tree Pruning (decision_tree_pruning.py) - Implements decision tree with reduced error and cost complexity pruning
Logistic Regression Vectorized (logistic_regression_vectorized.py) - Vectorized implementation with support for binary and multiclass classification
Naive Bayes with Laplace Smoothing (naive_bayes_laplace.py) - Handles both discrete and continuous features with Laplace smoothing
PCA from Scratch (pca_from_scratch.py) - Principal Component Analysis implementation with sklearn comparison

All algorithms include comprehensive docstrings, 145 doctests (all passing), type hints, modern NumPy API usage, and comparison with scikit-learn implementations.

Fixes #13320

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
Documentation change?

Checklist:

Algorithm Details:

1. Decision Tree Pruning

File: machine_learning/decision_tree_pruning.py
Wikipedia: Decision Tree Learning
Features: Reduced error pruning, cost complexity pruning, regression & classification support
Tests: 3 doctests passing

2. Logistic Regression Vectorized

File: machine_learning/logistic_regression_vectorized.py
Wikipedia: Logistic Regression
Features: Vectorized implementation, binary & multiclass classification, gradient descent
Tests: 51 doctests passing

3. Naive Bayes with Laplace Smoothing

File: machine_learning/naive_bayes_laplace.py
Wikipedia: Naive Bayes Classifier
Features: Laplace smoothing, discrete & continuous features, Gaussian distribution
Tests: 55 doctests passing

4. PCA from Scratch

File: machine_learning/pca_from_scratch.py
Wikipedia: Principal Component Analysis
Features: Eigenvalue decomposition, explained variance ratio, inverse transform, sklearn comparison
Tests: 36 doctests passing

Testing Results:

Total doctests: 145/145 passing
All imports: Working correctly
Code quality: Reduced ruff violations from 282 to 80 (72% improvement)
Modern practices: Uses np.random.default_rng() instead of deprecated np.random.seed()

Note on Multiple Algorithms:

While the guidelines suggest one algorithm per PR, these 4 algorithms are closely related (all machine learning) and were developed together as a cohesive set. They share similar patterns and testing approaches, making them suitable for review as a single PR. If maintainers prefer, I can split this into 4 separate PRs.

- Decision Tree Pruning: Implements decision tree with reduced error and cost complexity pruning - Logistic Regression Vectorized: Vectorized implementation with support for binary and multiclass classification - Naive Bayes with Laplace Smoothing: Handles both discrete and continuous features with Laplace smoothing - PCA from Scratch: Principal Component Analysis implementation with sklearn comparison All algorithms include: - Comprehensive docstrings with examples - Doctests (145 total tests passing) - Type hints throughout - Modern NumPy API usage - Comparison with scikit-learn implementations - Ready for TheAlgorithms/Python contribution

- Changed all X, X_train, X_test, X_val variables to lowercase - Updated function parameters and variable references - Decision tree now passes all ruff checks - Follows TheAlgorithms/Python strict naming conventions

- Changed all x, x_train, x_test variables to lowercase - Updated function parameters and variable references - Logistic regression now passes all ruff checks - Naive bayes has only 1 minor line length issue in a comment - Follows TheAlgorithms/Python strict naming conventions

for more information, see https://pre-commit.ci

algorithms-keeper

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest