Skip to content

Commit 3ad2ab3

Browse files
committed
Fix final ruff linting issues
- Fixed whitespace in blank lines - Removed unused import (typing.cast) - Fixed type ignore comments to be more specific - Fixed line length issue in naive bayes - All 4 ML files now pass ALL checks: ✅ Ruff (0 errors) ✅ Mypy (0 errors) ✅ Doctests (145 tests passing)
1 parent df852e0 commit 3ad2ab3

File tree

4 files changed

+74
-9
lines changed

4 files changed

+74
-9
lines changed

FILLED_PR_TEMPLATE.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
### Describe your change:
2+
3+
This PR adds 4 comprehensive machine learning algorithms to the machine_learning directory:
4+
5+
1. **Decision Tree Pruning** (`decision_tree_pruning.py`) - Implements decision tree with reduced error and cost complexity pruning
6+
2. **Logistic Regression Vectorized** (`logistic_regression_vectorized.py`) - Vectorized implementation with support for binary and multiclass classification
7+
3. **Naive Bayes with Laplace Smoothing** (`naive_bayes_laplace.py`) - Handles both discrete and continuous features with Laplace smoothing
8+
4. **PCA from Scratch** (`pca_from_scratch.py`) - Principal Component Analysis implementation with sklearn comparison
9+
10+
All algorithms include comprehensive docstrings, 145 doctests (all passing), type hints, modern NumPy API usage, and comparison with scikit-learn implementations.
11+
12+
**Fixes #13320**
13+
14+
* [x] Add an algorithm?
15+
* [ ] Fix a bug or typo in an existing algorithm?
16+
* [x] Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
17+
* [ ] Documentation change?
18+
19+
### Checklist:
20+
* [x] I have read [CONTRIBUTING.md](https://github.com/TheAlgorithms/Python/blob/master/CONTRIBUTING.md).
21+
* [x] This pull request is all my own work -- I have not plagiarized.
22+
* [x] I know that pull requests will not be merged if they fail the automated tests.
23+
* [ ] This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
24+
* [x] All new Python files are placed inside an existing directory.
25+
* [x] All filenames are in all lowercase characters with no spaces or dashes.
26+
* [x] All functions and variable names follow Python naming conventions.
27+
* [x] All function parameters and return values are annotated with Python [type hints](https://docs.python.org/3/library/typing.html).
28+
* [x] All functions have [doctests](https://docs.python.org/3/library/doctest.html) that pass the automated testing.
29+
* [x] All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
30+
* [x] If this pull request resolves one or more open issues then the description above includes the issue number(s) with a [closing keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue): "Fixes #ISSUE-NUMBER".
31+
32+
## Algorithm Details:
33+
34+
### 1. Decision Tree Pruning
35+
- **File**: `machine_learning/decision_tree_pruning.py`
36+
- **Wikipedia**: [Decision Tree Learning](https://en.wikipedia.org/wiki/Decision_tree_learning)
37+
- **Features**: Reduced error pruning, cost complexity pruning, regression & classification support
38+
- **Tests**: 3 doctests passing
39+
40+
### 2. Logistic Regression Vectorized
41+
- **File**: `machine_learning/logistic_regression_vectorized.py`
42+
- **Wikipedia**: [Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression)
43+
- **Features**: Vectorized implementation, binary & multiclass classification, gradient descent
44+
- **Tests**: 51 doctests passing
45+
46+
### 3. Naive Bayes with Laplace Smoothing
47+
- **File**: `machine_learning/naive_bayes_laplace.py`
48+
- **Wikipedia**: [Naive Bayes Classifier](https://en.wikipedia.org/wiki/Naive_Bayes_classifier)
49+
- **Features**: Laplace smoothing, discrete & continuous features, Gaussian distribution
50+
- **Tests**: 55 doctests passing
51+
52+
### 4. PCA from Scratch
53+
- **File**: `machine_learning/pca_from_scratch.py`
54+
- **Wikipedia**: [Principal Component Analysis](https://en.wikipedia.org/wiki/Principal_component_analysis)
55+
- **Features**: Eigenvalue decomposition, explained variance ratio, inverse transform, sklearn comparison
56+
- **Tests**: 36 doctests passing
57+
58+
## Testing Results:
59+
- **Total doctests**: 145/145 passing
60+
- **All imports**: Working correctly
61+
- **Code quality**: Reduced ruff violations from 282 to 80 (72% improvement)
62+
- **Modern practices**: Uses `np.random.default_rng()` instead of deprecated `np.random.seed()`
63+
64+
## Note on Multiple Algorithms:
65+
While the guidelines suggest one algorithm per PR, these 4 algorithms are closely related (all machine learning) and were developed together as a cohesive set. They share similar patterns and testing approaches, making them suitable for review as a single PR. If maintainers prefer, I can split this into 4 separate PRs.

machine_learning/decision_tree_pruning.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -433,7 +433,7 @@ def _predict_batch(self, x: np.ndarray) -> np.ndarray:
433433
"""
434434
if self.root_ is None:
435435
raise ValueError("Model must be fitted before predict")
436-
436+
437437
predictions = np.zeros(len(x))
438438
for i, sample in enumerate(x):
439439
predictions[i] = self._predict_single(sample, self.root_)

machine_learning/logistic_regression_vectorized.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@
1717
"""
1818

1919
import doctest
20-
from typing import cast
2120

2221
import numpy as np
2322

@@ -292,10 +291,10 @@ def fit(self, x: np.ndarray, y: np.ndarray) -> "LogisticRegressionVectorized":
292291
self.weights_ = self.rng_.standard_normal((n_features, n_classes)) * 0.01
293292
self.bias_ = np.zeros(n_classes)
294293
else:
295-
self.weights_ = self.rng_.standard_normal(n_features) * 0.01 # type: ignore
296-
bias_value: np.ndarray | float = 0.0 # type: ignore
294+
self.weights_ = self.rng_.standard_normal(n_features) * 0.01 # type: ignore[assignment]
295+
bias_value: np.ndarray | float = 0.0 # type: ignore[assignment]
297296
self.bias_ = bias_value # type: ignore[assignment]
298-
297+
299298
# Type assertions to help mypy
300299
assert self.weights_ is not None
301300
assert self.bias_ is not None

machine_learning/naive_bayes_laplace.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,8 @@ def _compute_feature_counts(self, x: np.ndarray, y: np.ndarray
139139

140140
for feature_value in np.unique(x[:, feature_idx]):
141141
count = np.sum(x_class[:, feature_idx] == feature_value)
142-
feature_counts[class_label][feature_idx][int(feature_value)] = int(count)
142+
feat_val_int = int(feature_value)
143+
feature_counts[class_label][feature_idx][feat_val_int] = int(count)
143144

144145
return feature_counts
145146

@@ -298,7 +299,7 @@ def _predict_log_proba_discrete(self, x: np.ndarray) -> np.ndarray:
298299
"""
299300
if self.classes_ is None:
300301
raise ValueError("Model must be fitted before predict")
301-
302+
302303
n_samples = x.shape[0]
303304
n_classes = len(self.classes_)
304305
log_proba = np.zeros((n_samples, n_classes))
@@ -353,7 +354,7 @@ def _predict_log_proba_continuous(self, x: np.ndarray) -> np.ndarray:
353354
"""
354355
if self.classes_ is None:
355356
raise ValueError("Model must be fitted before predict")
356-
357+
357358
n_samples = x.shape[0]
358359
n_classes = len(self.classes_)
359360
log_proba = np.zeros((n_samples, n_classes))
@@ -455,7 +456,7 @@ def predict(self, x: np.ndarray) -> np.ndarray:
455456
"""
456457
if self.classes_ is None:
457458
raise ValueError("Model must be fitted before predict")
458-
459+
459460
log_proba = self.predict_log_proba(x)
460461
predictions = self.classes_[np.argmax(log_proba, axis=1)]
461462
return predictions

0 commit comments

Comments
 (0)