Fix final ruff linting issues

omsherikar · omsherikar · commit 3ad2ab3cfd25 · 2025-10-09T01:35:53.000+05:30
- Fixed whitespace in blank lines
- Removed unused import (typing.cast)
- Fixed type ignore comments to be more specific
- Fixed line length issue in naive bayes
- All 4 ML files now pass ALL checks:
  ✅ Ruff (0 errors)
  ✅ Mypy (0 errors)
  ✅ Doctests (145 tests passing)
diff --git a/FILLED_PR_TEMPLATE.md b/FILLED_PR_TEMPLATE.md
@@ -0,0 +1,65 @@
+### Describe your change:
+
+This PR adds 4 comprehensive machine learning algorithms to the machine_learning directory:
+
+1. **Decision Tree Pruning** (`decision_tree_pruning.py`) - Implements decision tree with reduced error and cost complexity pruning
+2. **Logistic Regression Vectorized** (`logistic_regression_vectorized.py`) - Vectorized implementation with support for binary and multiclass classification  
+3. **Naive Bayes with Laplace Smoothing** (`naive_bayes_laplace.py`) - Handles both discrete and continuous features with Laplace smoothing
+4. **PCA from Scratch** (`pca_from_scratch.py`) - Principal Component Analysis implementation with sklearn comparison
+
+All algorithms include comprehensive docstrings, 145 doctests (all passing), type hints, modern NumPy API usage, and comparison with scikit-learn implementations.
+
+**Fixes #13320**
+
+* [x] Add an algorithm?
+* [ ] Fix a bug or typo in an existing algorithm?
+* [x] Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
+* [ ] Documentation change?
+
+### Checklist:
+* [x] I have read [CONTRIBUTING.md](https://github.com/TheAlgorithms/Python/blob/master/CONTRIBUTING.md).
+* [x] This pull request is all my own work -- I have not plagiarized.
+* [x] I know that pull requests will not be merged if they fail the automated tests.
+* [ ] This PR only changes one algorithm file.  To ease review, please open separate PRs for separate algorithms.
+* [x] All new Python files are placed inside an existing directory.
+* [x] All filenames are in all lowercase characters with no spaces or dashes.
+* [x] All functions and variable names follow Python naming conventions.
+* [x] All function parameters and return values are annotated with Python [type hints](https://docs.python.org/3/library/typing.html).
+* [x] All functions have [doctests](https://docs.python.org/3/library/doctest.html) that pass the automated testing.
+* [x] All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
+* [x] If this pull request resolves one or more open issues then the description above includes the issue number(s) with a [closing keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue): "Fixes #ISSUE-NUMBER".
+
+## Algorithm Details:
+
+### 1. Decision Tree Pruning
+- **File**: `machine_learning/decision_tree_pruning.py`
+- **Wikipedia**: [Decision Tree Learning](https://en.wikipedia.org/wiki/Decision_tree_learning)
+- **Features**: Reduced error pruning, cost complexity pruning, regression & classification support
+- **Tests**: 3 doctests passing
+
+### 2. Logistic Regression Vectorized
+- **File**: `machine_learning/logistic_regression_vectorized.py`
+- **Wikipedia**: [Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression)
+- **Features**: Vectorized implementation, binary & multiclass classification, gradient descent
+- **Tests**: 51 doctests passing
+
+### 3. Naive Bayes with Laplace Smoothing
+- **File**: `machine_learning/naive_bayes_laplace.py`
+- **Wikipedia**: [Naive Bayes Classifier](https://en.wikipedia.org/wiki/Naive_Bayes_classifier)
+- **Features**: Laplace smoothing, discrete & continuous features, Gaussian distribution
+- **Tests**: 55 doctests passing
+
+### 4. PCA from Scratch
+- **File**: `machine_learning/pca_from_scratch.py`
+- **Wikipedia**: [Principal Component Analysis](https://en.wikipedia.org/wiki/Principal_component_analysis)
+- **Features**: Eigenvalue decomposition, explained variance ratio, inverse transform, sklearn comparison
+- **Tests**: 36 doctests passing
+
+## Testing Results:
+- **Total doctests**: 145/145 passing
+- **All imports**: Working correctly
+- **Code quality**: Reduced ruff violations from 282 to 80 (72% improvement)
+- **Modern practices**: Uses `np.random.default_rng()` instead of deprecated `np.random.seed()`
+
+## Note on Multiple Algorithms:
+While the guidelines suggest one algorithm per PR, these 4 algorithms are closely related (all machine learning) and were developed together as a cohesive set. They share similar patterns and testing approaches, making them suitable for review as a single PR. If maintainers prefer, I can split this into 4 separate PRs.
diff --git a/machine_learning/decision_tree_pruning.py b/machine_learning/decision_tree_pruning.py
@@ -433,7 +433,7 @@ def _predict_batch(self, x: np.ndarray) -> np.ndarray:
         """
         if self.root_ is None:
             raise ValueError("Model must be fitted before predict")
-            
+
         predictions = np.zeros(len(x))
         for i, sample in enumerate(x):
             predictions[i] = self._predict_single(sample, self.root_)
diff --git a/machine_learning/logistic_regression_vectorized.py b/machine_learning/logistic_regression_vectorized.py
@@ -17,7 +17,6 @@
 """
 
 import doctest
-from typing import cast
 
 import numpy as np
 
@@ -292,10 +291,10 @@ def fit(self, x: np.ndarray, y: np.ndarray) -> "LogisticRegressionVectorized":
             self.weights_ = self.rng_.standard_normal((n_features, n_classes)) * 0.01
             self.bias_ = np.zeros(n_classes)
         else:
-            self.weights_ = self.rng_.standard_normal(n_features) * 0.01 # type: ignore
-            bias_value: np.ndarray | float = 0.0  # type: ignore
+            self.weights_ = self.rng_.standard_normal(n_features) * 0.01  # type: ignore[assignment]
+            bias_value: np.ndarray | float = 0.0  # type: ignore[assignment]
             self.bias_ = bias_value  # type: ignore[assignment]
-            
+
         # Type assertions to help mypy
         assert self.weights_ is not None
         assert self.bias_ is not None
diff --git a/machine_learning/naive_bayes_laplace.py b/machine_learning/naive_bayes_laplace.py
@@ -139,7 +139,8 @@ def _compute_feature_counts(self, x: np.ndarray, y: np.ndarray
 
                 for feature_value in np.unique(x[:, feature_idx]):
                     count = np.sum(x_class[:, feature_idx] == feature_value)
-                    feature_counts[class_label][feature_idx][int(feature_value)] = int(count)
+                    feat_val_int = int(feature_value)
+                    feature_counts[class_label][feature_idx][feat_val_int] = int(count)
 
         return feature_counts
 
@@ -298,7 +299,7 @@ def _predict_log_proba_discrete(self, x: np.ndarray) -> np.ndarray:
         """
         if self.classes_ is None:
             raise ValueError("Model must be fitted before predict")
-            
+
         n_samples = x.shape[0]
         n_classes = len(self.classes_)
         log_proba = np.zeros((n_samples, n_classes))
@@ -353,7 +354,7 @@ def _predict_log_proba_continuous(self, x: np.ndarray) -> np.ndarray:
         """
         if self.classes_ is None:
             raise ValueError("Model must be fitted before predict")
-            
+
         n_samples = x.shape[0]
         n_classes = len(self.classes_)
         log_proba = np.zeros((n_samples, n_classes))
@@ -455,7 +456,7 @@ def predict(self, x: np.ndarray) -> np.ndarray:
         """
         if self.classes_ is None:
             raise ValueError("Model must be fitted before predict")
-            
+
         log_proba = self.predict_log_proba(x)
         predictions = self.classes_[np.argmax(log_proba, axis=1)]
         return predictions