This project applies machine learning techniques to detect fraudulent credit card transactions using a publicly available dataset from Kaggle.
Download the full report (PDF)
- Source: Kaggle (ULB Machine Learning Group)
- Contains anonymized transaction data
- Highly imbalanced dataset with very few fraudulent transactions
- Exploratory Data Analysis (EDA)
- Feature selection and scaling
- Logistic Regression
- Random Forest
Models were evaluated using:
- Confusion Matrix
- Precision
- Recall
- F1-score
- Logistic Regression performed well overall but missed some fraudulent transactions
- Random Forest achieved better performance, particularly in recall and F1-score
- Random Forest was more effective at detecting fraud
- Class imbalance significantly impacts model evaluation
- Accuracy alone is not a reliable metric
- Recall is critical for fraud detection problems
eda.ipynb– Data analysis and modelingcapstone.png– Model comparison results- LaTeX report – Final project report
James D. Pinkston
For environment setup and development workflow instructions, see: SETUP.md