Data Science Projects

Welcome to the Data Science Projects repository. This repository contains various data data science projects implemented using various technologies and models.

Project Overview

Automotive Fuel Economy

Description: In this analysis, aim is to build a linear regression model to predict the fuel efficiency (measured in miles per gallon, mpg) of different vehicles based on various features such as the number of cylinders, displacement, horsepower, weight, acceleration, model year, and origin.
Learning Type: Supervised Learning
Technologies Used: python, pandas, seaborn, matplotlib, scikit-learn, statsmodels, scipi
Algorithms/Models Used: Linear Regression, Ridge Regression

Bank Customer Classification

Description: This project builds classification models to predict whether a retail bank customer is likely to churn. It explores demographic, behavioural, and financial features using logistic regression and random forest models. Feature engineering, threshold tuning, and model evaluation are used to identify high-risk customers and support retention strategies.
Learning Type: Supervised Learning
Technologies Used: python, pandas, numpy, seaborn, matplotlib, scikit-learn
Algorithms/Models Used: Logistic Regression, Random Forest, GridSearchCV, ROC Curve Analysis, Precision-Recall Threshold Tuning

Bank Customer Segmentation

Description: This project applies clustering techniques to bank customer data to uncover distinct behavioural segments and assess their relationship with churn. Using features such as credit score, account balance, number of products, and activity status, the analysis identifies four meaningful customer profiles (e.g., “Wealthy Light Users” and “Rapid Multi-Product Adopters”). The churn rate is then evaluated across these segments to support targeted retention and marketing strategies.
Learning Type: Unsupervised Learning (with supervised follow-up analysis on churn labels)
Technologies Used: python, pandas, numpy, seaborn, matplotlib, scikit-learn
Algorithms/Models Used: K-Means Clustering, StandardScaler, Silhouette Analysis, Data Visualization (heatmaps, elbow plots)

Breast Cancer Diagnosis Using k-NN

Description: This project focuses on developing a machine learning model to predict whether a breast tumor is benign or malignant using the k-Nearest Neighbors (k-NN) algorithm, leveraging a series of preprocessing techniques, including feature scaling, dimensionality reduction through PCA, and addressing class imbalance with SMOTE.
Learning Type: Supervised Learning
Technologies Used: python, pandas, numpy, seaborn, matplotlib, scikit-learn, imbalanced-learn
Algorithms/Models Used: k-Nearest Neighbors (k-NN) Classification, Principal Component Analysis (PCA), Synthetic Minority Over-sampling Technique (SMOTE)

Increasing Employee Retention

Description: This project aims to perform clustering on employee data to identify distinct groups within the workforce. The analysis includes exploring the relationships between these clusters and employee attrition rates, providing insights into factors contributing to employee turnover. Key features analyzed include gender, job satisfaction, commute distance, performance, and department affiliation. The project also employs Principal Component Analysis (PCA) for dimensionality reduction and visualization of the clusters.
Learning Type: Unsupervised Learning
Technologies Used: python, pandas, numpy, seaborn, matplotlib, scikit-learn
Algorithms/Models Used: K-Means Clustering, Principal Component Analysis (PCA)

Predicting Car Purchase Decisions

Description: This project explores the factors influencing car purchase decisions. By leveraging data science techniques, particularly decision tree models, the project aims to predict whether a client will purchase a car based on demographic and financial information such as age, gender, and annual salary.
Learning Type: Supervised Learning
Technologies Used: python, pandas, numpy, seaborn, matplotlib, scikit-learn
Algorithms/Models Used: Decision Tree, GridSearchCV

Uncovering Global Socio-Economic Outliers

Description: This project focuses on identifying countries with unique or anomalous socio-economic conditions using various unsupervised anomaly detection algorithms. Through the use of K-Means clustering, DBSCAN, Isolation Forest, and One-Class SVM, the project seeks to uncover global outliers that may indicate countries facing extreme challenges or irregular patterns.
Learning Type: Unsupervised Learning
Technologies Used: python, pandas, numpy, seaborn, matplotlib, scikit-learn
Algorithms/Models Used: K-Means, DBSCAN, Isolation Forest, One-Class SVM

Usage

Each project folder contains specific instructions on how to run the scripts and notebooks. Refer to the README file within each project folder for detailed usage guidelines.

License

This repository is licensed under the MIT License. See the LICENSE file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Projects

Project Overview

Automotive Fuel Economy

Bank Customer Classification

Bank Customer Segmentation

Breast Cancer Diagnosis Using k-NN

Increasing Employee Retention

Predicting Car Purchase Decisions

Uncovering Global Socio-Economic Outliers

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Automotive Fuel Economy		Automotive Fuel Economy
Bank Customer Classification		Bank Customer Classification
Bank Customer Segmentation		Bank Customer Segmentation
Breast Cancer Diagnosis Using k-NN		Breast Cancer Diagnosis Using k-NN
Increasing Employee Retention		Increasing Employee Retention
Predicting Car Purchase Decisions		Predicting Car Purchase Decisions
Uncovering Global Socio-Economic Outliers		Uncovering Global Socio-Economic Outliers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Data Science Projects

Project Overview

Automotive Fuel Economy

Bank Customer Classification

Bank Customer Segmentation

Breast Cancer Diagnosis Using k-NN

Increasing Employee Retention

Predicting Car Purchase Decisions

Uncovering Global Socio-Economic Outliers

Usage

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages