A machine learning-powered clinical decision support system that predicts breast cancer malignancy based on cell nuclei measurements. Built with Python, Scikit-Learn, and Streamlit.
(Add a screenshot of your beautiful UI here after you run it!)
Early diagnosis is the most critical factor in breast cancer survival rates. OncoPredict AI is a bridge between raw medical data and actionable clinical insights.
Instead of leaving the machine learning model inside a Jupyter Notebook, I deployed it as an interactive web application. This tool allows medical professionals (or users) to input specific cytology features—such as radius, texture, and smoothness—and receive an instant, probability-based prediction on whether a tumor is Benign or Malignant.
The goal was to build a system that is not only accurate but also visually accessible, moving away from complex code interfaces to a clean, professional dashboard.
This project follows a rigorous Data Science lifecycle:
- Data Ingestion: Utilizes the Wisconsin Breast Cancer Diagnostic (WBCD) dataset.
- Preprocessing: * Cleaned the data by removing irrelevant IDs and empty columns.
- Mapped categorical diagnosis (
M/B) to binary targets (1/0).
- Mapped categorical diagnosis (
- Scaling: Implemented a
StandardScalerto normalize feature values (e.g., ensuring 'Area' doesn't dominate 'Smoothness' just because the numbers are larger). - Modeling: Trained a Logistic Regression classifier, optimized for binary classification tasks in medical contexts.
- Deployment: Serialized the model and scaler using
Pickleand built a frontend with Streamlit.
- Dynamic Input System: The app automatically reads the dataset features and generates the appropriate sliders, making the code adaptable to different datasets.
- Real-Time Probability Score: Doesn't just say "Cancer" or "Safe"; it provides a confidence percentage (e.g., "98.5% Confidence").
- Professional UI: Custom CSS implementation for a "Glassmorphism" look, featuring clean typography and specific color coding (Green for Benign, Red for Malignant).
- Stateful Persistence: Uses
pickleto load trained assets instantly without retraining on every reload.
- Language: Python 3.10+
- Frontend: Streamlit (with Custom CSS)
- ML Libraries: Scikit-Learn, Pandas, NumPy
- Serialization: Pickle
1. Clone the Repository
git clone [https://github.com/yourusername/oncopredict-ai.git](https://github.com/yourusername/oncopredict-ai.git)
cd oncopredict-ai