This project focuses on predicting medical insurance charges using a Linear Regression machine learning model. The model estimates insurance costs based on personal and demographic attributes.
Medical insurance costs depend on multiple factors such as age, BMI, smoking habits, and region. The objective of this project is to build a regression model that accurately predicts insurance charges using these features.
This is a supervised learning regression problem where the target variable (charges) is continuous.
The dataset used in this project is insurance.csv, which contains the following features:
- age : Age of the policyholder
- sex : Gender (male/female)
- bmi : Body Mass Index
- children : Number of dependents
- smoker : Smoking status (yes/no)
- region : Residential region (northeast, northwest, southeast, southwest)
- charges : Medical insurance cost (Target Variable)
- Python
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
- Jupyter Notebook
- Data Loading
- Data Exploration and Analysis
- Data Preprocessing
- Encoding categorical variables
- Feature selection
- Train-test split
- Model training using Linear Regression
- Model evaluation
- Prediction of insurance charges
The model performance was evaluated using:
- R² Score
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
These metrics measure how accurately the model predicts insurance charges.
-
Clone the repository:
git clone https://github.com/Divyansh1802/Insurance-Charge-Predictor.git
-
Navigate to the project folder:
cd Insurance-Charge-Predictor
-
Install required dependencies:
pip install -r requirements.txt
-
Open the Jupyter Notebook and run all cells.
Input:
- Age: 30
- Sex: Male
- BMI: 25.3
- Children: 1
- Smoker: No
- Region: Southeast
Output:
- Predicted Insurance Charges: (Model Generated Value)
- Apply advanced regression models (Random Forest, Gradient Boosting, XGBoost)
- Perform hyperparameter tuning
- Deploy the model using Flask or Streamlit
- Build a user-friendly interface for real-time predictions
Divyansh Upadhyay