This project performs data cleaning, exploration, and visualization on the Netflix titles dataset (Netflix.json).
It uses Python 3, Pandas, Matplotlib, Seaborn, and Plotly within Google Colab to uncover insights such as:
- Year-wise growth of movies and TV shows on Netflix
- Country-wise content distribution
- Genre popularity trends over time
- Movie duration patterns and release-year analysis
- Clean and preprocess the raw JSON dataset
- Handle missing values & duplicates
- Normalize duration, date, and genre information
- Perform Exploratory Data Analysis (EDA)
- Generate interactive visualizations for deeper insights
| Category | Tools / Libraries |
|---|---|
| Language | Python 3 |
| Data Handling | Pandas, NumPy |
| Visualization | Matplotlib, Seaborn, Plotly |
| Environment | Google Colab |
| Version Control | Git + GitHub |
- Loaded the Netflix dataset (
Netflix.json) into a Pandas DataFrame - Standardized column names and trimmed whitespaces
- Parsed date fields (e.g.,
date_added) intodatetimeobjects - Normalized duration (minutes / seasons)
- Split multi-valued columns (
cast,country,listed_in) into lists - Handled missing values using imputation / placeholder logic
- Removed duplicates based on
show_idandtitle + release_year - Exported a cleaned dataset (
netflix_cleaned.csv) for further analysis
Key analyses performed:
- Content type distribution (Movie vs TV Show)
- Rating distribution across types
- Country and genre frequency
- Correlation heatmap between numeric features
- 3D scatter plots using Plotly (Release Year × Duration × Type)
- Genre-over-time heatmap
Below are a few examples of generated plots (replace with your screenshots):
| Visualization | Preview |
|---|---|
| Bar Chart | ![]() |
| Bar Chart | ![]() |
| Histogram | ![]() |
| Boxplot | ![]() |
| Interactive 3D render | ![]() |
| Numeric_Heatmap | ![]() |
| Group_BarChart | ![]() |
🖼️ To insert screenshots, save them in a
figures/folder and reference them with relative paths as shown above.
Click the badge below to launch instantly in Colab:
git clone https://github.com/yourusername/netflix-eda.git
cd netflix-eda
pip install -r requirements.txt
jupyter notebook notebooks/netflix_eda.ipynb
📈 Key Insights
📅 Sharp rise in content added after 2015
🎞️ Movies form ~70% of total titles
🌍 US, India, UK top content producers
🎭 Drama & Comedy most common genres
⏱️ Average movie duration ≈ 100 minutes
🧠 Learnings
Hands-on data wrangling and cleaning in Pandas
Feature engineering (date, duration, multi-value columns)
Interactive EDA with Plotly
GitHub project structuring and documentation
🪪 License
This project is licensed under the MIT License.
You are free to use, modify, and share with attribution.





