🎬 CineMatch — Hybrid Movie Recommender System

A hybrid recommendation engine combining Content-Based Filtering and Collaborative Filtering techniques to deliver personalized movie suggestions using the IMDB Movies Dataset.

📌 Table of Contents

Overview
Dataset
Project Structure
Methodology
EDA Highlights
Recommendation Engine
Tech Stack
Getting Started
Results

🧠 Overview

CineMatch is a hybrid movie recommender system built on the IMDB Movies Dataset. It combines:

Content-Based Filtering — recommends movies similar to a given title based on genres, directors, and cast using TF-IDF vectorization and cosine similarity.
User-Based Collaborative Filtering — leverages user rating patterns to surface movies liked by similar users.

The hybrid approach addresses the limitations of each method individually — content-only systems are too narrow, while collaborative systems suffer from cold-start problems.

📂 Dataset

Property	Detail
Source	Kaggle — IMDB Movie Ratings Dataset
File	`movie_data.csv`

Features

Feature	Description
`director_name`	Director of the movie
`genres`	Pipe-separated genre tags (e.g. `Action\|Drama`)
`actor_1_name`	Primary actor
`actor_2_name`	Secondary actor
`actor_3_name`	Tertiary actor
`movie_title`	Official title
`imdb_score`	Rating on a 1–10 scale
`num_voted_users`	Number of IMDB user votes
`title_year`	Year of release
`duration`	Runtime in minutes
`language`	Primary language
`country`	Country of production

📁 Project Structure

├── Hybrid-Recommender-System.ipynb   # Main notebook
├── movie_data.csv                    # Dataset
└── README.md

🔬 Methodology

1. Data Cleaning & Preprocessing

Separated numeric and categorical features
Imputed missing numeric values with median
Imputed missing categorical values with "Unknown"
Stripped invisible characters and whitespace from movie titles
Validated data ranges and categorical consistency

2. Exploratory Data Analysis

Distribution of IMDB scores
Average score and duration trends over time
Genre popularity and average ratings by genre
Most active actors by movie count
Top 10 most-voted movies

3. Feature Engineering

Combined text-based features into a single features column:

imdb['features'] = (
    imdb['genres'] + " " +
    imdb['director_name'] + " " +
    imdb['actor_1_name'] + " " +
    imdb['actor_2_name'] + " " +
    imdb['actor_3_name']
)

4. Content-Based Filtering

Applied TF-IDF Vectorization (max_features=3000, bigrams, min_df=2)
Computed Cosine Similarity matrix across all movies
Retrieved top-N most similar movies for any given title

📊 EDA Highlights

🎭 Drama is the most common genre, followed by Comedy and Thriller
⭐ IMDB scores are right-skewed — most movies cluster between 6.0 – 7.5
🎬 Robert De Niro and Nicolas Cage lead in movie appearances
📈 Average movie ratings slightly declined post-2000
🔗 Engagement metrics (votes, reviews) are strongly correlated with each other but only moderately with IMDB score — popularity ≠ quality

🎯 Recommendation Engine

get_recommendations("Avatar")

Output:

Recommended movies similar to Avatar:
1. Aliens
2. Titanic
3. Guardians of the Galaxy
4. Star Trek Into Darkness
...

The function:

Looks up the movie index
Retrieves its cosine similarity scores against all other movies
Returns the top 10 most similar titles (excluding itself)

🛠 Tech Stack

Library	Purpose
`pandas`	Data loading, cleaning, transformation
`scikit-learn`	TF-IDF vectorization, cosine similarity
`matplotlib`	Plotting
`seaborn`	Statistical visualizations

✅ Results

Method	Strength
Content-Based	Strong at finding stylistically similar films (same director, cast, genre)
Collaborative	Captures user taste patterns beyond surface metadata
Hybrid	Balances both — broader coverage, better cold-start handling

📬 Contact

Have suggestions or want to collaborate? Feel free to open an issue or reach out!

Built with ❤️ using Python & IMDB data

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
CineMatch - Collaborative_Filtering.ipynb		CineMatch - Collaborative_Filtering.ipynb
CineMatch - VADER Sentiment Analysis and Cosine Similarity.ipynb		CineMatch - VADER Sentiment Analysis and Cosine Similarity.ipynb
Content-Based-Recommender-System.ipynb		Content-Based-Recommender-System.ipynb
README.md		README.md
push.sh		push.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 CineMatch — Hybrid Movie Recommender System

📌 Table of Contents

🧠 Overview

📂 Dataset

Features

📁 Project Structure

🔬 Methodology

1. Data Cleaning & Preprocessing

2. Exploratory Data Analysis

3. Feature Engineering

4. Content-Based Filtering

📊 EDA Highlights

🎯 Recommendation Engine

🛠 Tech Stack

✅ Results

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎬 CineMatch — Hybrid Movie Recommender System

📌 Table of Contents

🧠 Overview

📂 Dataset

Features

📁 Project Structure

🔬 Methodology

1. Data Cleaning & Preprocessing

2. Exploratory Data Analysis

3. Feature Engineering

4. Content-Based Filtering

📊 EDA Highlights

🎯 Recommendation Engine

🛠 Tech Stack

✅ Results

📬 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages