This project demonstrates a complete end-to-end data analytics workflow, transforming raw, unstructured employee data into a clean, analysis-ready dataset and building an interactive HR dashboard.
It showcases skills in:
- Data cleaning & preprocessing
- Data transformation using Python
- Data modeling & visualization
- Business insight generation
To simulate a real-world business scenario where messy HR data is cleaned and transformed into meaningful insights that support decision-making.
- Source:
messy_employee_data.xlsx - Sheet Used:
Employee_Data_RAW
The dataset initially contained:
- Inconsistent column names
- Missing values
- Duplicate records
- Incorrect data formats
- Invalid entries (emails, phone numbers, etc.)
- Removed extra spaces
- Converted all column names to lowercase
- Replaced spaces with underscores
- Renamed inconsistent columns
- Dropped completely empty rows
- Removed rows containing placeholder values (
---) - Removed duplicate employee records
- Removed prefixes (
EMP-) - Converted to consistent format
- Trimmed spaces
- Standardized capitalization
- Converted to lowercase
- Removed invalid entries
- Extracted digits only
- Standardized format:
XXX-XXX-XXXX
- Removed symbols (
$,,) - Converted to numeric format
- Converted to standard date format (
YYYY-MM-DD)
- Standardized naming (title case)
- Cleaned city names (e.g., "nyc" β "New York")
- Standardized country names (e.g., "us", "usa" β "USA")
- Ensured values between 0β100
- Handled missing and invalid values
- Defaulted to 0 where necessary
- Valid range enforced (18β75)
-
Standardized categories:
- Male
- Female
- Non-Binary
- Other
- Not Specified
-
Cleaned into:
- Active
- Inactive
- On Leave
- Terminated
- Unknown
- Cleaned dataset saved as:
cleaned_employee_data.xlsx
The cleaned dataset was used to build an interactive HR Analytics Dashboard with the following insights:
- Total Employees
- Average Salary
- Average Performance Score
- Average Bonus Percentage
- Hiring trends over time
- Department-wise employee distribution
- Salary vs performance analysis
- Top job roles by employee count
- Gender distribution
- Geographic employee distribution
- Filters (Department, Gender, Country, Status)
- Drill-down capabilities
- Dynamic visuals responding to user input
- Identification of top-performing departments
- Salary distribution patterns across roles
- Workforce composition by region and gender
- Relationship between salary and performance
- Hiring trends over time
-
Python
- pandas
- re (regex)
-
Excel
-
Power BI
pip install pandas openpyxlpython your_script_name.py- Clean dataset will be generated automatically
This project demonstrates the ability to:
- Handle real-world messy datasets
- Apply data cleaning best practices
- Transform raw data into business insights
- Build professional dashboards
- Add predictive analytics (employee attrition)
- Integrate real-time data sources
- Enhance dashboard storytelling
- Deploy dashboard to cloud platforms
If you're a recruiter or collaborator interested in this project, feel free to connect!
β If you found this project valuable, consider giving it a star!
