Data Analyst with an MSc in Environmental Engineering, specializing in transforming real-world data into actionable business insights using SQL, Python, Excel, and Tableau. Experienced in end-to-end analytics, including data cleaning, exploratory analysis, machine learning, and dashboarding, to support data-driven decision-making across various business functions.
- SQL: Joins, Aggregations, CTEs, Window Functions
- Python: Pandas, NumPy, Matplotlib, Seaborn, Plotly, Scikit-learn
- Data Analysis: Data Cleaning, Exploratory Data Analysis (EDA)
- Visualization: Tableau, Excel (Pivot Tables, VLOOKUP/HLOOKUP), Dashboards
- Machine Learning: Classification, Regression, Feature Engineering, Model Evaluation, SHAP
- Tools: Jupyter Notebook, Git, GitHub
🔗 GitHub Repository | Tableau Dashboard | Presentation
- Tools: Python (Pandas, NumPy), Scikit-learn, CatBoost, Tableau
- Description: Developed a machine learning classification model to predict customer churn using demographic, behavioral, and transactional data.
- Key Insights:
- High churn risk is driven by low engagement, inactivity, and customer complaints
- CatBoost achieved the best performance with 95% recall and 0.90 F2-score
- SHAP analysis revealed key behavioral patterns behind customer disengagement
- Business Value: Enables early churn detection, targeted retention strategies, and more efficient marketing spend.
🔗 GitHub Repository | Tableau Dashboard | Presentation
- Tools: Python (Pandas, NumPy), Tableau
- Description: Analyzed NYC Green Taxi trip data from January 2023 to uncover demand patterns, fare drivers, and passenger behavior for operational optimization.
- Key Insights:
- Peak demand occurs during weekday commuting hours (7–9 AM, 3–6 PM)
- Manhattan accounts for over 60% of pickups, with strong zone-level concentration
- Trip distance is the primary fare driver (r ≈ 0.86), while duration has minimal impact
- Most trips are single-passenger (87%) and predominantly paid via card (65%)
- Business Value: Enables data-driven fleet allocation, targeted pricing strategies, and service optimization based on demand patterns and customer behavior.
🔗 GitHub Repository | Tableau Dashboard
- Tools: SQL, Python (Pandas, NumPy), Tableau
- Description: Conducted SQL-based analysis of 2018 Olist e-commerce data to evaluate sales trends, customer behavior, delivery efficiency, and customer satisfaction.
- Key Insights:
- Revenue peaks in April ($965K) and May ($974K), driven by higher average order value
- Revenue is concentrated among a small segment of high-value customers
- São Paulo dominates total revenue, while low-volume states show high AOV potential
- Approximately 4,900 late deliveries contribute to lower customer satisfaction
- Business Value: Supports targeted marketing, high-value customer retention, regional expansion strategies, and delivery performance optimization.
🔗 GitHub Repository | Tableau Dashboard | Presentation
- Tools: Python (Pandas, NumPy), Scikit-learn, XGBoost, Tableau
- Description: Developed and benchmarked regression models to predict apartment prices in Daegu using transaction data from 1978 to 2015, including feature engineering, preprocessing, and hyperparameter tuning.
- Key Insights:
- Apartment size, location accessibility, and nearby facilities are the strongest drivers of price
- XGBoost achieved the best performance (MAE: ₩35.1M, MAPE: 17.5%, R²: 0.803), outperforming baseline models
- Business Value: Enables data-driven pricing decisions, reducing mispricing risk by up to 36–47% and supporting faster, more accurate property sales.
- Tools: SQL, Python (Pandas, NumPy)
- Description: Analyzed HR data to uncover patterns in employee attrition, performance, satisfaction, compensation, and career progression to support data-driven retention strategies.
- Key Insights:
- Attrition rate is 16.1%, with over 90% of exits coming from employees aged 18–35
- High performers show the highest attrition rates (~27%), indicating risk of losing top talent
- Attrition is concentrated in Sales, HR, and early-career employees, with lower-salary groups contributing over 90% of total exits
- Business Value: Enables targeted retention strategies through early-career support, compensation optimization, and proactive identification of high-risk employee segments.
- Tools: SQL, Python (Pandas, NumPy)
- Description: Analyzed Boutique Hotel booking and revenue data to evaluate customer demographics, room utilization, and seasonal demand patterns for operational and pricing optimization.
- Key Insights:
- Guests aged 56+ account for ~54% of bookings, with domestic travelers (~55%) as the primary market
- Single Rooms are fully occupied (100%) while premium rooms (Suites, Family) generate higher revenue per booking but remain underutilized
- Demand peaks in June and declines in September, highlighting clear seasonal revenue opportunities
- Credit cards dominate payments (~50%), with cash still widely used (~30%)
- Business Value: Supports pricing optimization, improved room utilization, targeted marketing strategies, and data-driven seasonal revenue management.