This project analyzes housing price trends across European countries using the House Price Index (HPI) dataset. It follows a complete data analytics workflow, transforming raw data into meaningful insights through data cleaning, database storage, SQL analysis, and visualization.
The project demonstrates an end-to-end data pipeline:
Raw Data → Data Cleaning (Python) → Database (MySQL) → SQL Analysis → Visualization
-
Housing Market Growth Which European countries experienced the largest increase in housing prices since 2015?
-
Market Trend Over Time How has the average housing price index evolved across Europe from 2015 to 2024?
-
Top Housing Markets Which countries recorded the highest housing price index values in 2024?
-
Consistent Growth: Housing prices increased steadily across most European countries after the 2015 baseline, with Hungary leading with a wide margin.
-
Post-2020 Acceleration: Many countries experienced faster growth in housing prices after 2020 with the average price of housing at 178euros in 2024.
-
Market Differences: Some countries show significantly higher price levels and growth compared to others as at 2024.
The cleaned dataset is stored in a MySQL database: housing_project.
| Column | Type |
|---|---|
| country | VARCHAR(50) |
| year | INT |
| house_price_index | FLOAT |
This structure allows analysis of housing price trends across countries and years using SQL aggregation.
The original dataset was provided in a wide format (years as columns). It was transformed into a long format suitable for analysis.
- Load dataset using Pandas
- Fix encoding issues (
cp1252) - Remove unnecessary columns (e.g., unnamed columns)
- Filter out aggregated regions (e.g., EU totals)
- Reshape data using
pandas.melt()(wide → long) - Clean and convert numeric values
- Export final dataset as
house_prices_clean.csv
The following queries were used to answer the research questions:
SELECT
country,
MIN(house_price_index) AS starting_price,
MAX(house_price_index) AS latest_price
FROM house_prices_clean
GROUP BY country
ORDER BY latest_price DESC;SELECT
year,
AVG(house_price_index) AS average_price
FROM house_prices_clean
GROUP BY year
ORDER BY year;SELECT
country,
house_price_index
FROM house_prices_clean
WHERE year = 2024
ORDER BY house_price_index DESC;Using Python (Matplotlib), the following visualizations were created:
- Housing price growth by country (bar chart)
- Average price trend over time (line chart)
- Highest housing prices in 2024 (ranking chart)
housing-price-analysis/
│
├── data/
│ ├── raw/
│ └── cleaned/
│
├── notebooks/
│ ├── data_cleaning.ipynb
│ └── analysis_visualization.ipynb
│
├── sql/
│ ├── create_tables.sql
│ └── analysis_queries.sql
│
├── visuals/
│
├── presentation/
│
├── docs/
│ └── trello_board.pdf
│
├── requirements.txt
└── README.md
Run:
data_cleaning.ipynb
This generates:
house_prices_clean.csv
Run:
sql/create_tables.sql
Use MySQL Workbench → Table Data Import Wizard Import:
house_prices_clean.csv
Execute:
sql/analysis_queries.sql
Run:
analysis_visualization.ipynb
- Python (Pandas, Matplotlib, Seaborn, )
- MySQL
- Jupyter Notebook
- European House Price Index (HPI) dataset (Eurostat)
This project highlights the steady increase in housing prices across Europe and demonstrates how data analytics tools can be used to explore economic trends. It showcases fundamental skills in data cleaning, SQL analysis, and data visualization.
- Diego Fornero
- Selasey Junior Gbeddy