Skip to content

This repository is used as a sample solution for students who want to build a regression model to predict housing prices. The original dataset used is the well known Ames Housing Prices Dataset from Dean De Cock.

License

Notifications You must be signed in to change notification settings

ValentinFutterer/housingprices-ml-sample-solution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

housingprices-ml-sample-solution

Overview

This repository creates a sample solution / benchmarks for students who need to build a regression model to predict housing prices in the UE "Creating Machine Learning Models in Practice". The code was created in the context of an educational TU Wien project. The original dataset used is the well known Ames Housing Prices Dataset from Dean De Cock. The version used in this code was converted from txt to csv and is stored in DBRepo, pre-split in training, validation and testing datasets.

Original dataset and documentation: -https://jse.amstat.org/v19n3/decock/AmesHousing.txt -https://jse.amstat.org/v19n3/decock/DataDocumentation.txt

DBRepo dataset splits: -https://handle.test.datacite.org/10.82556/1jqa-zp46 -https://handle.test.datacite.org/10.82556/7cm6-bt62 -https://handle.test.datacite.org/10.82556/20e7-a615

This project implements a linear regression model to predict housing prices using the Ames Housing dataset. The script performs the following:

  1. Data Retrieval: Downloads datasets via the dbrepo Python package
  2. Preprocessing: Numeric + categorical imputation, one-hot encoding
  3. Training: Trains a LinearRegression model on the training set
  4. Evaluation: Validates and tests the model; computes RMSE
  5. Visualization: Plots absolute error and root squared error
  6. Upload: Saves model and plots, and uploads them to TUWRD via the Invenio API

Installation

The code comes as a python script 'main.py' or as a notebook 'housing_prices_dataset_model.ipynb'. For both you first need to fill in some variables. DBRepo credentials go here in line 47 client = RestClient(endpoint="https://test.dbrepo.tuwien.ac.at", username="username", password="password"). The Invenio auth token in line 140 auth_token = "Invenio Token". Additionally, if you do not wish to downlaod the data from DBRepo, you can use the local datasets in the data folder, by commenting and uncommenting at line 50.

Install all dependencies and start the scripts with:

pip install -r requirements.txt
python main.py
pip install jupyter
jupyter notebook

License

The MIT license attached only pertains to the source code itself, not the produced model ames_housing_prices_model.pkl or the two graphs scatter-absolute_error and scatter_root_squared_error or the attached dataset splits in the data folder. These are all derivate works from the original ames housing dataset and do not have a license. They can however be reused in an educational context, see here where the dataset was originally published: https://jse.amstat.org/jse_users.htm Therefore, the same restriction applies to the datasets, the graphs and the model.

About

This repository is used as a sample solution for students who want to build a regression model to predict housing prices. The original dataset used is the well known Ames Housing Prices Dataset from Dean De Cock.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors