Supervised and Unsupervised machine learning for Current Population Surveys

This repo is to demonstrate the application of supervised and unsupervised ML for income prediction and customer segmentation analysis using the Current Population Surveys (1994 and 1995) by the U.S. Census Bureau. Two income levels are given as the dependent variable: <$50,000 and >$50,000. The income threshold represents individuals around the 75th percentile of the total U.S. population (high-income group) during that time. There are 40 independent variables in total. A detailed list is provided in datacleaning.csv with metadata and data inspection information.

The workflow includes three models: LASSO logistic regression, XGBoost, and K-means clustering. LASSO is used for variable filtering based on the association strength between the variables. XGBoost is used for predicting the two income levels, and K-means clustering is used for customer segmentation.

To run the analysis:

Clone the GitRepo and navigate to the directory

git clone https://github.com/codefortheplanet/Supervised-and-Unsupervised-machine-learning-for-Current-Population-Surveys.git

cd Supervised-and-Unsupervised-machine-learning-for-Current-Population-Surveys

Create a new conda environment and install required dependencies. Then, activate the environment

conda env create -f environment.yml

conda activate cps

Run the initiate Python script

python run.py

The script will print major evaluation metrics and save result related coefficents and graphs

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
asec2022_recode.json		asec2022_recode.json
census-bureau.columns		census-bureau.columns
census-bureau.data		census-bureau.data
cleaning.py		cleaning.py
datacleaning.csv		datacleaning.csv
environment.yml		environment.yml
kmeancluster.py		kmeancluster.py
run.py		run.py
xgboostmodel.py		xgboostmodel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supervised and Unsupervised machine learning for Current Population Surveys

To run the analysis:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Supervised and Unsupervised machine learning for Current Population Surveys

To run the analysis:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages