Skip to content

chore: add basic file structure to start the project#457

Open
Devam0908 wants to merge 9 commits into
gpsaggese:masterfrom
Devam0908:UmdTask382_DATA605_Spring2026_Clickhouse_user_engagement_prediction
Open

chore: add basic file structure to start the project#457
Devam0908 wants to merge 9 commits into
gpsaggese:masterfrom
Devam0908:UmdTask382_DATA605_Spring2026_Clickhouse_user_engagement_prediction

Conversation

@Devam0908
Copy link
Copy Markdown

Summary

  • Create basic file structure needed for our project.
  • Decide dataset and ML hypothesis

Planned Workflow

  1. Build an ETL pipeline in clickhouse.API.ipynb.
  2. Feature engineering and Model Evaluation in clickhouse.example.ipynb.
  3. Documenting results and finalizing README.md

Issue: #382

Assigned to: @gpsaggese, @protocorn, @Devam0908, @panktisheta, @dshah567

Devam0908 and others added 7 commits May 1, 2026 00:01
…b.io into UmdTask382_DATA605_Spring2026_Clickhouse_user_engagement_prediction
Edited Clickhouse.API and made a script to ingest the data directly into Clickhouse due to being it larger size.
It is given that the dataset is clean on its kaggle size.

Dataset: https://www.kaggle.com/datasets/mkechinov/ecommerce-behavior-data-from-multi-category-store
* Make docker run + ingest workflow reproducible

* Added Datasource for Grafana and started with basic analytics

* Edited the notes

* Changed the hourly_funnel Table creation and query_df function

* Change gitignore and build sessoin level table

* Add description for step 2/9 in clickhouse.api.ipynb

* Complete clickhouse.api.ipynb file

* Add all necessary files and clean run

* Automate dashboard
@Devam0908
Copy link
Copy Markdown
Author

Summary

Complete end-to-end ClickHouse + ML project to predict whether an e-commerce user session will result in a purchase, with a reproducible Docker workflow and Grafana dashboards.

Issue: #382
Authors: @Devam0908, @panktisheta, @dshah567
Reviewers: @gpsaggese, @protocorn

Project

  • Project: Session Purchase Prediction (ClickHouse-first analytics → ML)
  • Dataset: Kaggle “Ecommerce behavior data from multi category store” (2019-Oct)
  • DB/Stack: ClickHouse (MergeTree + Materialized Views), Jupyter, Grafana (ClickHouse datasource)
  • Models: Logistic Regression + XGBoost (trained from ClickHouse-built training table)

What’s included

  • Reproducible Docker setup (docker-compose.yml) with:
    • ClickHouse (HTTP + native ports exposed)
    • Jupyter notebook container
    • Grafana with ClickHouse datasource + dashboard provisioning
  • Fast ingestion + schema pipeline (docker_ingest.sh)
    • events_raw (CSV ingest) → events_typed (typed/partitioned) via mv_events_raw_to_typed
  • ClickHouse-first feature engineering + labeling
    • Builds session-level labels (session_outcomes) and a stable training table (training_sessions_n5, first N=5 events)
  • Model training + evaluation notebook
    • Trains/evaluates Logistic Regression + XGBoost from training_sessions_n5
    • Writes session-level scores back to ClickHouse (session_purchase_predictions)
  • Grafana dashboards (provisioned in artifacts/)
    • Drop-off / funnel style analysis (“where users drop off before purchase”)
    • Early intent signal (“predict purchase intent from first few actions”)
    • Model comparison (“which model best identifies potential buyers”)
  • Utility module (clickhouse_utils.py) for environment-driven ClickHouse connections and query→DataFrame helpers

Run plan

  • docker compose up -d
  • ./docker_ingest.sh
  • Run notebooks:
    • notebooks/clickhouse.API.ipynb
    • notebooks/clickhouse.example.ipynb
  • Open Grafana: http://localhost:3000 (admin/admin) and confirm dashboards load

@Devam0908 Devam0908 requested a review from protocorn May 8, 2026 12:53
* Make docker run + ingest workflow reproducible

* Added Datasource for Grafana and started with basic analytics

* Edited the notes

* Changed the hourly_funnel Table creation and query_df function

* Change gitignore and build sessoin level table

* Add description for step 2/9 in clickhouse.api.ipynb

* Complete clickhouse.api.ipynb file

* Add all necessary files and clean run

* Automate dashboard

* Ready for Final Review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants