My homework solutions and projects for the Data Engineering Zoomcamp 2026 β a free course by DataTalks.Club covering the modern data engineering stack.
| # | Module | Topic | Tech Stack |
|---|---|---|---|
| 1 | Module1 | Docker & SQL | Docker, PostgreSQL, Terraform, GCP |
| 2 | Module2 | Workflow Orchestration | Kestra |
| 3 | Module3 | Data Warehouse | BigQuery, dlt |
| 4 | Module4 | Analytics Engineering | dbt, dimensional modeling |
| 5 | Module5 | Data Platforms | Bruin |
| 6 | Module6 | Batch Processing | Apache Spark (PySpark) |
| 7 | Module7 | Streaming | PyFlink, Redpanda |
| π§ͺ | Workshop | dlt Workshop | dlt, DuckDB, REST API |
Each module folder contains a
homework.mdwith the questions, my answers, and the code/queries used to derive them.
SoCal NOD Tracker β Foreclosure Early-Warning Pipeline
An end-to-end pipeline tracking Notice of Default (NOD) filings across 6 Southern California counties β a leading indicator of foreclosure activity.
Daily CSVs β Kestra DAG β GCS (data lake) β BigQuery (raw) β dbt (marts) β Looker Studio
Stack: Terraform Β· Kestra Β· Google Cloud Storage Β· BigQuery Β· dbt Β· Looker Studio
See the full project README for architecture, reproduction steps, and the dashboard.
| Layer | Tools |
|---|---|
| Containerization | Docker, Docker Compose |
| Orchestration | Kestra |
| Data Lake | Google Cloud Storage |
| Data Warehouse | BigQuery, PostgreSQL, DuckDB |
| Ingestion | dlt |
| Transformation | dbt, Bruin |
| Batch Processing | Apache Spark / PySpark |
| Streaming | Apache Flink (PyFlink), Redpanda |
| Infrastructure as Code | Terraform |
| Visualization | Looker Studio |
.
βββ Module1/ # Docker & SQL
βββ Module2/ # Workflow Orchestration (Kestra)
βββ Module3/ # Data Warehouse (BigQuery, dlt)
βββ Module4/ # Analytics Engineering (dbt)
βββ Module5/ # Data Platforms (Bruin)
βββ Module6/ # Batch Processing (Spark)
βββ Module7/ # Streaming (PyFlink, Redpanda)
βββ Workshop/ # dlt Workshop
βββ project1/ # Capstone: SoCal NOD Tracker
The Data Engineering Zoomcamp is a free, hands-on course covering data engineering fundamentals: containerization, workflow orchestration, data warehousing, analytics engineering, batch processing, and streaming.
Michael β @HighviewOne
Released under the MIT License.