In this homework, we'll use the dbt project in 04-analytics-engineering/taxi_rides_ny/ to transform NYC taxi data and answer questions by querying the models.
- Set up your dbt project following the setup guide
- Load the Green and Yellow taxi data for 2019-2020 into your warehouse
- Run
dbt buildto create all models and run tests
After a successful build, you should have models like fct_trips, dim_zones, and fct_monthly_zone_revenue in your warehouse.
Given a dbt project with the following structure:
models/
├── staging/
│ ├── stg_green_tripdata.sql
│ └── stg_yellow_tripdata.sql
└── intermediate/
└── int_trips_unioned.sql (depends on stg_green_tripdata & stg_yellow_tripdata)
If you run dbt run --select int_trips_unioned, what models will be built?
stg_green_tripdata,stg_yellow_tripdata, andint_trips_unioned(upstream dependencies)- Any model with upstream and downstream dependencies to
int_trips_unioned int_trips_unionedonlyint_trips_unioned,int_trips, andfct_trips(downstream dependencies)
You've configured a generic test like this in your schema.yml:
columns:
- name: payment_type
data_tests:
- accepted_values:
arguments:
values: [1, 2, 3, 4, 5]
quote: falseYour model fct_trips has been running successfully for months. A new value 6 now appears in the source data.
What happens when you run dbt test --select fct_trips?
- dbt will skip the test because the model didn't change
- dbt will fail the test, returning a non-zero exit code
- dbt will pass the test with a warning about the new value
- dbt will update the configuration to include the new value
After running your dbt project, query the fct_monthly_zone_revenue model.
What is the count of records in the fct_monthly_zone_revenue model?
- 12,998
- 14,120
- 12,184
- 15,421
Using the fct_monthly_zone_revenue table, find the pickup zone with the highest total revenue (revenue_monthly_total_amount) for Green taxi trips in 2020.
Which zone had the highest revenue?
- East Harlem North
- Morningside Heights
- East Harlem South
- Washington Heights South
Using the fct_monthly_zone_revenue table, what is the total number of trips (total_monthly_trips) for Green taxis in October 2019?
- 500,234
- 350,891
- 384,624
- 421,509
Create a staging model for the For-Hire Vehicle (FHV) trip data for 2019.
- Load the FHV trip data for 2019 into your data warehouse
- Create a staging model
stg_fhv_tripdatawith these requirements:- Filter out records where
dispatching_base_num IS NULL - Rename fields to match your project's naming conventions (e.g.,
PUlocationID→pickup_location_id)
- Filter out records where
What is the count of records in stg_fhv_tripdata?
- 42,084,899
- 43,244,693
- 22,998,722
- 44,112,187
- Form for submitting: https://courses.datatalks.club/de-zoomcamp-2026/homework/hw4
=======
We encourage everyone to share what they learned. This is called "learning in public".
Read more about the benefits here.
🚀 Week 4 of Data Engineering Zoomcamp by @DataTalksClub complete!
Just finished Module 4 - Analytics Engineering with dbt. Learned how to:
✅ Build transformation models with dbt
✅ Create staging, intermediate, and fact tables
✅ Write tests to ensure data quality
✅ Understand lineage and model dependencies
✅ Analyze revenue patterns across NYC zones
Transforming raw data into analytics-ready models - the T in ELT!
Here's my homework solution: <LINK>
Following along with this amazing free course - who else is learning data engineering?
You can sign up here: https://github.com/DataTalksClub/data-engineering-zoomcamp/
📈 Module 4 of Data Engineering Zoomcamp done!
- Analytics Engineering with dbt
- Transformation models & tests
- Data lineage & dependencies
- NYC taxi revenue analysis
My solution: <LINK>
Free course by @DataTalksClub: https://github.com/DataTalksClub/data-engineering-zoomcamp/