Automated data pipeline for fetching YouTube trending videos and loading them into a database.
A small pipeline that:
- fetches YouTube's trending videos (uses the YouTube Data API v3)
- writes the results as CSV files in
data/ - loads a CSV into PostgreSQL using SQLAlchemy
There are two main scripts in scripts/:
fetch_trending.py— calls the YouTube API and saves a CSV indata/.load_to_db.py— reads a CSV fromdata/and appends it to a Postgres table.
There is also an Airflow DAG at dags/youtube_trending_dag.py, which can be used to schedule the fetch with Airflow. At the moment the dag is scheduled for every 15 minutes.
---- How it works in more detail ----
scripts/fetch_trending.pyuses the Google API clientgoogleapiclient.discovery.buildto call thevideos.listendpoint withchart=mostPopularandregionCode=US.- The script extracts selected fields from the response (video id, title, channel, category id, publication time, and basic stats), converts them to a Pandas DataFrame, and writes a timestamped CSV to
data/. scripts/load_to_db.pyusespandas.read_csvandsqlalchemy.create_engineto append CSV rows to atrending_videostable in PostgreSQL.dags/youtube_trending_dag.pyis an Airflow DAG that is used to schedule the fetch and downstream tasks.
In Airflow, a DAG or Directed Acyclic Graph, is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies.
- Clone the repo:
git clone https://github.com/em1e/YouTubeTrends.git
-
Add your YouTube API key
- Create a
.envfile in the repo root (example.env.exampleprovided):API_KEY=YOUR_YOUTUBE_API_KEY
note: fetch_trending.py will attempt to read
API_KEYand try to load.envfrom the project root. - Create a
-
Airflow and Scheduling
I'm using Airflow's default Docker Compose file (with small modifications) to run the airflow webserver, requirements are included with my dockerfile and requirements.txt.
docker compose up --build-> build requirements and start all servicesdocker compose down --volumes --rmi all-> clean up when done
Then you should be able to open the Airflow web server at http://localhost:8080 and enable the youtube_trending_pipeline DAG.
db name: airflow
user: airflow
password: airflow
I learned a lot during a small window, Airflow, DAGs, dbt, Youtube API. There were a LOT of things that were completely new to me! Gotta say I overall enjoyed the debugging and learning process over these past few days C:


