Skip to content

vanHeemstraSystems/learning-azure-databricks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏎️ learning-azure-databricks

Scuderia Data β€” Building Your F1 Data Platform on Azure

"Data Engineering is just F1. Raw fuel in, championship insight out."

This repository follows the Scuderia Data metaphor: every Azure Databricks and Data Platform concept mapped to a Formula 1 Racing Team. If you understand F1, you understand Azure Databricks.


πŸ—ΊοΈ The Metaphor Map

F1 Concept Azure / Databricks Concept
🏭 F1 Factory Azure Data Platform (the whole thing)
🏎️ Race Car Azure Databricks
βš™οΈ V10 Engine Apache Spark
β›½ Raw Fuel Raw Data (events, logs, transactions)
πŸš› Fuel Logistics Azure Data Factory (ADF)
πŸ›’οΈ Fuel Tank Azure Data Lake Storage Gen2 (ADLS)
πŸ”§ Pit Lane + Fuel Grades Delta Lake (Bronze / Silver / Gold)
πŸ‘· Pit Crew Data Engineers
πŸ§‘β€βœˆοΈ Driver Data Scientists & Analysts
πŸ–₯️ Cockpit Databricks Notebooks
πŸ“‘ Telemetry System Azure Monitor + Databricks Observability
🧠 Race Strategist Unity Catalog (Governance)
πŸ’¨ Wind Tunnel AutoML + MLflow (ML Experiments)
πŸ“Ί Race Broadcast Power BI Dashboards
πŸ† Championship Table Lakehouse Architecture

πŸ“ Repository Structure

learning-azure-databricks/
β”‚
β”œβ”€β”€ flows/                          ← End-to-end learning journeys
β”‚   β”œβ”€β”€ flow-01-factory-tour.md     ← Overview: the whole F1 factory
β”‚   β”œβ”€β”€ flow-02-fuel-to-finish.md   ← Data: raw β†’ insights pipeline
β”‚   └── flow-03-race-day-ops.md     ← Production: monitoring & governance
β”‚
β”œβ”€β”€ stories/                        ← Focused concept narratives
β”‚   β”œβ”€β”€ story-01-the-fuel-tank.md   ← ADLS Gen2 deep dive
β”‚   β”œβ”€β”€ story-02-the-engine.md      ← Apache Spark internals
β”‚   β”œβ”€β”€ story-03-pit-lane.md        ← Delta Lake operations
β”‚   β”œβ”€β”€ story-04-race-strategy.md   ← Unity Catalog governance
β”‚   └── story-05-wind-tunnel.md     ← MLflow & AutoML
β”‚
β”œβ”€β”€ tasks/                          ← Hands-on exercises
β”‚   β”œβ”€β”€ task-01-spin-up-cluster.md
β”‚   β”œβ”€β”€ task-02-ingest-bronze.md
β”‚   β”œβ”€β”€ task-03-transform-silver.md
β”‚   β”œβ”€β”€ task-04-aggregate-gold.md
β”‚   └── task-05-register-model.md
β”‚
β”œβ”€β”€ 100-foundations/                ← Azure + Databricks fundamentals
β”œβ”€β”€ 200-data-ingestion/             ← ADF, Event Hubs, streaming
β”œβ”€β”€ 300-delta-lake/                 ← Delta operations & optimization
β”œβ”€β”€ 400-databricks-core/            ← Clusters, notebooks, jobs, SQL
β”œβ”€β”€ 500-unity-catalog/              ← Governance, lineage, access
β”œβ”€β”€ 600-medallion-architecture/     ← Bronze β†’ Silver β†’ Gold patterns
β”œβ”€β”€ 700-ml-and-mlflow/              ← Feature store, AutoML, model registry
β”œβ”€β”€ 800-synapse-and-powerbi/        ← Serving layer & reporting
β”œβ”€β”€ 900-production-patterns/        ← CI/CD, cost, monitoring
β”‚
└── articles/                       ← Dev.to series: "Scuderia Data"
    β”œβ”€β”€ episode-01-welcome-to-the-factory.md
    β”œβ”€β”€ episode-02-the-fuel-tank.md
    β”œβ”€β”€ episode-03-fuel-logistics.md
    β”œβ”€β”€ episode-04-the-race-car.md
    β”œβ”€β”€ episode-05-the-engine.md
    β”œβ”€β”€ episode-06-pit-lane-bronze.md
    β”œβ”€β”€ episode-07-silver-refinement.md
    β”œβ”€β”€ episode-08-gold-aggregation.md
    β”œβ”€β”€ episode-09-the-cockpit.md
    β”œβ”€β”€ episode-10-race-strategy-governance.md
    β”œβ”€β”€ episode-11-telemetry.md
    β”œβ”€β”€ episode-12-wind-tunnel-ml.md
    β”œβ”€β”€ episode-13-race-broadcast.md
    └── episode-14-championship-architecture.md

🚦 How to Use This Repo

  1. Start with flows/ β€” get the big picture of each learning journey
  2. Read the stories/ β€” go deep on individual concepts with the F1 metaphor
  3. Do the tasks/ β€” hands-on exercises to build muscle memory
  4. Browse numbered directories β€” structured reference material by topic
  5. Follow the articles/ β€” the Dev.to series tells the whole story episodically

πŸ“š Dev.to Series: "Scuderia Data"

Published under: Infrastructure as Code Adventures or new series "Like F1? Love Data!"

14-episode series covering Azure Databricks from factory floor to championship podium.


πŸ”— Related Repositories

  • learning-crossplane-schemas β€” Infrastructure provisioning
  • learning-audit-automation β€” Security & compliance automation
  • learning-tailscale β€” Secure networking

Part of the stallone learning ecosystem.

About

Learning Azure Databricks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors