Production-ready football analytics platform with AI-powered chatbot, interactive dashboards, and fully automated data pipelines.
The platform is built on a modern data stack designed for high-frequency data ingestion, multi-layered processing, and AI-driven insights.
- Orchestration: Managed by Apache Airflow with custom Python Operators.
- Stealth Scraping: Uses Playwright paired with Xvfb (Virtual Display) to bypass advanced anti-bot measures (Cloudflare/TLS Fingerprinting) on FBref and Understat.
- Sources: High-fidelity match statistics, shot maps, lineups, and advanced metrics (xG, xA, SCA).
Data is organized in a PostgreSQL data warehouse across three primary tiers:
- Bronze (Raw): Persistent landing zone for raw JSON and HTML responses.
- Silver (Staged): Cleaned, validated, and normalized relational tables.
- Gold (Analytics): Optimized views and metrics for dashboards and RAG.
- ELT Workflow: Uses dbt (data build tool) to transform raw Bronze data into Silver and Gold layers.
- Version Control: SQL-based transformations with testing, documentation, and lineage tracking.
- Modeling: Implements modular SQL for reusable business logic and performance optimization.
- Engine: A FastAPI backend powering a Hybrid-Intelligence Chatbot.
- RAG (Retrieval-Augmented Generation): Uses ChromaDB as a vector store to index match reports and tactical insights.
- Local LLM: Orchestrated by Ollama, running models like
qwen2.5locally for data privacy and low latency. - Hybrid Search: Combines direct SQL queries (for aggregate stats) with vector search (for semantic/tactical analysis).
- Dashboards: (External) Interactive visualizations for tactical analysis.
- Monitoring: Prometheus and Grafana track system health, scraping success rates, and database performance.
- Backend: Python 3.11, FastAPI
- Orchestration: Apache Airflow
- Transformation: dbt (Data Build Tool)
- Database: PostgreSQL 16
- AI/ML: Ollama (LLM), ChromaDB (Vector DB), Sentence-Transformers
- Infrastructure: Docker, Docker Compose
- Monitoring: Prometheus, Grafana
- Clone the repository:
git clone https://github.com/EbEmad/Analytics-Platform.git cd Analytics-Platform - Start the platform:
docker-compose up -d
- Access the services:
- Airflow:
http://localhost:8080 - Chatbot API:
http://localhost:5000 - Grafana:
http://localhost:3000
- Airflow:
