Data engineer. I break things, read source code, and ship fixes upstream.
MSc Data Analytics. Building pipelines and dev tools on the side. I believe compliance shouldn't mean spreadsheets and AI shouldn't require the cloud. Yorkshire, UK.
sql-sop — SQL linter on PyPI. 23 rules, 78 tests, libCST-based injection scanner, pre-commit + GitHub Action. 195+ monthly downloads. pip install sql-sop
pr-sop — PR governance checker on PyPI. 3 config-driven checks (CHANGELOG drift, version consistency, pre-commit rev pins), 29 tests, pydantic v2 config, runs as CLI, pre-commit hook, or GitHub Action. First external consumer: sql-sop itself. pip install pr-sop
Production Analytics Pipeline — Incremental ETL from fish production ERP. 15K+ rows/day, FastAPI (11 endpoints) + Next.js + Power BI, Prefect orchestration, Docker + OpenTofu. 53 tests.
UK Crime Pipeline — Police UK API → PostgreSQL + BigQuery. 99,675 records, 6 dbt marts, 65 tests, Polars ingestion, SLO monitoring. streamlit · looker studio · hugging face
OpsMind — On-prem AI for manufacturing. NL-to-SQL in 5s, LangGraph agent, MCP server architecture, pgvector + ChromaDB RAG, Gemma 3 12B via Ollama. Golden-set eval harness with failure-mode taxonomy. docs
Manufacturing Compliance Dashboard — BRC/HACCP food safety compliance. MCP server exposes 5 compliance tools for LLM agents, NL query interface for auditors, z-score anomaly detection, Four Golden Signals /metrics endpoint. live
SQL Ops Reviewer — GitHub Action that auto-reviews .sql files in PRs using local AI. Catches injection risks, performance anti-patterns, style violations. One YAML file to set up, runs on the CI runner, zero API keys.
MediAsk — Health Q&A platform for factory workers. NHS-verified guidance, Gemini responses, voice input, 18 languages. Flask + PostgreSQL, Dockerised. live
drt — Triage Collaborator. Shipped multi-sync orchestration (drt run --all, --select tag:, --threads N) with a thread-safe StateManager and 11 parallel-dispatch tests. Plus 5 destination connectors, the official connector tutorial, Docker support, and pre-commit hooks — all merged.
sql-sop — Maintainer. Review and merge community PRs (W011 union-without-all, W012 group-by-ordinal, W005-template adoption), publish to PyPI, maintain governance + security policy, triage issues. First-PR-wins soft-assignment policy in place.
pr-sop — Creator and maintainer. Shipped v0.1.0 (initial three checks), v0.1.1 (fix for third-party rev: pin false positives), and v0.1.2 (fix for CI-merge-commit tag lookup) to PyPI in 24 hours. Full governance, security, contributing, and code-of-conduct documents published.
Merged contributions into projects I use every day.
- scanapi/scanapi#868 —
docs: add missing docstrings to spec_evaluator.py(First Contribution) - pyOpenSci/python-package-guide#622 — added Turing Way links for
CITATION.cffand software citation guidance - dlt-hub/dlt#3830 — updated source count from 5,000 to 8,000+ in intro docs
- py-pdf/fpdf2#1805 — added Punjabi (
pa) tutorial translation
I learn tools by reading their source: reverse-engineer the architecture, find the gap, ship the fix.
drt · pandas · ChromaDB · pgcli · ollama · superset · plotly · fpdf2
Python, SQL, dbt, PostgreSQL, BigQuery, FastAPI, Streamlit, Prefect, LangGraph, Ollama, Docker, Polars, pandas, Pydantic, pytest, GitHub Actions


