Skip to content
View seanbrar's full-sized avatar

Block or report seanbrar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
seanbrar/README.md

Sean Brar

ML systems engineer building evaluation and verification infrastructure for AI systems. Google DeepMind GSoC 2025 alumnus.

My work focuses on the question: when an AI system produces output, how do you know it's correct? I build the harnesses, validation pipelines, and correctness guarantees that answer that question — from statistical evaluation of retrieval systems to schema-constrained LLM output validation to infrastructure-level correctness in LLM orchestration.

Currently pursuing post-baccalaureate CS and Mathematics, preparing for graduate research in ML evaluation and verification.

Selected Projects

Pollux — Async multimodal LLM orchestration library with deterministic content-hash caching, single-flight deduplication, and retry-policy separation for generation vs. side-effect calls. 90% API cost reduction on fan-out workloads. GSoC 2025 with Google DeepMind. Published on PyPI.

ContextRAG — RAG evaluation harness computing 7 retrieval metrics with TOST equivalence testing, bootstrap CIs, and Holm-Bonferroni correction. Validated a preregistered null hypothesis across 60+ experiment runs and 3 datasets.

gh-templates — Schema-constrained LLM extraction pipeline across 3,746 repositories. Pydantic contracts validating structured Gemini output with transient/permanent error taxonomy at 99.97% success rate.

paperweight — arXiv paper discovery and triage CLI with golden-set validation, offline integration testing, and Tenacity retry architecture. Published on PyPI.

Connect

seanbrar.com · LinkedIn · hello@seanbrar.com

Pinned Loading

  1. pollux pollux Public

    Python 1 1

  2. ContextRAG ContextRAG Public

    A RAG evaluation harness for rigorous experimentation.

    Python

  3. gh-templates gh-templates Public

    Data-driven analysis of pull request templates in popular open source repositories.

    Python

  4. paperweight paperweight Public

    Automated retrieval, filtering, and LLM-powered summarization of arXiv papers based on your research interests.

    Python 1