An End-to-End Evaluation Framework for Entity Resolution Systems
-
Updated
Dec 3, 2023 - Python
An End-to-End Evaluation Framework for Entity Resolution Systems
Model Context Protocol Benchmark Runner
A metrics library to evaluate vision language models with a pytorch eco system.
✍️ Collaborate on writing technical content for the Giskard Community
An open-source Streamlit web app to generate beautiful confusion matrices for multi-class machine learning models. Supports numeric and string labels, CSV upload, manual label entry, custom color maps, and displays evaluation metrics like Accuracy, Precision, Recall, and F1-score. Users can download the confusion matrix as an image.
Safety-first legal NLP system with hierarchical long-document processing, deterministic inference, clause extraction, and rule-based risk engine — built for traceability and deployment constraints.
Enterprise-grade machine learning observability platform that detects data drift, concept drift, and performance degradation in production models. Features statistical drift detection (KS test, PSI), real-time alerting, Redis caching, and FastAPI backend.
Data Science Challenge from Coursera Project : Loan Default Prediction
This project contains codes and paperwork based on the course CSI5155 at University of Ottawa (delivered by Professor Dr. Herna Viktor).
GitHub Action for SWE-bench Pro evaluation powered by mcpbr
Small, educational project that shows how to build a **minimal RAG pipeline** with a **simple evaluation loop**
Collection of Machine Learning (ML) and Natural Language Processing (NLP) projects showcasing a range of applications, algorithms, and techniques.
End-to-end E-Commerce Recommendation System using implicit feedback, featuring Popularity, Item-Item CF, ALS (Matrix Factorization), and a Hybrid model, with offline evaluation and online serving via FastAPI + Streamlit.
Evaluation of system-level risks in content moderation models using policy-driven metrics, identity-based analysis, and governance-aligned datasets.
A MATLAB-based machine learning project that implements a Naive Bayes spam email classifier using the UCI Spambase dataset. Includes feature selection, model tuning, performance evaluation, and deployment-ready model export.
📊 Generate and visualize confusion matrices for multi-class models with ease, including metrics and custom formats in an open-source Streamlit app.
Add a description, image, and links to the ml-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the ml-evaluation topic, visit your repo's landing page and select "manage topics."