Portable, content-addressed reliability evidence for LLM systems. Capture how a model behaves under perturbation; preserve, verify, and diff the evidence across model changes.
python provenance regression-testing ai-safety differential-testing ai-evaluation llm falsification llm-evaluation eu-ai-act reliability-testing model-migration perturbation-testing replayable-evidence evaluation-validity evidence-infrastructure
-
Updated
May 25, 2026 - Python