Code Repository for: AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
security benchmarking benchmark research ai evaluations hacking artificial-intelligence cybersecurity ctf agents offensive-security ai-agents benchmark-datasets llm cyber-evals
-
Updated
Apr 19, 2026 - Jupyter Notebook