"Battle-test your AI agent's knowledge before you ship. Ship with data, not vibes."
AgentReady is a lightweight, Netlify-deployed harness that stress-tests the knowledge base and agent harness (skills/rules/tools/system prompts) you built around an LLM.
- Upload your agent's knowledge base (docs, runbooks, READMEs, etc.)
- Upload your agent harness (optional) + describe the agent's mission
- Get a readiness score and a prioritized list of blind spots before users find them
- Generates stress-test probes tailored to your mission and domain (incidents, edge cases, adversarial prompts, cross-domain questions)
- Evaluates knowledge + harness coverage by running probes and forcing the model to be explicit about confidence + missing context
- Returns a readiness score + top gaps (what's missing, whether it's a knowledge vs harness issue, why it matters, and how to fix it)
- Netlify Functions (serverless)
- Anthropic Claude API (
@anthropic-ai/sdk) - Vanilla JS (no framework)
- Neo-brutalist CSS
git clone https://github.com/rajnavakoti/agent-ready
cd agent-ready
npm install
# Run Netlify dev server (functions + static site)
ANTHROPIC_API_KEY=your-key npm run devThen open the local URL printed by Netlify (defaults to http://localhost:8888).
- Create a new Netlify site from this repo
- Set the environment variable
ANTHROPIC_API_KEY - Deploy
netlify.toml already sets:
publish = "src"functions = "netlify/functions"
The app is a 3-function pipeline:
netlify/functions/generate-probes.mjs: given mission + knowledge base + harness, generates targeted probes (JSON)netlify/functions/execute-probes.mjs: runs each probe against the provided context and returns answers + confidence + explicit gapsnetlify/functions/analyze-gaps.mjs: aggregates results into category scores, an overall readiness score, and the top blind spots
On the frontend, src/js/app.js orchestrates the pipeline:
/.netlify/functions/generate-probes/.netlify/functions/execute-probes/.netlify/functions/analyze-gaps
AgentReady uses 17+ public datasets across 8 domains to ground scenario simulation, including:
- Customer support
- IT incidents
- Healthcare
- Legal
- Finance / fraud
- E-commerce
- Real estate
- Code / bugs
This project is inspired by DDC-style evaluation methodology: https://arxiv.org/abs/2603.14057
MIT (see LICENSE).