Quality Gate Reference
- Checklist Item(s): TDD-001 (Test Coverage Baseline), TDD-003 (Test Pyramid Balance)
- Current Score: TDD domain 11/100 — Tier 1 blocker
- Impact: High — leverages existing eval investment
- Effort: High
Description
The project has 27 JSON eval files across evals/ and skills/*/evals/ that define structured behavioral expectations (prompts, expected tool calls, assertions). These are currently declarative specifications evaluated manually or by AI — not connected to any automated test runner.
Implementation Plan
- Write a pytest plugin or parametrized test loader that reads eval JSON files
- For each eval entry, extract the
expectations or assertions arrays
- Map deterministic expectations to pytest assertions:
- Skill trigger detection → verify correct skill matches prompt
- Expected tool calls → validate against known tool schemas
- Expected state changes → check output structure
- Use
@pytest.mark.parametrize with ids from eval name fields
- Non-deterministic expectations (LLM output quality) → mark as
pytest.mark.skip with reason, or use similarity thresholds
Benefits
Acceptance Criteria
References
- Promptfoo — eval framework patterns
- DeepEval — pytest-integrated eval framework
- Existing eval files:
evals/test-architect-evals.json, skills/refactor/evals/refactor-evals.json
Generated by Cogitations /cog-discover
Quality Gate Reference
Description
The project has 27 JSON eval files across
evals/andskills/*/evals/that define structured behavioral expectations (prompts, expected tool calls, assertions). These are currently declarative specifications evaluated manually or by AI — not connected to any automated test runner.Implementation Plan
expectationsorassertionsarrays@pytest.mark.parametrizewithidsfrom evalnamefieldspytest.mark.skipwith reason, or use similarity thresholdsBenefits
Acceptance Criteria
References
evals/test-architect-evals.json,skills/refactor/evals/refactor-evals.jsonGenerated by Cogitations /cog-discover