Skip to content

[Cogitations] Bridge 27 eval JSON files to automated pytest assertions #3

@zircote

Description

@zircote

Quality Gate Reference

  • Checklist Item(s): TDD-001 (Test Coverage Baseline), TDD-003 (Test Pyramid Balance)
  • Current Score: TDD domain 11/100 — Tier 1 blocker
  • Impact: High — leverages existing eval investment
  • Effort: High

Description

The project has 27 JSON eval files across evals/ and skills/*/evals/ that define structured behavioral expectations (prompts, expected tool calls, assertions). These are currently declarative specifications evaluated manually or by AI — not connected to any automated test runner.

Implementation Plan

  1. Write a pytest plugin or parametrized test loader that reads eval JSON files
  2. For each eval entry, extract the expectations or assertions arrays
  3. Map deterministic expectations to pytest assertions:
    • Skill trigger detection → verify correct skill matches prompt
    • Expected tool calls → validate against known tool schemas
    • Expected state changes → check output structure
  4. Use @pytest.mark.parametrize with ids from eval name fields
  5. Non-deterministic expectations (LLM output quality) → mark as pytest.mark.skip with reason, or use similarity thresholds

Benefits

Acceptance Criteria

  • TDD-001 score improves
  • TDD-003 (test pyramid) improves from 0.05
  • Eval failures detected in CI before merge
  • No regression in other domain scores

References

  • Promptfoo — eval framework patterns
  • DeepEval — pytest-integrated eval framework
  • Existing eval files: evals/test-architect-evals.json, skills/refactor/evals/refactor-evals.json

Generated by Cogitations /cog-discover

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions