Commit ed6693a
committed
feat(nightwatch): add dual-panel evaluation for drift detection
Add DualPanelResult struct and dual_panel_evaluate function to enable
two independent quality assessments on agent output. Drift is detected
when panel agreement falls below 0.5.
Panel A: Scores based on ReasoningCertificate quality (premises,
claims, edge cases, confidence)
Panel B: Scores based on output structure (sections, evidence markers,
conclusion markers, minimum length)
Includes comprehensive unit tests covering:
- Both panels agree (no drift)
- Panels disagree (drift detected)
- Missing certificate scenario
Refs #911 parent ab46bbd commit ed6693a
2 files changed
Lines changed: 386 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
59 | | - | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
| |||
0 commit comments