fix: return ERROR not CAUGHT when evaluator cannot perform comparison#36
Open
MdSadiqMd wants to merge 1 commit into
Open
fix: return ERROR not CAUGHT when evaluator cannot perform comparison#36MdSadiqMd wants to merge 1 commit into
MdSadiqMd wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #24
Summary
eval_modes.pyreturnedCAUGHT, inflating hallucination metrics and triggering heal-loop retries that can never convergeERROR— aligning with the evaluator's own docstring: "ERROR — the evaluator could not actually evaluate. This is not evidence of tampering and must not be conflated with CAUGHT"CAUGHTERRORverification_modeCAUGHTERRORexpected_json_schemais empty forschema_typeCAUGHTERRORSchemaError— the schema itself is invalidCAUGHTERRORrange_minandrange_maxbothNoneCAUGHTERRORCAUGHTERRORTest plan
uv run pytest tests/unit/-> 116 passeduv run pytest tests/e2e/-> 15 passedtest_field_extraction_error_when_index_out_of_rangeto expectERRORtest_schema_type_missing_path_is_errorto expectERRORtest_evaluate_handoff_caught_on_mismatch,test_final_verify_failure_marks_caught, scenario B tampered claim) still returnCAUGHT