Summary
When _get_by_json_path (or whatever resolves the claim's json_path against the indexed payload) cannot walk the path, the per-claim verdict comes back as result: "CAUGHT" with the parser error in detail, e.g.:
detail: json_path: 'expected dict at segment "['value'][0]['subject']", got list'
A consumer can't tell that apart from a real value mismatch — both look like result: "CAUGHT" to anyone reading per_claim. That has two downstream costs:
- KPI / metrics — counting "hallucinations caught" inflates the number with false positives every time the LLM picks a path the parser can't handle. Demos and customer dashboards over-report drift.
- Heal-loop classifiers — substitute / reprompt / fail decisions are made on
result == "CAUGHT". An evaluator-can't-walk-this verdict triggers the wrong tier, retries that can never converge, and noisy traces.
Proposal
Emit result: "ERROR" (or a new "EVALUATOR_ERROR", whichever fits the existing taxonomy) for path-walk failures, with the same detail so debug info is preserved. The two cases that consumers want to disambiguate:
- CAUGHT — path resolved, claimed value disagrees with indexed value (real drift)
- ERROR — path could not be resolved (evaluator limitation, missing field, malformed path, etc.)
Reproduction
Any tool whose response is a JSON array at the root, with the LLM emitting Python-bracket-key paths (['value'][0]['subject']) or bracketed numeric indexes ([0].subject). The SDK's parser bails before reaching the leaf. Currently surfaces as CAUGHT instead of ERROR.
Workaround on the consumer side
We're patching this locally in customer-support-sdk-demo (evaluate_node.py — _patch_array_path_verdicts): if the SDK's verdict has an "expected … got list/dict" detail, we re-walk with a more permissive parser, mark verdicts as PASS when our local walk verifies the claim, and as ERROR when it can't. That belongs in the SDK so every consumer doesn't reinvent it.
Related
Tracks alongside the existing array-indexing limitation in _get_by_json_path (which the consumer-side workaround was originally created to bridge). Resolving this report-classification issue is independent of fixing the underlying parser — even a path the SDK genuinely can't walk would be more honestly classified as ERROR than CAUGHT.
Summary
When
_get_by_json_path(or whatever resolves the claim'sjson_pathagainst the indexed payload) cannot walk the path, the per-claim verdict comes back asresult: "CAUGHT"with the parser error indetail, e.g.:A consumer can't tell that apart from a real value mismatch — both look like
result: "CAUGHT"to anyone readingper_claim. That has two downstream costs:result == "CAUGHT". An evaluator-can't-walk-this verdict triggers the wrong tier, retries that can never converge, and noisy traces.Proposal
Emit
result: "ERROR"(or a new"EVALUATOR_ERROR", whichever fits the existing taxonomy) for path-walk failures, with the samedetailso debug info is preserved. The two cases that consumers want to disambiguate:Reproduction
Any tool whose response is a JSON array at the root, with the LLM emitting Python-bracket-key paths (
['value'][0]['subject']) or bracketed numeric indexes ([0].subject). The SDK's parser bails before reaching the leaf. Currently surfaces asCAUGHTinstead ofERROR.Workaround on the consumer side
We're patching this locally in
customer-support-sdk-demo(evaluate_node.py —_patch_array_path_verdicts): if the SDK's verdict has an "expected … got list/dict" detail, we re-walk with a more permissive parser, mark verdicts as PASS when our local walk verifies the claim, and as ERROR when it can't. That belongs in the SDK so every consumer doesn't reinvent it.Related
Tracks alongside the existing array-indexing limitation in
_get_by_json_path(which the consumer-side workaround was originally created to bridge). Resolving this report-classification issue is independent of fixing the underlying parser — even a path the SDK genuinely can't walk would be more honestly classified as ERROR than CAUGHT.