if it's a CLI itself, it cannot be called from within benchtool init
ideally, but less important
instead of
# Design
"The tested capability, characteristic, or concept is defined":
- "Tested concept, capability, or characteristic not explicitly mentioned."
- "Tested concept explicitly mentioned and need for definition acknowledged, but definition not provided."
- "Tested concept, capability, or characteristic explicitly mentioned but not defined."
- "Tested concept, capability, or characteristic explicitly mentioned and defined."
it would be better to have more structure
- category_name: Design
criterion_text: "The tested capability, characteristic, or concept is defined"
criterion_id: definition
rubric:
- na: ""
- 0: "Tested concept, capability, or characteristic not explicitly mentioned."
- 5:"Tested concept explicitly mentioned and need for definition acknowledged, but definition not provided."
- 10:"Tested concept, capability, or characteristic explicitly mentioned but not defined."
- 15:"Tested concept, capability, or characteristic explicitly mentioned and defined."
and then the per-benchmark wouldlook like
- criterion_id: definiton
response
justification
score
skipped
category_name: design
this will then also make the code way easier to read
if it's a CLI itself, it cannot be called from withinbenchtool initideally, but less important
instead of
it would be better to have more structure
and then the per-benchmark wouldlook like
this will then also make the code way easier to read