Skip to content

Make grading summary deterministic and JSON-safe across report/finalize flows#28

Open
Copilot wants to merge 1 commit into
mainfrom
copilot/update-containerisation-summary
Open

Make grading summary deterministic and JSON-safe across report/finalize flows#28
Copilot wants to merge 1 commit into
mainfrom
copilot/update-containerisation-summary

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 19, 2026

The current summary field was derived from LLM-generated prose, which made output non-deterministic and inconsistent with strict JSON-first consumers. This change makes the summary fully deterministic from scoring data and enforces a plain-text, JSON-safe summary shape everywhere it is produced.

  • Deterministic summary contract

    • Added build_deterministic_summary(...) in src/mas/tools/summary_builder.py.
    • Summary is now computed from total_score, total_marks, grade, and per-criterion scores (not from model prose).
    • Output format is stable and single-line, with explicit score breakdown.
  • Replaced LLM-paragraph extraction

    • Removed “first prose paragraph” summary extraction logic from:
      • src/mas/agents/report.py
      • src/mas/agents/finalize.py
    • Both agents now call the shared deterministic builder, ensuring identical summary semantics in sync and finalize paths.
  • Behavioral coverage updates

    • Updated report/finalize tests to assert deterministic summary output instead of markdown paragraph heuristics.
    • Added dedicated unit tests in tests/test_summary_builder.py for:
      • full criterion breakdown output
      • empty-criteria fallback behavior
  • Docs alignment

    • Updated README API response example to reflect deterministic summary format.
summary = build_deterministic_summary(
    scored_criteria=[
        {"name": "Definition of Containerisation", "score": 0, "max_score": 5},
        {"name": "Benefits of Containerisation", "score": 3, "max_score": 5},
        {"name": "Role of Kubernetes", "score": 1, "max_score": 5},
        {"name": "Technical Accuracy and Depth", "score": 2, "max_score": 5},
    ],
    total_score=6,
    total_marks=20,
    grade="F",
)
# "Total score: 6/20 (30.00%), grade F. Breakdown: Definition of Containerisation: 0/5; Benefits of Containerisation: 3/5; Role of Kubernetes: 1/5; Technical Accuracy and Depth: 2/5."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants