Skip to content

fix(review): mark judge infra failures inconclusive#240

Merged
jacsamell merged 1 commit into
mainfrom
codex/judge-infra-inconclusive
May 26, 2026
Merged

fix(review): mark judge infra failures inconclusive#240
jacsamell merged 1 commit into
mainfrom
codex/judge-infra-inconclusive

Conversation

@leobaldock
Copy link
Copy Markdown
Contributor

@leobaldock leobaldock commented May 26, 2026

Summary

  • Separate judge availability failures from code review verdicts.
  • Post COMMENT with INCONCLUSIVE: judge infrastructure unavailable when available judges approve but expected judges fail or miss.
  • Keep REQUEST_CHANGES only when an available judge actually rejects the code.
  • Apply matching semantics to branch review display and panel summaries.

Root cause

Cube treated missing judge decision files as a review rejection. That made infra failures look like code failures and turned partial approval panels into noisy REQUEST_CHANGES reviews.

Validation

  • pytest -q tests -> 512 passed
  • pytest -q tests/cli tests/core tests/automation -> 511 passed
  • pytest -q tests/cli/test_auto_approve_gate.py tests/cli/test_panel_summary.py tests/cli/test_peer_review_branch_summary.py -> 20 passed
  • pytest -q tests/core/test_judge_panel_skip_approved.py tests/core/test_session_reset_on_stale.py tests/cli/test_auto_approve_gate.py tests/cli/test_panel_summary.py tests/cli/test_peer_review_branch_summary.py -> 61 passed
  • ruff check on touched files -> no issues
  • mypy on touched runtime files -> no issues
  • python -m compileall -q python/cube -> passed
  • git diff --check -> passed

Overview

Distinguishes infrastructure failures from code review rejections by treating missing/failed judge decisions as availability issues rather than code rejections. When expected judges are unavailable but available judges approve, the system posts an INCONCLUSIVE comment instead of REQUEST_CHANGES.

Key Changes

Judge Infrastructure Handling

  • Added "unavailable" status to JudgeRunStatus type
  • Treats failed, missing, and unavailable states uniformly in status rendering
  • Panel summary now displays these as UNAVAILABLE (FAILED), UNAVAILABLE (MISSING), etc.

Review Verdict Logic

  • New helper _available_review_rejected() distinguishes actual code rejections from infrastructure failures
  • Auto-approve gate now returns INCONCLUSIVE (COMMENT) when expected judges are missing but no available judge has rejected
  • REQUEST_CHANGES is only returned when an available judge explicitly rejects the code
  • Missing judge details are included in inconclusive comments

Branch Review Alignment

  • Refactored branch review decision logic into _branch_review_decision_and_summary() helper
  • Applies consistent inconclusive semantics to branch and PR review paths
  • Missing judge information now rendered consistently across both flows

Panel Summary Display

  • New _is_unavailable_result() helper centralises unavailable state detection
  • Unavailable rows display error/log information when available
  • Recovery commands shown only for unavailable rows that have recovery options
  • Gate status now includes "inconclusive" branch for availability issues

Testing

  • Updated 3 auto-approve gate tests to expect INCONCLUSIVE behaviour
  • Updated 2 panel summary tests for new unavailable rendering
  • Added 2 new branch review tests for inconclusive/rejection scenarios
  • All 512 pytest tests passing; mypy and ruff checks clean

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 32cd3dd3-2b82-495f-833d-c2817472040e

📥 Commits

Reviewing files that changed from the base of the PR and between 313928e and 1fa736c.

📒 Files selected for processing (7)
  • python/cube/automation/judge_panel.py
  • python/cube/commands/auto_approve.py
  • python/cube/commands/peer_review.py
  • python/cube/models/types.py
  • tests/cli/test_auto_approve_gate.py
  • tests/cli/test_panel_summary.py
  • tests/cli/test_peer_review_branch_summary.py
📜 Recent review details
🔇 Additional comments (7)
python/cube/models/types.py (1)

57-57: LGTM!

python/cube/automation/judge_panel.py (1)

631-635: LGTM!

Also applies to: 644-646, 665-666, 698-700, 707-709, 720-725

python/cube/commands/auto_approve.py (1)

30-40: LGTM!

Also applies to: 191-223, 228-233

python/cube/commands/peer_review.py (1)

511-547: LGTM!

Also applies to: 799-799, 812-814, 821-825, 965-976, 1001-1004, 1211-1220

tests/cli/test_auto_approve_gate.py (1)

63-94: LGTM!

Also applies to: 118-123, 146-147, 257-283

tests/cli/test_panel_summary.py (1)

140-143: LGTM!

Also applies to: 148-159, 189-192, 197-197, 281-281

tests/cli/test_peer_review_branch_summary.py (1)

1-49: LGTM!


Walkthrough

This PR unifies the treatment of "failed", "missing", and "unavailable" judge statuses as unavailable states throughout the judge panel rendering and decision logic. A new "unavailable" value is added to the JudgeRunStatus type. The judge panel now displays failed and missing judges using the label UNAVAILABLE (FAILED) and UNAVAILABLE (MISSING) respectively, and a centralised helper classifies unavailable rows. The gate block documentation is expanded to include "inconclusive" status. When expected judges are missing, the auto-approve gate and peer review logic now distinguish between infrastructure unavailability (missing judges) and actual code verdicts (available judges rejecting), returning an INCONCLUSIVE COMMENT only when no available judge has rejected, otherwise returning REQUEST_CHANGES with an availability warning.

Possibly related PRs

  • aetheronhq/agent-cube#196: Updates peer review logic for missing judge detection and verdict derivation based on expected panel size, with corresponding downstream decision text modifications and helper refactoring.
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main objective: distinguishing judge infrastructure failures from code review verdicts and marking them as inconclusive.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@leobaldock leobaldock marked this pull request as ready for review May 26, 2026 14:10
@jacsamell jacsamell merged commit f09aaf0 into main May 26, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants