Skip to content

feat(cli): add refusal detection and anti-refusal prompt hardening#16

Merged
markm39 merged 1 commit into
masterfrom
feat/anti-refusal
Mar 30, 2026
Merged

feat(cli): add refusal detection and anti-refusal prompt hardening#16
markm39 merged 1 commit into
masterfrom
feat/anti-refusal

Conversation

@markm39
Copy link
Copy Markdown
Collaborator

@markm39 markm39 commented Mar 30, 2026

Summary

Prevents the LLM from refusing to attempt hard problems (FrontierMath, IMO, competitions).

Prompt hardening:

  • System prompt: "MANDATORY: You MUST attempt every proof. NEVER refuse."
  • API instructions: "NEVER refuse to attempt a proof"
  • Branch agent prompts (Planner, Prover): explicit anti-refusal language

Refusal detection + forced retry:

  • New refusal.rs module with two-tier pattern matching:
    • Strong signals (always trigger): "open problem", "FrontierMath", "cannot be proven", etc.
    • Weak signals (only trigger with no tool calls + no Lean code): "I apologize", "too difficult"
  • When detected, injects a system message forcing the model to write code
  • Max 2 retries before accepting (escalating from constructive to forceful)
  • Applied at both main agentic loop and branch loop break points

Test plan

  • cargo clippy --workspace -- -D warnings
  • cargo test --workspace (32 tests pass, including 7 new refusal tests)
  • Unit tests: strong refusal, weak refusal, tools override, lean code override, normal text

When the LLM recognizes hard problems (FrontierMath, IMO, competitions),
it refuses to attempt them. This adds two defenses:

1. System prompt hardening: explicit instructions to never refuse, never
   discuss problem difficulty/sources, always write Lean code.

2. Refusal detection with forced retry: when the model produces no tool
   calls and its text matches refusal patterns ("open problem", "cannot
   solve", "FrontierMath", etc.), inject a system message forcing it to
   attempt the proof. Max 2 retries before accepting the refusal.

Also fixes additional Rust 1.94 clippy lints (map_or, field assignment).
@markm39 markm39 merged commit 7ce64fb into master Mar 30, 2026
@markm39 markm39 deleted the feat/anti-refusal branch March 30, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant