Skip to content

kelu-look/lean-proof-debugger

Repository files navigation

Lean Proof Debugger

Understand why Lean theorem proving runs succeed or fail.

A local-first debugging and analysis toolkit for Lean 4 proof attempts — not a new prover, but a post-hoc diagnostic layer on top of existing provers like ReProver, COPRA, and GPT-f.

PyPI version Python CI Tests License: MIT


What it does

Instead of just running a prover and getting a pass/fail, this toolkit helps you answer:

  • Why did this proof attempt fail?
  • Which tactic step went wrong, and why?
  • Were the retrieved premises actually useful?
  • How does one model's proof search differ from another's?
  • How hard is this theorem relative to others in the benchmark?

Core features

Feature Description
Parsers Ingest LeanDojo JSON, ReProver search logs, ProofNet/miniF2F, raw tactic logs
Proof Tree Visualize tactic trees, dead ends, and the successful path
Premise Inspector Per-step retrieval recall, miss highlighting
Failure Classifier Automatic labeling: premise_retrieval_miss, tactic_dead_end, verifier_type_error, timeout, wrong_formalization
Run Comparison Side-by-side diff of multiple proof attempts on the same theorem
Embedding Analysis Proof state similarity, theorem difficulty scores (optional)

Quickstart

Install

git clone https://github.com/kelu-look/lean-proof-debugger.git
cd lean-proof-debugger
pip install -r requirements.txt

Run the demo

python -m demo.run_demo

Launch the web UI

streamlit run app/streamlit_app.py

CLI

# Parse a trace file
lean-debug parse demo/traces/leandojo_sample.json

# Classify failures
lean-debug classify demo/traces/leandojo_sample.json

# Show proof tree
lean-debug tree demo/traces/leandojo_sample.json

# Show premise retrieval
lean-debug premises demo/traces/leandojo_sample.json --k 5

# Compare two runs
lean-debug compare run1.json run2.json

# Full report
lean-debug report demo/traces/leandojo_sample.json --output reports/

Failure labels

Label Trigger
success Proof closed
timeout Metadata or error message indicates timeout
verifier_type_error Lean type-checker error in error_message
premise_retrieval_miss Miss rate > 50% or all steps have empty retrieved premises
tactic_dead_end Tree has dead-end nodes or last tactic made no progress
wrong_formalization unknown constant / sorry / axiom in error messages
unknown No pattern matched

Supported input formats

Format Source Key fields
leandojo LeanDojo steps[].state_before, retrieved_premises
reprover ReProver search_tree[].node_id, parent_id
proofnet ProofNet / miniF2F problem, proof
raw Any tactic log raw_log string

Publishing a release

# Bump version in pyproject.toml and __init__.py, then:
git tag v0.1.0
git push origin v0.1.0
# GitHub Actions builds and publishes to PyPI automatically via trusted publishing.

Configure PyPI trusted publishing once at pypi.org → Your project → Publishing → Add publisher:

  • Owner: kelu-look, Repo: lean-proof-debugger, Workflow: release.yml, Environment: pypi

Project structure

lean_proof_debugger/
  parsers/         # Pluggable parsers + ProofTrace dataclass
  proof_tree.py    # ProofTree: construction, ASCII + Plotly rendering
  premise_analysis.py   # Per-step retrieval recall
  failure_classifier.py # Rule-based failure labeling
  run_comparison.py     # Multi-run alignment and diff
  embedding_analysis.py # Optional: state embeddings, difficulty scores
  report.py             # Markdown/JSON/HTML report generation

app/
  streamlit_app.py      # 5-page web UI
  pages/                # Home, Proof Tree, Premises, Failure, Comparison, Embeddings

demo/
  traces/               # Sample LeanDojo, ReProver, miniF2F traces
  run_demo.py           # End-to-end demo script

tests/                  # pytest suite (no model downloads required)
cli.py                  # lean-debug CLI (Typer)

Optional: embedding analysis

pip install lean-proof-debugger[embeddings]

Enables the Embedding Explorer page and:

from lean_proof_debugger.embedding_analysis import EmbeddingAnalyzer

analyzer = EmbeddingAnalyzer("all-MiniLM-L6-v2")
vecs = analyzer.encode_states(traces)
vecs_2d = analyzer.project_2d(vecs, method="umap")
score = analyzer.theorem_difficulty_score(trace)

Testing

pip install -r requirements-dev.txt
pytest tests/ -v

All tests use deterministic in-memory fixtures — no model downloads, no live Lean process.


Target benchmarks


Design principles

  • Local-first — no API calls, runs fully offline
  • Format-agnostic — pluggable parsers for any prover output
  • Diagnostics over visualization — every output answers a specific failure question
  • No live Lean required — operates entirely on pre-generated traces

License

MIT

About

Debugging and analysis toolkit for Lean theorem proving runs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors