Understand why Lean theorem proving runs succeed or fail.
A local-first debugging and analysis toolkit for Lean 4 proof attempts — not a new prover, but a post-hoc diagnostic layer on top of existing provers like ReProver, COPRA, and GPT-f.
Instead of just running a prover and getting a pass/fail, this toolkit helps you answer:
- Why did this proof attempt fail?
- Which tactic step went wrong, and why?
- Were the retrieved premises actually useful?
- How does one model's proof search differ from another's?
- How hard is this theorem relative to others in the benchmark?
| Feature | Description |
|---|---|
| Parsers | Ingest LeanDojo JSON, ReProver search logs, ProofNet/miniF2F, raw tactic logs |
| Proof Tree | Visualize tactic trees, dead ends, and the successful path |
| Premise Inspector | Per-step retrieval recall, miss highlighting |
| Failure Classifier | Automatic labeling: premise_retrieval_miss, tactic_dead_end, verifier_type_error, timeout, wrong_formalization |
| Run Comparison | Side-by-side diff of multiple proof attempts on the same theorem |
| Embedding Analysis | Proof state similarity, theorem difficulty scores (optional) |
git clone https://github.com/kelu-look/lean-proof-debugger.git
cd lean-proof-debugger
pip install -r requirements.txtpython -m demo.run_demostreamlit run app/streamlit_app.py# Parse a trace file
lean-debug parse demo/traces/leandojo_sample.json
# Classify failures
lean-debug classify demo/traces/leandojo_sample.json
# Show proof tree
lean-debug tree demo/traces/leandojo_sample.json
# Show premise retrieval
lean-debug premises demo/traces/leandojo_sample.json --k 5
# Compare two runs
lean-debug compare run1.json run2.json
# Full report
lean-debug report demo/traces/leandojo_sample.json --output reports/| Label | Trigger |
|---|---|
success |
Proof closed |
timeout |
Metadata or error message indicates timeout |
verifier_type_error |
Lean type-checker error in error_message |
premise_retrieval_miss |
Miss rate > 50% or all steps have empty retrieved premises |
tactic_dead_end |
Tree has dead-end nodes or last tactic made no progress |
wrong_formalization |
unknown constant / sorry / axiom in error messages |
unknown |
No pattern matched |
| Format | Source | Key fields |
|---|---|---|
leandojo |
LeanDojo | steps[].state_before, retrieved_premises |
reprover |
ReProver | search_tree[].node_id, parent_id |
proofnet |
ProofNet / miniF2F | problem, proof |
raw |
Any tactic log | raw_log string |
# Bump version in pyproject.toml and __init__.py, then:
git tag v0.1.0
git push origin v0.1.0
# GitHub Actions builds and publishes to PyPI automatically via trusted publishing.Configure PyPI trusted publishing once at pypi.org → Your project → Publishing → Add publisher:
- Owner:
kelu-look, Repo:lean-proof-debugger, Workflow:release.yml, Environment:pypi
lean_proof_debugger/
parsers/ # Pluggable parsers + ProofTrace dataclass
proof_tree.py # ProofTree: construction, ASCII + Plotly rendering
premise_analysis.py # Per-step retrieval recall
failure_classifier.py # Rule-based failure labeling
run_comparison.py # Multi-run alignment and diff
embedding_analysis.py # Optional: state embeddings, difficulty scores
report.py # Markdown/JSON/HTML report generation
app/
streamlit_app.py # 5-page web UI
pages/ # Home, Proof Tree, Premises, Failure, Comparison, Embeddings
demo/
traces/ # Sample LeanDojo, ReProver, miniF2F traces
run_demo.py # End-to-end demo script
tests/ # pytest suite (no model downloads required)
cli.py # lean-debug CLI (Typer)
pip install lean-proof-debugger[embeddings]Enables the Embedding Explorer page and:
from lean_proof_debugger.embedding_analysis import EmbeddingAnalyzer
analyzer = EmbeddingAnalyzer("all-MiniLM-L6-v2")
vecs = analyzer.encode_states(traces)
vecs_2d = analyzer.project_2d(vecs, method="umap")
score = analyzer.theorem_difficulty_score(trace)pip install -r requirements-dev.txt
pytest tests/ -vAll tests use deterministic in-memory fixtures — no model downloads, no live Lean process.
- Local-first — no API calls, runs fully offline
- Format-agnostic — pluggable parsers for any prover output
- Diagnostics over visualization — every output answers a specific failure question
- No live Lean required — operates entirely on pre-generated traces
MIT