Lean Proof Debugger

Understand why Lean theorem proving runs succeed or fail.

A local-first debugging and analysis toolkit for Lean 4 proof attempts — not a new prover, but a post-hoc diagnostic layer on top of existing provers like ReProver, COPRA, and GPT-f.

What it does

Instead of just running a prover and getting a pass/fail, this toolkit helps you answer:

Why did this proof attempt fail?
Which tactic step went wrong, and why?
Were the retrieved premises actually useful?
How does one model's proof search differ from another's?
How hard is this theorem relative to others in the benchmark?

Core features

Feature	Description
Parsers	Ingest LeanDojo JSON, ReProver search logs, ProofNet/miniF2F, raw tactic logs
Proof Tree	Visualize tactic trees, dead ends, and the successful path
Premise Inspector	Per-step retrieval recall, miss highlighting
Failure Classifier	Automatic labeling: `premise_retrieval_miss`, `tactic_dead_end`, `verifier_type_error`, `timeout`, `wrong_formalization`
Run Comparison	Side-by-side diff of multiple proof attempts on the same theorem
Embedding Analysis	Proof state similarity, theorem difficulty scores (optional)

Quickstart

Install

git clone https://github.com/kelu-look/lean-proof-debugger.git
cd lean-proof-debugger
pip install -r requirements.txt

Run the demo

python -m demo.run_demo

Launch the web UI

streamlit run app/streamlit_app.py

CLI

# Parse a trace file
lean-debug parse demo/traces/leandojo_sample.json

# Classify failures
lean-debug classify demo/traces/leandojo_sample.json

# Show proof tree
lean-debug tree demo/traces/leandojo_sample.json

# Show premise retrieval
lean-debug premises demo/traces/leandojo_sample.json --k 5

# Compare two runs
lean-debug compare run1.json run2.json

# Full report
lean-debug report demo/traces/leandojo_sample.json --output reports/

Failure labels

Label	Trigger
`success`	Proof closed
`timeout`	Metadata or error message indicates timeout
`verifier_type_error`	Lean type-checker error in `error_message`
`premise_retrieval_miss`	Miss rate > 50% or all steps have empty retrieved premises
`tactic_dead_end`	Tree has dead-end nodes or last tactic made no progress
`wrong_formalization`	`unknown constant` / `sorry` / `axiom` in error messages
`unknown`	No pattern matched

Supported input formats

Format	Source	Key fields
`leandojo`	LeanDojo	`steps[].state_before`, `retrieved_premises`
`reprover`	ReProver	`search_tree[].node_id`, `parent_id`
`proofnet`	ProofNet / miniF2F	`problem`, `proof`
`raw`	Any tactic log	`raw_log` string

Publishing a release

# Bump version in pyproject.toml and __init__.py, then:
git tag v0.1.0
git push origin v0.1.0
# GitHub Actions builds and publishes to PyPI automatically via trusted publishing.

Configure PyPI trusted publishing once at pypi.org → Your project → Publishing → Add publisher:

Owner: kelu-look, Repo: lean-proof-debugger, Workflow: release.yml, Environment: pypi

Project structure

lean_proof_debugger/
  parsers/         # Pluggable parsers + ProofTrace dataclass
  proof_tree.py    # ProofTree: construction, ASCII + Plotly rendering
  premise_analysis.py   # Per-step retrieval recall
  failure_classifier.py # Rule-based failure labeling
  run_comparison.py     # Multi-run alignment and diff
  embedding_analysis.py # Optional: state embeddings, difficulty scores
  report.py             # Markdown/JSON/HTML report generation

app/
  streamlit_app.py      # 5-page web UI
  pages/                # Home, Proof Tree, Premises, Failure, Comparison, Embeddings

demo/
  traces/               # Sample LeanDojo, ReProver, miniF2F traces
  run_demo.py           # End-to-end demo script

tests/                  # pytest suite (no model downloads required)
cli.py                  # lean-debug CLI (Typer)

Optional: embedding analysis

pip install lean-proof-debugger[embeddings]

Enables the Embedding Explorer page and:

from lean_proof_debugger.embedding_analysis import EmbeddingAnalyzer

analyzer = EmbeddingAnalyzer("all-MiniLM-L6-v2")
vecs = analyzer.encode_states(traces)
vecs_2d = analyzer.project_2d(vecs, method="umap")
score = analyzer.theorem_difficulty_score(trace)

Testing

pip install -r requirements-dev.txt
pytest tests/ -v

All tests use deterministic in-memory fixtures — no model downloads, no live Lean process.

Target benchmarks

miniF2F
ProofNet
Real Lean 4 / Mathlib4 repos via LeanDojo

Design principles

Local-first — no API calls, runs fully offline
Format-agnostic — pluggable parsers for any prover output
Diagnostics over visualization — every output answers a specific failure question
No live Lean required — operates entirely on pre-generated traces

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
app		app
demo		demo
lean_proof_debugger		lean_proof_debugger
notebooks		notebooks
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
cli.py		cli.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lean Proof Debugger

What it does

Core features

Quickstart

Install

Run the demo

Launch the web UI

CLI

Failure labels

Supported input formats

Publishing a release

Project structure

Optional: embedding analysis

Testing

Target benchmarks

Design principles

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lean Proof Debugger

What it does

Core features

Quickstart

Install

Run the demo

Launch the web UI

CLI

Failure labels

Supported input formats

Publishing a release

Project structure

Optional: embedding analysis

Testing

Target benchmarks

Design principles

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages