Skip to content

perf: avoid re-exporting imported doc expanders#854

Open
ejgallego wants to merge 1 commit into
leanprover:mainfrom
ejgallego:perf/manual-patch-test
Open

perf: avoid re-exporting imported doc expanders#854
ejgallego wants to merge 1 commit into
leanprover:mainfrom
ejgallego:perf/manual-patch-test

Conversation

@ejgallego
Copy link
Copy Markdown
Contributor

I was observing some very large .olean files in Verso Blueprint, which did not make a lot of sense. For example, we have a file that imports all the test blueprints; its .olean was going above 10 GiB.

After some investigation, I found the culprit: doc expanders and signature environment extension state were re-exported, which led to exponential blowups in .olean size.

Based on a quick Codex query, there may be some more cases like this in Lean and Verso.

This patch stores doc expander and signature extension state as separate local and complete maps.

We only export local entries while keeping lookup over the complete imported state, so importer modules no longer reserialize every transitive expander entry.

We have added a transitive-import regression covering role, code block, directive, and block-command expanders.

Measurements

For blueprints, the size reduction is dramatic (on the order of 100x or more).

Some more benchmarks thanks to Codex:

Reference manual at 8ae381a3, Lean v4.30.0-rc2, command lake --no-cache build Manual, baseline Verso ba73c230:

Build Elapsed Max RSS
baseline run 1 157.13s 2,518,464 KB
baseline run 2 155.61s 2,525,236 KB
patched 144.25s 2,537,208 KB

Compared reference-manual artifacts:

  • total artifacts: 637,348,589 B -> 599,759,959 B, -37,588,630 B / about -35.8 MiB
  • .olean total: 546,283,272 B -> 508,649,832 B, -37,633,440 B

Synthetic importer test, Manual.PerfImportPair importing Manual.Terms and Manual.Tactics.Reference:

  • Lake-built .olean: 375,456 B -> 17,072 B, -95.5%
  • direct Lean .olean: 375,392 B -> 17,008 B
  • direct compile time stayed flat: 1.66s -> 1.65s

UsersGuide smoke measurement:

  • elapsed time effectively flat: 56.59s -> 56.84s
  • compared .olean artifacts dropped by about 432 KiB

I was observing some very large `.olean` files in Verso Blueprint, which did not make a lot of sense. For example, we have a file that imports all the test blueprints; its `.olean` was going above 10 GiB.

After some investigation, I found the culprit: doc expanders and signature environment extension state were re-exported, which led to exponential blowups in `.olean` size.

Based on a quick Codex query, there may be some more cases like this in Lean and Verso.

This patch stores doc expander and signature extension state as separate local and complete maps.

We only export local entries while keeping lookup over the complete imported state, so importer modules no longer reserialize every transitive expander entry.

We have added a transitive-import regression covering role, code block, directive, and block-command expanders.

## Measurements

For blueprints, the size reduction is dramatic (on the order of 100x or more).

Some more benchmarks thanks to Codex:

Reference manual at `8ae381a3`, Lean `v4.30.0-rc2`, command `lake --no-cache build Manual`, baseline Verso `ba73c230`:

| Build | Elapsed | Max RSS |
| --- | ---: | ---: |
| baseline run 1 | 157.13s | 2,518,464 KB |
| baseline run 2 | 155.61s | 2,525,236 KB |
| patched | 144.25s | 2,537,208 KB |

Compared reference-manual artifacts:
- total artifacts: `637,348,589 B -> 599,759,959 B`, `-37,588,630 B` / about `-35.8 MiB`
- `.olean` total: `546,283,272 B -> 508,649,832 B`, `-37,633,440 B`

Synthetic importer test, `Manual.PerfImportPair` importing `Manual.Terms` and `Manual.Tactics.Reference`:
- Lake-built `.olean`: `375,456 B -> 17,072 B`, `-95.5%`
- direct Lean `.olean`: `375,392 B -> 17,008 B`
- direct compile time stayed flat: `1.66s -> 1.65s`

UsersGuide smoke measurement:
- elapsed time effectively flat: `56.59s -> 56.84s`
- compared `.olean` artifacts dropped by about `432 KiB`
@ejgallego ejgallego marked this pull request as ready for review May 10, 2026 18:45
@github-actions
Copy link
Copy Markdown
Contributor

Preview for this PR is ready! 🎉

@ejgallego
Copy link
Copy Markdown
Contributor Author

More generally, I wonder if it would be possible to improve Lean's upstream API to capture this use case in a more principled way.

@david-christiansen
Copy link
Copy Markdown
Collaborator

Nice catch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants