perf: avoid re-exporting imported doc expanders#854
Open
ejgallego wants to merge 1 commit into
Open
Conversation
I was observing some very large `.olean` files in Verso Blueprint, which did not make a lot of sense. For example, we have a file that imports all the test blueprints; its `.olean` was going above 10 GiB. After some investigation, I found the culprit: doc expanders and signature environment extension state were re-exported, which led to exponential blowups in `.olean` size. Based on a quick Codex query, there may be some more cases like this in Lean and Verso. This patch stores doc expander and signature extension state as separate local and complete maps. We only export local entries while keeping lookup over the complete imported state, so importer modules no longer reserialize every transitive expander entry. We have added a transitive-import regression covering role, code block, directive, and block-command expanders. ## Measurements For blueprints, the size reduction is dramatic (on the order of 100x or more). Some more benchmarks thanks to Codex: Reference manual at `8ae381a3`, Lean `v4.30.0-rc2`, command `lake --no-cache build Manual`, baseline Verso `ba73c230`: | Build | Elapsed | Max RSS | | --- | ---: | ---: | | baseline run 1 | 157.13s | 2,518,464 KB | | baseline run 2 | 155.61s | 2,525,236 KB | | patched | 144.25s | 2,537,208 KB | Compared reference-manual artifacts: - total artifacts: `637,348,589 B -> 599,759,959 B`, `-37,588,630 B` / about `-35.8 MiB` - `.olean` total: `546,283,272 B -> 508,649,832 B`, `-37,633,440 B` Synthetic importer test, `Manual.PerfImportPair` importing `Manual.Terms` and `Manual.Tactics.Reference`: - Lake-built `.olean`: `375,456 B -> 17,072 B`, `-95.5%` - direct Lean `.olean`: `375,392 B -> 17,008 B` - direct compile time stayed flat: `1.66s -> 1.65s` UsersGuide smoke measurement: - elapsed time effectively flat: `56.59s -> 56.84s` - compared `.olean` artifacts dropped by about `432 KiB`
Contributor
|
Preview for this PR is ready! 🎉 |
Contributor
Author
|
More generally, I wonder if it would be possible to improve Lean's upstream API to capture this use case in a more principled way. |
Collaborator
|
Nice catch! |
david-christiansen
approved these changes
May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I was observing some very large
.oleanfiles in Verso Blueprint, which did not make a lot of sense. For example, we have a file that imports all the test blueprints; its.oleanwas going above 10 GiB.After some investigation, I found the culprit: doc expanders and signature environment extension state were re-exported, which led to exponential blowups in
.oleansize.Based on a quick Codex query, there may be some more cases like this in Lean and Verso.
This patch stores doc expander and signature extension state as separate local and complete maps.
We only export local entries while keeping lookup over the complete imported state, so importer modules no longer reserialize every transitive expander entry.
We have added a transitive-import regression covering role, code block, directive, and block-command expanders.
Measurements
For blueprints, the size reduction is dramatic (on the order of 100x or more).
Some more benchmarks thanks to Codex:
Reference manual at
8ae381a3, Leanv4.30.0-rc2, commandlake --no-cache build Manual, baseline Versoba73c230:Compared reference-manual artifacts:
637,348,589 B -> 599,759,959 B,-37,588,630 B/ about-35.8 MiB.oleantotal:546,283,272 B -> 508,649,832 B,-37,633,440 BSynthetic importer test,
Manual.PerfImportPairimportingManual.TermsandManual.Tactics.Reference:.olean:375,456 B -> 17,072 B,-95.5%.olean:375,392 B -> 17,008 B1.66s -> 1.65sUsersGuide smoke measurement:
56.59s -> 56.84s.oleanartifacts dropped by about432 KiB