Cache hook state to skip redundant pyenv sh-activate calls#523
Cache hook state to skip redundant pyenv sh-activate calls#523jakelodwick wants to merge 2 commits intopyenv:masterfrom
Conversation
Track five values between prompts ($PWD, $PYENV_VERSION, .python-version content, $PYENV_ROOT/version content, $VIRTUAL_ENV) using shell builtins. When all match, return immediately without forking. Covers cd, pyenv shell/local/global, and manual venv activate/deactivate. Implemented for bash/zsh/ksh (shared POSIX path) and fish.
b01b088 to
bb9a634
Compare
There was a problem hiding this comment.
Good for a start!
The "limitations" you've outlined are exactly why we haven't done something like this yet! Their complexity has intimidated all the prospective contributors before you! And a stale cache would be a bug -- so we'd have to revert any fundamentally flawed solution in order to fix it.
I think we actually can check for on-disk changes with absolute minimum overhead -- by caching mtimes of the active .pyenv-version file, and of any directories without a .pyenv-version file leading up to it. Then all the disk activity we'll have to do at a prompt is up to a few stat calls.
- As you can see, this makes it unnecessary to actually read the files each time
- Moreover, if the current version is set by a higher-priority mechanism (
shell -> local -> global) -- we don't have to check for changes in the lower-priority mechanisms at all! statis not a builtin. It is possible to do with thetest -ntbuiltin -- but then we'd have to create a marker file (which will be per-shell-session) and somehow maintain it. We can create our own builtin if process spawns are really as big of a deal as you make it look...
bin/pyenv-virtualenv-init
Outdated
| if test -f "\$PWD/.python-version" | ||
| read -z pvh_local < "\$PWD/.python-version" 2>/dev/null; or true | ||
| end | ||
| set -l pvh_global "" | ||
| if test -f "\$PYENV_ROOT/version" | ||
| read -z pvh_global < "\$PYENV_ROOT/version" 2>/dev/null; or true | ||
| end |
There was a problem hiding this comment.
As you probably guessed, this duplicates pyenv version logic.
If we leave it as a separate logic, we must somehow make sure that it remains in lockstep with the subcommand proper.
The most realistic way I see is to create a shared sourced file in Pyenv with a function that checks "if the active version has changed". It can very well read and write the cache as well. Then we add some tests to Pyenv to make sure that it properly resets the cache whenever we change the active version via any of the real subcommands (in an opaque way so as to not depend on their implementation details).
- This way, whenever anything in the real subcommands change, we'll automatically verify that the caching logic remains sound!
- Moreover,
pyenv versionproper can make use of this shared function, too, thus also taking advantage of the caching!!
There was a problem hiding this comment.
The most realistic way I see is to create a shared sourced file in Pyenv with a function that checks "if the active version has changed".
To optimize the effort, we can leave this logic here until we have at least some working solution -- and refactor it out afterwards as a separate step.
So you can save yourself from worrying about this for now!
|
P.S. since this is a highly requested change, you are eligible for a payout (non-taxable) from donated money upon completion if you're interested! |
Replace content reads ($(< .python-version), which fork a subshell in bash) with test -nt against a per-session marker file (shell builtin, zero forks). Walk the full directory tree from $PWD to / checking .python-version at each level, matching pyenv's "closest wins" resolution order. Check directory mtimes to detect file creation/deletion. Check $PYENV_ROOT/version only when no local .python-version exists anywhere in the tree. Skip all disk checks when $PYENV_VERSION is set (shell priority).
|
Thanks for the thorough review. The detail on mtime caching and the priority chain was exactly what I needed to get this right. Revised commit addresses all three points:
Cache variables simplified from five to three ( Shared sourced file in pyenv proper left for a follow-up, per your suggestion. |
| set -g _PYENV_VH_PWD "\$PWD" | ||
| set -g _PYENV_VH_VERSION "\$PYENV_VERSION" | ||
| set -g _PYENV_VH_VENV "\$VIRTUAL_ENV" | ||
| set -g _PYENV_VH_MARKER "\$PYENV_ROOT/.pyenv-vh-marker-\$fish_pid" |
There was a problem hiding this comment.
How are we going to maintain these marker files? We have to delete them at some point, otherwise they'll keep accumulating. I don't have an idea of a reliable way atm, that's why I wrote "somehow".
Is calling stat really that bad? It accepts multiple arguments so a single call would be enough regardless of how many entries we have to check.
Then, it's easy to check for changes by simply comparing outputs:
LOCAL_VERSION_PATHS=<paths to .python-version and dirs if any; can be an array>
SAVED_MTIMES="$(stat -c %Y $LOCAL_VERSION_PATHS)"
<...>
if [[ "$(stat -c %Y $LOCAL_VERSION_PATHS)" != "$SAVED_MTIMES" ]]; then <the cache is stale>; fi|
I'm currently busy with other RL stuff so I may be replying with a delay. Sorry about that. |
Problem
_pyenv_virtualenv_hookcallspyenv sh-activate --quieton every prompt, spawning ~10 subprocesses through the pyenv dispatcher regardless of whether anything has changed. On macOS this adds ~200ms of latency per keystroke+Enter (#259, #490, #338).Approach
Make the hook detect "nothing changed" and short-circuit before forking, rather than skipping the check entirely (the approach that #456 took, which broke
pyenv local).The hook now caches five values using shell builtins (zero forks):
$PWDcd)$PYENV_VERSIONpyenv shellchanges$PWD/.python-versioncontentpyenv localchanges$PYENV_ROOT/versioncontentpyenv globalchanges$VIRTUAL_ENVWhen all five match their cached values, the hook returns immediately. When any value changes, the full
pyenv sh-activatepath runs and the cache is refreshed.Limitation
.python-versionfiles in parent directories are not tracked. pyenv walks up the directory tree to find.python-version; this cache only reads the file in$PWD. If a parent directory's file changes while the user is in a subdirectory, the nextcdtriggers a full recheck. Walking up directories from the hook would replicate pyenv's version resolution logic.Measurement
Test plan
Acknowledgments
This approach builds on prior work:
$PWDonly; this one extends the cache to five keys to coverpyenv local,pyenv global,pyenv shell, and manual venv changes.$PWDand$PYENV_VERSIONas the key signals.