Performance is currently dominated by the git interface. Pegit queries git repos through git CLI plumbing commands rather than manipulating them directly, because it's the safest and simplest way. For a Pegit command invocation in moderately sized repos this can add up to hundreds of ms on Linux, and tens of thousands of ms on Windows Subsystem for Linux.
This makes Pegit command responsiveness less than ideal on Linux (queries aught to be perceptibly instant), but almost unusable on Windows.
Git commands are called via child_process.execSync which will spawn a new process for each command. For each Pegit command there are roughly treePaths * (4 + treeRefs * 4) calls to git, depending on the repo this can result in anywhere from tens to thousands of calls. Almost all time is spent setting up child processes and the shell, command execution time is negligible.
Observations
Dominated by child process overhead
For large numbers of git commands the overhead of setting up a child process in nodejs is the dominant factor. This is easy to prove looping over 1k of execSync('echo') vs execSync('echo &&'.repeat(1000)). The command execution time is negligible for git also.
Synchronous calls not significant factor
All git commands are called synchronously, even though some portion have the potential to be called asynchronously with some added complexity. However due to the dominant factor being child process overhead this doesn't actually help, in fact async appears to incur greater overhead in my tests.
Possible Solutions
Nodejs bindings for git
This is the conceptually ideal solution, a solid and reliable library like libgit2 should be used, dispensing with the git CLI all together. Unfortunately all of the current npm offerings for nodejs bindings are in various states of disarray. The official nodegit doesn't even install cleanly, frankly I don't trust any of them enough to depend on right now even if I could get them working.
Bundle commands
This is the conceptually worst solution. Independent commands could be bundled together into a dynamic shell script that returns a json response with a JS interface to separate the stdout of each and a callback for each. This wouldn't cover all commands (some are dependent with some JS in between), it may also make errors hard to handle properly, overall this just feels a bit hacky.
Persistent process
This would be a decent drop in with minimal refactoring necessary, or none if commands can remain synchronous. If a persistent process with a persistent shell can be created, all commands can be fed to it one at a time (the security implications of a shared shell are not an issue for this use case). There is one example I can find of this stateful-process-command-proxy although it has an excessively layered source (to be kind) and i'd prefer to find the meat of implementation than depend on it.
Performance is currently dominated by the git interface. Pegit queries git repos through git CLI plumbing commands rather than manipulating them directly, because it's the safest and simplest way. For a Pegit command invocation in moderately sized repos this can add up to hundreds of ms on Linux, and tens of thousands of ms on Windows Subsystem for Linux.
This makes Pegit command responsiveness less than ideal on Linux (queries aught to be perceptibly instant), but almost unusable on Windows.
Git commands are called via
child_process.execSyncwhich will spawn a new process for each command. For each Pegit command there are roughlytreePaths * (4 + treeRefs * 4)calls to git, depending on the repo this can result in anywhere from tens to thousands of calls. Almost all time is spent setting up child processes and the shell, command execution time is negligible.Observations
Dominated by child process overhead
For large numbers of git commands the overhead of setting up a child process in nodejs is the dominant factor. This is easy to prove looping over 1k of
execSync('echo')vsexecSync('echo &&'.repeat(1000)). The command execution time is negligible for git also.Synchronous calls not significant factor
All git commands are called synchronously, even though some portion have the potential to be called asynchronously with some added complexity. However due to the dominant factor being child process overhead this doesn't actually help, in fact async appears to incur greater overhead in my tests.
Possible Solutions
Nodejs bindings for gitThis is the conceptually ideal solution, a solid and reliable library like
libgit2should be used, dispensing with the git CLI all together. Unfortunately all of the current npm offerings for nodejs bindings are in various states of disarray. The official nodegit doesn't even install cleanly, frankly I don't trust any of them enough to depend on right now even if I could get them working.Bundle commands
This is the conceptually worst solution. Independent commands could be bundled together into a dynamic shell script that returns a json response with a JS interface to separate the stdout of each and a callback for each. This wouldn't cover all commands (some are dependent with some JS in between), it may also make errors hard to handle properly, overall this just feels a bit hacky.
Persistent process
This would be a decent drop in with minimal refactoring necessary, or none if commands can remain synchronous. If a persistent process with a persistent shell can be created, all commands can be fed to it one at a time (the security implications of a shared shell are not an issue for this use case). There is one example I can find of this
stateful-process-command-proxyalthough it has an excessively layered source (to be kind) and i'd prefer to find the meat of implementation than depend on it.