Skip to content

Git Interface Performance #1

@ThomasBrierley

Description

@ThomasBrierley

Performance is currently dominated by the git interface. Pegit queries git repos through git CLI plumbing commands rather than manipulating them directly, because it's the safest and simplest way. For a Pegit command invocation in moderately sized repos this can add up to hundreds of ms on Linux, and tens of thousands of ms on Windows Subsystem for Linux.

This makes Pegit command responsiveness less than ideal on Linux (queries aught to be perceptibly instant), but almost unusable on Windows.

Git commands are called via child_process.execSync which will spawn a new process for each command. For each Pegit command there are roughly treePaths * (4 + treeRefs * 4) calls to git, depending on the repo this can result in anywhere from tens to thousands of calls. Almost all time is spent setting up child processes and the shell, command execution time is negligible.

Observations

Dominated by child process overhead

For large numbers of git commands the overhead of setting up a child process in nodejs is the dominant factor. This is easy to prove looping over 1k of execSync('echo') vs execSync('echo &&'.repeat(1000)). The command execution time is negligible for git also.

Synchronous calls not significant factor

All git commands are called synchronously, even though some portion have the potential to be called asynchronously with some added complexity. However due to the dominant factor being child process overhead this doesn't actually help, in fact async appears to incur greater overhead in my tests.

Possible Solutions

Nodejs bindings for git

This is the conceptually ideal solution, a solid and reliable library like libgit2 should be used, dispensing with the git CLI all together. Unfortunately all of the current npm offerings for nodejs bindings are in various states of disarray. The official nodegit doesn't even install cleanly, frankly I don't trust any of them enough to depend on right now even if I could get them working.

Bundle commands

This is the conceptually worst solution. Independent commands could be bundled together into a dynamic shell script that returns a json response with a JS interface to separate the stdout of each and a callback for each. This wouldn't cover all commands (some are dependent with some JS in between), it may also make errors hard to handle properly, overall this just feels a bit hacky.

Persistent process

This would be a decent drop in with minimal refactoring necessary, or none if commands can remain synchronous. If a persistent process with a persistent shell can be created, all commands can be fed to it one at a time (the security implications of a shared shell are not an issue for this use case). There is one example I can find of this stateful-process-command-proxy although it has an excessively layered source (to be kind) and i'd prefer to find the meat of implementation than depend on it.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions