Improve performance in bigger projects#19632
Draft
RobinMalfait wants to merge 19 commits intomainfrom
Draft
Conversation
Whenever we call `.scan()`, we will use `.scan_sources()` internally to walk the file tree. When we access `.files` or `.globs` we also ensure that we have the data available by calling `.scan_sources`. But now we will prevent a double scan in case we already used `.scan()`. Note: whenever we call `.scan()`, we will traverse the file system again.
Not relevant, and we can re-introduce it if needed.
… in walker filter
If we increase `self.pos` until it exceeds the `usize`, then we have bigger problems... `self.pos + 1` should be safe enough.
We're only storing the `input` and the `pos` such that the Cursor size is much smaller. The `prev`, `curr` and `next` are now methods that compute the values when needed. We also inline those function calls so there is no additional overhead.
This way we don't have to call `.to_vec()` in the default case, which is the majority of the files in a typical project.
We dropped it from the `filter_entry` before because we wanted to introduce `build_parallel`. We had to walk all files anyway, so now we will check the `mtime` before actually extracting candidates from the files.
Accessing the `mtime` of a file has some overhead. When we call `.scan()` (think build mode), then we just scan all the files. There is no need to track `mtime` yet. If we call `.scan()` a second time, then we are in a watcher mode environment. Only at this time do we start tracking `mtimes`. This technically means that we will 1 full scan for the initial scan, the second scan is yet another full scan, but from that point onwards we use the `mtime` information. The biggest benefit is that the initial call stays fast without overhead, which is perfect for a production build.
The walk_parallel is useful and faster, but only in "watch" mode when everything is considered warm. Walk parallel has a 20ms-50ms overhead cost in my testing when I just run a build instead of running in watch mode. So right now we will use a normal walk, and use a parallel walk in watch mode. The walk_parallel is still faster for large codebases, but we don't know that ahead of time unfortunately...
ecdd645 to
e49a695
Compare
|
This looks very promising! I can't wait to see what comes out of this! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR improves the performance of Oxide when scanning large codebases.
The
OxideAPI, looks something like this:The
filesandglobsare used to tell PostCSS, Vite, webpack etc which files to watch for changes.The
.scan()operation extracts the candidates from the source files. You can think of these as potential Tailwind CSS classes.In all these scenarios we have to walk the file system and find files that match the
sources.1. Prevent multiple file system walks
The first big win came from the fact that accessing
.filesafter a.scan()also does an entire walk of the file system (for the givensources), which is unnecessary because we just walked the file system.This is something that's not really an issue in smaller codebases because we have
mtimetracking. We don't re-scan a file if itsmtimehasn't changed since the last scan. However, in large codebases with thousands of files, even walking the file system to checkmtimes can be expensive.2. Use parallel file system walking
Another big win is to use a parallel file system walker instead of a synchronous one. The big problem here is that the parallel build has 20ms-50ms of overhead which is noticeable on small codebases. We don't really know if you have a small or big codebase ahead of time, so maybe some kind of hint in the future would be useful.
So the solution I settled on right now is to use a synchronous walker for the initial scan, and then switch to a parallel walker for subsequent scans (think dev mode). This gives us the best of both worlds: fast initial scan on small codebases, and fast re-scans on large codebases.
Caveat: if you use the
@tailwindcss/cliwe know exactly which files changed so we can just re-scan those files directly without walking the file system at all. But in@tailwindcss/postcsswe don't know which files changed, so we have to walk the file system to checkmtimes.While this improvement is nice, it resulted in an annoying issue related to
mtimetracking. Since the parallel walker processes files in parallel, themtimewas typed asArc<Mutex<FxHashMap<PathBuf, SystemTime>>>so to avoid locking, I decided to only walk the files here and collect their paths. Then later we check themtimeto know whether to re-scan them or not.Initially I just removed the
mtimetracking altogether. But it did have an impact when actually extracting candidates from those files, so I added it back later.3. Delaying work
I was still a bit annoyed by the fact that we had to track
mtimevalues for every file. This seems like annoying overhead, especially when doing a single build (no dev mode).So the trick I applied here is to only start tracking
mtimevalues after the initial scan.This means that, in dev mode, we would do this:
mtimevalues. This time, we use the parallel walker instead of the synchronous one.mtimehas changedThe trade-off here is that on the second scan we always re-scan all files, even if they haven't changed. Since this typically only happens in dev mode, I think this is an acceptable trade-off especially if the initial build is therefor faster this way.
3. Small wins
There are also a few small wins in here that I would like to mention but that are less significant:
sourcepatterns instead of in every walker filter call.pre_process_inputalways calledcontent.to_vec()which allocates. Instead we now accept an ownedVec<u8>so we don't have to call.to_vec()in the default case (in my testing, this is ~92% of the time in the codebases I checked).Cursorstruct smaller, which is used a lot during candidate extraction.Benchmarks
Now for the fun stuff, the benchmarks!
The code for the benchmarks
tailwindcss.com codebase
In these benchmarks the
PRone is consistently faster thanmain. It's not by a lot but that's mainly because the codebase itself isn't that big. It is a codebase with a lot of candidates though, but not that many files.The candidate extraction was already pretty fast, so the wins here mainly come from avoiding re-walking the file system when accessing
.files, and from delayingmtimetracking until after the initial scan.Single initial build:
It's not a lot, but it's a bit faster. This is due to avoiding tracking the
mtimevalues initially and making some small optimizations related to the struct size and allocations.Single initial build + accessing
.files:We don't have to re-walk the entire file system even if we're just dealing with ~462 scanned files.
Watch/dev mode, only scanning:
This now switches to the parallel walker, but since it's not a super big codebase we don't see a huge win here yet.
Watch/dev mode, scanning + accessing
.files:Again we avoid re-walking the entire file system when accessing
.files.Synthetic 5000 files codebase
Based on the instructions from #19616 I created a codebase with 5000 files. Each file contains a
flexclass and a unique class likecontent-['/path/to/file']to ensure we have a decent amount of unique candidates.You can test the script yourself by running this:
Single initial build:
As expected not a super big win here because it's a single build. But there is a noticeable improvement.
Single initial build + accessing
.files:Now things are getting interesting. Almost a 2x speedup by avoiding re-walking the file system when accessing
.files.Watch/dev mode, only scanning:
This is where we see bigger wins because now we're using the parallel walker.
Watch/dev mode, scanning + accessing
.files:This is the biggest win of them all because we have all the benefits combined:
.filesTest plan
Fixes: #19616