Skip to content

Optimisation 2#1837

Open
hadley wants to merge 7 commits intomainfrom
optimisation-2
Open

Optimisation 2#1837
hadley wants to merge 7 commits intomainfrom
optimisation-2

Conversation

@hadley
Copy link
Copy Markdown
Member

@hadley hadley commented Mar 20, 2026

Your goal is to make this package faster using the debrief package to analyze the performance identify bottlenecks. Begin by running debrief on roxygenize(load_code = "installed"). Identify the code paths that take the longest amount of time and brainstorm possible optimizations. Write down a plan for each one optimization-plan.md.

Next, work systematically through each bottleneck, creating a microbenchmark to measure the performance before and after your proposed change. You MUST NOT change the behaviour of the code; if a tests fails the experiment is a failure.
Record your results in optimization-results.md. Each experiment should be a commit: if the experiment is successful check in the code and optimization-results.md; if the experiment does not yield a meaningful improvement, just check in the optimization-results.md with the results and your analysis.

hadley and others added 7 commits March 20, 2026 07:41
Add a fast grepl() pre-check to avoid expensive XML parsing for the
99% of tags that don't contain inline R code (`r ...` or ```{...}).
Saves ~12ms per roxygenize() run (11x faster for markdown_evaluate).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add fast grepl() pre-check in escape_rd_for_md() to skip expensive
regex search and string manipulation when text contains no Rd macros.
Saves ~28ms per roxygenize() run (92x faster per call for plain text).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add an environment-based cache for is_s3_generic() that is active only
during roxygenize() runs. With 729 calls and 301 unique names (59%
redundancy), this saves ~3ms per run. Cache is cleared after each run
to avoid cross-environment contamination.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The stringr wrapper adds ~12µs overhead per call for a simple
fixed-character split. With 280 calls per run, this saves ~3.4ms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add fast grepl() pre-check in get_md_linkrefs() to avoid expensive
str_match_all() regex when text contains no [ character. Saves ~4.5ms
per roxygenize() run (205 of 240 calls skip the regex).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
End-to-end roxygenize() on roxygen2: 936ms -> 839ms (10.4% faster).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mcol
Copy link
Copy Markdown
Contributor

mcol commented Mar 25, 2026

I've tested this quite eagerly on Luminescence, and unfortunately I have to report a minor slow down compared to the current main branch. Measuring with system.time(roxygen2::roxygenise()), user time went from 6.6s to 7s (median from 5 runs each), on a Linux laptop with Intel Ultra 7 and R 4.5.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants