Optimisation 2 by hadley · Pull Request #1837 · r-lib/roxygen2

hadley · 2026-03-20T13:13:33Z

Your goal is to make this package faster using the debrief package to analyze the performance identify bottlenecks. Begin by running debrief on roxygenize(load_code = "installed"). Identify the code paths that take the longest amount of time and brainstorm possible optimizations. Write down a plan for each one optimization-plan.md.

Next, work systematically through each bottleneck, creating a microbenchmark to measure the performance before and after your proposed change. You MUST NOT change the behaviour of the code; if a tests fails the experiment is a failure.
Record your results in optimization-results.md. Each experiment should be a commit: if the experiment is successful check in the code and optimization-results.md; if the experiment does not yield a meaningful improvement, just check in the optimization-results.md with the results and your analysis.

Add a fast grepl() pre-check to avoid expensive XML parsing for the 99% of tags that don't contain inline R code (`r ...` or ```{...}). Saves ~12ms per roxygenize() run (11x faster for markdown_evaluate). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add fast grepl() pre-check in escape_rd_for_md() to skip expensive regex search and string manipulation when text contains no Rd macros. Saves ~28ms per roxygenize() run (92x faster per call for plain text). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add an environment-based cache for is_s3_generic() that is active only during roxygenize() runs. With 729 calls and 301 unique names (59% redundancy), this saves ~3ms per run. Cache is cleared after each run to avoid cross-environment contamination. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The stringr wrapper adds ~12µs overhead per call for a simple fixed-character split. With 280 calls per run, this saves ~3.4ms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add fast grepl() pre-check in get_md_linkrefs() to avoid expensive str_match_all() regex when text contains no [ character. Saves ~4.5ms per roxygenize() run (205 of 240 calls skip the regex). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

End-to-end roxygenize() on roxygen2: 936ms -> 839ms (10.4% faster). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mcol · 2026-03-25T09:34:01Z

I've tested this quite eagerly on Luminescence, and unfortunately I have to report a minor slow down compared to the current main branch. Measuring with system.time(roxygen2::roxygenise()), user time went from 6.6s to 7s (median from 5 runs each), on a Linux laptop with Intel Ultra 7 and R 4.5.0.

hadley and others added 7 commits March 20, 2026 07:41

Use base strsplit() in find_generic() instead of stringr::str_split()

c23ba73

The stringr wrapper adds ~12µs overhead per call for a simple fixed-character split. With 280 calls per run, this saves ~3.4ms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add overall benchmark results to optimization-results.md

d1ced21

End-to-end roxygenize() on roxygen2: 936ms -> 839ms (10.4% faster). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update docs

20e0973

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimisation 2#1837

Optimisation 2#1837
hadley wants to merge 7 commits intomainfrom
optimisation-2

hadley commented Mar 20, 2026

Uh oh!

mcol commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hadley commented Mar 20, 2026

Uh oh!

mcol commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants