Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions R/markdown-code.R
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@
#' @keywords internal
markdown_evaluate <- function(text) {
text <- paste(text, collapse = "\n")
if (!grepl("`r ", text, fixed = TRUE) && !grepl("```{", text, fixed = TRUE)) {
return(text)
}
mdxml <- xml_ns_strip(md_to_mdxml(text, sourcepos = TRUE))
code_nodes <- xml_find_all(mdxml, ".//code | .//code_block")
rcode_nodes <- keep(code_nodes, is_markdown_code_node)
Expand Down
11 changes: 11 additions & 0 deletions R/markdown-escaping.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,18 @@
#' * `unescape_rd_for_md`: the original Rd text.
#' @rdname markdown-internals
#' @keywords internal
empty_rd_tags <- data.frame(
tag = character(),
start = integer(),
end = integer(),
argend = integer()
)

escape_rd_for_md <- function(text) {
if (!grepl("\\", text, fixed = TRUE)) {
attr(text, "roxygen-markdown-subst") <- list(tags = empty_rd_tags, id = "")
return(text)
}
rd_tags <- find_fragile_rd_tags(text, escaped_for_md)
protected <- protect_rd_tags(text, rd_tags)
double_escape_md(protected)
Expand Down
3 changes: 3 additions & 0 deletions R/markdown-link.R
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ add_linkrefs_to_md <- function(text) {
}

get_md_linkrefs <- function(text) {
if (!grepl("[", text, fixed = TRUE)) {
return(character())
}
refs <- str_match_all(
text,
regex(
Expand Down
31 changes: 30 additions & 1 deletion R/object-s3.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
s3_generic_cache <- new_environment()

s3_generic_cache_reset <- function() {
env_unbind(s3_generic_cache, env_names(s3_generic_cache))
s3_generic_cache$active <- TRUE
}

s3_generic_cache_clear <- function() {
env_unbind(s3_generic_cache, env_names(s3_generic_cache))
}

#' Determine if a function is an S3 generic or S3 method
#'
#' @description
Expand All @@ -16,6 +27,24 @@ is_s3_generic <- function(name, env = parent.frame()) {
if (name == "") {
return(FALSE)
}

if (isTRUE(s3_generic_cache$active)) {
cached <- env_get(s3_generic_cache, name, default = NULL)
if (!is.null(cached)) {
return(cached)
}
}

result <- is_s3_generic_impl(name, env)

if (isTRUE(s3_generic_cache$active)) {
env_poke(s3_generic_cache, name, result)
}

result
}

is_s3_generic_impl <- function(name, env) {
if (!exists(name, envir = env, mode = "function")) {
return(FALSE)
}
Expand Down Expand Up @@ -86,7 +115,7 @@ find_generic <- function(name, env = parent.frame()) {
return(c("all.equal", method))
}

pieces <- str_split(name, fixed("."))[[1]]
pieces <- strsplit(name, ".", fixed = TRUE)[[1]]
n <- length(pieces)

# No . in name, so can't be method
Expand Down
2 changes: 2 additions & 0 deletions R/roxygenize.R
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ roxygenize <- function(
is_first <- roxygen_setup(base_path)

find_package_cache_reset()
s3_generic_cache_reset()
withr::defer(s3_generic_cache_clear())
roxy_meta_load(base_path)
# Load required packages for method registration
packages <- roxy_meta_get("packages")
Expand Down
12 changes: 6 additions & 6 deletions man/markdown-internals.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

63 changes: 63 additions & 0 deletions optimization-plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Optimization plan

## Profiling summary

Profiled `roxygenize(load_code = "source")` on the roxygen2 package itself (5 iterations, ~2.9s total, interval = 5ms). Key bottlenecks by total time:

| Area | % of total | Key functions |
|------|-----------|---------------|
| Tag parsing / markdown | 23% | `parse_tags` → `markdown_if_active` → `markdown` |
| Markdown pass 2 (md→Rd) | 8% | `markdown_pass2` → `mdxml_children_to_rd_top` |
| CLI warnings | 8.5% | `warn_roxy` → `cli::cli_inform` |
| S3 generic detection | 12% | `block_find_object` → `is_s3_generic` / `find_generic` |
| R6 method extraction | 10% | `r6_extract_methods` → `warn_roxy_tag` (76% of its time!) |

## Optimization experiments

### Experiment 1: Cache `is_s3_generic()` results

**Bottleneck:** `find_generic()` calls `is_s3_generic()` in a loop over dot-split pieces of every function name. `is_s3_generic()` does expensive work: `exists()`, `get()`, `tryCatch(getNamespaceName(...))`, `body()`, and recursive `calls_use_method()`. The same generic names (e.g., `print`, `format`, `[`) are tested repeatedly across many functions in a package.

**Proposed fix:** Add a simple environment-based cache inside `is_s3_generic()` keyed by `(name, env)`. Clear the cache at the start of each `roxygenize()` call.

**Risk:** Low — pure function of name + env, env doesn't change during a roxygenize run.

### Experiment 2: Cache `methods::isGeneric()` in `find_generic()`

**Bottleneck:** `find_generic()` calls `methods::isGeneric()` for every function name. This is an S4-method lookup that hits the methods infrastructure.

**Proposed fix:** Cache `methods::isGeneric()` results alongside `is_s3_generic()`.

**Risk:** Low — same reasoning as experiment 1.

### Experiment 3: Replace `cli::cli_inform()` with simpler message in `warn_roxy()`

**Bottleneck:** `warn_roxy()` → `cli::cli_inform()` accounts for ~8.5% of total time. Each call involves `cli::style_hyperlink()`, `cli::format_inline()`, and the full cli formatting pipeline including selector matching. When documenting roxygen2 itself, R6 warnings generate many calls.

**Proposed fix:** Use `cli::cli_inform()` only in interactive sessions or batch the warnings. Alternatively, use `warning()` or `message()` with pre-formatted strings to bypass cli overhead.

**Risk:** Medium — changes user-visible output format. Need to be careful to preserve behavior.

### Experiment 4: Avoid `cli::format_inline()` in `warn_roxy_tag()`

**Bottleneck:** `warn_roxy_tag()` calls `cli::format_inline("{.strong @{tag$tag}} ")` for every warning. This is expensive relative to the simple output it produces.

**Proposed fix:** Replace with `paste0("**@", tag$tag, "** ")` or a simpler string formatting approach, since the bold markup is the only cli feature used.

**Risk:** Low — the formatted tag name is just concatenated into the message anyway.

### Experiment 5: Precompute `internal_f("tools", ".get_internal_S3_generics")()`

**Bottleneck:** In `is_s3_generic()`, for primitive functions, `internal_f("tools", ".get_internal_S3_generics")()` is called each time. This fetches and calls an internal function from the tools package.

**Proposed fix:** Compute this once and cache the result (it never changes within a session).

**Risk:** Very low — the tools package internal generics list is static.

### Experiment 6: Optimize `find_generic()` string splitting

**Bottleneck:** `find_generic()` calls `str_split(name, fixed("."))` for every function name being analyzed, and then `paste0(pieces[seq_len(i)], collapse = ".")` in a loop.

**Proposed fix:** Use `strsplit()` (base R) instead of `str_split()` which has stringr overhead. Pre-compute the candidate generic names more efficiently.

**Risk:** Very low — straightforward string operation replacement.
103 changes: 103 additions & 0 deletions optimization-results.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Optimization results

## Overall results

End-to-end `roxygenize(load_code = "source")` on the roxygen2 package itself (10 iterations, `devtools::load_all()` in a clean R subprocess):

| Version | min | median | Improvement |
|---------|-----|--------|-------------|
| Before (main) | 863ms | 936ms | — |
| After (all experiments) | 805ms | 839ms | 10.4% faster |

The optimizations are all fast-path short-circuits that avoid expensive work when it's not needed. They have the largest impact on packages where most roxygen tags contain simple text without inline R code, Rd macros, or markdown links (which is the common case).

## Experiment 1: Skip XML parsing in `markdown_evaluate()` for text without inline R code

**Hypothesis:** `markdown_evaluate()` parses every tag's text to XML to check for inline R code (`\`r ...\``), but the vast majority of tags don't contain any. A fast `grepl()` pre-check can skip the expensive XML parsing.

**Data:** Of 240 calls to `markdown_evaluate()` during `roxygenize()` on roxygen2, only 2 contain inline R code (238 are pure text).

**Change:** Added early return in `markdown_evaluate()` when text doesn't contain `` `r `` or `` ```{ `` patterns.

**Microbenchmark (240 typical calls, no inline R code):**

| Version | min | median |
|---------|-----|--------|
| Before | 13.23ms | 13.47ms |
| After | 1.01ms | 1.18ms |

**Result:** 11x faster for `markdown_evaluate()`, saving ~12ms per `roxygenize()` run. All 1105 tests pass.

**Verdict:** SUCCESS — committed.

## Experiment 2: Skip Rd tag escaping in `escape_rd_for_md()` for text without backslashes

**Hypothesis:** `escape_rd_for_md()` searches for Rd macros (e.g., `\code{}`, `\link{}`) to protect them from markdown parsing. But most roxygen text doesn't contain any Rd macros. A fast `grepl("\\", text, fixed = TRUE)` check can skip all the work.

**Data:** Of 240 calls during `roxygenize()` on roxygen2, only 4 contain backslashes (236 are plain text).

**Change:** Added early return in `escape_rd_for_md()` when text contains no backslash character, attaching the required empty attribute structure for `unescape_rd_for_md()`.

**Microbenchmark (per call, simple text):**

| Version | min | median |
|---------|-----|--------|
| Before | 106µs | 121µs |
| After | 1.1µs | 1.3µs |

**Result:** 92x faster for simple text, saving ~28ms per `roxygenize()` run (236 × 120µs). All 1105 tests pass.

**Verdict:** SUCCESS — committed.

## Experiment 3: Cache `is_s3_generic()` results during `roxygenize()`

**Hypothesis:** `is_s3_generic()` is called 729 times during `roxygenize()` on roxygen2, with only 301 unique names (59% redundancy). Each call involves `exists()`, `get()`, `tryCatch(getNamespaceName(...))`, and potentially recursive `calls_use_method()`. Caching results by name avoids redundant work.

**Change:** Added an environment-based cache that is activated during `roxygenize()` and cleared afterwards. The cache is inactive during direct calls (e.g., in tests) to avoid cross-environment contamination.

**Microbenchmark (729 calls matching a real roxygenize run):**

| Version | min | median |
|---------|-----|--------|
| Before | 16.4ms | 17.3ms |
| After | 13.6ms | 14.3ms |

**Result:** ~17% faster for `is_s3_generic()` calls, saving ~3ms per `roxygenize()` run. Savings scale with package size (more S3 methods = more cache hits). All 1105 tests pass.

**Verdict:** SUCCESS — committed. Small but clean improvement.

## Experiment 4: Use base `strsplit()` instead of `stringr::str_split()` in `find_generic()`

**Hypothesis:** `stringr::str_split()` has overhead from the stringr/stringi infrastructure. For a simple fixed-character split, base `strsplit()` is faster.

**Change:** Replaced `str_split(name, fixed("."))[[1]]` with `strsplit(name, ".", fixed = TRUE)[[1]]` in `find_generic()`.

**Microbenchmark (per call):**

| Version | min | median |
|---------|-----|--------|
| str_split | 10.5µs | 12.6µs |
| strsplit | 410ns | 533ns |

**Result:** 24x faster per call, saving ~3.4ms per `roxygenize()` run (280 calls × 12µs). All 1105 tests pass.

**Verdict:** SUCCESS — committed.

## Experiment 5: Skip link-reference regex in `get_md_linkrefs()` when no `[` present

**Hypothesis:** `get_md_linkrefs()` runs a complex regex via `str_match_all()` on every tag's text to find markdown link references like `[function()]`. Most tags don't contain `[` at all.

**Data:** Of 240 calls, only 35 contain `[` (205 are plain text without brackets).

**Change:** Added early return when `grepl("[", text, fixed = TRUE)` is FALSE.

**Microbenchmark (per call, simple text):**

| Version | min | median |
|---------|-----|--------|
| Before | 19.4µs | 23.3µs |
| After | 779ns | 1.4µs |

**Result:** 17x faster for plain text, saving ~4.5ms per `roxygenize()` run (205 × 22µs). All 1105 tests pass.

**Verdict:** SUCCESS — committed.
Loading