Conversation
This substantially improves performance of a common parsing bottleneck.
|
@gaborcsardi could you take a bit of a look at this? I'm not 100% convinced that it's worth reviewing this code but it is a lot faster, and I think underlying ideas are actually a bit easier to see in C++. This function is called on just about every tag component, so it is a reasonable place to optimise. |
gaborcsardi
left a comment
There was a problem hiding this comment.
Was this a bottleneck in devtools::document()?
According to my not very sophisticated timings, this PR does make devtools::document() very slightly (3-4%) faster for some packages (ps, processx, rlang), and has essentially no effect for others (testthat). E.g. this is testthat:
Before:
> system.time(devtools::document())
ℹ Updating testthat documentation
ℹ Loading testthat
user system elapsed
1.520 0.171 1.845After:
> system.time(devtools::document())
ℹ Updating testthat documentation
ℹ Loading testthat
user system elapsed
1.501 0.173 1.836(The fastest runs for each case from several runs.)
The C++ code seems mostly straightforward. The risk I see is that there might be edge cases we don't anticipate. I tried it on a bunch of packages, it seems to be OK, but people might be doing weird things. Should that happen, they can still use an older roxygen2 until we fix up the edge cases. So if you think that this code is better and easier to maintain, then we should merge it.
Btw. to speed this up further for repeated runs, we could memoize this function.
| This is a regression test for Markdown escaping. | ||
| } | ||
| \details{ | ||
|
|
There was a problem hiding this comment.
Are these changes expected?
| \code{escape_rd_for_md()} replaces fragile Rd tags with placeholders, to avoid | ||
| interpreting them as markdown. \code{unescape_rd_for_md()} puts the original | ||
| text back in place of the placeholders after the markdown parsing is done. | ||
| The fragile tags are listed in \code{escaped_for_md}. |
| ### Some Rd tags can't contain markdown | ||
|
|
||
| When mixing `Rd` and Markdown notation, most `Rd` tags may contain Markdown markup, the ones that can *not* are: `r paste0("\x60", roxygen2:::escaped_for_md, "\x60", collapse = ", ")`. | ||
| When mixing `Rd` and Markdown notation, most `Rd` tags may contain Markdown markup, the ones that can *not* are: `\acronym`, `\code`, `\command`, `\CRANpkg`, `\deqn`, `\doi`, `\dontrun`, `\dontshow`, `\donttest`, `\email`, `\env`, `\eqn`, `\figure`, `\file`, `\if`, `\ifelse`, `\kbd`, `\link`, `\linkS4class`, `\method`, `\mjeqn`, `\mjdeqn`, `\mjseqn`, `\mjsdeqn`, `\mjteqn`, `\mjtdeqn`, `\newcommand`, `\option`, `\out`, `\packageAuthor`, `\packageDescription`, `\packageDESCRIPTION`, `\packageIndices`, `\packageMaintainer`, `\packageTitle`, `\pkg`, `\PR`, `\preformatted`, `\renewcommand`, `\S3method`, `\S4method`, `\samp`, `\special`, `\testonly`, `\url`, `\var`, `\verb`. |
There was a problem hiding this comment.
I think it would make sense to generated this list programmatically instead of repeating it.
There was a problem hiding this comment.
It's now in a C++ vector, so not easy to pull. But given that it changes rarely and there's a reminder comment in the C++ code (which claude is likely to read), I think it's low risk.
There was a problem hiding this comment.
We can add a c++ function that just returns that vector as an R character vector, no?
| i = j; | ||
|
|
||
| // Check if the tag has arguments (next char must be '{') | ||
| if (i >= n || text[i] != '{') { |
There was a problem hiding this comment.
I think the next character can also be a [, e.g. \link[=dest]{name}.
This branch:
Main branch: