Reimplement markdown escaping in C++ by hadley · Pull Request #1838 · r-lib/roxygen2

hadley · 2026-03-20T17:43:07Z

x <- r"(See \code{foo()} and \link{bar})"
y <- strrep("x", 1e3)

bench::mark(
  escape_rd_for_md_c(x),
  escape_rd_for_md_c(y),
  check = FALSE
)[1:5]

This branch:

  <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>
1 escape_rd_for_md_c(x)   2.34µs   3.32µs   248920.        0B
2 escape_rd_for_md_c(y)   4.39µs   4.67µs   206356.        0B

Main branch:

# A tibble: 2 × 5
  expression               min   median `itr/sec` mem_alloc
  <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>
1 escape_rd_for_md(x)    154µs    170µs     5688.    3.56KB
2 escape_rd_for_md(y)    109µs    121µs     8112.   64.07KB

This substantially improves performance of a common parsing bottleneck.

hadley · 2026-03-21T12:30:19Z

@gaborcsardi could you take a bit of a look at this? I'm not 100% convinced that it's worth reviewing this code but it is a lot faster, and I think underlying ideas are actually a bit easier to see in C++. This function is called on just about every tag component, so it is a reasonable place to optimise.

gaborcsardi

Was this a bottleneck in devtools::document()?

According to my not very sophisticated timings, this PR does make devtools::document() very slightly (3-4%) faster for some packages (ps, processx, rlang), and has essentially no effect for others (testthat). E.g. this is testthat:
Before:

> system.time(devtools::document())
ℹ Updating testthat documentation
ℹ Loading testthat
   user  system elapsed
  1.520   0.171   1.845

After:

> system.time(devtools::document())
ℹ Updating testthat documentation
ℹ Loading testthat
   user  system elapsed
  1.501   0.173   1.836

(The fastest runs for each case from several runs.)

The C++ code seems mostly straightforward. The risk I see is that there might be edge cases we don't anticipate. I tried it on a bunch of packages, it seems to be OK, but people might be doing weird things. Should that happen, they can still use an older roxygen2 until we fix up the edge cases. So if you think that this code is better and easier to maintain, then we should merge it.

Btw. to speed this up further for repeated runs, we could memoize this function.

gaborcsardi · 2026-03-22T16:00:44Z

man/double_escape_md.Rd

 This is a regression test for Markdown escaping.
-}
-\details{
+


Are these changes expected?

gaborcsardi · 2026-03-22T16:01:13Z

man/markdown-internals.Rd

 \code{escape_rd_for_md()} replaces fragile Rd tags with placeholders, to avoid
 interpreting them as markdown. \code{unescape_rd_for_md()} puts the original
 text back in place of the placeholders after the markdown parsing is done.
-The fragile tags are listed in \code{escaped_for_md}.


Is this change expected?

gaborcsardi · 2026-03-22T16:02:18Z

vignettes/rd-formatting.Rmd

 ### Some Rd tags can't contain markdown

-When mixing `Rd` and Markdown notation, most `Rd` tags may contain Markdown markup, the ones that can *not* are: `r paste0("\x60", roxygen2:::escaped_for_md, "\x60", collapse = ", ")`.
+When mixing `Rd` and Markdown notation, most `Rd` tags may contain Markdown markup, the ones that can *not* are: `\acronym`, `\code`, `\command`, `\CRANpkg`, `\deqn`, `\doi`, `\dontrun`, `\dontshow`, `\donttest`, `\email`, `\env`, `\eqn`, `\figure`, `\file`, `\if`, `\ifelse`, `\kbd`, `\link`, `\linkS4class`, `\method`, `\mjeqn`, `\mjdeqn`, `\mjseqn`, `\mjsdeqn`, `\mjteqn`, `\mjtdeqn`, `\newcommand`, `\option`, `\out`, `\packageAuthor`, `\packageDescription`, `\packageDESCRIPTION`, `\packageIndices`, `\packageMaintainer`, `\packageTitle`, `\pkg`, `\PR`, `\preformatted`, `\renewcommand`, `\S3method`, `\S4method`, `\samp`, `\special`, `\testonly`, `\url`, `\var`, `\verb`.


I think it would make sense to generated this list programmatically instead of repeating it.

It's now in a C++ vector, so not easy to pull. But given that it changes rarely and there's a reminder comment in the C++ code (which claude is likely to read), I think it's low risk.

We can add a c++ function that just returns that vector as an R character vector, no?

gaborcsardi · 2026-03-23T06:41:18Z

src/markdown-escaping.cpp

+          i = j;
+
+          // Check if the tag has arguments (next char must be '{')
+          if (i >= n || text[i] != '{') {


I think the next character can also be a [, e.g. \link[=dest]{name}.

hadley added 3 commits March 20, 2026 08:42

Rewrite rd/md escape in C++

c10d6f6

This substantially improves performance of a common parsing bottleneck.

Just do a single pass

bcdd74b

IDs are now deterministic

bb537a9

gaborcsardi approved these changes Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement markdown escaping in C++#1838

Reimplement markdown escaping in C++#1838
hadley wants to merge 3 commits intomainfrom
md-rd-escaping-cpp

hadley commented Mar 20, 2026

Uh oh!

hadley commented Mar 21, 2026

Uh oh!

gaborcsardi left a comment

Uh oh!

gaborcsardi Mar 22, 2026

Uh oh!

gaborcsardi Mar 22, 2026

Uh oh!

gaborcsardi Mar 22, 2026

Uh oh!

hadley Mar 23, 2026

Uh oh!

gaborcsardi Mar 23, 2026

Uh oh!

gaborcsardi Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hadley commented Mar 20, 2026

Uh oh!

hadley commented Mar 21, 2026

Uh oh!

gaborcsardi left a comment

Choose a reason for hiding this comment

Uh oh!

gaborcsardi Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gaborcsardi Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gaborcsardi Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

hadley Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gaborcsardi Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gaborcsardi Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants