Skip to content

feat: more semantic tokens for Markdown/Verso#13674

Open
david-christiansen wants to merge 3 commits into
leanprover:masterfrom
david-christiansen:doc-hl
Open

feat: more semantic tokens for Markdown/Verso#13674
david-christiansen wants to merge 3 commits into
leanprover:masterfrom
david-christiansen:doc-hl

Conversation

@david-christiansen
Copy link
Copy Markdown
Contributor

@david-christiansen david-christiansen commented May 7, 2026

This PR adds semantic tokens for Markdown and Verso that correspond to those available in editor themes for Markdown. With a small update to editors, docstrings can be consistently formatted without needing to reimplement either complex parser client-side.

The changes include:

  • New semantic tokens that represent markup. One of them is not use in Lean per se: markupPlainText is intended for Verso proper, which should not be styled as though it's in a doc comment.

  • The Lean.Server.FileWorker.Markdown module hierarchy, which includes a mostly-conformant CommonMark parser that is very particular about tracking source locations. It deviates from CommonMark as follows:

    • Link reference names are case-sensitive outside of A-Z, rather than implementing the Unicode case folding specified by the spec
    • HTML named entities are not validated, and only a small subset are expanded
    • Embedded HTML is not parsed specially

    It passes all other tests in the spec.

  • Lean.Server.FileWorker.SemanticHighlighting now emits the new tokens based on a Markdown parse. I considered moving this code out of that module, but it would have required a broader reorganization so I erred on the side of not disturbing things. I can do it if you'd like, though.

  • Lean.Server.ProtocolOverview mentions the custom tokens in the semantic token method descriptions. I didn't see another natural place to describe it.

To test this, I Clauded up a verso of vscode-lean4 with the following extra rules:
Under semanticTokenScopes, I added:

                    "markupDocText": [
                        "comment.documentation",
                        "comment.block.documentation"
                    ],
                    "markupBold": [
                        "markup.bold"
                    ],
                    "markupItalic": [
                        "markup.italic"
                    ],
                    "markupBoldItalic": [
                        "markup.bold.italic",
                        "markup.italic.bold",
                        "markup.bold",
                        "markup.italic"
                    ],
                    "markupHeading": [
                        "markup.heading"
                    ],
                    "markupBoldHeading": [
                        "markup.bold.heading",
                        "markup.bold",
                        "markup.heading"
                    ],
                    "markupItalicHeading": [
                        "markup.italic.heading",
                        "markup.italic",
                        "markup.heading"
                    ],
                    "markupBoldItalicHeading": [
                        "markup.bold.italic.heading",
                        "markup.italic.bold.heading",
                        "markup.bold",
                        "markup.italic",
                        "markup.heading"
                    ],
                    "markupQuote": [
                        "markup.quote"
                    ],
                    "markupBoldQuote": [
                        "markup.bold.quote",
                        "markup.bold",
                        "markup.quote"
                    ],
                    "markupItalicQuote": [
                        "markup.italic.quote",
                        "markup.italic",
                        "markup.quote"
                    ],
                    "markupBoldItalicQuote": [
                        "markup.bold.italic.quote",
                        "markup.italic.bold.quote",
                        "markup.bold",
                        "markup.italic",
                        "markup.quote"
                    ],
                    "markupList": [
                        "markup.list"
                    ],
                    "markupBoldList": [
                        "markup.bold.list",
                        "markup.bold",
                        "markup.list"
                    ],
                    "markupItalicList": [
                        "markup.italic.list",
                        "markup.italic",
                        "markup.list"
                    ],
                    "markupBoldItalicList": [
                        "markup.bold.italic.list",
                        "markup.italic.bold.list",
                        "markup.bold",
                        "markup.italic",
                        "markup.list"
                    ],
                    "markupInlineCode": [
                        "markup.inline.raw",
                        "string"
                    ],
                    "markupCodeBlock": [
                        "markup.fenced_code.block",
                        "string"
                    ],
                    "markupPlainText": [
                        "text",
                        "variable",
                        "string",
                        "markup.raw"
                    ],
                    "markupUrl": [
                        "markup.underline.link",
                        "string.other.link"
                    ],
                    "markupCrossReference": [
                        "markup.underline.link",
                        "string.other.link"
                    ]

and under configurationDefaults:

            "editor.semanticTokenColorCustomizations": {
                "rules": {
                    "markupPlainText:lean4": {
                        "fontStyle": ""
                    }
                }
            },

There's apparently no way to have markupPlainText just be the default face, so I did this somewhat baroque setup. The others all just made sense.

Here's the result:
image

This PR adds semantic tokens for Markdown and Verso that correspond to
those available in editor themes for Markdown. With a small update to
editors, docstrings can be consistently formatted without needing to
reimplement either complex parser client-side.
@david-christiansen david-christiansen requested a review from mhuisi as a code owner May 7, 2026 00:17
@david-christiansen david-christiansen added the changelog-server Language server, widgets, and IDE extensions label May 7, 2026
@david-christiansen david-christiansen changed the title feat: Markdown semantic tokens feat: more semantic tokens for Markdown/Verso May 7, 2026
@github-actions github-actions Bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label May 7, 2026
@mathlib-lean-pr-testing
Copy link
Copy Markdown

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 5d5642107d0433519265f155ddbfbfb98007a80b --onto ea6e76707835317858b7dd36c95322679c50aaac. You can force Mathlib CI using the force-mathlib-ci label. (2026-05-07 01:59:07)

@leanprover-bot
Copy link
Copy Markdown
Collaborator

Reference manual CI status:

  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase 5d5642107d0433519265f155ddbfbfb98007a80b --onto 3fc99eef102549c743c6c63547983319bfae6f01. You can force reference manual CI using the force-manual-ci label. (2026-05-07 01:59:08)

@Julian
Copy link
Copy Markdown

Julian commented May 8, 2026

I took a closer look at this for lean.nvim last night. TL;DR it's awesome having semantic tokens here, we had disabled our markdown parser as getting it right alongside Lean was torture. How precisely it looks depends on the theme of course, but e.g. out of the box on mine, this is what this looks like:

leanmd-pr

A few specific comments in case helpful:

  • markdown punctuation looks like it's being sent with the keyword token type -- e.g. * in *foo*, or # foo in headings, or === / ---, or backticks for code blocks, etc. -- I think per the spec that is often going to be a prominent color as it's meant for true language keywords, so it makes things like * get styled the same color as def the true Lean keyword. Perhaps it'd be nicer if it were a markdown specific token, or left unstyled entirely so it falls back to the comment styling?
  • I notice you're not using token modifiers -- instead you have (markupBoldHeading, markupItalicHeading, markupBoldItalicHeading, markupBoldQuote … markupBoldItalicList) -- basically on the Neovim side that means I have to parse each of these and extract what styling they require, as opposed to having e.g. markupHeading / markupQuote / markupList and sending modifiers for bold | italic styling. Perhaps worth considering but I'm not sure what VSCode's support looks like?
  • In Neovim, styling is "additive", meaning when you send a style for bold, that gets added to styles underneath, and since in general that's "Comment", and many themes render that token type as italic, it means that the comment in lean.nvim rendered as bold+italic even for normal bold, and that all text basically renders italic. I put in a workaround here to get the expected visual, but flagging as I'm not sure if the same can happen in VSCode -- e.g. is there such a thing as using a VSCode theme where the default comment style is italic, and if so how does this PR look in such a theme?

All the above being said, yeah this is a great improvement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-server Language server, widgets, and IDE extensions toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants