Fix a number of em/strong issues (#641, #642, #643)#644
Merged
nicholasserra merged 5 commits intotrentm:masterfrom Oct 6, 2025
Merged
Fix a number of em/strong issues (#641, #642, #643)#644nicholasserra merged 5 commits intotrentm:masterfrom
nicholasserra merged 5 commits intotrentm:masterfrom
Conversation
Collaborator
|
This all looks reasonable, thank you for taking on all these edge cases! |
|
Thanks!
Actually, GitHub seems to output |
|
Quick report for gaps/regressions, I'll create issues later unless you ninja-fix it: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes #641, fixes #642 and fixes #643.
I put all these fixes in the same PR because I wanted to make sure they were compatible with each other, but that does mean it's a bit messy.
Middle word em breaking
(*em*)(#641)The middle word em regex would check for
*_chars that aren't preceded by another em char or a whitespace. The text(*em*)matches that, as the leading em char is not preceded by a space. It would then prevent this text from being processed as a valid em.I've updated the regex to look for ems that aren't preceded by non-word chars (instead of whitespace) and that fixed this issue.
The result is that we process this as expected:
Improve handling for leading underscores (#642)
In this issue we had what looked like an
<em>span, but it was straddling two other<strong>spans:This is not a valid em. Spans can be nested but they shouldn't stay open after the parent span closes.
I added some additional logic in the italics and bold stage that will check to see if the matched strong/em has any nested spans and that those spans are balanced and closed. If not, the strong/em is deemed invalid.
The result is that we process the strongs here, but not the em:
Consecutive strong/em can overlap (#643)
The strong/em regexes were starting their matches early as possible, and including as much text in the span as possible. This lead to the following text being processed like so:
**strong***em***strong**<strong>strong*</strong>em<strong>*strong</strong><strong>strong<em></strong>em<strong></em>strong</strong>This renders fine in most browsers, but is invalid html.
To fix this, I modified the strong regex to try to ignore as many leading
*_chars as possible to try to get the opening<strong>tag as close to the actual contents as possible, and try to close the</strong>as soon as possible.Previously the strong regex would process
***abc***as<strong>*abc*</strong>but now it will do*<strong>abc</strong>The effect of this is when we have consecutive strong and ems, they won't overlap anymore.
The unfortunate side effect is Github will process
***abc**as<strong>*abc</strong>, but we will output*<strong>abc</strong>instead, omitting that first em char from the span.