From b692934547ac187e948f37c471836541d4319e1e Mon Sep 17 00:00:00 2001 From: Jasmin Lapalme Date: Tue, 31 May 2022 14:22:35 -0400 Subject: [PATCH] Replace the colon use in href by an underscore The RFC3986 says that we cannot use the colon in HREF (section 2.2) --- DevelopmentNotes/DevelopmentNotes.html | 6 +-- QuickStart/QuickStart.html | 20 +++---- src/html.c | 28 +++++----- tests/MMD6Tests/Abbreviations.html | 12 ++--- tests/MMD6Tests/Citations.html | 36 ++++++------- tests/MMD6Tests/Dutch.html | 12 ++--- tests/MMD6Tests/English.html | 12 ++--- tests/MMD6Tests/French.html | 12 ++--- tests/MMD6Tests/German Guillemets.html | 12 ++--- tests/MMD6Tests/German.html | 12 ++--- tests/MMD6Tests/Glossaries.html | 66 ++++++++++++------------ tests/MMD6Tests/Inline Citations.html | 12 ++--- tests/MMD6Tests/Inline Footnotes.html | 32 ++++++------ tests/MMD6Tests/Integrated.html | 30 +++++------ tests/MMD6Tests/Reference Footnotes.html | 20 +++---- tests/MMD6Tests/Spanish.html | 12 ++--- tests/MMD6Tests/Swedish.html | 12 ++--- 17 files changed, 173 insertions(+), 173 deletions(-) diff --git a/DevelopmentNotes/DevelopmentNotes.html b/DevelopmentNotes/DevelopmentNotes.html index f20b42f..a136904 100644 --- a/DevelopmentNotes/DevelopmentNotes.html +++ b/DevelopmentNotes/DevelopmentNotes.html @@ -26,7 +26,7 @@

Why a New Version?

  • Accuracy – MMD v4 and v5 were the most accurate versions yet, and a lot of effort went into finding and resolving various edge cases. However, it began to feel like a game of whack-a-mole where new bugs would creep in every time I -fixed an old one. The PEG began to feel rather convoluted in spots, even +fixed an old one. The PEG began to feel rather convoluted in spots, even though it did allow for a precise (if not always accurate) specification of the grammar.

  • Performance – “Back in the day” peg-markdown was one of the fastest @@ -1142,8 +1142,8 @@

    Changelog


      -
    1. -PEG:

      Parsing Expression Grammar https://en.wikipedia.org/wiki/Parsing_expression_grammar  ↩︎

      +
    2. +PEG:

      Parsing Expression Grammar https://en.wikipedia.org/wiki/Parsing_expression_grammar  ↩︎

    diff --git a/QuickStart/QuickStart.html b/QuickStart/QuickStart.html index d949e50..098fdd5 100644 --- a/QuickStart/QuickStart.html +++ b/QuickStart/QuickStart.html @@ -59,7 +59,7 @@

    Performance

    When developing MMD v6, one of my goals was to keep MMD at least in the ballpark of the fastest processors. Of course, being the fastest would be fantastic, but I was more concerned with ensuring that the code was easily understood, and easily updated with new features in the future.

    -

    MMD v3 – v5 used a PEG to handle the parsing. This made it easy to understand the relationship between the MMD grammar and the parsing code, since they were one and the same. However, the parsing code generated by the parsers was not particularly fast, and was prone to troublesome edge cases with terrible performance characteristics.

    +

    MMD v3 – v5 used a PEG to handle the parsing. This made it easy to understand the relationship between the MMD grammar and the parsing code, since they were one and the same. However, the parsing code generated by the parsers was not particularly fast, and was prone to troublesome edge cases with terrible performance characteristics.

    The first step in MMD v6 parsing is to break the source text into a series of tokens, which may consist of plain text, whitespace, or special characters such as ‘*’, ‘[’, etc. This chain of tokens is then used to perform the actual parsing.

    @@ -67,7 +67,7 @@

    Performance

    1. Block parsing consists of identifying the “type” of each line of the source text, and grouping the lines into blocks (e.g. paragraphs, lists, blockquotes, etc.) Some blocks are a single line (e.g. ATX headers), and others can be many lines long. The block parsing in MMD v6 is handled by a parser generated by lemon. This parser allows the block structure to be more readily understood by non-programmers, but the generated parser is still fast.

    2. -
    3. Span parsing consists of identifying Markdown/MMD structures that occur inside of blocks, such as links, images, strong, emph, etc. Most of these structures require matching pairs of tokens to specify where the span starts and where it ends. Most of these spans allow arbitrary levels of nesting as well. This made parsing them correctly in the PEG-based code difficult and slow. MMD v6 uses a different approach that is accurate and has good performance characteristics even with edge cases. Basically, it keeps a stack of each “opening” token as it steps through the token chain. When a “closing” token is found, it is paired with the most recent appropriate opener on the stack. Any tokens in between the opener and closer are removed, as they are not able to be matched any more. To avoid unnecessary searches for non- existent openers, the parser keeps track of which opening tokens have been discovered. This allows the parser to continue moving forwards without having to go backwards and re-parse any previously visited tokens.

    4. +
    5. Span parsing consists of identifying Markdown/MMD structures that occur inside of blocks, such as links, images, strong, emph, etc. Most of these structures require matching pairs of tokens to specify where the span starts and where it ends. Most of these spans allow arbitrary levels of nesting as well. This made parsing them correctly in the PEG-based code difficult and slow. MMD v6 uses a different approach that is accurate and has good performance characteristics even with edge cases. Basically, it keeps a stack of each “opening” token as it steps through the token chain. When a “closing” token is found, it is paired with the most recent appropriate opener on the stack. Any tokens in between the opener and closer are removed, as they are not able to be matched any more. To avoid unnecessary searches for non- existent openers, the parser keeps track of which opening tokens have been discovered. This allows the parser to continue moving forwards without having to go backwards and re-parse any previously visited tokens.

    The result of this redesigned MMD parser is that it can parse short documents more quickly than CommonMark, and takes only 15% – 20% longer to parse long documents. I have not delved too deeply into this, but I presume that CommonMark has a bit more “set-up” time that becomes expensive when parsing a short document (e.g. a paragraph or two). But this cost becomes negligible when parsing longer documents (e.g. file sizes of 1 MB). So depending on your use case, CommonMark may well be faster than MMD, but we’re talking about splitting hairs here…. Recent comparisons show MMD v6 taking approximately 4.37 seconds to parse a 108 MB file (approximately 24.8 MB/second), and CommonMark took 3.72 seconds for the same file (29.2 MB/second). For comparison, MMD v5.4 took approximately 94 second for the same file (1.15 MB/second).

    @@ -91,7 +91,7 @@

    Parse Tree

  • Use the resulting token tree for your own purposes.

  • -

    The token tree (AST) includes starting offsets and length of each token, allowing you to use MMD as part of a syntax highlighter. MMD v5 did not have this functionality in the public version, in part because the PEG parsers used did not provide reliable offset positions, requiring a great deal of effort when I adapted MMD for use in MultiMarkdown Composer.

    +

    The token tree (AST) includes starting offsets and length of each token, allowing you to use MMD as part of a syntax highlighter. MMD v5 did not have this functionality in the public version, in part because the PEG parsers used did not provide reliable offset positions, requiring a great deal of effort when I adapted MMD for use in MultiMarkdown Composer.

    These steps are managed using the mmd_engine “object”. An individual mmd_engine cannot be used by multiple threads simultaneously, so if libMultiMarkdown is to be used in a multithreaded program, a separate mmd_engine should be created for each thread. Alternatively, just use the slightly more abstracted mmd_convert_string() function that handles creating and destroying the mmd_engine automatically.

    @@ -160,7 +160,7 @@

    Footnotes

    Glossary Terms

    -

    If there are terms in your document you wish to define in a glossary at the end of your document, you can define them using the glossary syntax.

    +

    If there are terms in your document you wish to define in a glossary at the end of your document, you can define them using the glossary syntax.

    Glossary terms can be specified using inline or reference syntax. The inline variant requires that the abbreviation be wrapped in parentheses and immediately follows the ?.

    @@ -410,16 +410,16 @@

    Future Steps


      -
    1. -PEG:

      Parsing Expression Grammar https://en.wikipedia.org/wiki/Parsing_expression_grammar  ↩

      +
    2. +PEG:

      Parsing Expression Grammar https://en.wikipedia.org/wiki/Parsing_expression_grammar  ↩

    3. -
    4. -AST:

      Abstract Syntax Tree https://en.wikipedia.org/wiki/Abstract_syntax_tree  ↩

      +
    5. +AST:

      Abstract Syntax Tree https://en.wikipedia.org/wiki/Abstract_syntax_tree  ↩

    6. -
    7. -glossary:

      The glossary collects information about important terms used in your document  ↩

      +
    8. +glossary:

      The glossary collects information about important terms used in your document  ↩

    diff --git a/src/html.c b/src/html.c index 719ae02..c12ca13 100644 --- a/src/html.c +++ b/src/html.c @@ -838,7 +838,7 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc scratch->footnote_para_counter--; if (scratch->footnote_para_counter == 0) { - printf("  ↩︎", scratch->citation_being_printed, LC("return to body")); + printf("  ↩︎", scratch->citation_being_printed, LC("return to body")); } } @@ -853,7 +853,7 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc temp_short = rand() % 32000 + 1; } - printf("  ↩︎", temp_short, LC("return to body")); + printf("  ↩︎", temp_short, LC("return to body")); } } @@ -861,7 +861,7 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc scratch->footnote_para_counter--; if (scratch->footnote_para_counter == 0) { - printf("  ↩︎", scratch->glossary_being_printed, LC("return to body")); + printf("  ↩︎", scratch->glossary_being_printed, LC("return to body")); } } @@ -1575,11 +1575,11 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc if (temp_short2 == scratch->used_citations->size) { // This is a re-use of a previously used note - printf("(%d)", + printf("(%d)", temp_short, LC("see citation"), temp_short); } else { // This is the first time this note was used - printf("(%d)", + printf("(%d)", temp_short, temp_short, LC("see citation"), temp_short); } } else { @@ -1587,11 +1587,11 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc if (temp_short2 == scratch->used_citations->size) { // This is a re-use of a previously used note - printf("(%s, %d)", + printf("(%s, %d)", temp_short, LC("see citation"), temp_char, temp_short); } else { // This is the first time this note was used - printf("(%s, %d)", + printf("(%s, %d)", temp_short, temp_short, LC("see citation"), temp_char, temp_short); } } @@ -1638,7 +1638,7 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc temp_short3 = temp_short; } - printf("%d", + printf("%d", temp_short3, LC("see footnote"), temp_short); } else { // This is the first time this note was used @@ -1650,7 +1650,7 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc temp_short3 = temp_short; } - printf("%d", + printf("%d", temp_short3, temp_short3, LC("see footnote"), temp_short); } } else { @@ -1690,7 +1690,7 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc if (temp_short2 == scratch->used_glossaries->size) { // This is a re-use of a previously used note - printf("", + printf("", temp_short, LC("see glossary")); mmd_print_string_html(out, temp_note->clean_text, false, true); print_const(""); @@ -1698,7 +1698,7 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc // This is the first time this note was used - printf("", + printf("", temp_short, temp_short, LC("see glossary")); mmd_print_string_html(out, temp_note->clean_text, false, true); print_const(""); @@ -2479,7 +2479,7 @@ void mmd_export_footnote_list_html(DString * out, const char * source, scratch_p // Export footnote pad(out, 2, scratch); - printf("
  • \n", i + 1); + printf("
  • \n", i + 1); scratch->padded = 6; note = stack_peek_index(scratch->used_footnotes, i); @@ -2527,7 +2527,7 @@ void mmd_export_glossary_list_html(DString * out, const char * source, scratch_p // Export glossary pad(out, 2, scratch); - printf("
  • \n", i + 1); + printf("
  • \n", i + 1); scratch->padded = 6; note = stack_peek_index(scratch->used_glossaries, i); @@ -2580,7 +2580,7 @@ void mmd_export_citation_list_html(DString * out, const char * source, scratch_p // Export footnote pad(out, 2, scratch); - printf("
  • \n", i + 1); + printf("
  • \n", i + 1); scratch->padded = 6; note = stack_peek_index(scratch->used_citations, i); diff --git a/tests/MMD6Tests/Abbreviations.html b/tests/MMD6Tests/Abbreviations.html index 3cae759..47dc627 100644 --- a/tests/MMD6Tests/Abbreviations.html +++ b/tests/MMD6Tests/Abbreviations.html @@ -80,9 +80,9 @@

    foo bar

    20

    -

    1

    +

    1

    -

    2

    +

    2