Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions DevelopmentNotes/DevelopmentNotes.html
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ <h4 id="whyanewversion">Why a New Version?</h4>
<li><p>Accuracy &#8211; <abbr title="MultiMarkdown">MMD</abbr> v4 and v5 were the most accurate versions yet, and a lot of
effort went into finding and resolving various edge cases. However, it began
to feel like a game of whack-a-mole where new bugs would creep in every time I
fixed an old one. The <a href="#gn:1" id="gnref:1" title="see glossary" class="glossary">PEG</a> began to feel rather convoluted in spots, even
fixed an old one. The <a href="#gn_1" id="gnref_1" title="see glossary" class="glossary">PEG</a> began to feel rather convoluted in spots, even
though it did allow for a precise (if not always accurate) specification of
the grammar.</p></li>
<li><p>Performance &#8211; &#8220;Back in the day&#8221; <a href="https://github.com/jgm/peg-markdown">peg-markdown</a> was one of the fastest
Expand Down Expand Up @@ -1142,8 +1142,8 @@ <h3 id="changelog">Changelog</h3>
<hr />
<ol>

<li id="gn:1">
PEG: <p>Parsing Expression Grammar <a href="https://en.wikipedia.org/wiki/Parsing_expression_grammar">https://en.wikipedia.org/wiki/Parsing_expression_grammar</a> <a href="#gnref:1" title="return to body" class="reverseglossary">&#160;&#8617;&#xfe0e;</a></p>
<li id="gn_1">
PEG: <p>Parsing Expression Grammar <a href="https://en.wikipedia.org/wiki/Parsing_expression_grammar">https://en.wikipedia.org/wiki/Parsing_expression_grammar</a> <a href="#gnref_1" title="return to body" class="reverseglossary">&#160;&#8617;&#xfe0e;</a></p>
</li>

</ol>
Expand Down
20 changes: 10 additions & 10 deletions QuickStart/QuickStart.html
Original file line number Diff line number Diff line change
Expand Up @@ -59,15 +59,15 @@ <h3 id="performance">Performance</h3>

<p>When developing <abbr title="MultiMarkdown">MMD</abbr> v6, one of my goals was to keep <abbr title="MultiMarkdown">MMD</abbr> at least in the ballpark of the fastest processors. Of course, being <em>the</em> fastest would be fantastic, but I was more concerned with ensuring that the code was easily understood, and easily updated with new features in the future.</p>

<p><abbr title="MultiMarkdown">MMD</abbr> v3 &#8211; v5 used a <a href="#gn:1" id="gnref:1" title="see glossary" class="glossary">PEG</a> to handle the parsing. This made it easy to understand the relationship between the <abbr title="MultiMarkdown">MMD</abbr> grammar and the parsing code, since they were one and the same. However, the parsing code generated by the parsers was not particularly fast, and was prone to troublesome edge cases with terrible performance characteristics.</p>
<p><abbr title="MultiMarkdown">MMD</abbr> v3 &#8211; v5 used a <a href="#gn_1" id="gnref_1" title="see glossary" class="glossary">PEG</a> to handle the parsing. This made it easy to understand the relationship between the <abbr title="MultiMarkdown">MMD</abbr> grammar and the parsing code, since they were one and the same. However, the parsing code generated by the parsers was not particularly fast, and was prone to troublesome edge cases with terrible performance characteristics.</p>

<p>The first step in <abbr title="MultiMarkdown">MMD</abbr> v6 parsing is to break the source text into a series of tokens, which may consist of plain text, whitespace, or special characters such as &#8216;*&#8217;, &#8216;[&#8217;, etc. This chain of tokens is then used to perform the actual parsing.</p>

<p><abbr title="MultiMarkdown">MMD</abbr> v6 divides the parsing into two separate phases, which actually fits more with Markdown&#8217;s design philosophically.</p>

<ol>
<li><p>Block parsing consists of identifying the &#8220;type&#8221; of each line of the source text, and grouping the lines into blocks (e.g. paragraphs, lists, blockquotes, etc.) Some blocks are a single line (e.g. ATX headers), and others can be many lines long. The block parsing in <abbr title="MultiMarkdown">MMD</abbr> v6 is handled by a parser generated by <a href="http://www.hwaci.com/sw/lemon/">lemon</a>. This parser allows the block structure to be more readily understood by non-programmers, but the generated parser is still fast.</p></li>
<li><p>Span parsing consists of identifying Markdown/<abbr title="MultiMarkdown">MMD</abbr> structures that occur inside of blocks, such as links, images, strong, emph, etc. Most of these structures require matching pairs of tokens to specify where the span starts and where it ends. Most of these spans allow arbitrary levels of nesting as well. This made parsing them correctly in the <a href="#gn:1" title="see glossary" class="glossary">PEG</a>-based code difficult and slow. <abbr title="MultiMarkdown">MMD</abbr> v6 uses a different approach that is accurate and has good performance characteristics even with edge cases. Basically, it keeps a stack of each &#8220;opening&#8221; token as it steps through the token chain. When a &#8220;closing&#8221; token is found, it is paired with the most recent appropriate opener on the stack. Any tokens in between the opener and closer are removed, as they are not able to be matched any more. To avoid unnecessary searches for non- existent openers, the parser keeps track of which opening tokens have been discovered. This allows the parser to continue moving forwards without having to go backwards and re-parse any previously visited tokens.</p></li>
<li><p>Span parsing consists of identifying Markdown/<abbr title="MultiMarkdown">MMD</abbr> structures that occur inside of blocks, such as links, images, strong, emph, etc. Most of these structures require matching pairs of tokens to specify where the span starts and where it ends. Most of these spans allow arbitrary levels of nesting as well. This made parsing them correctly in the <a href="#gn_1" title="see glossary" class="glossary">PEG</a>-based code difficult and slow. <abbr title="MultiMarkdown">MMD</abbr> v6 uses a different approach that is accurate and has good performance characteristics even with edge cases. Basically, it keeps a stack of each &#8220;opening&#8221; token as it steps through the token chain. When a &#8220;closing&#8221; token is found, it is paired with the most recent appropriate opener on the stack. Any tokens in between the opener and closer are removed, as they are not able to be matched any more. To avoid unnecessary searches for non- existent openers, the parser keeps track of which opening tokens have been discovered. This allows the parser to continue moving forwards without having to go backwards and re-parse any previously visited tokens.</p></li>
</ol>

<p>The result of this redesigned <abbr title="MultiMarkdown">MMD</abbr> parser is that it can parse short documents more quickly than <a href="http://commonmark.org/">CommonMark</a>, and takes only 15% &#8211; 20% longer to parse long documents. I have not delved too deeply into this, but I presume that CommonMark has a bit more &#8220;set-up&#8221; time that becomes expensive when parsing a short document (e.g. a paragraph or two). But this cost becomes negligible when parsing longer documents (e.g. file sizes of 1 MB). So depending on your use case, CommonMark may well be faster than <abbr title="MultiMarkdown">MMD</abbr>, but we&#8217;re talking about splitting hairs here&#8230;. Recent comparisons show <abbr title="MultiMarkdown">MMD</abbr> v6 taking approximately 4.37 seconds to parse a 108 MB file (approximately 24.8 MB/second), and CommonMark took 3.72 seconds for the same file (29.2 MB/second). For comparison, <abbr title="MultiMarkdown">MMD</abbr> v5.4 took approximately 94 second for the same file (1.15 MB/second).</p>
Expand All @@ -91,7 +91,7 @@ <h3 id="parsetree">Parse Tree</h3>
<li><p>Use the resulting token tree for your own purposes.</p></li>
</ol>

<p>The token tree (<a href="#gn:2" id="gnref:2" title="see glossary" class="glossary">AST</a>) includes starting offsets and length of each token, allowing you to use <abbr title="MultiMarkdown">MMD</abbr> as part of a syntax highlighter. <abbr title="MultiMarkdown">MMD</abbr> v5 did not have this functionality in the public version, in part because the <a href="#gn:1" title="see glossary" class="glossary">PEG</a> parsers used did not provide reliable offset positions, requiring a great deal of effort when I adapted <abbr title="MultiMarkdown">MMD</abbr> for use in <a href="http://multimarkdown.com/">MultiMarkdown Composer</a>.</p>
<p>The token tree (<a href="#gn_2" id="gnref_2" title="see glossary" class="glossary">AST</a>) includes starting offsets and length of each token, allowing you to use <abbr title="MultiMarkdown">MMD</abbr> as part of a syntax highlighter. <abbr title="MultiMarkdown">MMD</abbr> v5 did not have this functionality in the public version, in part because the <a href="#gn_1" title="see glossary" class="glossary">PEG</a> parsers used did not provide reliable offset positions, requiring a great deal of effort when I adapted <abbr title="MultiMarkdown">MMD</abbr> for use in <a href="http://multimarkdown.com/">MultiMarkdown Composer</a>.</p>

<p>These steps are managed using the <code>mmd_engine</code> &#8220;object&#8221;. An individual <code>mmd_engine</code> cannot be used by multiple threads simultaneously, so if libMultiMarkdown is to be used in a multithreaded program, a separate <code>mmd_engine</code> should be created for each thread. Alternatively, just use the slightly more abstracted <code>mmd_convert_string()</code> function that handles creating and destroying the <code>mmd_engine</code> automatically.</p>

Expand Down Expand Up @@ -160,7 +160,7 @@ <h4 id="footnotes">Footnotes</h4>

<h4 id="glossaryterms">Glossary Terms</h4>

<p>If there are terms in your document you wish to define in a <a href="#gn:3" id="gnref:3" title="see glossary" class="glossary">glossary</a> at the end of your document, you can define them using the glossary syntax.</p>
<p>If there are terms in your document you wish to define in a <a href="#gn_3" id="gnref_3" title="see glossary" class="glossary">glossary</a> at the end of your document, you can define them using the glossary syntax.</p>

<p>Glossary terms can be specified using inline or reference syntax. The inline variant requires that the abbreviation be wrapped in parentheses and immediately follows the <code>?</code>.</p>

Expand Down Expand Up @@ -410,16 +410,16 @@ <h3 id="futuresteps">Future Steps</h3>
<hr />
<ol>

<li id="gn:1">
PEG: <p>Parsing Expression Grammar <a href="https://en.wikipedia.org/wiki/Parsing_expression_grammar">https://en.wikipedia.org/wiki/Parsing_expression_grammar</a> <a href="#gnref:1" title="return to body" class="reverseglossary">&#160;&#8617;</a></p>
<li id="gn_1">
PEG: <p>Parsing Expression Grammar <a href="https://en.wikipedia.org/wiki/Parsing_expression_grammar">https://en.wikipedia.org/wiki/Parsing_expression_grammar</a> <a href="#gnref_1" title="return to body" class="reverseglossary">&#160;&#8617;</a></p>
</li>

<li id="gn:2">
AST: <p>Abstract Syntax Tree <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">https://en.wikipedia.org/wiki/Abstract_syntax_tree</a> <a href="#gnref:2" title="return to body" class="reverseglossary">&#160;&#8617;</a></p>
<li id="gn_2">
AST: <p>Abstract Syntax Tree <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">https://en.wikipedia.org/wiki/Abstract_syntax_tree</a> <a href="#gnref_2" title="return to body" class="reverseglossary">&#160;&#8617;</a></p>
</li>

<li id="gn:3">
glossary: <p>The glossary collects information about important terms used in your document <a href="#gnref:3" title="return to body" class="reverseglossary">&#160;&#8617;</a></p>
<li id="gn_3">
glossary: <p>The glossary collects information about important terms used in your document <a href="#gnref_3" title="return to body" class="reverseglossary">&#160;&#8617;</a></p>
</li>

</ol>
Expand Down
28 changes: 14 additions & 14 deletions src/html.c
Original file line number Diff line number Diff line change
Expand Up @@ -838,7 +838,7 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc
scratch->footnote_para_counter--;

if (scratch->footnote_para_counter == 0) {
printf(" <a href=\"#cnref:%d\" title=\"%s\" class=\"reversecitation\">&#160;&#8617;&#xfe0e;</a>", scratch->citation_being_printed, LC("return to body"));
printf(" <a href=\"#cnref_%d\" title=\"%s\" class=\"reversecitation\">&#160;&#8617;&#xfe0e;</a>", scratch->citation_being_printed, LC("return to body"));
}
}

Expand All @@ -853,15 +853,15 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc
temp_short = rand() % 32000 + 1;
}

printf(" <a href=\"#fnref:%d\" title=\"%s\" class=\"reversefootnote\">&#160;&#8617;&#xfe0e;</a>", temp_short, LC("return to body"));
printf(" <a href=\"#fnref_%d\" title=\"%s\" class=\"reversefootnote\">&#160;&#8617;&#xfe0e;</a>", temp_short, LC("return to body"));
}
}

if (scratch->glossary_being_printed) {
scratch->footnote_para_counter--;

if (scratch->footnote_para_counter == 0) {
printf(" <a href=\"#gnref:%d\" title=\"%s\" class=\"reverseglossary\">&#160;&#8617;&#xfe0e;</a>", scratch->glossary_being_printed, LC("return to body"));
printf(" <a href=\"#gnref_%d\" title=\"%s\" class=\"reverseglossary\">&#160;&#8617;&#xfe0e;</a>", scratch->glossary_being_printed, LC("return to body"));
}
}

Expand Down Expand Up @@ -1575,23 +1575,23 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc

if (temp_short2 == scratch->used_citations->size) {
// This is a re-use of a previously used note
printf("<a href=\"#cn:%d\" title=\"%s\" class=\"citation\">(%d)</a>",
printf("<a href=\"#cn_%d\" title=\"%s\" class=\"citation\">(%d)</a>",
temp_short, LC("see citation"), temp_short);
} else {
// This is the first time this note was used
printf("<a href=\"#cn:%d\" id=\"cnref:%d\" title=\"%s\" class=\"citation\">(%d)</a>",
printf("<a href=\"#cn_%d\" id=\"cnref_%d\" title=\"%s\" class=\"citation\">(%d)</a>",
temp_short, temp_short, LC("see citation"), temp_short);
}
} else {
// Locator present

if (temp_short2 == scratch->used_citations->size) {
// This is a re-use of a previously used note
printf("<a href=\"#cn:%d\" title=\"%s\" class=\"citation\">(%s, %d)</a>",
printf("<a href=\"#cn_%d\" title=\"%s\" class=\"citation\">(%s, %d)</a>",
temp_short, LC("see citation"), temp_char, temp_short);
} else {
// This is the first time this note was used
printf("<a href=\"#cn:%d\" id=\"cnref:%d\" title=\"%s\" class=\"citation\">(%s, %d)</a>",
printf("<a href=\"#cn_%d\" id=\"cnref_%d\" title=\"%s\" class=\"citation\">(%s, %d)</a>",
temp_short, temp_short, LC("see citation"), temp_char, temp_short);
}
}
Expand Down Expand Up @@ -1638,7 +1638,7 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc
temp_short3 = temp_short;
}

printf("<a href=\"#fn:%d\" title=\"%s\" class=\"footnote\"><sup>%d</sup></a>",
printf("<a href=\"#fn_%d\" title=\"%s\" class=\"footnote\"><sup>%d</sup></a>",
temp_short3, LC("see footnote"), temp_short);
} else {
// This is the first time this note was used
Expand All @@ -1650,7 +1650,7 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc
temp_short3 = temp_short;
}

printf("<a href=\"#fn:%d\" id=\"fnref:%d\" title=\"%s\" class=\"footnote\"><sup>%d</sup></a>",
printf("<a href=\"#fn_%d\" id=\"fnref_%d\" title=\"%s\" class=\"footnote\"><sup>%d</sup></a>",
temp_short3, temp_short3, LC("see footnote"), temp_short);
}
} else {
Expand Down Expand Up @@ -1690,15 +1690,15 @@ void mmd_export_token_html(DString * out, const char * source, token * t, scratc
if (temp_short2 == scratch->used_glossaries->size) {
// This is a re-use of a previously used note

printf("<a href=\"#gn:%d\" title=\"%s\" class=\"glossary\">",
printf("<a href=\"#gn_%d\" title=\"%s\" class=\"glossary\">",
temp_short, LC("see glossary"));
mmd_print_string_html(out, temp_note->clean_text, false, true);
print_const("</a>");
} else {
// This is the first time this note was used


printf("<a href=\"#gn:%d\" id=\"gnref:%d\" title=\"%s\" class=\"glossary\">",
printf("<a href=\"#gn_%d\" id=\"gnref_%d\" title=\"%s\" class=\"glossary\">",
temp_short, temp_short, LC("see glossary"));
mmd_print_string_html(out, temp_note->clean_text, false, true);
print_const("</a>");
Expand Down Expand Up @@ -2479,7 +2479,7 @@ void mmd_export_footnote_list_html(DString * out, const char * source, scratch_p
// Export footnote
pad(out, 2, scratch);

printf("<li id=\"fn:%d\">\n", i + 1);
printf("<li id=\"fn_%d\">\n", i + 1);
scratch->padded = 6;

note = stack_peek_index(scratch->used_footnotes, i);
Expand Down Expand Up @@ -2527,7 +2527,7 @@ void mmd_export_glossary_list_html(DString * out, const char * source, scratch_p
// Export glossary
pad(out, 2, scratch);

printf("<li id=\"gn:%d\">\n", i + 1);
printf("<li id=\"gn_%d\">\n", i + 1);
scratch->padded = 6;

note = stack_peek_index(scratch->used_glossaries, i);
Expand Down Expand Up @@ -2580,7 +2580,7 @@ void mmd_export_citation_list_html(DString * out, const char * source, scratch_p
// Export footnote
pad(out, 2, scratch);

printf("<li id=\"cn:%d\">\n", i + 1);
printf("<li id=\"cn_%d\">\n", i + 1);
scratch->padded = 6;

note = stack_peek_index(scratch->used_citations, i);
Expand Down
12 changes: 6 additions & 6 deletions tests/MMD6Tests/Abbreviations.html
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,9 @@ <h1 id="foobar">foo bar</h1>

<p>20</p>

<p><a href="#fn:1" id="fnref:1" title="see footnote" class="footnote"><sup>1</sup></a></p>
<p><a href="#fn_1" id="fnref_1" title="see footnote" class="footnote"><sup>1</sup></a></p>

<p><a href="#fn:2" id="fnref:2" title="see footnote" class="footnote"><sup>2</sup></a></p>
<p><a href="#fn_2" id="fnref_2" title="see footnote" class="footnote"><sup>2</sup></a></p>

<ul>
<li><abbr title="FOO">foo</abbr></li>
Expand All @@ -94,12 +94,12 @@ <h1 id="foobar">foo bar</h1>
<hr />
<ol>

<li id="fn:1">
<p><abbr title="FOO">foo</abbr> and <abbr title="BAR">bar</abbr> <a href="#fnref:1" title="return to body" class="reversefootnote">&#160;&#8617;&#xfe0e;</a></p>
<li id="fn_1">
<p><abbr title="FOO">foo</abbr> and <abbr title="BAR">bar</abbr> <a href="#fnref_1" title="return to body" class="reversefootnote">&#160;&#8617;&#xfe0e;</a></p>
</li>

<li id="fn:2">
<p>foo and bar <a href="#fnref:2" title="return to body" class="reversefootnote">&#160;&#8617;&#xfe0e;</a></p>
<li id="fn_2">
<p>foo and bar <a href="#fnref_2" title="return to body" class="reversefootnote">&#160;&#8617;&#xfe0e;</a></p>
</li>

</ol>
Expand Down
Loading