Bug Description
The content-start-position check incorrectly reports a fail on pages
where real content immediately follows a heading but is in table format
rather than prose paragraphs.
Affected check
- ID:
content-start-position
- Category:
page-size
- afdocs version:
0.9.1
Example page
https://developer.atlassian.com/platform/forge/limits-scheduled-trigger.md
Reported contentStartPercent: 95% (fail) — but the page has legitimate
content immediately after the first heading. The content is a limits table,
not navigation or boilerplate.
Root cause
In headingFollowedByContent(), a heading is only considered to have real
content following it if the next lines match prose patterns:
// Current prose detection:
t.length > NAV_MAX_LENGTH && linkDensity(t) < 0.5 // long non-link line
/[.!?]$/.test(t) && t.length >= 10 // sentence with punctuation
This correctly filters nav/sidebar headings but incorrectly skips headings followed by tables — because:
• Markdown table rows end with | not .!?
• HTML <table> tags don't match any prose pattern
• Short table cells may be under NAV_MAX_LENGTH
So for a page like:
# Limits for Scheduled Triggers
| Trigger interval | Max executions |
|-----------------|----------------|
| Every 5 minutes | 50 per hour |
The heading is skipped as if it were a nav heading, the algorithm keeps scanning, and reports a falsely high contentStartPercent.
Three affected scenarios
1. Raw .md files with markdown tables
# Page Title
| Col A | Col B |
|-------|-------|
| val | val |
Table rows end with | — not matched by prose detection. ❌
2. HTML pages with HTML tables
<h1>Page Title</h1>
<table><tr><th>Col A</th><th>Col B</th></tr></table>
After htmlToMarkdown() conversion → becomes markdown table → same issue. ❌
3. .md files containing raw HTML tables
# Page Title
<table>
<tr><th>Col A</th><th>Col B</th></tr>
<tr><td>val</td><td>val</td></tr>
</table>
Since .md files skip htmlToMarkdown() conversion (raw body used directly), the <table> HTML tags are passed to findContentStart() as-is — not matched by any existing pattern. ❌
This third case is common in documentation sites that use HTML tables inside markdown files for complex layouts (e.g. developer.atlassian.com).
Suggested fix
Add table detection to headingFollowedByContent() alongside the existing prose checks:
// Markdown table row
if (/^\|.+\|/.test(t)) return true;
// HTML table tag inside .md files (not converted by htmlToMarkdown)
if (/^<table[\s>]/i.test(t)) return true;
// HTML table row inside .md files
if (/^<tr[\s>]/i.test(t)) return true;
Since findContentStart() always operates on the same text regardless of source format, this single fix covers all three scenarios.
Impact
• False fail results on legitimate reference/API documentation pages
• Affects any docs site that uses tables for reference content (API limits, config options, CLI flags, pricing tables, etc.)
• Particularly severe for .md files containing HTML tables — a common pattern in documentation platforms
• Misleads developers into thinking their content structure is wrong when it is perfectly valid
Bug Description
The
content-start-positioncheck incorrectly reports afailon pageswhere real content immediately follows a heading but is in table format
rather than prose paragraphs.
Affected check
content-start-positionpage-size0.9.1Example page
https://developer.atlassian.com/platform/forge/limits-scheduled-trigger.mdReported
contentStartPercent: 95%(fail) — but the page has legitimatecontent immediately after the first heading. The content is a limits table,
not navigation or boilerplate.
Root cause
In
headingFollowedByContent(), a heading is only considered to have realcontent following it if the next lines match prose patterns: