Skip to content

content-start-position: false positive on pages where content is a table (markdown, HTML, or HTML-in-md) #20

@Chris-ganta

Description

@Chris-ganta

Bug Description

The content-start-position check incorrectly reports a fail on pages
where real content immediately follows a heading but is in table format
rather than prose paragraphs.

Affected check

  • ID: content-start-position
  • Category: page-size
  • afdocs version: 0.9.1

Example page

https://developer.atlassian.com/platform/forge/limits-scheduled-trigger.md

Reported contentStartPercent: 95% (fail) — but the page has legitimate
content immediately after the first heading. The content is a limits table,
not navigation or boilerplate.

Root cause

In headingFollowedByContent(), a heading is only considered to have real
content following it if the next lines match prose patterns:

// Current prose detection:
t.length > NAV_MAX_LENGTH && linkDensity(t) < 0.5  // long non-link line
/[.!?]$/.test(t) && t.length >= 10                 // sentence with punctuation
This correctly filters nav/sidebar headings but incorrectly skips headings followed by tables  because:

  Markdown table rows end with | not .!?
  HTML <table> tags don't match any prose pattern
  Short table cells may be under NAV_MAX_LENGTH
So for a page like:

# Limits for Scheduled Triggers

| Trigger interval | Max executions |
|-----------------|----------------|
| Every 5 minutes | 50 per hour    |
The heading is skipped as if it were a nav heading, the algorithm keeps scanning, and reports a falsely high contentStartPercent.
Three affected scenarios
1. Raw .md files with markdown tables

# Page Title
| Col A | Col B |
|-------|-------|
| val   | val   |
Table rows end with |  not matched by prose detection. 
2. HTML pages with HTML tables

<h1>Page Title</h1>
<table><tr><th>Col A</th><th>Col B</th></tr></table>
After htmlToMarkdown() conversion  becomes markdown table  same issue. 
3. .md files containing raw HTML tables

# Page Title

<table>
  <tr><th>Col A</th><th>Col B</th></tr>
  <tr><td>val</td><td>val</td></tr>
</table>
Since .md files skip htmlToMarkdown() conversion (raw body used directly), the <table> HTML tags are passed to findContentStart() as-is — not matched by any existing pattern. ❌
This third case is common in documentation sites that use HTML tables inside markdown files for complex layouts (e.g. developer.atlassian.com).
Suggested fix
Add table detection to headingFollowedByContent() alongside the existing prose checks:

// Markdown table row
if (/^\|.+\|/.test(t)) return true;

// HTML table tag inside .md files (not converted by htmlToMarkdown)
if (/^<table[\s>]/i.test(t)) return true;

// HTML table row inside .md files
if (/^<tr[\s>]/i.test(t)) return true;
Since findContentStart() always operates on the same text regardless of source format, this single fix covers all three scenarios.
Impact

•  False fail results on legitimate reference/API documentation pages
•  Affects any docs site that uses tables for reference content (API limits, config options, CLI flags, pricing tables, etc.)
•  Particularly severe for .md files containing HTML tables — a common pattern in documentation platforms
•  Misleads developers into thinking their content structure is wrong when it is perfectly valid

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions