Skip to content

Content negotiation check doesn't detect markdown-formatted error pages #29

@dacharyc

Description

@dacharyc

Problem

The content-negotiation check in markdown-availability can produce false positives when a site returns an error page formatted as markdown with HTTP 200 and Content-Type: text/markdown.

Found this while scoring Next.js. Requesting a page with Accept: text/markdown returns:

HTTP 200
Content-Type: text/markdown

# Page Not Found

The URL `/docs/llm-digest/app/getting-started/installation` does not exist.

## How to find the correct page
...

The check passes because:

  • Status is 200 (never validated)
  • Content-Type is text/markdown
  • Body has markdown headings and links, so looksLikeMarkdown() returns true
  • Body has no HTML tags

This gave Next.js a falsely inflated markdown-availability score of 100 (A+), when content negotiation doesn't actually work for that site (only .md URL suffix does).

Affected code

  • src/checks/markdown-availability/content-negotiation.ts (lines 46-69) — classification logic never checks status code or body semantics
  • src/helpers/detect-markdown.tslooksLikeMarkdown() is purely structural, not semantic
  • src/checks/markdown-availability/markdown-url-support.ts — shares the same vulnerability for .md suffix checks (though less likely to trigger in practice)

Secondary impact

The error page content gets cached in pageCache with source 'content-negotiation' (lines 52-66), which can poison downstream checks like markdown-content-parity.

Existing prior art

The http-status-codes check in url-stability already has a SOFT_404_PATTERNS regex:

const SOFT_404_PATTERNS = /not\s*found|page\s*not\s*found|404|does\s*not\s*exist/i;

This could be reused or adapted.

Suggested fix

Two complementary checks:

  1. Validate HTTP status code is 2xx before classifying as successful
  2. Scan body for error-page patterns (reuse SOFT_404_PATTERNS or similar) and reject matches

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions