Current behavior: The markdown-content-parity check assumes that any drift between markdown and HTML content means agents are getting outdated information and treats all divergence as a problem.
Problem: Content drift isn't always a deficiency. Some sites intentionally serve different content to different audiences — for example, using tags or other mechanisms to provide agent-optimized markdown alongside human-optimized HTML. In those cases, divergence is a feature, not a bug.
Desired behavior: The check should distinguish between two scenarios:
- Unintentional drift — markdown and HTML have fallen out of sync (the current assumption, and a real problem).
- Intentional audience segmentation — the site deliberately serves different content for agents vs. humans, e.g., through structured tags or metadata indicating the split is purposeful. For example, Fern specifically does this through
<llms-only> and <llms-ignore> tags.
I don't have a specific implementation in mind, but roughly: the check should pass if markdown == HTML or if there's some evidence the content is intentionally segmented by audience.
Current behavior: The
markdown-content-paritycheck assumes that any drift between markdown and HTML content means agents are getting outdated information and treats all divergence as a problem.Problem: Content drift isn't always a deficiency. Some sites intentionally serve different content to different audiences — for example, using tags or other mechanisms to provide agent-optimized markdown alongside human-optimized HTML. In those cases, divergence is a feature, not a bug.
Desired behavior: The check should distinguish between two scenarios:
<llms-only>and<llms-ignore>tags.I don't have a specific implementation in mind, but roughly: the check should pass if markdown == HTML or if there's some evidence the content is intentionally segmented by audience.