Skip to content

Add Cloudflare Markdown for Agents pre-check to fetch workflows#928

Open
catchingknives wants to merge 1 commit intodanielmiessler:mainfrom
catchingknives:feature/cloudflare-markdown-content-negotiation
Open

Add Cloudflare Markdown for Agents pre-check to fetch workflows#928
catchingknives wants to merge 1 commit intodanielmiessler:mainfrom
catchingknives:feature/cloudflare-markdown-content-negotiation

Conversation

@catchingknives
Copy link

Idea: Free markdown before the tier chain even starts

Cloudflare launched Markdown for Agents in February 2026 — sites behind Cloudflare (Pro+) can now serve native markdown via Accept: text/markdown content negotiation. Their benchmark shows ~80% token savings (16,180 → 3,150 tokens on a typical blog post). Server-side conversion is more accurate than client-side HTML parsing. Non-Cloudflare sites simply ignore the header and return HTML, so there's zero downside to trying.

This PR adds a lightweight curl pre-check to FourTierScrape and Retrieve that probes for markdown before entering the existing tier/layer chain:

curl -sL -H "Accept: text/markdown" "[URL]" | head -5

If the body comes back as markdown (YAML frontmatter or # heading instead of <!DOCTYPE) → use it directly, skip the tier chain. If HTML → proceed to Tier 1 as normal.

Why "pre-check" and not "Tier 0"

Deliberately framed as a preamble rather than a new numbered tier so the four-tier naming/numbering stays intact. No rename, no renumbering, just an additive optimization before the chain starts.

What changed

File Change
Scraping/BrightData/Workflows/FourTierScrape.md Added pre-check section before Tier 1, updated decision diagram, updated Tier 2 Accept header to prefer text/markdown
Research/Workflows/Retrieve.md Added pre-fetch note in Layer 1 section

Gotcha discovered during testing

Cloudflare's CDN currently returns content-type: text/html in the response header even when the body is markdown (verified on blog.cloudflare.com). The vary: accept header confirms content negotiation occurred. Detection should check the body format, not just the content-type header. This is noted in the workflow.

Tested against

  • blog.cloudflare.com/markdown-for-agents/ with Accept: text/markdown → body is clean markdown with YAML frontmatter ✓
  • Same URL with Accept: text/html → body is raw HTML ✓
  • example.com with Accept: text/markdown → body is HTML, graceful fallthrough ✓

Open to feedback on framing, placement, or whether this should live somewhere else entirely.


🤖 Generated with Claude Code

Adds a lightweight curl probe before the existing tier chain in
FourTierScrape and before WebFetch in Retrieve. Sites behind
Cloudflare (Pro+) with Markdown for Agents enabled serve native
markdown via Accept: text/markdown content negotiation, saving
~80% tokens. Non-Cloudflare sites ignore the header — zero downside.

Framed as a pre-check rather than a new tier to preserve the
existing four-tier naming and numbering.

Also updates Tier 2 Accept header to prefer text/markdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant