Add Cloudflare Markdown for Agents pre-check to fetch workflows by catchingknives · Pull Request #928 · danielmiessler/Personal_AI_Infrastructure

catchingknives · 2026-03-07T20:13:47Z

Idea: Free markdown before the tier chain even starts

Cloudflare launched Markdown for Agents in February 2026 — sites behind Cloudflare (Pro+) can now serve native markdown via Accept: text/markdown content negotiation. Their benchmark shows ~80% token savings (16,180 → 3,150 tokens on a typical blog post). Server-side conversion is more accurate than client-side HTML parsing. Non-Cloudflare sites simply ignore the header and return HTML, so there's zero downside to trying.

This PR adds a lightweight curl pre-check to FourTierScrape and Retrieve that probes for markdown before entering the existing tier/layer chain:

curl -sL -H "Accept: text/markdown" "[URL]" | head -5

If the body comes back as markdown (YAML frontmatter or # heading instead of <!DOCTYPE) → use it directly, skip the tier chain. If HTML → proceed to Tier 1 as normal.

Why "pre-check" and not "Tier 0"

Deliberately framed as a preamble rather than a new numbered tier so the four-tier naming/numbering stays intact. No rename, no renumbering, just an additive optimization before the chain starts.

What changed

File	Change
`Scraping/BrightData/Workflows/FourTierScrape.md`	Added pre-check section before Tier 1, updated decision diagram, updated Tier 2 Accept header to prefer `text/markdown`
`Research/Workflows/Retrieve.md`	Added pre-fetch note in Layer 1 section

Gotcha discovered during testing

Cloudflare's CDN currently returns content-type: text/html in the response header even when the body is markdown (verified on blog.cloudflare.com). The vary: accept header confirms content negotiation occurred. Detection should check the body format, not just the content-type header. This is noted in the workflow.

Tested against

blog.cloudflare.com/markdown-for-agents/ with Accept: text/markdown → body is clean markdown with YAML frontmatter ✓
Same URL with Accept: text/html → body is raw HTML ✓
example.com with Accept: text/markdown → body is HTML, graceful fallthrough ✓

Open to feedback on framing, placement, or whether this should live somewhere else entirely.

🤖 Generated with Claude Code

Adds a lightweight curl probe before the existing tier chain in FourTierScrape and before WebFetch in Retrieve. Sites behind Cloudflare (Pro+) with Markdown for Agents enabled serve native markdown via Accept: text/markdown content negotiation, saving ~80% tokens. Non-Cloudflare sites ignore the header — zero downside. Framed as a pre-check rather than a new tier to preserve the existing four-tier naming and numbering. Also updates Tier 2 Accept header to prefer text/markdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Cloudflare Markdown for Agents pre-check to fetch workflows#928

Add Cloudflare Markdown for Agents pre-check to fetch workflows#928
catchingknives wants to merge 1 commit intodanielmiessler:mainfrom
catchingknives:feature/cloudflare-markdown-content-negotiation

catchingknives commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

catchingknives commented Mar 7, 2026

Idea: Free markdown before the tier chain even starts

Why "pre-check" and not "Tier 0"

What changed

Gotcha discovered during testing

Tested against

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant