Problem
When docs live at a subpath (e.g., https://swagger.io/docs/), discoverSitemapUrls() only checks:
{origin}/robots.txt for Sitemap: directives
- Falls back to
{origin}/sitemap.xml
If neither exists, discovery fails entirely, even when a valid sitemap is available at the docs URL path.
Example: Swagger UI docs at https://swagger.io/docs/ have a Starlight-generated sitemap at /docs/sitemap-index.xml (90 pages). But robots.txt returns 404 and /sitemap.xml returns 404, so afdocs finds 0 pages and falls back to testing only the root URL.
Proposed fix
When the input URL has a path component (i.e., it's not at the origin root), add these as fallback candidates in discoverSitemapUrls() after the existing origin-level checks:
{url}/sitemap.xml
{url}/sitemap-index.xml
The sitemap-index.xml filename is the Astro/Starlight convention (Starlight v0.21.5+ generates this by default). Other SSGs may use sitemap.xml at the docs subpath.
Note: fetchDocsSitemap() in llms-txt-freshness.ts already checks {baseUrl}/sitemap.xml for the freshness comparison, but that logic isn't used during primary page discovery. This would bring the same awareness to the discovery path.
Affected sites
- Swagger UI (
swagger.io/docs/) — Astro Starlight, sitemap at /docs/sitemap-index.xml
- Potentially any Starlight-based docs site hosted at a subpath
Problem
When docs live at a subpath (e.g.,
https://swagger.io/docs/),discoverSitemapUrls()only checks:{origin}/robots.txtforSitemap:directives{origin}/sitemap.xmlIf neither exists, discovery fails entirely, even when a valid sitemap is available at the docs URL path.
Example: Swagger UI docs at
https://swagger.io/docs/have a Starlight-generated sitemap at/docs/sitemap-index.xml(90 pages). Butrobots.txtreturns 404 and/sitemap.xmlreturns 404, so afdocs finds 0 pages and falls back to testing only the root URL.Proposed fix
When the input URL has a path component (i.e., it's not at the origin root), add these as fallback candidates in
discoverSitemapUrls()after the existing origin-level checks:{url}/sitemap.xml{url}/sitemap-index.xmlThe
sitemap-index.xmlfilename is the Astro/Starlight convention (Starlight v0.21.5+ generates this by default). Other SSGs may usesitemap.xmlat the docs subpath.Note:
fetchDocsSitemap()inllms-txt-freshness.tsalready checks{baseUrl}/sitemap.xmlfor the freshness comparison, but that logic isn't used during primary page discovery. This would bring the same awareness to the discovery path.Affected sites
swagger.io/docs/) — Astro Starlight, sitemap at/docs/sitemap-index.xml