Skip to content

ci: deploy docs via GitHub Actions to py.sdk.modelcontextprotocol.io#2632

Closed
maxisbey wants to merge 1 commit into
mainfrom
docs-deploy-github-pages-actions
Closed

ci: deploy docs via GitHub Actions to py.sdk.modelcontextprotocol.io#2632
maxisbey wants to merge 1 commit into
mainfrom
docs-deploy-github-pages-actions

Conversation

@maxisbey
Copy link
Copy Markdown
Contributor

Motivation and Context

Fixes #2614. The Python SDK docs link on https://modelcontextprotocol.io/docs/sdk
points to https://py.sdk.modelcontextprotocol.io/, which returns 404. Every
other SDK serves correctly at its *.sdk.modelcontextprotocol.io domain.

The DNS is already fully configured in modelcontextprotocol/dns (py.sdk
CNAME → modelcontextprotocol.github.io, plus the GitHub Pages domain
verification TXT records) and resolves correctly. The 404 is entirely a
Pages-side gap: no Pages site claims the custom domain
(gh api repos/modelcontextprotocol/python-sdk/pages shows cname: null).

The root cause is the deployment mechanism. Docs are published with
mkdocs gh-deploy --force, which force-pushes the built site to the
gh-pages branch. With branch-based Pages, a custom domain only works if a
CNAME file is present in that branch — and a force-push overwrites it on
every publish. This is a known wart of the gh-deploy model.

This PR migrates to the GitHub Actions Pages artifact pipeline
(configure-pagesupload-pages-artifactdeploy-pages), which is
GitHub's recommended modern model and what the TypeScript SDK already uses:

  • The custom domain becomes a persistent repository Pages setting, not a
    file that gets overwritten — it survives every deploy.
  • Drops contents: write (a bot pushing commits) in favour of scoped,
    short-lived pages: write + id-token: write (OIDC).
  • No generated HTML accumulating in a gh-pages branch; atomic deploys
    with a real github-pages environment.
  • Deploys continuously on push to main instead of only via manual
    workflow_dispatch, so docs no longer drift from main
    (workflow_dispatch is retained as a manual trigger).

site_url is also updated to https://py.sdk.modelcontextprotocol.io/ so
canonical links and sitemap.xml match the published domain. This subsumes
and supersedes #2615 (which changed only site_url — necessary but not
sufficient on its own).

Prior art: this mirrors the TypeScript SDK's docs deployment
(modelcontextprotocol/typescript-sdk#1109 introduced GitHub Pages
deployment; modelcontextprotocol/typescript-sdk#1584 moved it to the
artifact-based deploy-pages model used here).

Maintainer action required after merge

The workflow change alone does not flip the Pages mechanism — a repo admin
must do the following one-time settings change (the manual-publish path
required a maintainer too, so this adds no new gate). Recommended order to
avoid a broken intermediate state:

  1. Settings → Pages → Source: GitHub Actions (switches build_type
    from legacy to workflow).
  2. Merge this PR (or run the workflow via workflow_dispatch) and confirm
    the first Actions deployment succeeds.
  3. Settings → Pages → Custom domain: py.sdk.modelcontextprotocol.io,
    Save. (Equivalently:
    gh api -X PUT repos/modelcontextprotocol/python-sdk/pages -f cname=py.sdk.modelcontextprotocol.io.)
    DNS is already verified, so this is near-instant.
  4. Once the certificate is issued, tick Enforce HTTPS.

No DNS changes are needed. After step 3 the custom domain persists across
all future deploys. The old modelcontextprotocol.github.io/python-sdk/
path keeps working — GitHub 301-redirects it to the custom domain.

Verify:

gh api repos/modelcontextprotocol/python-sdk/pages --jq '{build_type,cname,https_enforced}'
# expect: {"build_type":"workflow","cname":"py.sdk.modelcontextprotocol.io","https_enforced":true}
curl -sI https://py.sdk.modelcontextprotocol.io/   # expect HTTP/2 200

The stale gh-pages branch can be deleted once the custom domain is
confirmed serving.

How Has This Been Tested?

uv run --frozen --no-sync mkdocs build was run locally: it builds in
strict mode, outputs to site/ (matching the artifact path:), and the
generated index.html canonical tag and sitemap.xml correctly use
https://py.sdk.modelcontextprotocol.io/. The Pages source/custom-domain
toggle is a repo setting and is covered in the maintainer checklist above.

Breaking Changes

No. User-facing docs URLs are unchanged (the new domain is where the docs
were always meant to be served; the old github.io path redirects).

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Pages actions are pinned to commit SHAs to match the repository's existing
workflow convention.

AI Disclaimer

Replace the manual gh-deploy workflow with the GitHub Actions Pages
artifact pipeline (configure-pages -> upload-pages-artifact ->
deploy-pages) and deploy on push to main.

The previous workflow pushed the built site to the gh-pages branch with
mkdocs gh-deploy --force. With branch-based Pages, a custom domain only
works via a CNAME file in the branch, which a force-push overwrites on
every publish. Switching to artifact-based deployment lets the custom
domain live as a persistent repository Pages setting instead, so it
survives every deploy, and drops the contents: write permission in
favour of the scoped pages/id-token tokens.

Also point site_url at https://py.sdk.modelcontextprotocol.io/ so
canonical links and the sitemap match the SDK's published domain.

Closes #2614.
Comment on lines +9 to +11
concurrency:
group: deploy-docs
cancel-in-progress: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 GitHub's official Pages starter workflows set cancel-in-progress: false for the deploy concurrency group, with the rationale that production Pages deployments should be allowed to complete rather than be cancelled mid-flight — and the TypeScript SDK workflow this PR cites as prior art does the same. Consider flipping this to false so a second push to main queues behind an in-progress deploy instead of cancelling it.

Extended reasoning...

What it is. The new deploy-docs.yml declares a concurrency group with cancel-in-progress: true. GitHub's own Pages starter workflows (e.g. actions/starter-workflows/pages/mkdocs.yml, static.yml, jekyll.yml) all set this to false, with an explicit inline comment: "Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued. However, do NOT cancel in-progress runs as we want to allow these production deployments to complete." The TypeScript SDK workflow that this PR's description cites as prior art (modelcontextprotocol/typescript-sdk#1584) also follows the false pattern.

The code path. With cancel-in-progress: true, two pushes to main in quick succession will cause the second run to cancel the first. If the first run is already inside the actions/deploy-pages step — after the artifact has been registered with the Pages API but before the deployment has been promoted — the cancellation aborts a production deployment mid-flight.

Why nothing else prevents it. The concurrency group correctly serialises runs (only one at a time), but cancel-in-progress: true overrides the default queueing behaviour. There's no other guard in the workflow that distinguishes "safe to cancel" (still building the site) from "don't cancel" (already calling the Pages deploy API).

Impact. In practice the blast radius is small: GitHub Pages deployments are largely atomic on the server side, so the most likely outcome of a cancelled deploy is that the previous version keeps serving, not a half-applied site. The newer push will deploy strictly newer content anyway. The realistic bad case is narrow — deploy A is cancelled, then deploy B fails for an unrelated reason, leaving you with no successful deploy when A would have landed. Still, it deviates from both GitHub's documented guidance and the prior art the PR explicitly follows.

Step-by-step example.

  1. Commit X is pushed to main → run A starts, builds the site, reaches actions/deploy-pages.
  2. Commit Y is pushed to main 30 seconds later → run B starts in the deploy-docs concurrency group.
  3. Because cancel-in-progress: true, GitHub cancels run A mid-way through the deploy-pages step.
  4. Run B builds and deploys commit Y. If B succeeds, all is well. If B fails (transient runner issue, broken build, etc.), the docs are now stale and there is no record of A having succeeded — even though A had already finished building and was deploying.

Fix. One-line change:

concurrency:
  group: deploy-docs
  cancel-in-progress: false

This matches GitHub's starter templates and the TS SDK workflow: the in-flight deploy is allowed to finish, intermediate queued runs are skipped, and only the latest queued run is kept.

@maxisbey maxisbey closed this May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Broken link to Python SDK docs from MCP docs

1 participant