Skip to content

fix(source_status): use REST API for SHA lookup to support SAML-protected private repos#173

Open
hsperker wants to merge 1 commit into
microsoft:mainfrom
hsperker:fix/saml-private-repo-update-check
Open

fix(source_status): use REST API for SHA lookup to support SAML-protected private repos#173
hsperker wants to merge 1 commit into
microsoft:mainfrom
hsperker:fix/saml-private-repo-update-check

Conversation

@hsperker
Copy link
Copy Markdown

@hsperker hsperker commented May 8, 2026

Fixes amplifier update failing with Unexpected error checking {module}: ValueError: Could not extract commit SHA from Atom feed for any module hosted in a SAML-enforced private GitHub org.

Repro

  1. Have an Amplifier bundle that references a module in a SAML-enforced private org (e.g. git+https://github.com/hsperker/amplifier-module-hooks-langfuse@main)
  2. gh auth login with a token that has repo scope but is not SSO-authorized for that org (the common case — SSO auth has to be granted explicitly per token)
  3. amplifier update

Root cause

_get_github_commit_sha calls the web atom feed at github.com/{owner}/{repo}/commits/{ref}.atom. This endpoint authenticates via session cookie, not Bearer token — the PAT in Authorization: Bearer ... is silently ignored. For a private repo accessed without a valid session, GitHub returns:

HTTP/2 200
content-type: text/html; charset=utf-8
content-length: 0

(no redirect, no 401, no 404). response.raise_for_status() passes, the SHA regex finds no match, and ValueError("Could not extract commit SHA from Atom feed: ...") is raised. That ValueError falls through the carefully-crafted HTTPStatusError handler in _check_all_cached_modules (added in #100) and ends up in the generic except Exception branch, surfacing as Unexpected error checking ... instead of the friendly "private repo or rate limited" message #100 was meant to provide.

PR #100 added auth headers on the assumption GitHub would return 401/403/404 for unauthorized atom requests. It does not — the empty-200 path was missed because the test suite mocked atom XML responses and never exercised the SAML/private-repo case.

Fix

Switch _get_github_commit_sha to the REST API:

https://api.github.com/repos/{owner}/{repo}/commits/{ref}

Properties:

  • Honors Authorization: Bearer ... (it is the canonical PAT auth surface)
  • Returns proper 401/403/404 for unauthorized requests, so the existing HTTPStatusError handler from fix: use auth for Atom feed requests and improve private repo error message #100 fires correctly
  • Same 5,000 req/hr authenticated rate limit
  • Already used elsewhere in the same file (_get_commit_details), so no new auth or dependency surface

Tests

  • Existing test_source_status_auth.py tests updated from atom XML mocks to JSON mocks (the auth-header behavior under test is unchanged)
  • New regression class TestUsesRestApiNotAtomFeed:
    • test_calls_rest_api_endpoint: asserts the URL starts with https://api.github.com/ and contains no .atom — catches accidental reversion
    • test_403_surfaces_as_http_status_error: asserts that a 403 raises HTTPStatusError rather than the previous opaque ValueError, preserving the contract _check_all_cached_modules relies on

All 8 tests in test_source_status_auth.py pass locally.

Evidence

$ curl -sIL -H "Authorization: Bearer $(gh auth token)" \
    https://github.com/hsperker/amplifier-module-hooks-langfuse/commits/main.atom
HTTP/2 200
content-type: text/html; charset=utf-8
content-length: 0

vs. the API path with the same token:

$ gh api repos/hsperker/amplifier-module-hooks-langfuse/commits/main | jq -r .sha
<40-char SHA>

(after the token has been SSO-authorized for the org, which the API correctly enforces with 403; without SSO authorization it returns 403 with a clear message rather than a silent empty 200.)

Note on local test run

tests/test_pre_turn_repair.py fails to collect locally with ImportError: cannot import name 'diagnose_transcript' from 'amplifier_foundation.session'. This is pre-existing on main (unrelated to this PR — same failure with my changes reverted) and appears to be a version skew between this repo and the installed amplifier-foundation. All other 508 tests collect; the 8 in test_source_status_auth.py pass.

…cted private repos

The web atom feed at github.com/.../commits/{ref}.atom does not accept
Bearer auth. For SAML-protected private repos, GitHub silently returns
HTTP 200 with an empty body — the SHA regex finds no match and raises
ValueError, which bypasses the HTTPStatusError handler in
_check_all_cached_modules and surfaces as 'Unexpected error checking
{module}: ValueError: Could not extract commit SHA from Atom feed'.

PR microsoft#100 added Bearer auth on the assumption GitHub would return
401/403/404 for unauthorized atom requests. It does not — the empty-200
response path was missed.

Switch _get_github_commit_sha to the REST API
(api.github.com/repos/{owner}/{repo}/commits/{ref}), which honors PAT
auth and returns proper status codes. _get_commit_details in the same
file already uses this endpoint, so no new auth or dependency surface.
Rate limits are equivalent (5000/hr authenticated).
@hsperker
Copy link
Copy Markdown
Author

hsperker commented May 8, 2026

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant