-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Problem
GitHub's REST API limits page-based pagination to page 100 (10,000 items with per_page=100). Repositories with more than 10,000 issues/PRs (like mne-tools/mne-python with 13,000+) trigger an HTTP 422 error when requesting beyond page 100.
Current error:
HTTP 422 on page 101 for mne-tools/mne-python issues
This means we silently miss issues/PRs beyond the 10,000 limit.
Proposed Solution
Switch from page-based to cursor-based pagination using GitHub's GraphQL API or the REST API's Link header with since/after parameters.
Option A: Use since parameter (REST API)
For issues/PRs, use since parameter with ISO 8601 timestamp to paginate by creation/update date instead of page number. This avoids the page limit entirely.
Option B: Switch to GraphQL API
GitHub's GraphQL API uses cursor-based pagination natively and has no page limit.
Implementation
- Modify
src/knowledge/github_sync.pysync functions - Replace
page=Niteration with cursor-based approach - Keep backward compatibility with existing incremental sync logic
- Add tests for repos with >10,000 items
Context
Discovered during MNE community onboarding. mne-tools/mne-python has 13,000+ issues, exceeding GitHub's page-based pagination limit. The sync currently stops at 10,000 items silently (or errors on page 101).