Description
pull_request.synchronize updates the canonical PR row to the new head_sha/base_sha and resets scoring_data_stored=false, but fetch-pr-files jobs are keyed only by repository and PR number. If an older file-fetch job is already active, BullMQ can dedupe the newer enqueue against that active job while the old worker continues fetching files for the PR row it read before the push.
When that old worker finishes, it can mark the current PR row as complete even though the stored file rows were fetched for an older head. The public files endpoint can then expose the newer head_sha with stale file contents and scoring_data_stored=true.
Steps to Reproduce
- Start a
fetch-pr-files job for owner/repo#7 while the PR row has head_sha='H1' and base_sha='B1'.
- Before that job finishes, process a
pull_request.synchronize webhook for the same PR with head_sha='H2' and base_sha='B2'.
- Observe that the webhook updates the PR row and resets
scoring_data_stored=false, then enqueues another fetch-pr-files job with the same stable job id.
- Because the older job is still active, BullMQ can treat the new enqueue as a duplicate of the active job.
- Let the old job finish after storing files fetched for
H1.
- Query
GET /api/v1/pulls/owner/repo/7/files.
Expected Behavior
Each PR file-fetch job should be tied to the PR head/base generation it is meant to fetch. A stale job should not be able to mark scoring_data_stored=true for a newer PR row, and a newer push should result in a current-generation file fetch.
Actual Behavior
The active old job can store stale file rows and then mark the current PR row as complete. Validators may see head_sha='H2' and scoring_data_stored=true while the file rows still correspond to H1.
Environment
- OS: Any server environment
- Runtime/Node version: Node 20, as used by CI
- Browser (if applicable): N/A
Additional Context
Relevant code paths:
packages/das/src/webhook/handlers/pull-request.handler.ts updates the PR row on synchronize, resets scoring_data_stored=false, and enqueues fetch-pr-files.
packages/das/src/webhook/handlers/pull-request.handler.ts uses a stable file job id based only on repoFullName and prNumber.
packages/das/src/webhook/github-fetcher.service.ts reads the PR row once and fetches files/content using that snapshot.
packages/das/src/queue/fetch.processor.ts marks scoring_data_stored=true after fetchAndStorePrFiles returns.
packages/das/src/api/pulls/pulls.service.ts exposes head_sha, base_sha, scoring_data_stored, and stored file rows together.
This breaks the scoring-data contract: scoring_data_stored=true should mean the persisted file rows match the PR head/base currently exposed by the files API.
Description
pull_request.synchronizeupdates the canonical PR row to the newhead_sha/base_shaand resetsscoring_data_stored=false, butfetch-pr-filesjobs are keyed only by repository and PR number. If an older file-fetch job is already active, BullMQ can dedupe the newer enqueue against that active job while the old worker continues fetching files for the PR row it read before the push.When that old worker finishes, it can mark the current PR row as complete even though the stored file rows were fetched for an older head. The public files endpoint can then expose the newer
head_shawith stale file contents andscoring_data_stored=true.Steps to Reproduce
fetch-pr-filesjob forowner/repo#7while the PR row hashead_sha='H1'andbase_sha='B1'.pull_request.synchronizewebhook for the same PR withhead_sha='H2'andbase_sha='B2'.scoring_data_stored=false, then enqueues anotherfetch-pr-filesjob with the same stable job id.H1.GET /api/v1/pulls/owner/repo/7/files.Expected Behavior
Each PR file-fetch job should be tied to the PR head/base generation it is meant to fetch. A stale job should not be able to mark
scoring_data_stored=truefor a newer PR row, and a newer push should result in a current-generation file fetch.Actual Behavior
The active old job can store stale file rows and then mark the current PR row as complete. Validators may see
head_sha='H2'andscoring_data_stored=truewhile the file rows still correspond toH1.Environment
Additional Context
Relevant code paths:
packages/das/src/webhook/handlers/pull-request.handler.tsupdates the PR row on synchronize, resetsscoring_data_stored=false, and enqueuesfetch-pr-files.packages/das/src/webhook/handlers/pull-request.handler.tsuses a stable file job id based only onrepoFullNameandprNumber.packages/das/src/webhook/github-fetcher.service.tsreads the PR row once and fetches files/content using that snapshot.packages/das/src/queue/fetch.processor.tsmarksscoring_data_stored=trueafterfetchAndStorePrFilesreturns.packages/das/src/api/pulls/pulls.service.tsexposeshead_sha,base_sha,scoring_data_stored, and stored file rows together.This breaks the scoring-data contract:
scoring_data_stored=trueshould mean the persisted file rows match the PR head/base currently exposed by the files API.