Skip to content

fix(dispatcher): add per-workflow locking to prevent run ID race conditions#11

Merged
skylenet merged 1 commit into
masterfrom
fix-race-condition-same-workflow-file
Jan 12, 2026
Merged

fix(dispatcher): add per-workflow locking to prevent run ID race conditions#11
skylenet merged 1 commit into
masterfrom
fix-race-condition-same-workflow-file

Conversation

@skylenet
Copy link
Copy Markdown
Member

@skylenet skylenet commented Jan 12, 2026

Summary

  • Add per-workflow-template locking to prevent race conditions when multiple groups dispatch the same workflow
  • Wait inline for run ID matching after dispatch while holding the lock
  • Ensures sequential dispatch for jobs targeting the same owner/repo/workflow_id

Problem

When multiple groups (A, B) trigger the same workflow (owner/repo/workflow.yaml), there was a race condition:

T0:      Group A triggers workflow X
T0+100ms: Group B triggers workflow X (same workflow)
T5s:     trackRuns() polls GitHub, finds Run1 and Run2
         -> Job A's findWorkflowRun() picks Run2 (most recent) ❌ WRONG
         -> Job B's findWorkflowRun() picks Run1 ❌ WRONG

GitHub's workflow_dispatch API returns 204 No Content with no run ID, so the system must poll and match by timestamp. Without locking, concurrent dispatches to the same workflow could get each other's run IDs.

Solution

Add per-workflow locking with inline run ID matching:

  1. Lock key: owner/repo/workflow_id
  2. Flow:
    • Acquire lock before triggering
    • Trigger workflow dispatch
    • Mark job as triggered
    • Poll for run ID (up to 60s, 5s intervals) while holding lock
    • Release lock after run ID found or timeout

This ensures only one job at a time can dispatch and match a specific workflow template.

Test plan

  • Build passes
  • Configure two groups with the same workflow template
  • Enqueue jobs in both groups simultaneously
  • Verify each job gets the correct run ID (check logs for sequential dispatch)
  • Verify tracking loop still works as fallback if inline matching times out

…itions

When multiple groups trigger the same workflow (owner/repo/workflow_id),
there was a race condition where jobs could get incorrect run IDs due to
timestamp-based matching in a shared time window.

Add per-workflow-template locking that:
- Acquires a mutex keyed by owner/repo/workflow_id before dispatching
- Waits inline (up to 60s) for run ID to be matched after trigger
- Releases lock only after run ID is found or timeout

This ensures sequential dispatch for jobs targeting the same workflow,
preventing run ID cross-matching between concurrent dispatches.
@skylenet skylenet merged commit baff303 into master Jan 12, 2026
5 of 6 checks passed
@skylenet skylenet deleted the fix-race-condition-same-workflow-file branch January 12, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant