Skip to content

Fix Migrate() hang on panic abort during throttling#1677

Open
jakubpliszka wants to merge 2 commits into
masterfrom
jakubpliszka/fix-panic-deadlock-during-throttle
Open

Fix Migrate() hang on panic abort during throttling#1677
jakubpliszka wants to merge 2 commits into
masterfrom
jakubpliszka/fix-panic-deadlock-during-throttle

Conversation

@jakubpliszka
Copy link
Copy Markdown
Contributor

Description

This PR makes consumeRowCopyComplete() context aware so it always unblocks and exits cleanly on abort.

When gh-ost is panic aborted while throttled, the migration never exits because consumeRowCopyComplete() blocks forever waiting on rowCopyComplete. This happens when context cancellation causes the completion signal to be dropped, leaving teardown and shutdown logic unreachable.

In case this PR introduced Go code changes:

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

Copilot AI review requested due to automatic review settings May 15, 2026 19:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a hang in Migrator.consumeRowCopyComplete() that occurred when gh-ost was panic-aborted during throttling. Previously, the function blocked unconditionally on rowCopyComplete, so if the abort path drained/cancelled the migration before any value was sent on that channel, teardown could never proceed. The fix wraps the receive in a select that also returns when the migration context is cancelled.

Changes:

  • Make consumeRowCopyComplete() context-aware via select on rowCopyComplete and migrationContext.GetContext().Done().
  • Preserve existing behavior of calling mgtr.abort(err) and returning without marking row copy complete on error.
Show a summary per file
File Description
go/logic/migrator.go Adds a context-cancellation case to the initial receive in consumeRowCopyComplete() so it unblocks on abort.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 1/1 changed files
  • Comments generated: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants