Skip to content

Conversation

@pauloricardomg
Copy link
Contributor

When a node crashes during its first auto-repair, the repair was incorrectly skipped by the "too soon" check after restart. This happened because tooSoonToRunRepair() uses repair_finish_ts which was set to the record creation time, not when repair actually finished. The in-progress detection logic in myTurnToRunRepair() was never reached.

The fix adds an in-progress check within tooSoonToRunRepair() that detects when repair_start_ts > repair_finish_ts and allows the repair to resume.

Patch by Paulo Motta; reviewed by X for CASSANDRA-21115

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

When a node crashes during its first auto-repair, the repair was incorrectly
skipped by the "too soon" check after restart. This happened because
tooSoonToRunRepair() uses repair_finish_ts which was set to the record
creation time, not when repair actually finished. The in-progress detection
logic in myTurnToRunRepair() was never reached.

The fix adds an in-progress check within tooSoonToRunRepair() that detects
when repair_start_ts > repair_finish_ts and allows the repair to resume.

Patch by Paulo Motta; reviewed by X for CASSANDRA-21115
Change comparison from > to >= to catch edge case where repair_start_ts
equals repair_finish_ts (repair record created but never progressed).
This prevents the "too soon" check from incorrectly skipping incomplete
repairs after node restart.

Adds test case for the start_ts == finish_ts edge case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant