CASSANDRA-21115: Fix auto-repair skipping incomplete first repair after node restart #4560
+131
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When a node crashes during its first auto-repair, the repair was incorrectly skipped by the "too soon" check after restart. This happened because tooSoonToRunRepair() uses repair_finish_ts which was set to the record creation time, not when repair actually finished. The in-progress detection logic in myTurnToRunRepair() was never reached.
The fix adds an in-progress check within tooSoonToRunRepair() that detects when repair_start_ts > repair_finish_ts and allows the repair to resume.
Patch by Paulo Motta; reviewed by X for CASSANDRA-21115
Thanks for sending a pull request! Here are some tips if you're new here:
Commit messages should follow the following format:
The Cassandra Jira