drbd: fix false positive resync throttling in drbd_rs_c_min_rate_throttle#832
Open
blktests-ci[bot] wants to merge 1 commit into
Open
drbd: fix false positive resync throttling in drbd_rs_c_min_rate_throttle#832blktests-ci[bot] wants to merge 1 commit into
blktests-ci[bot] wants to merge 1 commit into
Conversation
…ttle drbd_rs_c_min_rate_throttle() is intended to slow down resync when genuine application I/O is competing for the backing device. It used to detect "application I/O" by comparing the total sector count from the backing device (part_stat_read_accum) against the resync sector counter (rs_sect_ev), and throttling when the resync speed exceeds c-min-rate. That curr_events heuristic produces false positives: 1) On the receiver path, rs_sect_ev is incremented *after* the throttle check. The current resync I/O is already reflected in part_stat counters but not yet in rs_sect_ev, creating a persistent positive delta that looks like application I/O. 2) The per-cpu part_stat counters and the atomic rs_sect_ev are not read under any common lock, so transient skew between them can push the delta above 64 sectors even when no application I/O is present. When the false positive fires, the function compares the resync speed against c-min-rate (default 35840 KB/s ~ 35 MB/s). On modern hardware capable of 300+ MB/s resync the condition is almost always true, so the caller sleeps 100 ms (HZ/10) per resync request or stops issuing new requests, capping throughput at roughly c-min-rate. This was observed in production on a Distributed Cloud controller where drbd-dc-vault (100 GB) resynced at ~30 MB/s instead of the expected ~360 MB/s. Setting c-min-rate above the actual resync speed (e.g. 350 MB/s) or disabling the feature (c-min-rate 0) restored full throughput, confirming false-positive throttling as root cause. Switch the gate to ap_bio_cnt. inc_ap_bio() is called for every application bio at the top of drbd_make_request(), before any activity-log handling, and dec_ap_bio() runs on completion. That makes ap_bio_cnt the authoritative "application I/O in flight" signal, independent of part_stat update timing, per-cpu skew, and activity-log fastpath outcomes. Backport of the drbd 9.x fix to the in-tree drbd 8.4 driver. Suggested-by: Ionut Nechita <ionut.nechita@windriver.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> [inechita: backport to drbd 8.4 - ap_bio_cnt is scalar, not array] Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
Author
|
Upstream branch: aa54b1d |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull request for series with
subject: drbd: fix false positive resync throttling in drbd_rs_c_min_rate_throttle
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1094336