Skip to content

drbd: fix false positive resync throttling in drbd_rs_c_min_rate_throttle#832

Open
blktests-ci[bot] wants to merge 1 commit into
linus-master_basefrom
series/1094336=>linus-master
Open

drbd: fix false positive resync throttling in drbd_rs_c_min_rate_throttle#832
blktests-ci[bot] wants to merge 1 commit into
linus-master_basefrom
series/1094336=>linus-master

Conversation

@blktests-ci
Copy link
Copy Markdown

@blktests-ci blktests-ci Bot commented May 13, 2026

Pull request for series with
subject: drbd: fix false positive resync throttling in drbd_rs_c_min_rate_throttle
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1094336

…ttle

drbd_rs_c_min_rate_throttle() is intended to slow down resync when
genuine application I/O is competing for the backing device.  It used
to detect "application I/O" by comparing the total sector count from
the backing device (part_stat_read_accum) against the resync sector
counter (rs_sect_ev), and throttling when the resync speed exceeds
c-min-rate.

That curr_events heuristic produces false positives:

1) On the receiver path, rs_sect_ev is incremented *after* the throttle
   check.  The current resync I/O is already reflected in part_stat
   counters but not yet in rs_sect_ev, creating a persistent positive
   delta that looks like application I/O.

2) The per-cpu part_stat counters and the atomic rs_sect_ev are not
   read under any common lock, so transient skew between them can
   push the delta above 64 sectors even when no application I/O is
   present.

When the false positive fires, the function compares the resync speed
against c-min-rate (default 35840 KB/s ~ 35 MB/s).  On modern
hardware capable of 300+ MB/s resync the condition is almost always
true, so the caller sleeps 100 ms (HZ/10) per resync request or stops
issuing new requests, capping throughput at roughly c-min-rate.

This was observed in production on a Distributed Cloud controller
where drbd-dc-vault (100 GB) resynced at ~30 MB/s instead of the
expected ~360 MB/s.  Setting c-min-rate above the actual resync speed
(e.g. 350 MB/s) or disabling the feature (c-min-rate 0) restored full
throughput, confirming false-positive throttling as root cause.

Switch the gate to ap_bio_cnt.  inc_ap_bio() is called for every
application bio at the top of drbd_make_request(), before any
activity-log handling, and dec_ap_bio() runs on completion.  That
makes ap_bio_cnt the authoritative "application I/O in flight"
signal, independent of part_stat update timing, per-cpu skew, and
activity-log fastpath outcomes.

Backport of the drbd 9.x fix to the in-tree drbd 8.4 driver.

Suggested-by: Ionut Nechita <ionut.nechita@windriver.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
[inechita: backport to drbd 8.4 - ap_bio_cnt is scalar, not array]
Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented May 13, 2026

Upstream branch: aa54b1d
series: https://patchwork.kernel.org/project/linux-block/list/?series=1094336
version: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant