scsi/block: NUMA-local scan allocations, shared-tag path cleanup, and SCSI I/O counters#752
scsi/block: NUMA-local scan allocations, shared-tag path cleanup, and SCSI I/O counters#752blktests-ci[bot] wants to merge 3 commits intolinus-master_basefrom
Conversation
|
Upstream branch: d60bc14 |
6b4d829 to
ceec5ed
Compare
|
Upstream branch: b4e0758 |
421cf99 to
6e400f8
Compare
ceec5ed to
3b54e52
Compare
|
Upstream branch: 6596a02 |
6e400f8 to
df480d9
Compare
3b54e52 to
6a0b974
Compare
|
Upstream branch: 507bd4b |
df480d9 to
04a61e0
Compare
6a0b974 to
59ca59b
Compare
|
Upstream branch: dd6c438 |
04a61e0 to
c5c14f9
Compare
94f0438 to
857ada9
Compare
|
Upstream branch: dd6c438 |
c5c14f9 to
41afac9
Compare
857ada9 to
482ce5b
Compare
|
Upstream branch: dca922e |
41afac9 to
15a53d6
Compare
482ce5b to
5a9f7c7
Compare
|
Upstream branch: e75a43c |
15a53d6 to
562e80c
Compare
5a9f7c7 to
25a041f
Compare
|
Upstream branch: 66edb90 |
562e80c to
baf7e88
Compare
25a041f to
6f75bd1
Compare
|
Upstream branch: 6d35786 |
baf7e88 to
e50c47d
Compare
6f75bd1 to
1f0d33a
Compare
|
Upstream branch: 6d35786 |
e50c47d to
9be01f2
Compare
1f0d33a to
b1870f6
Compare
…apter When a host adapter is attached to a specific NUMA node, allocating scsi_device and scsi_target via kzalloc() may place them on a remote node. All hot-path I/O accesses to these structures then cross the NUMA interconnect, adding latency and consuming inter-node bandwidth. Use kzalloc_node() with dev_to_node(shost->dma_dev) so allocations land on the same node as the HBA, reducing cross-node traffic and improving I/O performance on NUMA systems. Signed-off-by: James Rizzo <james.rizzo@broadcom.com> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Original patch [1] by Bart Van Assche; this version is rebased onto the current tree. In testing it improves IOPS by roughly 16-18% by removing the fair-sharing throttle on shared tag queues. This patch removes the following code and structure members: - The function hctx_may_queue(). - blk_mq_hw_ctx.nr_active and request_queue.nr_active_requests_shared_tags and also all the code that modifies these two member variables. [1]: https://lore.kernel.org/linux-block/20240529213921.3166462-1-bvanassche@acm.org/ Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
iorequest_cnt and iodone_cnt are updated on every command dispatch and completion, often from different CPUs on high queue depth workloads. Using adjacent atomic_t fields caused cache line contention between the submission and completion paths. Represent these statistics with struct percpu_counter so increments are mostly local to each CPU, avoiding false sharing without growing struct scsi_device further for cache-line padding. Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
|
Upstream branch: aa54b1d |
9be01f2 to
b90e2f8
Compare
Pull request for series with
subject: scsi/block: NUMA-local scan allocations, shared-tag path cleanup, and SCSI I/O counters
version: 2
url: https://patchwork.kernel.org/project/linux-block/list/?series=1083261