nvme: Convert tag_list mutex to rwsemaphore to avoid deadlock #363

blktests-ci · 2025-11-14T03:32:08Z

Pull request for series with
subject: nvme: Convert tag_list mutex to rwsemaphore to avoid deadlock
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1023161

blktests-ci · 2025-11-14T03:32:09Z

Upstream branch: 6da43bb
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023161
version: 1

blktests-ci · 2025-11-16T07:41:42Z

Upstream branch: f824272
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023161
version: 1

blktests-ci · 2025-11-17T00:51:37Z

Upstream branch: f824272
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023161
version: 1

blktests-ci · 2025-11-17T02:42:48Z

Upstream branch: f824272
series: https://patchwork.kernel.org/project/linux-block/list/?series=1023161
version: 1

blktests-ci · 2025-11-17T20:40:55Z

Upstream branch: f824272
series: https://patchwork.kernel.org/project/linux-block/list/?series=1024486
version: 2

blktests-ci · 2025-11-17T23:52:22Z

Upstream branch: e7c375b
series: https://patchwork.kernel.org/project/linux-block/list/?series=1024486
version: 2

blktests-ci · 2025-11-18T02:25:52Z

Upstream branch: e7c375b
series: https://patchwork.kernel.org/project/linux-block/list/?series=1024486
version: 2

blktests-ci · 2025-11-19T00:28:50Z

Upstream branch: 8b69055
series: https://patchwork.kernel.org/project/linux-block/list/?series=1024486
version: 2

blktests-ci · 2025-11-21T09:55:53Z

Upstream branch: fd95357
series: https://patchwork.kernel.org/project/linux-block/list/?series=1024486
version: 2

blk_mq_{add,del}_queue_tag_set() functions add and remove queues from tagset, the functions make sure that tagset and queues are marked as shared when two or more queues are attached to the same tagset. Initially a tagset starts as unshared and when the number of added queues reaches two, blk_mq_add_queue_tag_set() marks it as shared along with all the queues attached to it. When the number of attached queues drops to 1 blk_mq_del_queue_tag_set() need to mark both the tagset and the remaining queues as unshared. Both functions need to freeze current queues in tagset before setting on unsetting BLK_MQ_F_TAG_QUEUE_SHARED flag. While doing so, both functions hold set->tag_list_lock mutex, which makes sense as we do not want queues to be added or deleted in the process. This used to work fine until commit 98d81f0 ("nvme: use blk_mq_[un]quiesce_tagset") made the nvme driver quiesce tagset instead of quiscing individual queues. blk_mq_quiesce_tagset() does the job and quiesce the queues in set->tag_list while holding set->tag_list_lock also. This results in deadlock between two threads with these stacktraces: __schedule+0x48e/0xed0 schedule+0x5a/0xc0 schedule_preempt_disabled+0x11/0x20 __mutex_lock.constprop.0+0x3cc/0x760 blk_mq_quiesce_tagset+0x26/0xd0 nvme_dev_disable_locked+0x77/0x280 [nvme] nvme_timeout+0x268/0x320 [nvme] blk_mq_handle_expired+0x5d/0x90 bt_iter+0x7e/0x90 blk_mq_queue_tag_busy_iter+0x2b2/0x590 ? __blk_mq_complete_request_remote+0x10/0x10 ? __blk_mq_complete_request_remote+0x10/0x10 blk_mq_timeout_work+0x15b/0x1a0 process_one_work+0x133/0x2f0 ? mod_delayed_work_on+0x90/0x90 worker_thread+0x2ec/0x400 ? mod_delayed_work_on+0x90/0x90 kthread+0xe2/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x2d/0x50 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 __schedule+0x48e/0xed0 schedule+0x5a/0xc0 blk_mq_freeze_queue_wait+0x62/0x90 ? destroy_sched_domains_rcu+0x30/0x30 blk_mq_exit_queue+0x151/0x180 disk_release+0xe3/0xf0 device_release+0x31/0x90 kobject_put+0x6d/0x180 nvme_scan_ns+0x858/0xc90 [nvme_core] ? nvme_scan_work+0x281/0x560 [nvme_core] nvme_scan_work+0x281/0x560 [nvme_core] process_one_work+0x133/0x2f0 ? mod_delayed_work_on+0x90/0x90 worker_thread+0x2ec/0x400 ? mod_delayed_work_on+0x90/0x90 kthread+0xe2/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x2d/0x50 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 The top stacktrace is showing nvme_timeout() called to handle nvme command timeout. timeout handler is trying to disable the controller and as a first step, it needs to blk_mq_quiesce_tagset() to tell blk-mq not to call queue callback handlers. The thread is stuck waiting for set->tag_list_lock as it tires to walk the queues in set->tag_list. The lock is held by the second thread in the bottom stack which is waiting for one of queues to be frozen. The queue usage counter will drop to zero after nvme_timeout() finishes, and this will not happen because the thread will wait for this mutex forever. Convert set->tag_list_lock mutex to set->tag_list_rwsem rwsemaphore to avoid the deadlock. Update blk_mq_[un]quiesce_tagset() to take the semaphore for read since this is enough to guarantee no queues will be added or removed. Update blk_mq_{add,del}_queue_tag_set() to take the semaphore for write while updating set->tag_list and downgrade it to read while freezing the queues. It should be safe to update set->flags and hctx->flags while holding the semaphore for read since the queues are already frozen. Fixes: 98d81f0 ("nvme: use blk_mq_[un]quiesce_tagset") Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

blktests-ci · 2025-12-08T05:29:20Z

Upstream branch: c2f2b01
series: https://patchwork.kernel.org/project/linux-block/list/?series=1024486
version: 2

blktests-ci bot added new V1 linus-master V1-ci-fail labels Nov 14, 2025

blktests-ci bot force-pushed the linus-master_base branch from 83d3e2f to 00d5e5c Compare November 16, 2025 07:35

blktests-ci bot force-pushed the series/1023161=>linus-master branch from c60e57c to 6268848 Compare November 16, 2025 07:41

blktests-ci bot force-pushed the linus-master_base branch from 00d5e5c to d782508 Compare November 17, 2025 00:45

blktests-ci bot force-pushed the series/1023161=>linus-master branch from 6268848 to 38e60f0 Compare November 17, 2025 00:51

blktests-ci bot force-pushed the series/1023161=>linus-master branch from 38e60f0 to c0ba4a9 Compare November 17, 2025 02:42

blktests-ci bot added V2 and removed V1 V1-ci-fail labels Nov 17, 2025

blktests-ci bot force-pushed the series/1023161=>linus-master branch from c0ba4a9 to 0e9d2ad Compare November 17, 2025 20:41

blktests-ci bot force-pushed the linus-master_base branch from d782508 to 6099a4d Compare November 17, 2025 23:44

blktests-ci bot force-pushed the series/1023161=>linus-master branch from 0e9d2ad to 32ed0c4 Compare November 17, 2025 23:52

blktests-ci bot force-pushed the linus-master_base branch from 6099a4d to 5121c4d Compare November 18, 2025 02:19

blktests-ci bot force-pushed the series/1023161=>linus-master branch from 32ed0c4 to c9ded35 Compare November 18, 2025 02:25

blktests-ci bot force-pushed the linus-master_base branch from 5121c4d to 4458758 Compare November 19, 2025 00:24

blktests-ci bot force-pushed the series/1023161=>linus-master branch from c9ded35 to 33a1d14 Compare November 19, 2025 00:28

blktests-ci bot added the V2-ci-pass label Nov 21, 2025

blktests-ci bot force-pushed the linus-master_base branch from 4458758 to 6f43942 Compare November 21, 2025 09:45

blktests-ci bot force-pushed the series/1023161=>linus-master branch from 33a1d14 to 4f56eb1 Compare November 21, 2025 09:55

blktests-ci bot force-pushed the linus-master_base branch from 6f43942 to ec9caac Compare December 8, 2025 05:15

blktests-ci bot force-pushed the series/1023161=>linus-master branch from 4f56eb1 to 1746eba Compare December 8, 2025 05:29

blktests-ci bot added V2-ci-fail and removed V2-ci-pass labels Dec 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nvme: Convert tag_list mutex to rwsemaphore to avoid deadlock #363

nvme: Convert tag_list mutex to rwsemaphore to avoid deadlock #363

Uh oh!

blktests-ci bot commented Nov 14, 2025

Uh oh!

blktests-ci bot commented Nov 14, 2025

Uh oh!

blktests-ci bot commented Nov 16, 2025

Uh oh!

blktests-ci bot commented Nov 17, 2025

Uh oh!

blktests-ci bot commented Nov 17, 2025

Uh oh!

blktests-ci bot commented Nov 17, 2025

Uh oh!

blktests-ci bot commented Nov 17, 2025

Uh oh!

blktests-ci bot commented Nov 18, 2025

Uh oh!

blktests-ci bot commented Nov 19, 2025

Uh oh!

blktests-ci bot commented Nov 21, 2025

Uh oh!

blktests-ci bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nvme: Convert tag_list mutex to rwsemaphore to avoid deadlock #363

Are you sure you want to change the base?

nvme: Convert tag_list mutex to rwsemaphore to avoid deadlock #363

Uh oh!

Conversation

blktests-ci bot commented Nov 14, 2025

Uh oh!

blktests-ci bot commented Nov 14, 2025

Uh oh!

blktests-ci bot commented Nov 16, 2025

Uh oh!

blktests-ci bot commented Nov 17, 2025

Uh oh!

blktests-ci bot commented Nov 17, 2025

Uh oh!

blktests-ci bot commented Nov 17, 2025

Uh oh!

blktests-ci bot commented Nov 17, 2025

Uh oh!

blktests-ci bot commented Nov 18, 2025

Uh oh!

blktests-ci bot commented Nov 19, 2025

Uh oh!

blktests-ci bot commented Nov 21, 2025

Uh oh!

blktests-ci bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant