* [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period
@ 2018-06-10 20:38 Roman Pen
2018-06-11 6:24 ` Christoph Hellwig
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Roman Pen @ 2018-06-10 20:38 UTC (permalink / raw)
To: linux-block
Cc: Jinpu Wang, Gi-Oh Kim, Danil Kipnis, Roman Pen, Jens Axboe,
Bart Van Assche, Christoph Hellwig, Sagi Grimberg, Ming Lei
It is not allowed to reinit q->tag_set_list list entry while RCU grace
period has not completed yet, otherwise the following soft lockup in
blk_mq_sched_restart() happens:
[ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
[ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000
[ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
[ 1064.256510] Call Trace:
[ 1064.256664] <IRQ>
[ 1064.256824] blk_mq_free_request+0xea/0x100
[ 1064.256987] msg_io_conf+0x59/0xd0 [ibnbd_client]
[ 1064.257175] complete_rdma_req+0xf2/0x230 [ibtrs_client]
[ 1064.257340] ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
[ 1064.257502] ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
[ 1064.257669] ib_create_qp+0x321/0x380 [ib_core]
[ 1064.257841] ib_process_cq_direct+0xbd/0x120 [ib_core]
[ 1064.258007] irq_poll_softirq+0xb7/0xe0
[ 1064.258165] __do_softirq+0x106/0x2a2
[ 1064.258328] irq_exit+0x92/0xa0
[ 1064.258509] do_IRQ+0x4a/0xd0
[ 1064.258660] common_interrupt+0x7a/0x7a
[ 1064.258818] </IRQ>
Meanwhile another context frees other queue but with the same set of
shared tags:
[ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
[ 1288.201833] bash D 0 5910 5820 0x00000000
[ 1288.202016] Call Trace:
[ 1288.202315] schedule+0x32/0x80
[ 1288.202462] schedule_timeout+0x1e5/0x380
[ 1288.203838] wait_for_completion+0xb0/0x120
[ 1288.204137] __wait_rcu_gp+0x125/0x160
[ 1288.204287] synchronize_sched+0x6e/0x80
[ 1288.204770] blk_mq_free_queue+0x74/0xe0
[ 1288.204922] blk_cleanup_queue+0xc7/0x110
[ 1288.205073] ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
[ 1288.205389] ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
[ 1288.205548] kernfs_fop_write+0x109/0x180
[ 1288.206328] vfs_write+0xb3/0x1a0
[ 1288.206476] SyS_write+0x52/0xc0
[ 1288.206624] do_syscall_64+0x68/0x1d0
[ 1288.206774] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
What happened is the following:
1. There are several MQ queues with shared tags.
2. One queue is about to be freed and now task is in
blk_mq_del_queue_tag_set().
3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
tag list in order to find hctx to restart.
Because linked list entry was modified in blk_mq_del_queue_tag_set()
without proper waiting for a grace period, blk_mq_sched_restart()
never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.
Fix is simple: reinit list entry after an RCU grace period elapsed.
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: linux-block@vger.kernel.org
---
block/blk-mq.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0dc9e341c2a7..2a40d60950f4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2422,7 +2422,6 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
mutex_lock(&set->tag_list_lock);
list_del_rcu(&q->tag_set_list);
- INIT_LIST_HEAD(&q->tag_set_list);
if (list_is_singular(&set->tag_list)) {
/* just transitioned to unshared */
set->flags &= ~BLK_MQ_F_TAG_SHARED;
@@ -2430,8 +2429,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
blk_mq_update_tag_set_depth(set, false);
}
mutex_unlock(&set->tag_list_lock);
-
synchronize_rcu();
+ INIT_LIST_HEAD(&q->tag_set_list);
}
static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
--
2.13.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period
2018-06-10 20:38 [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period Roman Pen
@ 2018-06-11 6:24 ` Christoph Hellwig
2018-06-11 6:33 ` Ming Lei
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2018-06-11 6:24 UTC (permalink / raw)
To: Roman Pen
Cc: linux-block, Jinpu Wang, Gi-Oh Kim, Danil Kipnis, Jens Axboe,
Bart Van Assche, Christoph Hellwig, Sagi Grimberg, Ming Lei
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period
2018-06-10 20:38 [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period Roman Pen
2018-06-11 6:24 ` Christoph Hellwig
@ 2018-06-11 6:33 ` Ming Lei
2018-06-11 13:26 ` Bart Van Assche
2018-06-11 14:14 ` Jens Axboe
3 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2018-06-11 6:33 UTC (permalink / raw)
To: Roman Pen
Cc: linux-block, Jinpu Wang, Gi-Oh Kim, Danil Kipnis, Jens Axboe,
Bart Van Assche, Christoph Hellwig, Sagi Grimberg, Ming Lei
On Mon, Jun 11, 2018 at 4:38 AM, Roman Pen
<roman.penyaev@profitbricks.com> wrote:
> It is not allowed to reinit q->tag_set_list list entry while RCU grace
> period has not completed yet, otherwise the following soft lockup in
> blk_mq_sched_restart() happens:
>
> [ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
> [ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000
> [ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
> [ 1064.256510] Call Trace:
> [ 1064.256664] <IRQ>
> [ 1064.256824] blk_mq_free_request+0xea/0x100
> [ 1064.256987] msg_io_conf+0x59/0xd0 [ibnbd_client]
> [ 1064.257175] complete_rdma_req+0xf2/0x230 [ibtrs_client]
> [ 1064.257340] ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
> [ 1064.257502] ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
> [ 1064.257669] ib_create_qp+0x321/0x380 [ib_core]
> [ 1064.257841] ib_process_cq_direct+0xbd/0x120 [ib_core]
> [ 1064.258007] irq_poll_softirq+0xb7/0xe0
> [ 1064.258165] __do_softirq+0x106/0x2a2
> [ 1064.258328] irq_exit+0x92/0xa0
> [ 1064.258509] do_IRQ+0x4a/0xd0
> [ 1064.258660] common_interrupt+0x7a/0x7a
> [ 1064.258818] </IRQ>
>
> Meanwhile another context frees other queue but with the same set of
> shared tags:
>
> [ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
> [ 1288.201833] bash D 0 5910 5820 0x00000000
> [ 1288.202016] Call Trace:
> [ 1288.202315] schedule+0x32/0x80
> [ 1288.202462] schedule_timeout+0x1e5/0x380
> [ 1288.203838] wait_for_completion+0xb0/0x120
> [ 1288.204137] __wait_rcu_gp+0x125/0x160
> [ 1288.204287] synchronize_sched+0x6e/0x80
> [ 1288.204770] blk_mq_free_queue+0x74/0xe0
> [ 1288.204922] blk_cleanup_queue+0xc7/0x110
> [ 1288.205073] ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
> [ 1288.205389] ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
> [ 1288.205548] kernfs_fop_write+0x109/0x180
> [ 1288.206328] vfs_write+0xb3/0x1a0
> [ 1288.206476] SyS_write+0x52/0xc0
> [ 1288.206624] do_syscall_64+0x68/0x1d0
> [ 1288.206774] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> What happened is the following:
>
> 1. There are several MQ queues with shared tags.
> 2. One queue is about to be freed and now task is in
> blk_mq_del_queue_tag_set().
> 3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
> tag list in order to find hctx to restart.
>
> Because linked list entry was modified in blk_mq_del_queue_tag_set()
> without proper waiting for a grace period, blk_mq_sched_restart()
> never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.
>
> Fix is simple: reinit list entry after an RCU grace period elapsed.
>
> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: linux-block@vger.kernel.org
> ---
> block/blk-mq.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0dc9e341c2a7..2a40d60950f4 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2422,7 +2422,6 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
>
> mutex_lock(&set->tag_list_lock);
> list_del_rcu(&q->tag_set_list);
> - INIT_LIST_HEAD(&q->tag_set_list);
> if (list_is_singular(&set->tag_list)) {
> /* just transitioned to unshared */
> set->flags &= ~BLK_MQ_F_TAG_SHARED;
> @@ -2430,8 +2429,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
> blk_mq_update_tag_set_depth(set, false);
> }
> mutex_unlock(&set->tag_list_lock);
> -
> synchronize_rcu();
> + INIT_LIST_HEAD(&q->tag_set_list);
> }
>
> static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
> --
> 2.13.1
>
Good catch:
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Thanks,
Ming Lei
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period
2018-06-10 20:38 [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period Roman Pen
2018-06-11 6:24 ` Christoph Hellwig
2018-06-11 6:33 ` Ming Lei
@ 2018-06-11 13:26 ` Bart Van Assche
2018-06-11 14:14 ` Jens Axboe
3 siblings, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2018-06-11 13:26 UTC (permalink / raw)
To: roman.penyaev, linux-block
Cc: hch, gi-oh.kim, ming.lei, sagi, jinpu.wang, danil.kipnis, axboe
T24gU3VuLCAyMDE4LTA2LTEwIGF0IDIyOjM4ICswMjAwLCBSb21hbiBQZW4gd3JvdGU6DQo+IEl0
IGlzIG5vdCBhbGxvd2VkIHRvIHJlaW5pdCBxLT50YWdfc2V0X2xpc3QgbGlzdCBlbnRyeSB3aGls
ZSBSQ1UgZ3JhY2UNCj4gcGVyaW9kIGhhcyBub3QgY29tcGxldGVkIHlldCwgb3RoZXJ3aXNlIHRo
ZSBmb2xsb3dpbmcgc29mdCBsb2NrdXAgaW4NCj4gYmxrX21xX3NjaGVkX3Jlc3RhcnQoKSBoYXBw
ZW5zOg0KDQpQbGVhc2UgYWRkIHRoZSBmb2xsb3dpbmc6DQoNCkZpeGVzOiA3MDVjZGE5N2VlM2Eg
KCJibGstbXE6IE1ha2UgaXQgc2FmZSB0byB1c2UgUkNVIHRvIGl0ZXJhdGUgb3ZlciBibGtfbXFf
dGFnX3NldC50YWdfbGlzdCIpDQpDYzogc3RhYmxlQHZnZXIua2VybmVsLm9yZw0KDQpBbnl3YXk6
DQoNClJldmlld2VkLWJ5OiBCYXJ0IFZhbiBBc3NjaGUgPGJhcnQudmFuYXNzY2hlQHdkYy5jb20+
DQoNCg0KDQo=
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period
2018-06-10 20:38 [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period Roman Pen
` (2 preceding siblings ...)
2018-06-11 13:26 ` Bart Van Assche
@ 2018-06-11 14:14 ` Jens Axboe
3 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2018-06-11 14:14 UTC (permalink / raw)
To: Roman Pen, linux-block
Cc: Jinpu Wang, Gi-Oh Kim, Danil Kipnis, Bart Van Assche,
Christoph Hellwig, Sagi Grimberg, Ming Lei
On 6/10/18 2:38 PM, Roman Pen wrote:
> It is not allowed to reinit q->tag_set_list list entry while RCU grace
> period has not completed yet, otherwise the following soft lockup in
> blk_mq_sched_restart() happens:
>
> [ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
> [ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000
> [ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
> [ 1064.256510] Call Trace:
> [ 1064.256664] <IRQ>
> [ 1064.256824] blk_mq_free_request+0xea/0x100
> [ 1064.256987] msg_io_conf+0x59/0xd0 [ibnbd_client]
> [ 1064.257175] complete_rdma_req+0xf2/0x230 [ibtrs_client]
> [ 1064.257340] ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
> [ 1064.257502] ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
> [ 1064.257669] ib_create_qp+0x321/0x380 [ib_core]
> [ 1064.257841] ib_process_cq_direct+0xbd/0x120 [ib_core]
> [ 1064.258007] irq_poll_softirq+0xb7/0xe0
> [ 1064.258165] __do_softirq+0x106/0x2a2
> [ 1064.258328] irq_exit+0x92/0xa0
> [ 1064.258509] do_IRQ+0x4a/0xd0
> [ 1064.258660] common_interrupt+0x7a/0x7a
> [ 1064.258818] </IRQ>
>
> Meanwhile another context frees other queue but with the same set of
> shared tags:
>
> [ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
> [ 1288.201833] bash D 0 5910 5820 0x00000000
> [ 1288.202016] Call Trace:
> [ 1288.202315] schedule+0x32/0x80
> [ 1288.202462] schedule_timeout+0x1e5/0x380
> [ 1288.203838] wait_for_completion+0xb0/0x120
> [ 1288.204137] __wait_rcu_gp+0x125/0x160
> [ 1288.204287] synchronize_sched+0x6e/0x80
> [ 1288.204770] blk_mq_free_queue+0x74/0xe0
> [ 1288.204922] blk_cleanup_queue+0xc7/0x110
> [ 1288.205073] ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
> [ 1288.205389] ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
> [ 1288.205548] kernfs_fop_write+0x109/0x180
> [ 1288.206328] vfs_write+0xb3/0x1a0
> [ 1288.206476] SyS_write+0x52/0xc0
> [ 1288.206624] do_syscall_64+0x68/0x1d0
> [ 1288.206774] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> What happened is the following:
>
> 1. There are several MQ queues with shared tags.
> 2. One queue is about to be freed and now task is in
> blk_mq_del_queue_tag_set().
> 3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
> tag list in order to find hctx to restart.
>
> Because linked list entry was modified in blk_mq_del_queue_tag_set()
> without proper waiting for a grace period, blk_mq_sched_restart()
> never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.
>
> Fix is simple: reinit list entry after an RCU grace period elapsed.
Good catch, applied.
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-06-11 14:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-10 20:38 [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period Roman Pen
2018-06-11 6:24 ` Christoph Hellwig
2018-06-11 6:33 ` Ming Lei
2018-06-11 13:26 ` Bart Van Assche
2018-06-11 14:14 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.