All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <tom.leiming@gmail.com>
To: Roman Pen <roman.penyaev@profitbricks.com>
Cc: linux-block <linux-block@vger.kernel.org>,
	Jinpu Wang <jinpu.wang@profitbricks.com>,
	Gi-Oh Kim <gi-oh.kim@profitbricks.com>,
	Danil Kipnis <danil.kipnis@profitbricks.com>,
	Jens Axboe <axboe@kernel.dk>,
	Bart Van Assche <bart.vanassche@wdc.com>,
	Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
	Ming Lei <ming.lei@redhat.com>
Subject: Re: [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period
Date: Mon, 11 Jun 2018 14:33:11 +0800	[thread overview]
Message-ID: <CACVXFVMW7jiWKS0f3H9S=Fb=p7m7Gf0zZQ8b9Q76xgei82N-9Q@mail.gmail.com> (raw)
In-Reply-To: <20180610203824.16512-1-roman.penyaev@profitbricks.com>

On Mon, Jun 11, 2018 at 4:38 AM, Roman Pen
<roman.penyaev@profitbricks.com> wrote:
> It is not allowed to reinit q->tag_set_list list entry while RCU grace
> period has not completed yet, otherwise the following soft lockup in
> blk_mq_sched_restart() happens:
>
> [ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
> [ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000
> [ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
> [ 1064.256510] Call Trace:
> [ 1064.256664]  <IRQ>
> [ 1064.256824]  blk_mq_free_request+0xea/0x100
> [ 1064.256987]  msg_io_conf+0x59/0xd0 [ibnbd_client]
> [ 1064.257175]  complete_rdma_req+0xf2/0x230 [ibtrs_client]
> [ 1064.257340]  ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
> [ 1064.257502]  ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
> [ 1064.257669]  ib_create_qp+0x321/0x380 [ib_core]
> [ 1064.257841]  ib_process_cq_direct+0xbd/0x120 [ib_core]
> [ 1064.258007]  irq_poll_softirq+0xb7/0xe0
> [ 1064.258165]  __do_softirq+0x106/0x2a2
> [ 1064.258328]  irq_exit+0x92/0xa0
> [ 1064.258509]  do_IRQ+0x4a/0xd0
> [ 1064.258660]  common_interrupt+0x7a/0x7a
> [ 1064.258818]  </IRQ>
>
> Meanwhile another context frees other queue but with the same set of
> shared tags:
>
> [ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
> [ 1288.201833] bash            D    0  5910   5820 0x00000000
> [ 1288.202016] Call Trace:
> [ 1288.202315]  schedule+0x32/0x80
> [ 1288.202462]  schedule_timeout+0x1e5/0x380
> [ 1288.203838]  wait_for_completion+0xb0/0x120
> [ 1288.204137]  __wait_rcu_gp+0x125/0x160
> [ 1288.204287]  synchronize_sched+0x6e/0x80
> [ 1288.204770]  blk_mq_free_queue+0x74/0xe0
> [ 1288.204922]  blk_cleanup_queue+0xc7/0x110
> [ 1288.205073]  ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
> [ 1288.205389]  ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
> [ 1288.205548]  kernfs_fop_write+0x109/0x180
> [ 1288.206328]  vfs_write+0xb3/0x1a0
> [ 1288.206476]  SyS_write+0x52/0xc0
> [ 1288.206624]  do_syscall_64+0x68/0x1d0
> [ 1288.206774]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> What happened is the following:
>
> 1. There are several MQ queues with shared tags.
> 2. One queue is about to be freed and now task is in
>    blk_mq_del_queue_tag_set().
> 3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
>    tag list in order to find hctx to restart.
>
> Because linked list entry was modified in blk_mq_del_queue_tag_set()
> without proper waiting for a grace period, blk_mq_sched_restart()
> never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.
>
> Fix is simple: reinit list entry after an RCU grace period elapsed.
>
> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: linux-block@vger.kernel.org
> ---
>  block/blk-mq.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0dc9e341c2a7..2a40d60950f4 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2422,7 +2422,6 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
>
>         mutex_lock(&set->tag_list_lock);
>         list_del_rcu(&q->tag_set_list);
> -       INIT_LIST_HEAD(&q->tag_set_list);
>         if (list_is_singular(&set->tag_list)) {
>                 /* just transitioned to unshared */
>                 set->flags &= ~BLK_MQ_F_TAG_SHARED;
> @@ -2430,8 +2429,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
>                 blk_mq_update_tag_set_depth(set, false);
>         }
>         mutex_unlock(&set->tag_list_lock);
> -
>         synchronize_rcu();
> +       INIT_LIST_HEAD(&q->tag_set_list);
>  }
>
>  static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
> --
> 2.13.1
>

Good catch:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming Lei

  parent reply	other threads:[~2018-06-11  6:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-10 20:38 [PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period Roman Pen
2018-06-11  6:24 ` Christoph Hellwig
2018-06-11  6:33 ` Ming Lei [this message]
2018-06-11 13:26 ` Bart Van Assche
2018-06-11 14:14 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACVXFVMW7jiWKS0f3H9S=Fb=p7m7Gf0zZQ8b9Q76xgei82N-9Q@mail.gmail.com' \
    --to=tom.leiming@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bart.vanassche@wdc.com \
    --cc=danil.kipnis@profitbricks.com \
    --cc=gi-oh.kim@profitbricks.com \
    --cc=hch@lst.de \
    --cc=jinpu.wang@profitbricks.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=roman.penyaev@profitbricks.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.