linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Tejun Heo <tj@kernel.org>,
	Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Damien Le Moal <damien.lemoal@opensource.wdc.com>
Subject: Re: lockdep WARNING at blktests block/011
Date: Wed, 5 Oct 2022 08:20:00 -0600	[thread overview]
Message-ID: <Yz2SkNORASzmL+jq@kbusch-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <15c6e51f-a2a4-38ff-15a4-9efee32824d3@I-love.SAKURA.ne.jp>

On Wed, Oct 05, 2022 at 07:00:30PM +0900, Tetsuo Handa wrote:
> On 2022/10/05 17:31, Shinichiro Kawasaki wrote:
> > @@ -5120,11 +5120,27 @@ EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
> >  void nvme_sync_io_queues(struct nvme_ctrl *ctrl)
> >  {
> >  	struct nvme_ns *ns;
> > +	LIST_HEAD(splice);
> >  
> > -	down_read(&ctrl->namespaces_rwsem);
> > -	list_for_each_entry(ns, &ctrl->namespaces, list)
> > +	/*
> > +	 * blk_sync_queues() call in ctrl->snamespaces_rwsem critical section
> > +	 * triggers deadlock warning by lockdep since cancel_work_sync() in
> > +	 * blk_sync_queue() waits for nvme_timeout() work completion which may
> > +	 * lock the ctrl->snamespaces_rwsem. To avoid the deadlock possibility,
> > +	 * call blk_sync_queues() out of the critical section by moving the
> > +         * ctrl->namespaces list elements to the stack list head temporally.
> > +	 */
> > +
> > +	down_write(&ctrl->namespaces_rwsem);
> > +	list_splice_init(&ctrl->namespaces, &splice);
> > +	up_write(&ctrl->namespaces_rwsem);
> 
> Does this work?
> 
> ctrl->namespaces being empty when calling blk_sync_queue() means that
> e.g. nvme_start_freeze() cannot find namespaces to freeze, doesn't it?

There can't be anything to timeout at this point. The controller is disabled
prior to syncing the queues. Not only is there no IO for timeout work to
operate on, the controller state is already disabled, so a subsequent freeze
would be skipped.
 
>   blk_mq_timeout_work(work) { // Is blocking __flush_work() from cancel_work_sync().
>     blk_mq_queue_tag_busy_iter(blk_mq_check_expired) {
>       bt_for_each(blk_mq_check_expired) == blk_mq_check_expired() {
>         blk_mq_rq_timed_out() {
>           req->q->mq_ops->timeout(req) == nvme_timeout(req) {
>             nvme_dev_disable() {
>               mutex_lock(&dev->shutdown_lock); // Holds dev->shutdown_lock
>               nvme_start_freeze(&dev->ctrl) {
>                 down_read(&ctrl->namespaces_rwsem); // Holds ctrl->namespaces_rwsem which might block
>                 //blk_freeze_queue_start(ns->queue); // <= Never be called because ctrl->namespaces is empty.
>                 up_read(&ctrl->namespaces_rwsem);
>               }
>               mutex_unlock(&dev->shutdown_lock);
>             }
>           }
>         }
>       }
>     }
>   }
> 
> Are you sure that down_read(&ctrl->namespaces_rwsem) users won't run
> when ctrl->namespaces is temporarily made empty? (And if you are sure
> that down_read(&ctrl->namespaces_rwsem) users won't run when
> ctrl->namespaces is temporarily made empty, why ctrl->namespaces_rwsem
> needs to be a rw-sem rather than a plain mutex or spinlock ?)

We iterate the list in some target fast paths, so we don't want this to be an
exclusive section for readers.
 
> > +	list_for_each_entry(ns, &splice, list)
> >  		blk_sync_queue(ns->queue);
> > -	up_read(&ctrl->namespaces_rwsem);
> > +
> > +	down_write(&ctrl->namespaces_rwsem);
> > +	list_splice(&splice, &ctrl->namespaces);
> > +	up_write(&ctrl->namespaces_rwsem);
> >  }
> >  EXPORT_SYMBOL_GPL(nvme_sync_io_queues);
> 
> I don't know about dependency chain, but you might be able to add
> "struct nvme_ctrl"->sync_io_queue_mutex which is held for serializing
> nvme_sync_io_queues() and down_write(&ctrl->namespaces_rwsem) users?
> 
> If we can guarantee that ctrl->namespaces_rwsem => ctrl->sync_io_queue_mutex
> is impossible, nvme_sync_io_queues() can use ctrl->sync_io_queue_mutex
> rather than ctrl->namespaces_rwsem, and down_write(&ctrl->namespaces_rwsem)/
> up_write(&ctrl->namespaces_rwsem) users are replaced with
>   mutex_lock(&ctrl->sync_io_queue_mutex);
>   down_write(&ctrl->namespaces_rwsem);
> and
>   up_write(&ctrl->namespaces_rwsem);
>   mutex_unlock(&ctrl->sync_io_queue_mutex);
> sequences respectively.
> 


  reply	other threads:[~2022-10-05 14:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-30  0:19 lockdep WARNING at blktests block/011 Shinichiro Kawasaki
2022-09-30 11:06 ` Tetsuo Handa
2022-10-03 13:32   ` Shinichiro Kawasaki
2022-10-03 15:28     ` Keith Busch
2022-10-04 10:44       ` Shinichiro Kawasaki
2022-10-04 11:10         ` Tetsuo Handa
2022-10-04 12:23           ` Shinichiro Kawasaki
2022-10-05  8:31             ` Shinichiro Kawasaki
2022-10-05 10:00               ` Tetsuo Handa
2022-10-05 14:20                 ` Keith Busch [this message]
2022-10-06  2:30                   ` Shinichiro Kawasaki
2022-10-07  1:36                     ` Shinichiro Kawasaki
2022-10-04 22:34         ` Damien Le Moal
2022-10-07 20:34       ` Bart Van Assche
2022-10-10 13:31         ` Keith Busch
2022-10-11 17:11           ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yz2SkNORASzmL+jq@kbusch-mbp.dhcp.thefacebook.com \
    --to=kbusch@kernel.org \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=shinichiro.kawasaki@wdc.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).