All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org
Subject: Re: [PATCH 10/10] nvme: implement multipath access to nvme subsystems
Date: Tue, 29 Aug 2017 16:55:59 +0200	[thread overview]
Message-ID: <20170829145559.GA32760@lst.de> (raw)
In-Reply-To: <20170829145417.GA4428@localhost.localdomain>

On Tue, Aug 29, 2017 at 10:54:17AM -0400, Keith Busch wrote:
> On Wed, Aug 23, 2017 at 07:58:15PM +0200, Christoph Hellwig wrote:
> > +	/* Anything else could be a path failure, so should be retried */
> > +	spin_lock_irqsave(&ns->head->requeue_lock, flags);
> > +	blk_steal_bios(&ns->head->requeue_list, req);
> > +	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> > +
> > +	nvme_reset_ctrl(ns->ctrl);
> > +	kblockd_schedule_work(&ns->head->requeue_work);
> > +	return true;
> > +}
> 
> It appears this isn't going to cause the path selection to failover for
> the requeued work. The bio's bi_disk is unchanged from the failed path when the
> requeue_work submits the bio again so it will use the same path, right? 

Oh.  This did indeed break with the bi_bdev -> bi_disk refactoring
I did just before sending this out.

> It also looks like new submissions will get a new path only from the
> fact that the original/primary is being reset. The controller reset
> itself seems a bit heavy-handed. Can we just set head->current_path to
> the next active controller in the list?

For ANA we'll have to do that anyway, but if we got a failure
that clearly indicates a path failure what benefit is there in not
resetting the controller?  But yeah, maybe we can just switch the
path for non-ANA controllers and wait for timeouts to do their work.

> 
> > +static void nvme_requeue_work(struct work_struct *work)
> > +{
> > +	struct nvme_ns_head *head =
> > +		container_of(work, struct nvme_ns_head, requeue_work);
> > +	struct bio *bio, *next;
> > +
> > +	spin_lock_irq(&head->requeue_lock);
> > +	next = bio_list_get(&head->requeue_list);
> > +	spin_unlock_irq(&head->requeue_lock);
> > +
> > +	while ((bio = next) != NULL) {
> > +		next = bio->bi_next;
> > +		bio->bi_next = NULL;
> > +		generic_make_request_fast(bio);
> > +	}
> > +}
> 
> Here, I think we need to reevaluate the path (nvme_find_path) and set
> bio->bi_disk accordingly.

Yes.  Previously this was opencoded and always used head->disk, but
I messed it up last minute.  In the end it still worked for my cases
because the controller would either already be reset or fail all
I/O, but this behavior clearly is not intended and suboptimal.

WARNING: multiple messages have this Message-ID (diff)
From: hch@lst.de (Christoph Hellwig)
Subject: [PATCH 10/10] nvme: implement multipath access to nvme subsystems
Date: Tue, 29 Aug 2017 16:55:59 +0200	[thread overview]
Message-ID: <20170829145559.GA32760@lst.de> (raw)
In-Reply-To: <20170829145417.GA4428@localhost.localdomain>

On Tue, Aug 29, 2017@10:54:17AM -0400, Keith Busch wrote:
> On Wed, Aug 23, 2017@07:58:15PM +0200, Christoph Hellwig wrote:
> > +	/* Anything else could be a path failure, so should be retried */
> > +	spin_lock_irqsave(&ns->head->requeue_lock, flags);
> > +	blk_steal_bios(&ns->head->requeue_list, req);
> > +	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> > +
> > +	nvme_reset_ctrl(ns->ctrl);
> > +	kblockd_schedule_work(&ns->head->requeue_work);
> > +	return true;
> > +}
> 
> It appears this isn't going to cause the path selection to failover for
> the requeued work. The bio's bi_disk is unchanged from the failed path when the
> requeue_work submits the bio again so it will use the same path, right? 

Oh.  This did indeed break with the bi_bdev -> bi_disk refactoring
I did just before sending this out.

> It also looks like new submissions will get a new path only from the
> fact that the original/primary is being reset. The controller reset
> itself seems a bit heavy-handed. Can we just set head->current_path to
> the next active controller in the list?

For ANA we'll have to do that anyway, but if we got a failure
that clearly indicates a path failure what benefit is there in not
resetting the controller?  But yeah, maybe we can just switch the
path for non-ANA controllers and wait for timeouts to do their work.

> 
> > +static void nvme_requeue_work(struct work_struct *work)
> > +{
> > +	struct nvme_ns_head *head =
> > +		container_of(work, struct nvme_ns_head, requeue_work);
> > +	struct bio *bio, *next;
> > +
> > +	spin_lock_irq(&head->requeue_lock);
> > +	next = bio_list_get(&head->requeue_list);
> > +	spin_unlock_irq(&head->requeue_lock);
> > +
> > +	while ((bio = next) != NULL) {
> > +		next = bio->bi_next;
> > +		bio->bi_next = NULL;
> > +		generic_make_request_fast(bio);
> > +	}
> > +}
> 
> Here, I think we need to reevaluate the path (nvme_find_path) and set
> bio->bi_disk accordingly.

Yes.  Previously this was opencoded and always used head->disk, but
I messed it up last minute.  In the end it still worked for my cases
because the controller would either already be reset or fail all
I/O, but this behavior clearly is not intended and suboptimal.

  reply	other threads:[~2017-08-29 14:55 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-23 17:58 RFC: nvme multipath support Christoph Hellwig
2017-08-23 17:58 ` Christoph Hellwig
2017-08-23 17:58 ` [PATCH 01/10] nvme: report more detailed status codes to the block layer Christoph Hellwig
2017-08-23 17:58   ` Christoph Hellwig
2017-08-28  6:06   ` Sagi Grimberg
2017-08-28  6:06     ` Sagi Grimberg
2017-08-28 18:50   ` Keith Busch
2017-08-28 18:50     ` Keith Busch
2017-08-23 17:58 ` [PATCH 02/10] nvme: allow calling nvme_change_ctrl_state from irq context Christoph Hellwig
2017-08-23 17:58   ` Christoph Hellwig
2017-08-28  6:06   ` Sagi Grimberg
2017-08-28  6:06     ` Sagi Grimberg
2017-08-28 18:50   ` Keith Busch
2017-08-28 18:50     ` Keith Busch
2017-08-23 17:58 ` [PATCH 03/10] nvme: remove unused struct nvme_ns fields Christoph Hellwig
2017-08-23 17:58   ` Christoph Hellwig
2017-08-28  6:07   ` Sagi Grimberg
2017-08-28  6:07     ` Sagi Grimberg
2017-08-28 19:13   ` Keith Busch
2017-08-28 19:13     ` Keith Busch
2017-08-23 17:58 ` [PATCH 04/10] nvme: remove nvme_revalidate_ns Christoph Hellwig
2017-08-23 17:58   ` Christoph Hellwig
2017-08-28  6:12   ` Sagi Grimberg
2017-08-28  6:12     ` Sagi Grimberg
2017-08-28 19:14   ` Keith Busch
2017-08-28 19:14     ` Keith Busch
2017-08-23 17:58 ` [PATCH 05/10] nvme: don't blindly overwrite identifiers on disk revalidate Christoph Hellwig
2017-08-23 17:58   ` Christoph Hellwig
2017-08-28  6:17   ` Sagi Grimberg
2017-08-28  6:17     ` Sagi Grimberg
2017-08-28  6:23     ` Christoph Hellwig
2017-08-28  6:23       ` Christoph Hellwig
2017-08-28  6:32       ` Sagi Grimberg
2017-08-28  6:32         ` Sagi Grimberg
2017-08-28 19:15   ` Keith Busch
2017-08-28 19:15     ` Keith Busch
2017-08-23 17:58 ` [PATCH 06/10] nvme: track subsystems Christoph Hellwig
2017-08-23 17:58   ` Christoph Hellwig
2017-08-23 22:04   ` Keith Busch
2017-08-23 22:04     ` Keith Busch
2017-08-24  8:52     ` Christoph Hellwig
2017-08-24  8:52       ` Christoph Hellwig
2017-08-28  6:22   ` Sagi Grimberg
2017-08-28  6:22     ` Sagi Grimberg
2017-08-23 17:58 ` [PATCH 07/10] nvme: track shared namespaces Christoph Hellwig
2017-08-23 17:58   ` Christoph Hellwig
2017-08-28  6:51   ` Sagi Grimberg
2017-08-28  6:51     ` Sagi Grimberg
2017-08-28  8:50     ` Christoph Hellwig
2017-08-28  8:50       ` Christoph Hellwig
2017-08-28 20:21     ` J Freyensee
2017-08-28 20:21       ` J Freyensee
2017-08-29  8:25       ` Christoph Hellwig
2017-08-29  8:25         ` Christoph Hellwig
2017-08-29  6:54     ` Guan Junxiong
2017-08-29  6:54       ` Guan Junxiong
2017-08-28 12:04   ` javigon
2017-08-28 12:04     ` javigon
2017-08-28 12:41   ` Guan Junxiong
2017-08-28 12:41     ` Guan Junxiong
2017-08-28 14:30     ` Christoph Hellwig
2017-08-28 14:30       ` Christoph Hellwig
2017-08-29  2:42       ` Guan Junxiong
2017-08-29  2:42         ` Guan Junxiong
2017-08-29  8:30         ` Christoph Hellwig
2017-08-29  8:30           ` Christoph Hellwig
2017-08-29  8:29     ` Christoph Hellwig
2017-08-29  8:29       ` Christoph Hellwig
2017-08-28 19:18   ` Keith Busch
2017-08-28 19:18     ` Keith Busch
2017-08-23 17:58 ` [PATCH 08/10] block: provide a generic_make_request_fast helper Christoph Hellwig
2017-08-23 17:58   ` Christoph Hellwig
2017-08-28  7:00   ` Sagi Grimberg
2017-08-28  7:00     ` Sagi Grimberg
2017-08-28  8:54     ` Christoph Hellwig
2017-08-28  8:54       ` Christoph Hellwig
2017-08-28 11:01       ` Sagi Grimberg
2017-08-28 11:01         ` Sagi Grimberg
2017-08-28 11:54         ` Christoph Hellwig
2017-08-28 11:54           ` Christoph Hellwig
2017-08-28 12:38           ` Sagi Grimberg
2017-08-28 12:38             ` Sagi Grimberg
2017-08-23 17:58 ` [PATCH 09/10] blk-mq: add a blk_steal_bios helper Christoph Hellwig
2017-08-23 17:58   ` Christoph Hellwig
2017-08-28  7:04   ` Sagi Grimberg
2017-08-28  7:04     ` Sagi Grimberg
2017-08-23 17:58 ` [PATCH 10/10] nvme: implement multipath access to nvme subsystems Christoph Hellwig
2017-08-23 17:58   ` Christoph Hellwig
2017-08-23 18:21   ` Bart Van Assche
2017-08-23 18:21     ` Bart Van Assche
2017-08-24  8:59     ` hch
2017-08-24  8:59       ` hch
2017-08-24 20:17       ` Bart Van Assche
2017-08-24 20:17         ` Bart Van Assche
2017-09-05 11:53         ` Christoph Hellwig
2017-09-05 11:53           ` Christoph Hellwig
2017-09-11  6:34           ` Tony Yang
2017-08-23 22:53   ` Keith Busch
2017-08-23 22:53     ` Keith Busch
2017-08-24  8:52     ` Christoph Hellwig
2017-08-24  8:52       ` Christoph Hellwig
2017-08-28  7:23   ` Sagi Grimberg
2017-08-28  7:23     ` Sagi Grimberg
2017-08-28  9:06     ` Christoph Hellwig
2017-08-28  9:06       ` Christoph Hellwig
2017-08-28 13:40       ` Sagi Grimberg
2017-08-28 13:40         ` Sagi Grimberg
2017-08-28 14:24         ` Christoph Hellwig
2017-08-28 14:24           ` Christoph Hellwig
2017-09-07 15:17       ` Tony Yang
2017-08-29 10:22   ` Guan Junxiong
2017-08-29 10:22     ` Guan Junxiong
2017-08-29 14:51     ` Christoph Hellwig
2017-08-29 14:51       ` Christoph Hellwig
2017-08-29 14:54   ` Keith Busch
2017-08-29 14:54     ` Keith Busch
2017-08-29 14:55     ` Christoph Hellwig [this message]
2017-08-29 14:55       ` Christoph Hellwig
2017-08-29 15:41       ` Keith Busch
2017-08-29 15:41         ` Keith Busch
2017-09-18  0:17         ` Christoph Hellwig
2017-09-18  0:17           ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170829145559.GA32760@lst.de \
    --to=hch@lst.de \
    --cc=axboe@kernel.dk \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.