linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-nvme@lists.infradead.org, Sagi Grimberg <sagi@grimberg.me>,
	Keith Busch <keith.busch@wdc.com>
Subject: Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
Date: Tue, 6 Oct 2020 07:48:40 +0200	[thread overview]
Message-ID: <8d7d4803-5808-0839-ee4f-e36a12756497@suse.de> (raw)
In-Reply-To: <20201005125201.GB1125@lst.de>

On 10/5/20 2:52 PM, Christoph Hellwig wrote:
> On Mon, Oct 05, 2020 at 02:45:00PM +0200, Hannes Reinecke wrote:
>> Currently namespaces behave differently depending on the 'CMIC'
>> setting. If CMIC is zero, the device is removed once the last path
>> goes away. If CMIC has the multipath bit set, the device is retained
>> even if the last path is removed.
>> This is okay for fabrics, where one can do an explicit disconnect
>> to remove the device, but for nvme-pci this induces a regression
>> with PCI hotplug.
>> When the NVMe device is opened (eg by MD), the NVMe device is not
>> removed after a PCI hot-remove. Hence MD will not be notified about
>> the event, and will continue to consider this device as operational.
>> Consequently, upon PCI hot-add the device shows up as a new NVMe
>> device, and MD will fail to reattach the device.
>> So this patch adds NVME_NSHEAD_QUEUE_IF_NO_PATH flag to the nshead
>> to restore the original behaviour for non-fabrics NVMe devices.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/nvme/host/core.c      | 10 +++++++++-
>>   drivers/nvme/host/multipath.c | 38 ++++++++++++++++++++++++++++++++++++++
>>   drivers/nvme/host/nvme.h      |  2 ++
>>   3 files changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 4459a40b057c..e21c32ea4b51 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -475,8 +475,11 @@ static void nvme_free_ns_head(struct kref *ref)
>>   		container_of(ref, struct nvme_ns_head, ref);
>>   
>>   #ifdef CONFIG_NVME_MULTIPATH
>> -	if (head->disk)
>> +	if (head->disk) {
>> +		if (test_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags))
>> +			nvme_mpath_remove_disk(head);
>>   		put_disk(head->disk);
>> +	}
>>   #endif
>>   	ida_simple_remove(&head->subsys->ns_ida, head->instance);
>>   	cleanup_srcu_struct(&head->srcu);
>> @@ -3357,6 +3360,7 @@ static struct attribute *nvme_ns_id_attrs[] = {
>>   #ifdef CONFIG_NVME_MULTIPATH
>>   	&dev_attr_ana_grpid.attr,
>>   	&dev_attr_ana_state.attr,
>> +	&dev_attr_queue_if_no_path.attr,
>>   #endif
>>   	NULL,
>>   };
>> @@ -3387,6 +3391,10 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
>>   		if (!nvme_ctrl_use_ana(nvme_get_ns_from_dev(dev)->ctrl))
>>   			return 0;
>>   	}
>> +	if (a == &dev_attr_queue_if_no_path.attr) {
>> +		if (dev_to_disk(dev)->fops == &nvme_fops)
>> +			return 0;
>> +	}
>>   #endif
>>   	return a->mode;
>>   }
>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>> index 55045291b4de..bbdad5917112 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -381,6 +381,9 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
>>   	/* set to a default value for 512 until disk is validated */
>>   	blk_queue_logical_block_size(q, 512);
>>   	blk_set_stacking_limits(&q->limits);
>> +	/* Enable queue_if_no_path semantics for fabrics */
>> +	if (ctrl->ops->flags & NVME_F_FABRICS)
>> +		set_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags);
> 
> Well, that is blindly obvious from the code.  But why would we treat
> fabrics special?
> 
Well, because it's established behaviour of the current code.
Changing it now has the potential to break existing scenarios.

For PCI (ie non-fabrics) the current behaviour is arguably a corner case 
(as one needs to have a PCI-NVMe with CMIC bit set), but once you have 
it PCI hotplug is done for. So there we really want to change the 
behaviour to get the same user experience for all NVMe drives.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-10-06  5:48 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-05 12:44 [RFC PATCHv3 0/2] nvme: queue_if_no_path functionality Hannes Reinecke
2020-10-05 12:44 ` [PATCH 1/2] nvme-mpath: delete disk after last connection Hannes Reinecke
2020-10-05 12:50   ` Christoph Hellwig
2021-03-05 20:06     ` Sagi Grimberg
2021-03-04 14:34   ` Daniel Wagner
2020-10-05 12:45 ` [PATCH 2/2] nvme: add 'queue_if_no_path' semantics Hannes Reinecke
2020-10-05 12:52   ` Christoph Hellwig
2020-10-06  5:48     ` Hannes Reinecke [this message]
2020-10-06  7:51       ` Christoph Hellwig
2020-10-06  8:07         ` Hannes Reinecke
2020-10-06  8:27           ` Christoph Hellwig
2020-10-06  8:29             ` Hannes Reinecke
2020-10-06  8:39               ` Christoph Hellwig
2020-10-06 13:30                 ` Hannes Reinecke
2020-10-06 13:45                   ` Hannes Reinecke
2021-03-05 20:31                     ` Sagi Grimberg
2021-03-08 13:17                       ` Hannes Reinecke
2021-03-15 17:21                         ` Sagi Grimberg
2020-10-06 17:41                   ` Keith Busch
2021-03-05 20:11                     ` Sagi Grimberg
2021-03-11 12:41                       ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d7d4803-5808-0839-ee4f-e36a12756497@suse.de \
    --to=hare@suse.de \
    --cc=hch@lst.de \
    --cc=keith.busch@wdc.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).