linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCHv3 0/2] nvme: queue_if_no_path functionality
@ 2020-10-05 12:44 Hannes Reinecke
  2020-10-05 12:44 ` [PATCH 1/2] nvme-mpath: delete disk after last connection Hannes Reinecke
  2020-10-05 12:45 ` [PATCH 2/2] nvme: add 'queue_if_no_path' semantics Hannes Reinecke
  0 siblings, 2 replies; 21+ messages in thread
From: Hannes Reinecke @ 2020-10-05 12:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch, Hannes Reinecke

Hi all,

this is a patchset based on Keiths original patch for restoring pre-fabrics
behaviour when the last path to a multipath device is removed.
Originally, the nvme device had been removed once the underlying hardware
had been removed. With the introduction of multipath support things changed;
it's now the 'CMIC' bit in the controller identification which controls the
behaviour. If it's set to non-zero, the device is retained even if the hardware
is removed. While this is okay for fabrics (as we can manually connect and
disconnect the devices), for nvme-pci this means that PCI hotplug ceases
to work as the device is never removed, and when reinserting the hardware
a new nvme device is created.
This patchset introduces a 'queue_if_no_path' flag to control the handling of
the last path; it's set for fabrics to retain the current functionality,
but unset for PCI to revert to the original, pre-fabrics behaviour.

Changes to v2:
- Merge two patches
- Fix to make it compile without CONFIG_NVME_MULTIPATH
- Move sysfs attributes to be compiled in only for CONFIG_NVME_MULTIPATH

Hannes Reinecke (1):
  nvme: add 'queue_if_no_path' semantics

Keith Busch (1):
  nvme-mpath: delete disk after last connection

 drivers/nvme/host/core.c      | 13 ++++++++++++-
 drivers/nvme/host/multipath.c | 39 ++++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h      |  4 +++-
 3 files changed, 53 insertions(+), 3 deletions(-)

-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/2] nvme-mpath: delete disk after last connection
  2020-10-05 12:44 [RFC PATCHv3 0/2] nvme: queue_if_no_path functionality Hannes Reinecke
@ 2020-10-05 12:44 ` Hannes Reinecke
  2020-10-05 12:50   ` Christoph Hellwig
  2021-03-04 14:34   ` Daniel Wagner
  2020-10-05 12:45 ` [PATCH 2/2] nvme: add 'queue_if_no_path' semantics Hannes Reinecke
  1 sibling, 2 replies; 21+ messages in thread
From: Hannes Reinecke @ 2020-10-05 12:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, Sagi Grimberg, Keith Busch

From: Keith Busch <kbusch@kernel.org>

The multipath code currently deletes the disk only after all references
to it are dropped rather than when the last path to that disk is lost.
This has been reported to cause problems with some usage, like MD RAID.

Delete the disk when the last path is gone. This is the same behavior we
currently have with non-multipathed nvme devices.

The following is just a simple example that demonstrates what is currently
observed using a simple nvme loop back (loop setup file not shown):

 # nvmetcli restore loop.json
 [   31.156452] nvmet: adding nsid 1 to subsystem testnqn1
 [   31.159140] nvmet: adding nsid 1 to subsystem testnqn2

 # nvme connect -t loop -n testnqn1 -q hostnqn
 [   36.866302] nvmet: creating controller 1 for subsystem testnqn1 for NQN hostnqn.
 [   36.872926] nvme nvme3: new ctrl: "testnqn1"

 # nvme connect -t loop -n testnqn1 -q hostnqn
 [   38.227186] nvmet: creating controller 2 for subsystem testnqn1 for NQN hostnqn.
 [   38.234450] nvme nvme4: new ctrl: "testnqn1"

 # nvme connect -t loop -n testnqn2 -q hostnqn
 [   43.902761] nvmet: creating controller 3 for subsystem testnqn2 for NQN hostnqn.
 [   43.907401] nvme nvme5: new ctrl: "testnqn2"

 # nvme connect -t loop -n testnqn2 -q hostnqn
 [   44.627689] nvmet: creating controller 4 for subsystem testnqn2 for NQN hostnqn.
 [   44.641773] nvme nvme6: new ctrl: "testnqn2"

 # mdadm --create /dev/md0 --level=mirror --raid-devices=2 /dev/nvme3n1 /dev/nvme5n1
 [   53.497038] md/raid1:md0: active with 2 out of 2 mirrors
 [   53.501717] md0: detected capacity change from 0 to 66060288

 # cat /proc/mdstat
 Personalities : [raid1]
 md0 : active raid1 nvme5n1[1] nvme3n1[0]
       64512 blocks super 1.2 [2/2] [UU]

Now delete all paths to one of the namespaces:

 # echo 1 > /sys/class/nvme/nvme3/delete_controller
 # echo 1 > /sys/class/nvme/nvme4/delete_controller

We have no path, but mdstat says:

 # cat /proc/mdstat
 Personalities : [raid1]
 md0 : active (auto-read-only) raid1 nvme5n1[1]
       64512 blocks super 1.2 [2/1] [_U]

And this is reported to cause a problem.

With the proposed patch, the following messages appear:

 [  227.516807] md/raid1:md0: Disk failure on nvme3n1, disabling device.
 [  227.516807] md/raid1:md0: Operation continuing on 1 devices.

And mdstat shows only the viable members:

 # cat /proc/mdstat
 Personalities : [raid1]
 md0 : active (auto-read-only) raid1 nvme5n1[1]
       64512 blocks super 1.2 [2/1] [_U]

Reported-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c      | 5 ++++-
 drivers/nvme/host/multipath.c | 1 -
 drivers/nvme/host/nvme.h      | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 385b10317873..4459a40b057c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -474,7 +474,10 @@ static void nvme_free_ns_head(struct kref *ref)
 	struct nvme_ns_head *head =
 		container_of(ref, struct nvme_ns_head, ref);
 
-	nvme_mpath_remove_disk(head);
+#ifdef CONFIG_NVME_MULTIPATH
+	if (head->disk)
+		put_disk(head->disk);
+#endif
 	ida_simple_remove(&head->subsys->ns_ida, head->instance);
 	cleanup_srcu_struct(&head->srcu);
 	nvme_put_subsystem(head->subsys);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 74896be40c17..55045291b4de 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -697,7 +697,6 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 		 */
 		head->disk->queue = NULL;
 	}
-	put_disk(head->disk);
 }
 
 int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 566776100126..b6180bb3361d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -670,7 +670,7 @@ static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
 	struct nvme_ns_head *head = ns->head;
 
 	if (head->disk && list_empty(&head->list))
-		kblockd_schedule_work(&head->requeue_work);
+		nvme_mpath_remove_disk(head);
 }
 
 static inline void nvme_trace_bio_complete(struct request *req,
-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-05 12:44 [RFC PATCHv3 0/2] nvme: queue_if_no_path functionality Hannes Reinecke
  2020-10-05 12:44 ` [PATCH 1/2] nvme-mpath: delete disk after last connection Hannes Reinecke
@ 2020-10-05 12:45 ` Hannes Reinecke
  2020-10-05 12:52   ` Christoph Hellwig
  1 sibling, 1 reply; 21+ messages in thread
From: Hannes Reinecke @ 2020-10-05 12:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch, Hannes Reinecke

Currently namespaces behave differently depending on the 'CMIC'
setting. If CMIC is zero, the device is removed once the last path
goes away. If CMIC has the multipath bit set, the device is retained
even if the last path is removed.
This is okay for fabrics, where one can do an explicit disconnect
to remove the device, but for nvme-pci this induces a regression
with PCI hotplug.
When the NVMe device is opened (eg by MD), the NVMe device is not
removed after a PCI hot-remove. Hence MD will not be notified about
the event, and will continue to consider this device as operational.
Consequently, upon PCI hot-add the device shows up as a new NVMe
device, and MD will fail to reattach the device.
So this patch adds NVME_NSHEAD_QUEUE_IF_NO_PATH flag to the nshead
to restore the original behaviour for non-fabrics NVMe devices.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/core.c      | 10 +++++++++-
 drivers/nvme/host/multipath.c | 38 ++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h      |  2 ++
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4459a40b057c..e21c32ea4b51 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -475,8 +475,11 @@ static void nvme_free_ns_head(struct kref *ref)
 		container_of(ref, struct nvme_ns_head, ref);
 
 #ifdef CONFIG_NVME_MULTIPATH
-	if (head->disk)
+	if (head->disk) {
+		if (test_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags))
+			nvme_mpath_remove_disk(head);
 		put_disk(head->disk);
+	}
 #endif
 	ida_simple_remove(&head->subsys->ns_ida, head->instance);
 	cleanup_srcu_struct(&head->srcu);
@@ -3357,6 +3360,7 @@ static struct attribute *nvme_ns_id_attrs[] = {
 #ifdef CONFIG_NVME_MULTIPATH
 	&dev_attr_ana_grpid.attr,
 	&dev_attr_ana_state.attr,
+	&dev_attr_queue_if_no_path.attr,
 #endif
 	NULL,
 };
@@ -3387,6 +3391,10 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
 		if (!nvme_ctrl_use_ana(nvme_get_ns_from_dev(dev)->ctrl))
 			return 0;
 	}
+	if (a == &dev_attr_queue_if_no_path.attr) {
+		if (dev_to_disk(dev)->fops == &nvme_fops)
+			return 0;
+	}
 #endif
 	return a->mode;
 }
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 55045291b4de..bbdad5917112 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -381,6 +381,9 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 	/* set to a default value for 512 until disk is validated */
 	blk_queue_logical_block_size(q, 512);
 	blk_set_stacking_limits(&q->limits);
+	/* Enable queue_if_no_path semantics for fabrics */
+	if (ctrl->ops->flags & NVME_F_FABRICS)
+		set_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags);
 
 	/* we need to propagate up the VMC settings */
 	if (ctrl->vwc & NVME_CTRL_VWC_PRESENT)
@@ -640,6 +643,37 @@ static ssize_t ana_state_show(struct device *dev, struct device_attribute *attr,
 }
 DEVICE_ATTR_RO(ana_state);
 
+static ssize_t queue_if_no_path_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct nvme_ns_head *head = nvme_get_ns_from_dev(dev)->head;
+
+	return sprintf(buf, "%s\n",
+		       test_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags) ?
+		       "on" : "off");
+}
+
+static ssize_t queue_if_no_path_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct nvme_ns_head *head = nvme_get_ns_from_dev(dev)->head;
+	int err;
+	bool queue_if_no_path;
+
+	err = kstrtobool(buf, &queue_if_no_path);
+	if (err)
+		return -EINVAL;
+
+	if (queue_if_no_path)
+		set_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags);
+	else
+		clear_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags);
+
+	return count;
+}
+DEVICE_ATTR(queue_if_no_path, S_IRUGO | S_IWUSR,
+	queue_if_no_path_show, queue_if_no_path_store);
+
 static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl,
 		struct nvme_ana_group_desc *desc, void *data)
 {
@@ -682,6 +716,10 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
 	if (!head->disk)
 		return;
+	if (test_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags)) {
+		kblockd_schedule_work(&head->requeue_work);
+		return;
+	}
 	if (head->disk->flags & GENHD_FL_UP)
 		del_gendisk(head->disk);
 	blk_set_queue_dying(head->disk->queue);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index b6180bb3361d..94ae06cdc934 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -411,6 +411,7 @@ struct nvme_ns_head {
 	struct mutex		lock;
 	unsigned long		flags;
 #define NVME_NSHEAD_DISK_LIVE	0
+#define NVME_NSHEAD_QUEUE_IF_NO_PATH 1
 	struct nvme_ns __rcu	*current_path[];
 #endif
 };
@@ -684,6 +685,7 @@ static inline void nvme_trace_bio_complete(struct request *req,
 
 extern struct device_attribute dev_attr_ana_grpid;
 extern struct device_attribute dev_attr_ana_state;
+extern struct device_attribute dev_attr_queue_if_no_path;
 extern struct device_attribute subsys_attr_iopolicy;
 
 #else
-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] nvme-mpath: delete disk after last connection
  2020-10-05 12:44 ` [PATCH 1/2] nvme-mpath: delete disk after last connection Hannes Reinecke
@ 2020-10-05 12:50   ` Christoph Hellwig
  2021-03-05 20:06     ` Sagi Grimberg
  2021-03-04 14:34   ` Daniel Wagner
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2020-10-05 12:50 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Keith Busch, linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg

> -	nvme_mpath_remove_disk(head);
> +#ifdef CONFIG_NVME_MULTIPATH
> +	if (head->disk)
> +		put_disk(head->disk);
> +#endif

Please move this into a helper in multipath.c with a stub for the
non-multipath case.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-05 12:45 ` [PATCH 2/2] nvme: add 'queue_if_no_path' semantics Hannes Reinecke
@ 2020-10-05 12:52   ` Christoph Hellwig
  2020-10-06  5:48     ` Hannes Reinecke
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2020-10-05 12:52 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg

On Mon, Oct 05, 2020 at 02:45:00PM +0200, Hannes Reinecke wrote:
> Currently namespaces behave differently depending on the 'CMIC'
> setting. If CMIC is zero, the device is removed once the last path
> goes away. If CMIC has the multipath bit set, the device is retained
> even if the last path is removed.
> This is okay for fabrics, where one can do an explicit disconnect
> to remove the device, but for nvme-pci this induces a regression
> with PCI hotplug.
> When the NVMe device is opened (eg by MD), the NVMe device is not
> removed after a PCI hot-remove. Hence MD will not be notified about
> the event, and will continue to consider this device as operational.
> Consequently, upon PCI hot-add the device shows up as a new NVMe
> device, and MD will fail to reattach the device.
> So this patch adds NVME_NSHEAD_QUEUE_IF_NO_PATH flag to the nshead
> to restore the original behaviour for non-fabrics NVMe devices.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/nvme/host/core.c      | 10 +++++++++-
>  drivers/nvme/host/multipath.c | 38 ++++++++++++++++++++++++++++++++++++++
>  drivers/nvme/host/nvme.h      |  2 ++
>  3 files changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 4459a40b057c..e21c32ea4b51 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -475,8 +475,11 @@ static void nvme_free_ns_head(struct kref *ref)
>  		container_of(ref, struct nvme_ns_head, ref);
>  
>  #ifdef CONFIG_NVME_MULTIPATH
> -	if (head->disk)
> +	if (head->disk) {
> +		if (test_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags))
> +			nvme_mpath_remove_disk(head);
>  		put_disk(head->disk);
> +	}
>  #endif
>  	ida_simple_remove(&head->subsys->ns_ida, head->instance);
>  	cleanup_srcu_struct(&head->srcu);
> @@ -3357,6 +3360,7 @@ static struct attribute *nvme_ns_id_attrs[] = {
>  #ifdef CONFIG_NVME_MULTIPATH
>  	&dev_attr_ana_grpid.attr,
>  	&dev_attr_ana_state.attr,
> +	&dev_attr_queue_if_no_path.attr,
>  #endif
>  	NULL,
>  };
> @@ -3387,6 +3391,10 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
>  		if (!nvme_ctrl_use_ana(nvme_get_ns_from_dev(dev)->ctrl))
>  			return 0;
>  	}
> +	if (a == &dev_attr_queue_if_no_path.attr) {
> +		if (dev_to_disk(dev)->fops == &nvme_fops)
> +			return 0;
> +	}
>  #endif
>  	return a->mode;
>  }
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 55045291b4de..bbdad5917112 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -381,6 +381,9 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
>  	/* set to a default value for 512 until disk is validated */
>  	blk_queue_logical_block_size(q, 512);
>  	blk_set_stacking_limits(&q->limits);
> +	/* Enable queue_if_no_path semantics for fabrics */
> +	if (ctrl->ops->flags & NVME_F_FABRICS)
> +		set_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags);

Well, that is blindly obvious from the code.  But why would we treat
fabrics special? 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-05 12:52   ` Christoph Hellwig
@ 2020-10-06  5:48     ` Hannes Reinecke
  2020-10-06  7:51       ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Hannes Reinecke @ 2020-10-06  5:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch

On 10/5/20 2:52 PM, Christoph Hellwig wrote:
> On Mon, Oct 05, 2020 at 02:45:00PM +0200, Hannes Reinecke wrote:
>> Currently namespaces behave differently depending on the 'CMIC'
>> setting. If CMIC is zero, the device is removed once the last path
>> goes away. If CMIC has the multipath bit set, the device is retained
>> even if the last path is removed.
>> This is okay for fabrics, where one can do an explicit disconnect
>> to remove the device, but for nvme-pci this induces a regression
>> with PCI hotplug.
>> When the NVMe device is opened (eg by MD), the NVMe device is not
>> removed after a PCI hot-remove. Hence MD will not be notified about
>> the event, and will continue to consider this device as operational.
>> Consequently, upon PCI hot-add the device shows up as a new NVMe
>> device, and MD will fail to reattach the device.
>> So this patch adds NVME_NSHEAD_QUEUE_IF_NO_PATH flag to the nshead
>> to restore the original behaviour for non-fabrics NVMe devices.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/nvme/host/core.c      | 10 +++++++++-
>>   drivers/nvme/host/multipath.c | 38 ++++++++++++++++++++++++++++++++++++++
>>   drivers/nvme/host/nvme.h      |  2 ++
>>   3 files changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 4459a40b057c..e21c32ea4b51 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -475,8 +475,11 @@ static void nvme_free_ns_head(struct kref *ref)
>>   		container_of(ref, struct nvme_ns_head, ref);
>>   
>>   #ifdef CONFIG_NVME_MULTIPATH
>> -	if (head->disk)
>> +	if (head->disk) {
>> +		if (test_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags))
>> +			nvme_mpath_remove_disk(head);
>>   		put_disk(head->disk);
>> +	}
>>   #endif
>>   	ida_simple_remove(&head->subsys->ns_ida, head->instance);
>>   	cleanup_srcu_struct(&head->srcu);
>> @@ -3357,6 +3360,7 @@ static struct attribute *nvme_ns_id_attrs[] = {
>>   #ifdef CONFIG_NVME_MULTIPATH
>>   	&dev_attr_ana_grpid.attr,
>>   	&dev_attr_ana_state.attr,
>> +	&dev_attr_queue_if_no_path.attr,
>>   #endif
>>   	NULL,
>>   };
>> @@ -3387,6 +3391,10 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
>>   		if (!nvme_ctrl_use_ana(nvme_get_ns_from_dev(dev)->ctrl))
>>   			return 0;
>>   	}
>> +	if (a == &dev_attr_queue_if_no_path.attr) {
>> +		if (dev_to_disk(dev)->fops == &nvme_fops)
>> +			return 0;
>> +	}
>>   #endif
>>   	return a->mode;
>>   }
>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>> index 55045291b4de..bbdad5917112 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -381,6 +381,9 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
>>   	/* set to a default value for 512 until disk is validated */
>>   	blk_queue_logical_block_size(q, 512);
>>   	blk_set_stacking_limits(&q->limits);
>> +	/* Enable queue_if_no_path semantics for fabrics */
>> +	if (ctrl->ops->flags & NVME_F_FABRICS)
>> +		set_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags);
> 
> Well, that is blindly obvious from the code.  But why would we treat
> fabrics special?
> 
Well, because it's established behaviour of the current code.
Changing it now has the potential to break existing scenarios.

For PCI (ie non-fabrics) the current behaviour is arguably a corner case 
(as one needs to have a PCI-NVMe with CMIC bit set), but once you have 
it PCI hotplug is done for. So there we really want to change the 
behaviour to get the same user experience for all NVMe drives.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-06  5:48     ` Hannes Reinecke
@ 2020-10-06  7:51       ` Christoph Hellwig
  2020-10-06  8:07         ` Hannes Reinecke
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2020-10-06  7:51 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg

On Tue, Oct 06, 2020 at 07:48:40AM +0200, Hannes Reinecke wrote:
> Well, because it's established behaviour of the current code.
> Changing it now has the potential to break existing scenarios.
>
> For PCI (ie non-fabrics) the current behaviour is arguably a corner case 
> (as one needs to have a PCI-NVMe with CMIC bit set), but once you have it 
> PCI hotplug is done for. So there we really want to change the behaviour to 
> get the same user experience for all NVMe drives.

Dual port PCI NVMe devices are pretty common in the enterprise space.
I really don't see any good reason to differ here.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-06  7:51       ` Christoph Hellwig
@ 2020-10-06  8:07         ` Hannes Reinecke
  2020-10-06  8:27           ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Hannes Reinecke @ 2020-10-06  8:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch

On 10/6/20 9:51 AM, Christoph Hellwig wrote:
> On Tue, Oct 06, 2020 at 07:48:40AM +0200, Hannes Reinecke wrote:
>> Well, because it's established behaviour of the current code.
>> Changing it now has the potential to break existing scenarios.
>>
>> For PCI (ie non-fabrics) the current behaviour is arguably a corner case
>> (as one needs to have a PCI-NVMe with CMIC bit set), but once you have it
>> PCI hotplug is done for. So there we really want to change the behaviour to
>> get the same user experience for all NVMe drives.
> 
> Dual port PCI NVMe devices are pretty common in the enterprise space.
> I really don't see any good reason to differ here.
> 
?
Differ from what?

Normal (non-dual ported) PCI-NVMe?
NVMe-oF?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-06  8:07         ` Hannes Reinecke
@ 2020-10-06  8:27           ` Christoph Hellwig
  2020-10-06  8:29             ` Hannes Reinecke
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2020-10-06  8:27 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg

On Tue, Oct 06, 2020 at 10:07:49AM +0200, Hannes Reinecke wrote:
> On 10/6/20 9:51 AM, Christoph Hellwig wrote:
>> On Tue, Oct 06, 2020 at 07:48:40AM +0200, Hannes Reinecke wrote:
>>> Well, because it's established behaviour of the current code.
>>> Changing it now has the potential to break existing scenarios.
>>>
>>> For PCI (ie non-fabrics) the current behaviour is arguably a corner case
>>> (as one needs to have a PCI-NVMe with CMIC bit set), but once you have it
>>> PCI hotplug is done for. So there we really want to change the behaviour to
>>> get the same user experience for all NVMe drives.
>>
>> Dual port PCI NVMe devices are pretty common in the enterprise space.
>> I really don't see any good reason to differ here.
>>
> ?
> Differ from what?
>
> Normal (non-dual ported) PCI-NVMe?
> NVMe-oF?

All multipath devices should behave the same.  No special casing for
PCIe, please.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-06  8:27           ` Christoph Hellwig
@ 2020-10-06  8:29             ` Hannes Reinecke
  2020-10-06  8:39               ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Hannes Reinecke @ 2020-10-06  8:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch

On 10/6/20 10:27 AM, Christoph Hellwig wrote:
> On Tue, Oct 06, 2020 at 10:07:49AM +0200, Hannes Reinecke wrote:
>> On 10/6/20 9:51 AM, Christoph Hellwig wrote:
>>> On Tue, Oct 06, 2020 at 07:48:40AM +0200, Hannes Reinecke wrote:
>>>> Well, because it's established behaviour of the current code.
>>>> Changing it now has the potential to break existing scenarios.
>>>>
>>>> For PCI (ie non-fabrics) the current behaviour is arguably a corner case
>>>> (as one needs to have a PCI-NVMe with CMIC bit set), but once you have it
>>>> PCI hotplug is done for. So there we really want to change the behaviour to
>>>> get the same user experience for all NVMe drives.
>>>
>>> Dual port PCI NVMe devices are pretty common in the enterprise space.
>>> I really don't see any good reason to differ here.
>>>
>> ?
>> Differ from what?
>>
>> Normal (non-dual ported) PCI-NVMe?
>> NVMe-oF?
> 
> All multipath devices should behave the same.  No special casing for
> PCIe, please.
> 
Even if the default behaviour breaks PCI hotplug?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-06  8:29             ` Hannes Reinecke
@ 2020-10-06  8:39               ` Christoph Hellwig
  2020-10-06 13:30                 ` Hannes Reinecke
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2020-10-06  8:39 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg

On Tue, Oct 06, 2020 at 10:29:49AM +0200, Hannes Reinecke wrote:
>> All multipath devices should behave the same.  No special casing for
>> PCIe, please.
>>
> Even if the default behaviour breaks PCI hotplug?

Why would it "break" PCI hotplug?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-06  8:39               ` Christoph Hellwig
@ 2020-10-06 13:30                 ` Hannes Reinecke
  2020-10-06 13:45                   ` Hannes Reinecke
  2020-10-06 17:41                   ` Keith Busch
  0 siblings, 2 replies; 21+ messages in thread
From: Hannes Reinecke @ 2020-10-06 13:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch

On 10/6/20 10:39 AM, Christoph Hellwig wrote:
> On Tue, Oct 06, 2020 at 10:29:49AM +0200, Hannes Reinecke wrote:
>>> All multipath devices should behave the same.  No special casing for
>>> PCIe, please.
>>>
>> Even if the default behaviour breaks PCI hotplug?
> 
> Why would it "break" PCI hotplug?
> 
When running under MD RAID:
Before hotplug:
# nvme list
Node             SN                   Model 
       Namespace Usage                      Format           FW Rev
---------------- -------------------- 
---------------------------------------- --------- 
-------------------------- ---------------- --------
/dev/nvme0n1     SLESNVME1            QEMU NVMe Ctrl 
       1          17.18  GB /  17.18  GB    512   B +  0 B   1.0
/dev/nvme1n1     SLESNVME2            QEMU NVMe Ctrl 
       1           4.29  GB /   4.29  GB    512   B +  0 B   1.0
/dev/nvme2n1     SLESNVME3            QEMU NVMe Ctrl 
       1           4.29  GB /   4.29  GB    512   B +  0 B   1.0
After hotplug:

# nvme list
Node             SN                   Model 
       Namespace Usage                      Format           FW Rev
---------------- -------------------- 
---------------------------------------- --------- 
-------------------------- ---------------- --------
/dev/nvme0n1     SLESNVME1            QEMU NVMe Ctrl 
       1          17.18  GB /  17.18  GB    512   B +  0 B   1.0
/dev/nvme1n1     SLESNVME2            QEMU NVMe Ctrl 
       -1          0.00   B /   0.00   B      1   B +  0 B   1.0
/dev/nvme1n2     SLESNVME2            QEMU NVMe Ctrl 
       1           4.29  GB /   4.29  GB    512   B +  0 B   1.0
/dev/nvme2n1     SLESNVME3            QEMU NVMe Ctrl 
       1           4.29  GB /   4.29  GB    512   B +  0 B   1.0

And MD hasn't been notified that the device is gone:
# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 nvme2n1[1] nvme1n1[0]
       4189184 blocks super 1.2 2 near-copies [2/2] [UU]
       bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>

Once I do some I/O to it MD recognized a faulty device:

# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 nvme2n1[1] nvme1n1[0](F)
       4189184 blocks super 1.2 2 near-copies [2/1] [_U]
       bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>

but the re-added device isn't added to the MD RAID.
In fact, it has been assigned a _different_ namespace ID:

[  904.299065] pcieport 0000:00:08.0: pciehp: Slot(0-1): Card present
[  904.299067] pcieport 0000:00:08.0: pciehp: Slot(0-1): Link Up
[  904.435314] pci 0000:02:00.0: [8086:5845] type 00 class 0x010802
[  904.435523] pci 0000:02:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
[  904.435676] pci 0000:02:00.0: reg 0x20: [mem 0x00000000-0x00000fff]
[  904.436982] pci 0000:02:00.0: BAR 0: assigned [mem 
0xc1200000-0xc1201fff 64bit]
[  904.437086] pci 0000:02:00.0: BAR 4: assigned [mem 0xc1202000-0xc1202fff]
[  904.437118] pcieport 0000:00:08.0: PCI bridge to [bus 02]
[  904.437137] pcieport 0000:00:08.0:   bridge window [io  0x7000-0x7fff]
[  904.439024] pcieport 0000:00:08.0:   bridge window [mem 
0xc1200000-0xc13fffff]
[  904.440229] pcieport 0000:00:08.0:   bridge window [mem 
0x802000000-0x803ffffff 64bit pref]
[  904.447150] nvme nvme3: pci function 0000:02:00.0
[  904.447487] nvme 0000:02:00.0: enabling device (0000 -> 0002)
[  904.458880] nvme nvme3: 1/0/0 default/read/poll queues
[  904.461296] nvme1n2: detected capacity change from 0 to 4294967296

and the 'old', pre-hotplug device still lingers on in the 'nvme list' 
output.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-06 13:30                 ` Hannes Reinecke
@ 2020-10-06 13:45                   ` Hannes Reinecke
  2021-03-05 20:31                     ` Sagi Grimberg
  2020-10-06 17:41                   ` Keith Busch
  1 sibling, 1 reply; 21+ messages in thread
From: Hannes Reinecke @ 2020-10-06 13:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, Sagi Grimberg, linux-nvme

On 10/6/20 3:30 PM, Hannes Reinecke wrote:
> On 10/6/20 10:39 AM, Christoph Hellwig wrote:
>> On Tue, Oct 06, 2020 at 10:29:49AM +0200, Hannes Reinecke wrote:
>>>> All multipath devices should behave the same.  No special casing for
>>>> PCIe, please.
>>>>
>>> Even if the default behaviour breaks PCI hotplug?
>>
>> Why would it "break" PCI hotplug?
>>
> When running under MD RAID:
> Before hotplug:
> # nvme list
> Node             SN                   Model       Namespace 
> Usage                      Format           FW Rev
> ---------------- -------------------- 
> ---------------------------------------- --------- 
> -------------------------- ---------------- --------
> /dev/nvme0n1     SLESNVME1            QEMU NVMe Ctrl       1          
> 17.18  GB /  17.18  GB    512   B +  0 B   1.0
> /dev/nvme1n1     SLESNVME2            QEMU NVMe Ctrl       1           
> 4.29  GB /   4.29  GB    512   B +  0 B   1.0
> /dev/nvme2n1     SLESNVME3            QEMU NVMe Ctrl       1           
> 4.29  GB /   4.29  GB    512   B +  0 B   1.0
> After hotplug:
> 
> # nvme list
> Node             SN                   Model       Namespace 
> Usage                      Format           FW Rev
> ---------------- -------------------- 
> ---------------------------------------- --------- 
> -------------------------- ---------------- --------
> /dev/nvme0n1     SLESNVME1            QEMU NVMe Ctrl       1          
> 17.18  GB /  17.18  GB    512   B +  0 B   1.0
> /dev/nvme1n1     SLESNVME2            QEMU NVMe Ctrl       -1          
> 0.00   B /   0.00   B      1   B +  0 B   1.0
> /dev/nvme1n2     SLESNVME2            QEMU NVMe Ctrl       1           
> 4.29  GB /   4.29  GB    512   B +  0 B   1.0
> /dev/nvme2n1     SLESNVME3            QEMU NVMe Ctrl       1           
> 4.29  GB /   4.29  GB    512   B +  0 B   1.0
> 
> And MD hasn't been notified that the device is gone:
> # cat /proc/mdstat
> Personalities : [raid10]
> md0 : active raid10 nvme2n1[1] nvme1n1[0]
>        4189184 blocks super 1.2 2 near-copies [2/2] [UU]
>        bitmap: 0/1 pages [0KB], 65536KB chunk
> 
> unused devices: <none>
> 
> Once I do some I/O to it MD recognized a faulty device:
> 
> # cat /proc/mdstat
> Personalities : [raid10]
> md0 : active raid10 nvme2n1[1] nvme1n1[0](F)
>        4189184 blocks super 1.2 2 near-copies [2/1] [_U]
>        bitmap: 0/1 pages [0KB], 65536KB chunk
> 
> unused devices: <none>
> 
> but the re-added device isn't added to the MD RAID.
> In fact, it has been assigned a _different_ namespace ID:
> 
> [  904.299065] pcieport 0000:00:08.0: pciehp: Slot(0-1): Card present
> [  904.299067] pcieport 0000:00:08.0: pciehp: Slot(0-1): Link Up
> [  904.435314] pci 0000:02:00.0: [8086:5845] type 00 class 0x010802
> [  904.435523] pci 0000:02:00.0: reg 0x10: [mem 0x00000000-0x00001fff 
> 64bit]
> [  904.435676] pci 0000:02:00.0: reg 0x20: [mem 0x00000000-0x00000fff]
> [  904.436982] pci 0000:02:00.0: BAR 0: assigned [mem 
> 0xc1200000-0xc1201fff 64bit]
> [  904.437086] pci 0000:02:00.0: BAR 4: assigned [mem 
> 0xc1202000-0xc1202fff]
> [  904.437118] pcieport 0000:00:08.0: PCI bridge to [bus 02]
> [  904.437137] pcieport 0000:00:08.0:   bridge window [io  0x7000-0x7fff]
> [  904.439024] pcieport 0000:00:08.0:   bridge window [mem 
> 0xc1200000-0xc13fffff]
> [  904.440229] pcieport 0000:00:08.0:   bridge window [mem 
> 0x802000000-0x803ffffff 64bit pref]
> [  904.447150] nvme nvme3: pci function 0000:02:00.0
> [  904.447487] nvme 0000:02:00.0: enabling device (0000 -> 0002)
> [  904.458880] nvme nvme3: 1/0/0 default/read/poll queues
> [  904.461296] nvme1n2: detected capacity change from 0 to 4294967296
> 
> and the 'old', pre-hotplug device still lingers on in the 'nvme list' 
> output.
> 
Compare that to the 'standard', non-CMIC nvme, where with the same setup 
MD would detach the nvme on its own:

# cat /proc/mdstat
Personalities : [raid10]
md127 : active (auto-read-only) raid10 nvme2n1[1]
       4189184 blocks super 1.2 2 near-copies [2/1] [_U]
       bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>
# nvme list
Node             SN                   Model 
       Namespace Usage                      Format           FW Rev
---------------- -------------------- 
---------------------------------------- --------- 
-------------------------- ---------------- --------
/dev/nvme0n1     SLESNVME1            QEMU NVMe Ctrl 
       1          17.18  GB /  17.18  GB    512   B +  0 B   1.0
/dev/nvme2n1     SLESNVME3            QEMU NVMe Ctrl 
       1           4.29  GB /   4.29  GB    512   B +  0 B   1.0

And yes, this is exactly the same setup, the only difference being the 
CMIC setting for the NVMe device.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-06 13:30                 ` Hannes Reinecke
  2020-10-06 13:45                   ` Hannes Reinecke
@ 2020-10-06 17:41                   ` Keith Busch
  2021-03-05 20:11                     ` Sagi Grimberg
  1 sibling, 1 reply; 21+ messages in thread
From: Keith Busch @ 2020-10-06 17:41 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Keith Busch, Christoph Hellwig, linux-nvme, Sagi Grimberg

On Tue, Oct 06, 2020 at 03:30:14PM +0200, Hannes Reinecke wrote:
> but the re-added device isn't added to the MD RAID.
> In fact, it has been assigned a _different_ namespace ID:

That's actually just a driver assigned instance rather than a Namespace
ID. 
 
> [  904.299065] pcieport 0000:00:08.0: pciehp: Slot(0-1): Card present
> [  904.299067] pcieport 0000:00:08.0: pciehp: Slot(0-1): Link Up
> [  904.435314] pci 0000:02:00.0: [8086:5845] type 00 class 0x010802
> [  904.435523] pci 0000:02:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> [  904.435676] pci 0000:02:00.0: reg 0x20: [mem 0x00000000-0x00000fff]
> [  904.436982] pci 0000:02:00.0: BAR 0: assigned [mem 0xc1200000-0xc1201fff 64bit]
> [  904.437086] pci 0000:02:00.0: BAR 4: assigned [mem 0xc1202000-0xc1202fff]
> [  904.437118] pcieport 0000:00:08.0: PCI bridge to [bus 02]
> [  904.437137] pcieport 0000:00:08.0:   bridge window [io  0x7000-0x7fff]
> [  904.439024] pcieport 0000:00:08.0:   bridge window [mem 0xc1200000-0xc13fffff]
> [  904.440229] pcieport 0000:00:08.0:   bridge window [mem 0x802000000-0x803ffffff 64bit pref]
> [  904.447150] nvme nvme3: pci function 0000:02:00.0
> [  904.447487] nvme 0000:02:00.0: enabling device (0000 -> 0002)
> [  904.458880] nvme nvme3: 1/0/0 default/read/poll queues
> [  904.461296] nvme1n2: detected capacity change from 0 to 4294967296
> 
> and the 'old', pre-hotplug device still lingers on in the 'nvme list'
> output.

I think what you're showing is a problem for everyone, including
fabrics. The namespace was attached to the previously existing
subsystem, nvme1, but we didn't find the previously existing "head" to
reattach the namespace.

I just threw the below together very hastily, so I'm sure there's
something wrong with it, but just to capture what's wrong with the
current code:

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3833584b5cee..0b3bbbe6d1e8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3897,8 +3897,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 
 	mutex_lock(&ns->ctrl->subsys->lock);
 	list_del_rcu(&ns->siblings);
-	if (list_empty(&ns->head->list))
-		list_del_init(&ns->head->entry);
 	mutex_unlock(&ns->ctrl->subsys->lock);
 
 	synchronize_rcu(); /* guarantee not available in head->list */
--

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] nvme-mpath: delete disk after last connection
  2020-10-05 12:44 ` [PATCH 1/2] nvme-mpath: delete disk after last connection Hannes Reinecke
  2020-10-05 12:50   ` Christoph Hellwig
@ 2021-03-04 14:34   ` Daniel Wagner
  1 sibling, 0 replies; 21+ messages in thread
From: Daniel Wagner @ 2021-03-04 14:34 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Sagi Grimberg, Keith Busch

On Mon, Oct 05, 2020 at 02:44:59PM +0200, Hannes Reinecke wrote:
>  # cat /proc/mdstat
>  Personalities : [raid1]
>  md0 : active raid1 nvme5n1[1] nvme3n1[0]
>        64512 blocks super 1.2 [2/2] [UU]
> 
> Now delete all paths to one of the namespaces:
> 
>  # echo 1 > /sys/class/nvme/nvme3/delete_controller
>  # echo 1 > /sys/class/nvme/nvme4/delete_controller
> 
> We have no path, but mdstat says:
> 
>  # cat /proc/mdstat
>  Personalities : [raid1]
>  md0 : active (auto-read-only) raid1 nvme5n1[1]
>        64512 blocks super 1.2 [2/1] [_U]

Shouldn't there be nvme3n1 still listed?


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] nvme-mpath: delete disk after last connection
  2020-10-05 12:50   ` Christoph Hellwig
@ 2021-03-05 20:06     ` Sagi Grimberg
  0 siblings, 0 replies; 21+ messages in thread
From: Sagi Grimberg @ 2021-03-05 20:06 UTC (permalink / raw)
  To: Christoph Hellwig, Hannes Reinecke; +Cc: Keith Busch, linux-nvme, Keith Busch


>> -	nvme_mpath_remove_disk(head);
>> +#ifdef CONFIG_NVME_MULTIPATH
>> +	if (head->disk)
>> +		put_disk(head->disk);
>> +#endif
> 
> Please move this into a helper in multipath.c with a stub for the
> non-multipath case.

Other than this comment, it looks good to me.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-06 17:41                   ` Keith Busch
@ 2021-03-05 20:11                     ` Sagi Grimberg
  2021-03-11 12:41                       ` Hannes Reinecke
  0 siblings, 1 reply; 21+ messages in thread
From: Sagi Grimberg @ 2021-03-05 20:11 UTC (permalink / raw)
  To: Keith Busch, Hannes Reinecke; +Cc: Christoph Hellwig, linux-nvme, Keith Busch


> I think what you're showing is a problem for everyone, including
> fabrics. The namespace was attached to the previously existing
> subsystem, nvme1, but we didn't find the previously existing "head" to
> reattach the namespace.
> 
> I just threw the below together very hastily, so I'm sure there's
> something wrong with it, but just to capture what's wrong with the
> current code:
> 
> ---
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 3833584b5cee..0b3bbbe6d1e8 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3897,8 +3897,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>   
>   	mutex_lock(&ns->ctrl->subsys->lock);
>   	list_del_rcu(&ns->siblings);
> -	if (list_empty(&ns->head->list))
> -		list_del_init(&ns->head->entry);

Not removing it ever from the head->list?

>   	mutex_unlock(&ns->ctrl->subsys->lock);
>   
>   	synchronize_rcu(); /* guarantee not available in head->list */
> --

This is a problem. beacuse if you don't remove the ns from the
head->list nothing prevents it from being re-selected as the
current_path after we synchronize the srcu, although it is going away...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2020-10-06 13:45                   ` Hannes Reinecke
@ 2021-03-05 20:31                     ` Sagi Grimberg
  2021-03-08 13:17                       ` Hannes Reinecke
  0 siblings, 1 reply; 21+ messages in thread
From: Sagi Grimberg @ 2021-03-05 20:31 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig; +Cc: linux-nvme, Keith Busch


> Compare that to the 'standard', non-CMIC nvme, where with the same setup 
> MD would detach the nvme on its own:
> 
> # cat /proc/mdstat
> Personalities : [raid10]
> md127 : active (auto-read-only) raid10 nvme2n1[1]
>        4189184 blocks super 1.2 2 near-copies [2/1] [_U]
>        bitmap: 0/1 pages [0KB], 65536KB chunk
> 
> unused devices: <none>
> # nvme list
> Node             SN                   Model       Namespace 
> Usage                      Format           FW Rev
> ---------------- -------------------- 
> ---------------------------------------- --------- 
> -------------------------- ---------------- --------
> /dev/nvme0n1     SLESNVME1            QEMU NVMe Ctrl       1          
> 17.18  GB /  17.18  GB    512   B +  0 B   1.0
> /dev/nvme2n1     SLESNVME3            QEMU NVMe Ctrl       1           
> 4.29  GB /   4.29  GB    512   B +  0 B   1.0
> 
> And yes, this is exactly the same setup, the only difference being the 
> CMIC setting for the NVMe device.

I agree with Christoph that we should do exactly the same for all.

Hannes, My understanding here is that you want the device to go away
after the last path disappeared because it breaks md, why don't you
want to have this also for fabrics?

You mention that in fabrics one can manually disconnect, but why should
the user resort to a manual disconnect?

Is something else broken with this behavior? Maybe I am missing
something here?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2021-03-05 20:31                     ` Sagi Grimberg
@ 2021-03-08 13:17                       ` Hannes Reinecke
  2021-03-15 17:21                         ` Sagi Grimberg
  0 siblings, 1 reply; 21+ messages in thread
From: Hannes Reinecke @ 2021-03-08 13:17 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig; +Cc: linux-nvme, Keith Busch

On 3/5/21 9:31 PM, Sagi Grimberg wrote:
> 
>> Compare that to the 'standard', non-CMIC nvme, where with the same
>> setup MD would detach the nvme on its own:
>>
>> # cat /proc/mdstat
>> Personalities : [raid10]
>> md127 : active (auto-read-only) raid10 nvme2n1[1]
>>        4189184 blocks super 1.2 2 near-copies [2/1] [_U]
>>        bitmap: 0/1 pages [0KB], 65536KB chunk
>>
>> unused devices: <none>
>> # nvme list
>> Node             SN                   Model       Namespace
>> Usage                      Format           FW Rev
>> ---------------- --------------------
>> ---------------------------------------- ---------
>> -------------------------- ---------------- --------
>> /dev/nvme0n1     SLESNVME1            QEMU NVMe Ctrl       1         
>> 17.18  GB /  17.18  GB    512   B +  0 B   1.0
>> /dev/nvme2n1     SLESNVME3            QEMU NVMe Ctrl       1          
>> 4.29  GB /   4.29  GB    512   B +  0 B   1.0
>>
>> And yes, this is exactly the same setup, the only difference being the
>> CMIC setting for the NVMe device.
> 
> I agree with Christoph that we should do exactly the same for all.
> 
> Hannes, My understanding here is that you want the device to go away
> after the last path disappeared because it breaks md, why don't you
> want to have this also for fabrics?
> 
Oh, I would _love_ to have it for fabrics per default.
If you agree with it I can resend Keiths original patch, which solves
the issue without the need for any additional settings.

> You mention that in fabrics one can manually disconnect, but why should
> the user resort to a manual disconnect?
> 
> Is something else broken with this behavior? Maybe I am missing
> something here?

I'm just trying to understand why the current logic was build into
fabrics in the first place.
But if no-one remembers (or there was none) I'd happily convert to the
PCI logic.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2021-03-05 20:11                     ` Sagi Grimberg
@ 2021-03-11 12:41                       ` Hannes Reinecke
  0 siblings, 0 replies; 21+ messages in thread
From: Hannes Reinecke @ 2021-03-11 12:41 UTC (permalink / raw)
  To: linux-nvme

On 3/5/21 9:11 PM, Sagi Grimberg wrote:
> 
>> I think what you're showing is a problem for everyone, including
>> fabrics. The namespace was attached to the previously existing
>> subsystem, nvme1, but we didn't find the previously existing "head" to
>> reattach the namespace.
>>
>> I just threw the below together very hastily, so I'm sure there's
>> something wrong with it, but just to capture what's wrong with the
>> current code:
>>
>> ---
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 3833584b5cee..0b3bbbe6d1e8 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -3897,8 +3897,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>>         mutex_lock(&ns->ctrl->subsys->lock);
>>       list_del_rcu(&ns->siblings);
>> -    if (list_empty(&ns->head->list))
>> -        list_del_init(&ns->head->entry);
> 
> Not removing it ever from the head->list?
> 
>>       mutex_unlock(&ns->ctrl->subsys->lock);
>>         synchronize_rcu(); /* guarantee not available in head->list */
>> -- 
> 
> This is a problem. beacuse if you don't remove the ns from the
> head->list nothing prevents it from being re-selected as the
> current_path after we synchronize the srcu, although it is going away...
> 
You misread the patch.
I'm not skipping the removal from 'head->list', I'm skipping the removal
from 'subsys->nsheads' (which is where 'head->entry' is pointing to).


But I'll be sending an updated patch which clarifies this situation.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
  2021-03-08 13:17                       ` Hannes Reinecke
@ 2021-03-15 17:21                         ` Sagi Grimberg
  0 siblings, 0 replies; 21+ messages in thread
From: Sagi Grimberg @ 2021-03-15 17:21 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig; +Cc: linux-nvme, Keith Busch


>>> And yes, this is exactly the same setup, the only difference being the
>>> CMIC setting for the NVMe device.
>>
>> I agree with Christoph that we should do exactly the same for all.
>>
>> Hannes, My understanding here is that you want the device to go away
>> after the last path disappeared because it breaks md, why don't you
>> want to have this also for fabrics?
>>
> Oh, I would _love_ to have it for fabrics per default.
> If you agree with it I can resend Keiths original patch, which solves
> the issue without the need for any additional settings.

I think that this is what we should do. Care to resend this for
both fabrics and pci?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-03-15 17:22 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-05 12:44 [RFC PATCHv3 0/2] nvme: queue_if_no_path functionality Hannes Reinecke
2020-10-05 12:44 ` [PATCH 1/2] nvme-mpath: delete disk after last connection Hannes Reinecke
2020-10-05 12:50   ` Christoph Hellwig
2021-03-05 20:06     ` Sagi Grimberg
2021-03-04 14:34   ` Daniel Wagner
2020-10-05 12:45 ` [PATCH 2/2] nvme: add 'queue_if_no_path' semantics Hannes Reinecke
2020-10-05 12:52   ` Christoph Hellwig
2020-10-06  5:48     ` Hannes Reinecke
2020-10-06  7:51       ` Christoph Hellwig
2020-10-06  8:07         ` Hannes Reinecke
2020-10-06  8:27           ` Christoph Hellwig
2020-10-06  8:29             ` Hannes Reinecke
2020-10-06  8:39               ` Christoph Hellwig
2020-10-06 13:30                 ` Hannes Reinecke
2020-10-06 13:45                   ` Hannes Reinecke
2021-03-05 20:31                     ` Sagi Grimberg
2021-03-08 13:17                       ` Hannes Reinecke
2021-03-15 17:21                         ` Sagi Grimberg
2020-10-06 17:41                   ` Keith Busch
2021-03-05 20:11                     ` Sagi Grimberg
2021-03-11 12:41                       ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).