linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/2] nvme: fixup MD RAID usage
@ 2021-02-25 11:05 Hannes Reinecke
  2021-02-25 11:05 ` [PATCH 1/2] nvme: add 'fast_io_fail_tmo' controller sysfs attribute Hannes Reinecke
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Hannes Reinecke @ 2021-02-25 11:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch, Hannes Reinecke

Hi all,

ever since the implementation NVMe-oF does not work together with MD RAID.
MD RAID expects the device to return an I/O error on failure, and to remove
the block device if the underlying hardware is removed.
This is contrary to the implementation of NVMe-oF, which will keep on retrying
I/O while the controller is being reset, and will only remove the block device
once the last _user_ is gone.

While we already have a 'fast_io_fail_tmo' setting which would ensure
that I/O errors are returned after the timeout expired, this attribute
has to be set during the initial 'connect' call.
So the first patch adds a 'fasi_io_fail_tmo' controller sysfs
attribute which allows the admin to modify it during runtime.
To fixup the second issue a new attribute 'no_path_detach' is
implemented, which will cause the disk to be removed as once the
last controller holding a path to the namespace is removed (ie after
all reconnect attempts for that controllers are exhausted).

This is a rework of the earlier path by Keith Busch ('nvme-mpath: delete disk
after last connection'). Kudos to him for suggesting this approach.

Changes to v1:
- Integrated reviews from Sagi
- Use fast_io_fail_tmo instead of a new sysfs attribute
- Rename attribute to 'no_path_detach'

Hannes Reinecke (2):
  nvme: add 'fast_io_fail_tmo' controller sysfs attribute
  nvme: delete disk when last path is gone

 drivers/nvme/host/core.c      | 41 ++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/multipath.c | 31 +++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h      | 21 +++++++++++++++---
 3 files changed, 88 insertions(+), 5 deletions(-)

-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] nvme: add 'fast_io_fail_tmo' controller sysfs attribute
  2021-02-25 11:05 [PATCHv2 0/2] nvme: fixup MD RAID usage Hannes Reinecke
@ 2021-02-25 11:05 ` Hannes Reinecke
  2021-03-05 21:10   ` Sagi Grimberg
  2021-02-25 11:05 ` [PATCH 2/2] nvme: delete disk when last path is gone Hannes Reinecke
  2021-03-03  1:01 ` [PATCHv2 0/2] nvme: fixup MD RAID usage Minwoo Im
  2 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2021-02-25 11:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch, Hannes Reinecke

Add a 'fast_io_fail_tmo' controller sysfs attribute to make the
setting configurable during runtime.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/core.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4de6a3a13575..ba639049e385 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3658,6 +3658,37 @@ static ssize_t nvme_ctrl_loss_tmo_store(struct device *dev,
 static DEVICE_ATTR(ctrl_loss_tmo, S_IRUGO | S_IWUSR,
 	nvme_ctrl_loss_tmo_show, nvme_ctrl_loss_tmo_store);
 
+static ssize_t nvme_fast_io_fail_tmo_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+	struct nvmf_ctrl_options *opts = ctrl->opts;
+
+	if (opts->fast_io_fail_tmo == -1)
+		return sprintf(buf, "off\n");
+	return sprintf(buf, "%d\n", opts->fast_io_fail_tmo);
+}
+
+static ssize_t nvme_fast_io_fail_tmo_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+	struct nvmf_ctrl_options *opts = ctrl->opts;
+	int fast_io_fail_tmo, err;
+
+	err = kstrtoint(buf, 10, &fast_io_fail_tmo);
+	if (err)
+		return -EINVAL;
+
+	else if (fast_io_fail_tmo < 0)
+		opts->fast_io_fail_tmo = -1;
+	else
+		opts->fast_io_fail_tmo = fast_io_fail_tmo;
+	return count;
+}
+static DEVICE_ATTR(fast_io_fail_tmo, S_IRUGO | S_IWUSR,
+	nvme_fast_io_fail_tmo_show, nvme_fast_io_fail_tmo_store);
+
 static ssize_t nvme_ctrl_reconnect_delay_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -3703,6 +3734,7 @@ static struct attribute *nvme_dev_attrs[] = {
 	&dev_attr_hostnqn.attr,
 	&dev_attr_hostid.attr,
 	&dev_attr_ctrl_loss_tmo.attr,
+	&dev_attr_fast_io_fail_tmo.attr,
 	&dev_attr_reconnect_delay.attr,
 	NULL
 };
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] nvme: delete disk when last path is gone
  2021-02-25 11:05 [PATCHv2 0/2] nvme: fixup MD RAID usage Hannes Reinecke
  2021-02-25 11:05 ` [PATCH 1/2] nvme: add 'fast_io_fail_tmo' controller sysfs attribute Hannes Reinecke
@ 2021-02-25 11:05 ` Hannes Reinecke
  2021-02-25 16:59   ` Keith Busch
  2021-03-03  1:01 ` [PATCHv2 0/2] nvme: fixup MD RAID usage Minwoo Im
  2 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2021-02-25 11:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-nvme, Sagi Grimberg, Keith Busch, Hannes Reinecke

The multipath code currently deletes the disk only after all references
to it are dropped rather than when the last path to that disk is lost.
This differs from the behaviour in the non-multipathed case where the
disk is deleted once the controller is removed.
This has been reported to cause problems with some use cases like MD RAID.

This patch implements an alternative behaviour of deleting the disk when
the last path is gone, ie the same behaviour as non-multipathed nvme
devices. The alternative behaviour can be enabled with the new sysfs
attribute 'no_path_detach'.

Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/core.c      |  9 ++++++++-
 drivers/nvme/host/multipath.c | 31 ++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h      | 21 ++++++++++++++++++---
 3 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ba639049e385..7af6ba18f461 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -541,7 +541,9 @@ static void nvme_free_ns_head(struct kref *ref)
 	struct nvme_ns_head *head =
 		container_of(ref, struct nvme_ns_head, ref);
 
-	nvme_mpath_remove_disk(head);
+	if (!test_bit(NVME_NSHEAD_NO_PATH_DETACH, &head->flags))
+		nvme_mpath_remove_disk(head);
+	nvme_mpath_put_disk(head);
 	ida_simple_remove(&head->subsys->ns_ida, head->instance);
 	cleanup_srcu_struct(&head->srcu);
 	nvme_put_subsystem(head->subsys);
@@ -3464,6 +3466,7 @@ static struct attribute *nvme_ns_id_attrs[] = {
 #ifdef CONFIG_NVME_MULTIPATH
 	&dev_attr_ana_grpid.attr,
 	&dev_attr_ana_state.attr,
+	&dev_attr_no_path_detach.attr,
 #endif
 	NULL,
 };
@@ -3494,6 +3497,10 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
 		if (!nvme_ctrl_use_ana(nvme_get_ns_from_dev(dev)->ctrl))
 			return 0;
 	}
+	if (a == &dev_attr_no_path_detach.attr) {
+		if (dev_to_disk(dev)->fops == &nvme_bdev_ops)
+			return 0;
+	}
 #endif
 	return a->mode;
 }
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 0696319adaf6..7d38f9272490 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -641,6 +641,36 @@ static ssize_t ana_state_show(struct device *dev, struct device_attribute *attr,
 }
 DEVICE_ATTR_RO(ana_state);
 
+static ssize_t no_path_detach_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+	struct nvme_ns_head *head = disk->private_data;
+
+	return sprintf(buf, "%d\n",
+		       test_bit(NVME_NSHEAD_NO_PATH_DETACH, &head->flags) ?
+		       1 : 0);
+}
+
+static ssize_t no_path_detach_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+	struct nvme_ns_head *head = disk->private_data;
+	int err, no_path_detach;
+
+	err = kstrtoint(buf, 10, &no_path_detach);
+	if (err || no_path_detach < 0)
+		return -EINVAL;
+	if (no_path_detach)
+		set_bit(NVME_NSHEAD_NO_PATH_DETACH, &head->flags);
+	else
+		clear_bit(NVME_NSHEAD_NO_PATH_DETACH, &head->flags);
+	return count;
+}
+DEVICE_ATTR(no_path_detach, S_IRUGO | S_IWUSR,
+	no_path_detach_show, no_path_detach_store);
+
 static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl,
 		struct nvme_ana_group_desc *desc, void *data)
 {
@@ -702,7 +732,6 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 		 */
 		head->disk->queue = NULL;
 	}
-	put_disk(head->disk);
 }
 
 int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 07b34175c6ce..317f8a6cb7f4 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -417,7 +417,8 @@ struct nvme_ns_head {
 	struct work_struct	requeue_work;
 	struct mutex		lock;
 	unsigned long		flags;
-#define NVME_NSHEAD_DISK_LIVE	0
+#define NVME_NSHEAD_DISK_LIVE		0
+#define NVME_NSHEAD_NO_PATH_DETACH	1
 	struct nvme_ns __rcu	*current_path[];
 #endif
 };
@@ -680,8 +681,12 @@ static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
 {
 	struct nvme_ns_head *head = ns->head;
 
-	if (head->disk && list_empty(&head->list))
-		kblockd_schedule_work(&head->requeue_work);
+	if (head->disk && list_empty(&head->list)) {
+		if (test_bit(NVME_NSHEAD_NO_PATH_DETACH, &head->flags))
+			nvme_mpath_remove_disk(head);
+		else
+			kblockd_schedule_work(&head->requeue_work);
+	}
 }
 
 static inline void nvme_trace_bio_complete(struct request *req)
@@ -692,8 +697,15 @@ static inline void nvme_trace_bio_complete(struct request *req)
 		trace_block_bio_complete(ns->head->disk->queue, req->bio);
 }
 
+static inline void nvme_mpath_put_disk(struct nvme_ns_head *head)
+{
+	if (head->disk)
+		put_disk(head->disk);
+}
+
 extern struct device_attribute dev_attr_ana_grpid;
 extern struct device_attribute dev_attr_ana_state;
+extern struct device_attribute dev_attr_no_path_detach;
 extern struct device_attribute subsys_attr_iopolicy;
 
 #else
@@ -729,6 +741,9 @@ static inline void nvme_mpath_add_disk(struct nvme_ns *ns,
 static inline void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
 }
+static inline void nvme_mpath_put_disk(struct nvme_ns_head *head)
+{
+}
 static inline bool nvme_mpath_clear_current_path(struct nvme_ns *ns)
 {
 	return false;
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] nvme: delete disk when last path is gone
  2021-02-25 11:05 ` [PATCH 2/2] nvme: delete disk when last path is gone Hannes Reinecke
@ 2021-02-25 16:59   ` Keith Busch
  2021-02-25 17:37     ` Hannes Reinecke
  0 siblings, 1 reply; 13+ messages in thread
From: Keith Busch @ 2021-02-25 16:59 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg

On Thu, Feb 25, 2021 at 12:05:34PM +0100, Hannes Reinecke wrote:
> The multipath code currently deletes the disk only after all references
> to it are dropped rather than when the last path to that disk is lost.
> This differs from the behaviour in the non-multipathed case where the
> disk is deleted once the controller is removed.
> This has been reported to cause problems with some use cases like MD RAID.
> 
> This patch implements an alternative behaviour of deleting the disk when
> the last path is gone, ie the same behaviour as non-multipathed nvme
> devices. The alternative behaviour can be enabled with the new sysfs
> attribute 'no_path_detach'.

This looks ok to me. I have heard from a few people that they expected
it to work this way with the option enabled, but I suppose we do need to
retain the old behavior as default.

Reviewed-by: Keith Busch <kbusch@kernel.org>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] nvme: delete disk when last path is gone
  2021-02-25 16:59   ` Keith Busch
@ 2021-02-25 17:37     ` Hannes Reinecke
  2021-03-05 21:12       ` Sagi Grimberg
  0 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2021-02-25 17:37 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg

On 2/25/21 5:59 PM, Keith Busch wrote:
> On Thu, Feb 25, 2021 at 12:05:34PM +0100, Hannes Reinecke wrote:
>> The multipath code currently deletes the disk only after all references
>> to it are dropped rather than when the last path to that disk is lost.
>> This differs from the behaviour in the non-multipathed case where the
>> disk is deleted once the controller is removed.
>> This has been reported to cause problems with some use cases like MD RAID.
>>
>> This patch implements an alternative behaviour of deleting the disk when
>> the last path is gone, ie the same behaviour as non-multipathed nvme
>> devices. The alternative behaviour can be enabled with the new sysfs
>> attribute 'no_path_detach'.
> 
> This looks ok to me. I have heard from a few people that they expected
> it to work this way with the option enabled, but I suppose we do need to
> retain the old behavior as default.
> 
> Reviewed-by: Keith Busch <kbusch@kernel.org>
> 
Oh, I would _love_ to kill the old behaviour.
Especially as we now have fast_io_fail_tmo and ctrl_loss_tmo, which 
gives us enough control on how the controller and the remaining paths 
should behave (and which weren't present when fabrics got implemented).

We can make this behaviour the default, and kill the old approach next 
year if there are no complaints :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv2 0/2] nvme: fixup MD RAID usage
  2021-02-25 11:05 [PATCHv2 0/2] nvme: fixup MD RAID usage Hannes Reinecke
  2021-02-25 11:05 ` [PATCH 1/2] nvme: add 'fast_io_fail_tmo' controller sysfs attribute Hannes Reinecke
  2021-02-25 11:05 ` [PATCH 2/2] nvme: delete disk when last path is gone Hannes Reinecke
@ 2021-03-03  1:01 ` Minwoo Im
  2 siblings, 0 replies; 13+ messages in thread
From: Minwoo Im @ 2021-03-03  1:01 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Christoph Hellwig, linux-nvme, Sagi Grimberg, Keith Busch

On 21-02-25 12:05:32, Hannes Reinecke wrote:
> Hi all,
> 
> ever since the implementation NVMe-oF does not work together with MD RAID.
> MD RAID expects the device to return an I/O error on failure, and to remove
> the block device if the underlying hardware is removed.
> This is contrary to the implementation of NVMe-oF, which will keep on retrying
> I/O while the controller is being reset, and will only remove the block device
> once the last _user_ is gone.
> 
> While we already have a 'fast_io_fail_tmo' setting which would ensure
> that I/O errors are returned after the timeout expired, this attribute
> has to be set during the initial 'connect' call.
> So the first patch adds a 'fasi_io_fail_tmo' controller sysfs
> attribute which allows the admin to modify it during runtime.
> To fixup the second issue a new attribute 'no_path_detach' is
> implemented, which will cause the disk to be removed as once the
> last controller holding a path to the namespace is removed (ie after
> all reconnect attempts for that controllers are exhausted).
> 
> This is a rework of the earlier path by Keith Busch ('nvme-mpath: delete disk
> after last connection'). Kudos to him for suggesting this approach.
> 
> Changes to v1:
> - Integrated reviews from Sagi
> - Use fast_io_fail_tmo instead of a new sysfs attribute
> - Rename attribute to 'no_path_detach'
> 
> Hannes Reinecke (2):
>   nvme: add 'fast_io_fail_tmo' controller sysfs attribute
>   nvme: delete disk when last path is gone
> 
>  drivers/nvme/host/core.c      | 41 ++++++++++++++++++++++++++++++++++-
>  drivers/nvme/host/multipath.c | 31 +++++++++++++++++++++++++-
>  drivers/nvme/host/nvme.h      | 21 +++++++++++++++---
>  3 files changed, 88 insertions(+), 5 deletions(-)

Looks good to me.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] nvme: add 'fast_io_fail_tmo' controller sysfs attribute
  2021-02-25 11:05 ` [PATCH 1/2] nvme: add 'fast_io_fail_tmo' controller sysfs attribute Hannes Reinecke
@ 2021-03-05 21:10   ` Sagi Grimberg
  0 siblings, 0 replies; 13+ messages in thread
From: Sagi Grimberg @ 2021-03-05 21:10 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig; +Cc: Keith Busch, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] nvme: delete disk when last path is gone
  2021-02-25 17:37     ` Hannes Reinecke
@ 2021-03-05 21:12       ` Sagi Grimberg
  0 siblings, 0 replies; 13+ messages in thread
From: Sagi Grimberg @ 2021-03-05 21:12 UTC (permalink / raw)
  To: Hannes Reinecke, Keith Busch; +Cc: Christoph Hellwig, Keith Busch, linux-nvme


>>> The multipath code currently deletes the disk only after all references
>>> to it are dropped rather than when the last path to that disk is lost.
>>> This differs from the behaviour in the non-multipathed case where the
>>> disk is deleted once the controller is removed.
>>> This has been reported to cause problems with some use cases like MD 
>>> RAID.
>>>
>>> This patch implements an alternative behaviour of deleting the disk when
>>> the last path is gone, ie the same behaviour as non-multipathed nvme
>>> devices. The alternative behaviour can be enabled with the new sysfs
>>> attribute 'no_path_detach'.
>>
>> This looks ok to me. I have heard from a few people that they expected
>> it to work this way with the option enabled, but I suppose we do need to
>> retain the old behavior as default.
>>
>> Reviewed-by: Keith Busch <kbusch@kernel.org>
>>
> Oh, I would _love_ to kill the old behaviour.
> Especially as we now have fast_io_fail_tmo and ctrl_loss_tmo, which 
> gives us enough control on how the controller and the remaining paths 
> should behave (and which weren't present when fabrics got implemented).
> 
> We can make this behaviour the default, and kill the old approach next 
> year if there are no complaints :-)

If this is broken, and this behavior doesn't break anything else then I
don't see why we shouldn't do that.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] nvme: delete disk when last path is gone
  2021-02-24 22:40   ` Sagi Grimberg
@ 2021-02-25  8:37     ` Hannes Reinecke
  0 siblings, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2021-02-25  8:37 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig; +Cc: Keith Busch, linux-nvme, Keith Busch

On 2/24/21 11:40 PM, Sagi Grimberg wrote:
> 
>> The multipath code currently deletes the disk only after all references
>> to it are dropped rather than when the last path to that disk is lost.
>> This has been reported to cause problems with some use cases like MD 
>> RAID.
> 
> What is the exact problem?
> 
> Can you describe what the problem you see now and what you expect
> to see (unrelated to patch #1)?
> 
The problem is a difference in behaviour between multipathed and 
non-multipathed namespaces (ie whether 'CMIC' is set or not).
If the CMIC bit is _not_ set, the disk device will be removed once
the controller is gone; if the CMIC bit is set the disk device will be 
retained, and only removed once the last _reference_ is dropped.

This is causing customer issues, as some vendors produce nearly 
identical PCI NVMe devices, which differ in the CMIC bit.
So depending on which device the customer uses, he might be getting on 
or the other behaviour.
And this is causing issues when said customer deploys MD RAID on thems;
with one set of devices PCI hotplug works, with the other set of devices 
it doesn't.

>> This patch implements an alternative behaviour of deleting the disk when
>> the last path is gone, ie the same behaviour as non-multipathed nvme
>> devices.
> 
> But we also don't remove the non-multipath'd nvme device until the
> last reference drops (e.g. if you have a mounted filesystem on top).
> 
Au contraire.

When doing PCI hotplug the controller is removed (in the non-multipathed 
case), and calling 'put_disk()' during nvme_free_ns().
When doing PCI hotplug in the non-multipathed case, the controller is 
removed, too, but put_disk() is only called on the namespace itself; the 
'nshead' disk is still kept around, and put_disk() on the 'nshead' disk 
is only called after the last reference is dropped.

> This would be the equivalent to running raid on top of dm-mpath on
> top of scsi devices right? And if all the mpath device nodes go away
> the mpath device is deleted even if it has an open reference to it?
> 
See above. The prime motivator behind this patch is to get equivalent 
behaviour between multipathed and non-multipathed devices.
It just so happens that MD RAID exercises this particular issue.

>> The new behaviour will be selected with the 'fail_if_no_path'
>> attribute, as returning it's arguably the same functionality.
> 
> But its not the same functionality.

Agreed. But as the first patch will be dropped (see my other mail) I'll 
be redoing the patchset anyway.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] nvme: delete disk when last path is gone
  2021-02-23 11:59 ` [PATCH 2/2] nvme: delete disk when last path is gone Hannes Reinecke
  2021-02-23 12:56   ` Minwoo Im
@ 2021-02-24 22:40   ` Sagi Grimberg
  2021-02-25  8:37     ` Hannes Reinecke
  1 sibling, 1 reply; 13+ messages in thread
From: Sagi Grimberg @ 2021-02-24 22:40 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig; +Cc: Keith Busch, linux-nvme, Keith Busch


> The multipath code currently deletes the disk only after all references
> to it are dropped rather than when the last path to that disk is lost.
> This has been reported to cause problems with some use cases like MD RAID.

What is the exact problem?

Can you describe what the problem you see now and what you expect
to see (unrelated to patch #1)?

> This patch implements an alternative behaviour of deleting the disk when
> the last path is gone, ie the same behaviour as non-multipathed nvme
> devices.

But we also don't remove the non-multipath'd nvme device until the
last reference drops (e.g. if you have a mounted filesystem on top).

This would be the equivalent to running raid on top of dm-mpath on
top of scsi devices right? And if all the mpath device nodes go away
the mpath device is deleted even if it has an open reference to it?

> The new behaviour will be selected with the 'fail_if_no_path'
> attribute, as returning it's arguably the same functionality.

But its not the same functionality.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] nvme: delete disk when last path is gone
  2021-02-23 12:56   ` Minwoo Im
@ 2021-02-23 14:07     ` Hannes Reinecke
  0 siblings, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2021-02-23 14:07 UTC (permalink / raw)
  To: Minwoo Im
  Cc: Keith Busch, Keith Busch, Christoph Hellwig, linux-nvme, Sagi Grimberg

On 2/23/21 1:56 PM, Minwoo Im wrote:
> On 21-02-23 12:59:22, Hannes Reinecke wrote:
>> The multipath code currently deletes the disk only after all references
>> to it are dropped rather than when the last path to that disk is lost.
>> This has been reported to cause problems with some use cases like MD RAID.
>>
>> This patch implements an alternative behaviour of deleting the disk when
>> the last path is gone, ie the same behaviour as non-multipathed nvme
>> devices. The new behaviour will be selected with the 'fail_if_no_path'
>> attribute, as returning it's arguably the same functionality.
>>
>> Suggested-by: Keith Busch <kbusch@kernel.org>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>  drivers/nvme/host/core.c      |  1 +
>>  drivers/nvme/host/multipath.c |  3 ++-
>>  drivers/nvme/host/nvme.h      | 17 +++++++++++++++--
>>  3 files changed, 18 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 2fb3ecc0c53b..d717a6283d6e 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -542,6 +542,7 @@ static void nvme_free_ns_head(struct kref *ref)
>>  		container_of(ref, struct nvme_ns_head, ref);
>>  
>>  	nvme_mpath_remove_disk(head);
>> +	nvme_mpath_put_disk(head);
>>  	ida_simple_remove(&head->subsys->ns_ida, head->instance);
>>  	cleanup_srcu_struct(&head->srcu);
>>  	nvme_put_subsystem(head->subsys);
>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>> index d5773ea105b1..f995b8234622 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -724,6 +724,8 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id)
>>  
>>  void nvme_mpath_remove_disk(struct nvme_ns_head *head)
>>  {
>> +	if (test_bit(NVME_NSHEAD_FAIL_IF_NO_PATH, &head->flags))
>> +		return;
>>  	if (!head->disk)
>>  		return;
>>  	if (head->disk->flags & GENHD_FL_UP)
>> @@ -741,7 +743,6 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
>>  		 */
>>  		head->disk->queue = NULL;
>>  	}
>> -	put_disk(head->disk);
>>  }
>>  
>>  int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>> index 3d2513f8194d..e6efa085f08a 100644
>> --- a/drivers/nvme/host/nvme.h
>> +++ b/drivers/nvme/host/nvme.h
>> @@ -681,8 +681,12 @@ static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
>>  {
>>  	struct nvme_ns_head *head = ns->head;
>>  
>> -	if (head->disk && list_empty(&head->list))
>> -		kblockd_schedule_work(&head->requeue_work);
>> +	if (head->disk && list_empty(&head->list)) {
>> +		if (test_bit(NVME_NSHEAD_FAIL_IF_NO_PATH, &head->flags))
>> +			nvme_mpath_remove_disk(head);
> 
> Does it need to call nvme_mpath_remove_disk here ?  It looks like it
> returns with nothing right away if NVME_NSHEAD_FAIL_IF_NO_PATH is set.
> 
Argl. Yes, you are correct.

I'll be reworking that one.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] nvme: delete disk when last path is gone
  2021-02-23 11:59 ` [PATCH 2/2] nvme: delete disk when last path is gone Hannes Reinecke
@ 2021-02-23 12:56   ` Minwoo Im
  2021-02-23 14:07     ` Hannes Reinecke
  2021-02-24 22:40   ` Sagi Grimberg
  1 sibling, 1 reply; 13+ messages in thread
From: Minwoo Im @ 2021-02-23 12:56 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Keith Busch, Keith Busch, Christoph Hellwig, linux-nvme, Sagi Grimberg

On 21-02-23 12:59:22, Hannes Reinecke wrote:
> The multipath code currently deletes the disk only after all references
> to it are dropped rather than when the last path to that disk is lost.
> This has been reported to cause problems with some use cases like MD RAID.
> 
> This patch implements an alternative behaviour of deleting the disk when
> the last path is gone, ie the same behaviour as non-multipathed nvme
> devices. The new behaviour will be selected with the 'fail_if_no_path'
> attribute, as returning it's arguably the same functionality.
> 
> Suggested-by: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/nvme/host/core.c      |  1 +
>  drivers/nvme/host/multipath.c |  3 ++-
>  drivers/nvme/host/nvme.h      | 17 +++++++++++++++--
>  3 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 2fb3ecc0c53b..d717a6283d6e 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -542,6 +542,7 @@ static void nvme_free_ns_head(struct kref *ref)
>  		container_of(ref, struct nvme_ns_head, ref);
>  
>  	nvme_mpath_remove_disk(head);
> +	nvme_mpath_put_disk(head);
>  	ida_simple_remove(&head->subsys->ns_ida, head->instance);
>  	cleanup_srcu_struct(&head->srcu);
>  	nvme_put_subsystem(head->subsys);
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index d5773ea105b1..f995b8234622 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -724,6 +724,8 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id)
>  
>  void nvme_mpath_remove_disk(struct nvme_ns_head *head)
>  {
> +	if (test_bit(NVME_NSHEAD_FAIL_IF_NO_PATH, &head->flags))
> +		return;
>  	if (!head->disk)
>  		return;
>  	if (head->disk->flags & GENHD_FL_UP)
> @@ -741,7 +743,6 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
>  		 */
>  		head->disk->queue = NULL;
>  	}
> -	put_disk(head->disk);
>  }
>  
>  int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 3d2513f8194d..e6efa085f08a 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -681,8 +681,12 @@ static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
>  {
>  	struct nvme_ns_head *head = ns->head;
>  
> -	if (head->disk && list_empty(&head->list))
> -		kblockd_schedule_work(&head->requeue_work);
> +	if (head->disk && list_empty(&head->list)) {
> +		if (test_bit(NVME_NSHEAD_FAIL_IF_NO_PATH, &head->flags))
> +			nvme_mpath_remove_disk(head);

Does it need to call nvme_mpath_remove_disk here ?  It looks like it
returns with nothing right away if NVME_NSHEAD_FAIL_IF_NO_PATH is set.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2] nvme: delete disk when last path is gone
  2021-02-23 11:59 [PATCH 0/2] nvme: fix regression with MD RAID Hannes Reinecke
@ 2021-02-23 11:59 ` Hannes Reinecke
  2021-02-23 12:56   ` Minwoo Im
  2021-02-24 22:40   ` Sagi Grimberg
  0 siblings, 2 replies; 13+ messages in thread
From: Hannes Reinecke @ 2021-02-23 11:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-nvme, Sagi Grimberg, Keith Busch, Hannes Reinecke

The multipath code currently deletes the disk only after all references
to it are dropped rather than when the last path to that disk is lost.
This has been reported to cause problems with some use cases like MD RAID.

This patch implements an alternative behaviour of deleting the disk when
the last path is gone, ie the same behaviour as non-multipathed nvme
devices. The new behaviour will be selected with the 'fail_if_no_path'
attribute, as returning it's arguably the same functionality.

Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/core.c      |  1 +
 drivers/nvme/host/multipath.c |  3 ++-
 drivers/nvme/host/nvme.h      | 17 +++++++++++++++--
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2fb3ecc0c53b..d717a6283d6e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -542,6 +542,7 @@ static void nvme_free_ns_head(struct kref *ref)
 		container_of(ref, struct nvme_ns_head, ref);
 
 	nvme_mpath_remove_disk(head);
+	nvme_mpath_put_disk(head);
 	ida_simple_remove(&head->subsys->ns_ida, head->instance);
 	cleanup_srcu_struct(&head->srcu);
 	nvme_put_subsystem(head->subsys);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index d5773ea105b1..f995b8234622 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -724,6 +724,8 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id)
 
 void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
+	if (test_bit(NVME_NSHEAD_FAIL_IF_NO_PATH, &head->flags))
+		return;
 	if (!head->disk)
 		return;
 	if (head->disk->flags & GENHD_FL_UP)
@@ -741,7 +743,6 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 		 */
 		head->disk->queue = NULL;
 	}
-	put_disk(head->disk);
 }
 
 int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 3d2513f8194d..e6efa085f08a 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -681,8 +681,12 @@ static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
 {
 	struct nvme_ns_head *head = ns->head;
 
-	if (head->disk && list_empty(&head->list))
-		kblockd_schedule_work(&head->requeue_work);
+	if (head->disk && list_empty(&head->list)) {
+		if (test_bit(NVME_NSHEAD_FAIL_IF_NO_PATH, &head->flags))
+			nvme_mpath_remove_disk(head);
+		else
+			kblockd_schedule_work(&head->requeue_work);
+	}
 }
 
 static inline void nvme_trace_bio_complete(struct request *req)
@@ -693,6 +697,12 @@ static inline void nvme_trace_bio_complete(struct request *req)
 		trace_block_bio_complete(ns->head->disk->queue, req->bio);
 }
 
+static inline void nvme_mpath_put_disk(struct nvme_ns_head *head)
+{
+	if (head->disk)
+		put_disk(head->disk);
+}
+
 extern struct device_attribute dev_attr_ana_grpid;
 extern struct device_attribute dev_attr_ana_state;
 extern struct device_attribute dev_attr_fail_if_no_path;
@@ -731,6 +741,9 @@ static inline void nvme_mpath_add_disk(struct nvme_ns *ns,
 static inline void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
 }
+static inline void nvme_mpath_put_disk(struct nvme_ns_head *head)
+{
+}
 static inline bool nvme_mpath_clear_current_path(struct nvme_ns *ns)
 {
 	return false;
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-03-05 21:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-25 11:05 [PATCHv2 0/2] nvme: fixup MD RAID usage Hannes Reinecke
2021-02-25 11:05 ` [PATCH 1/2] nvme: add 'fast_io_fail_tmo' controller sysfs attribute Hannes Reinecke
2021-03-05 21:10   ` Sagi Grimberg
2021-02-25 11:05 ` [PATCH 2/2] nvme: delete disk when last path is gone Hannes Reinecke
2021-02-25 16:59   ` Keith Busch
2021-02-25 17:37     ` Hannes Reinecke
2021-03-05 21:12       ` Sagi Grimberg
2021-03-03  1:01 ` [PATCHv2 0/2] nvme: fixup MD RAID usage Minwoo Im
  -- strict thread matches above, loose matches on Subject: below --
2021-02-23 11:59 [PATCH 0/2] nvme: fix regression with MD RAID Hannes Reinecke
2021-02-23 11:59 ` [PATCH 2/2] nvme: delete disk when last path is gone Hannes Reinecke
2021-02-23 12:56   ` Minwoo Im
2021-02-23 14:07     ` Hannes Reinecke
2021-02-24 22:40   ` Sagi Grimberg
2021-02-25  8:37     ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).