* [RFC PATCHv3 0/2] nvme: queue_if_no_path functionality @ 2020-10-05 12:44 Hannes Reinecke 2020-10-05 12:44 ` [PATCH 1/2] nvme-mpath: delete disk after last connection Hannes Reinecke 2020-10-05 12:45 ` [PATCH 2/2] nvme: add 'queue_if_no_path' semantics Hannes Reinecke 0 siblings, 2 replies; 21+ messages in thread From: Hannes Reinecke @ 2020-10-05 12:44 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch, Hannes Reinecke Hi all, this is a patchset based on Keiths original patch for restoring pre-fabrics behaviour when the last path to a multipath device is removed. Originally, the nvme device had been removed once the underlying hardware had been removed. With the introduction of multipath support things changed; it's now the 'CMIC' bit in the controller identification which controls the behaviour. If it's set to non-zero, the device is retained even if the hardware is removed. While this is okay for fabrics (as we can manually connect and disconnect the devices), for nvme-pci this means that PCI hotplug ceases to work as the device is never removed, and when reinserting the hardware a new nvme device is created. This patchset introduces a 'queue_if_no_path' flag to control the handling of the last path; it's set for fabrics to retain the current functionality, but unset for PCI to revert to the original, pre-fabrics behaviour. Changes to v2: - Merge two patches - Fix to make it compile without CONFIG_NVME_MULTIPATH - Move sysfs attributes to be compiled in only for CONFIG_NVME_MULTIPATH Hannes Reinecke (1): nvme: add 'queue_if_no_path' semantics Keith Busch (1): nvme-mpath: delete disk after last connection drivers/nvme/host/core.c | 13 ++++++++++++- drivers/nvme/host/multipath.c | 39 ++++++++++++++++++++++++++++++++++++++- drivers/nvme/host/nvme.h | 4 +++- 3 files changed, 53 insertions(+), 3 deletions(-) -- 2.16.4 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/2] nvme-mpath: delete disk after last connection 2020-10-05 12:44 [RFC PATCHv3 0/2] nvme: queue_if_no_path functionality Hannes Reinecke @ 2020-10-05 12:44 ` Hannes Reinecke 2020-10-05 12:50 ` Christoph Hellwig 2021-03-04 14:34 ` Daniel Wagner 2020-10-05 12:45 ` [PATCH 2/2] nvme: add 'queue_if_no_path' semantics Hannes Reinecke 1 sibling, 2 replies; 21+ messages in thread From: Hannes Reinecke @ 2020-10-05 12:44 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, Sagi Grimberg, Keith Busch From: Keith Busch <kbusch@kernel.org> The multipath code currently deletes the disk only after all references to it are dropped rather than when the last path to that disk is lost. This has been reported to cause problems with some usage, like MD RAID. Delete the disk when the last path is gone. This is the same behavior we currently have with non-multipathed nvme devices. The following is just a simple example that demonstrates what is currently observed using a simple nvme loop back (loop setup file not shown): # nvmetcli restore loop.json [ 31.156452] nvmet: adding nsid 1 to subsystem testnqn1 [ 31.159140] nvmet: adding nsid 1 to subsystem testnqn2 # nvme connect -t loop -n testnqn1 -q hostnqn [ 36.866302] nvmet: creating controller 1 for subsystem testnqn1 for NQN hostnqn. [ 36.872926] nvme nvme3: new ctrl: "testnqn1" # nvme connect -t loop -n testnqn1 -q hostnqn [ 38.227186] nvmet: creating controller 2 for subsystem testnqn1 for NQN hostnqn. [ 38.234450] nvme nvme4: new ctrl: "testnqn1" # nvme connect -t loop -n testnqn2 -q hostnqn [ 43.902761] nvmet: creating controller 3 for subsystem testnqn2 for NQN hostnqn. [ 43.907401] nvme nvme5: new ctrl: "testnqn2" # nvme connect -t loop -n testnqn2 -q hostnqn [ 44.627689] nvmet: creating controller 4 for subsystem testnqn2 for NQN hostnqn. [ 44.641773] nvme nvme6: new ctrl: "testnqn2" # mdadm --create /dev/md0 --level=mirror --raid-devices=2 /dev/nvme3n1 /dev/nvme5n1 [ 53.497038] md/raid1:md0: active with 2 out of 2 mirrors [ 53.501717] md0: detected capacity change from 0 to 66060288 # cat /proc/mdstat Personalities : [raid1] md0 : active raid1 nvme5n1[1] nvme3n1[0] 64512 blocks super 1.2 [2/2] [UU] Now delete all paths to one of the namespaces: # echo 1 > /sys/class/nvme/nvme3/delete_controller # echo 1 > /sys/class/nvme/nvme4/delete_controller We have no path, but mdstat says: # cat /proc/mdstat Personalities : [raid1] md0 : active (auto-read-only) raid1 nvme5n1[1] 64512 blocks super 1.2 [2/1] [_U] And this is reported to cause a problem. With the proposed patch, the following messages appear: [ 227.516807] md/raid1:md0: Disk failure on nvme3n1, disabling device. [ 227.516807] md/raid1:md0: Operation continuing on 1 devices. And mdstat shows only the viable members: # cat /proc/mdstat Personalities : [raid1] md0 : active (auto-read-only) raid1 nvme5n1[1] 64512 blocks super 1.2 [2/1] [_U] Reported-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org> --- drivers/nvme/host/core.c | 5 ++++- drivers/nvme/host/multipath.c | 1 - drivers/nvme/host/nvme.h | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 385b10317873..4459a40b057c 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -474,7 +474,10 @@ static void nvme_free_ns_head(struct kref *ref) struct nvme_ns_head *head = container_of(ref, struct nvme_ns_head, ref); - nvme_mpath_remove_disk(head); +#ifdef CONFIG_NVME_MULTIPATH + if (head->disk) + put_disk(head->disk); +#endif ida_simple_remove(&head->subsys->ns_ida, head->instance); cleanup_srcu_struct(&head->srcu); nvme_put_subsystem(head->subsys); diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 74896be40c17..55045291b4de 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -697,7 +697,6 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head) */ head->disk->queue = NULL; } - put_disk(head->disk); } int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 566776100126..b6180bb3361d 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -670,7 +670,7 @@ static inline void nvme_mpath_check_last_path(struct nvme_ns *ns) struct nvme_ns_head *head = ns->head; if (head->disk && list_empty(&head->list)) - kblockd_schedule_work(&head->requeue_work); + nvme_mpath_remove_disk(head); } static inline void nvme_trace_bio_complete(struct request *req, -- 2.16.4 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] nvme-mpath: delete disk after last connection 2020-10-05 12:44 ` [PATCH 1/2] nvme-mpath: delete disk after last connection Hannes Reinecke @ 2020-10-05 12:50 ` Christoph Hellwig 2021-03-05 20:06 ` Sagi Grimberg 2021-03-04 14:34 ` Daniel Wagner 1 sibling, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2020-10-05 12:50 UTC (permalink / raw) To: Hannes Reinecke Cc: Keith Busch, linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg > - nvme_mpath_remove_disk(head); > +#ifdef CONFIG_NVME_MULTIPATH > + if (head->disk) > + put_disk(head->disk); > +#endif Please move this into a helper in multipath.c with a stub for the non-multipath case. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] nvme-mpath: delete disk after last connection 2020-10-05 12:50 ` Christoph Hellwig @ 2021-03-05 20:06 ` Sagi Grimberg 0 siblings, 0 replies; 21+ messages in thread From: Sagi Grimberg @ 2021-03-05 20:06 UTC (permalink / raw) To: Christoph Hellwig, Hannes Reinecke; +Cc: Keith Busch, linux-nvme, Keith Busch >> - nvme_mpath_remove_disk(head); >> +#ifdef CONFIG_NVME_MULTIPATH >> + if (head->disk) >> + put_disk(head->disk); >> +#endif > > Please move this into a helper in multipath.c with a stub for the > non-multipath case. Other than this comment, it looks good to me. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] nvme-mpath: delete disk after last connection 2020-10-05 12:44 ` [PATCH 1/2] nvme-mpath: delete disk after last connection Hannes Reinecke 2020-10-05 12:50 ` Christoph Hellwig @ 2021-03-04 14:34 ` Daniel Wagner 1 sibling, 0 replies; 21+ messages in thread From: Daniel Wagner @ 2021-03-04 14:34 UTC (permalink / raw) To: Hannes Reinecke Cc: Christoph Hellwig, Keith Busch, linux-nvme, Sagi Grimberg, Keith Busch On Mon, Oct 05, 2020 at 02:44:59PM +0200, Hannes Reinecke wrote: > # cat /proc/mdstat > Personalities : [raid1] > md0 : active raid1 nvme5n1[1] nvme3n1[0] > 64512 blocks super 1.2 [2/2] [UU] > > Now delete all paths to one of the namespaces: > > # echo 1 > /sys/class/nvme/nvme3/delete_controller > # echo 1 > /sys/class/nvme/nvme4/delete_controller > > We have no path, but mdstat says: > > # cat /proc/mdstat > Personalities : [raid1] > md0 : active (auto-read-only) raid1 nvme5n1[1] > 64512 blocks super 1.2 [2/1] [_U] Shouldn't there be nvme3n1 still listed? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-05 12:44 [RFC PATCHv3 0/2] nvme: queue_if_no_path functionality Hannes Reinecke 2020-10-05 12:44 ` [PATCH 1/2] nvme-mpath: delete disk after last connection Hannes Reinecke @ 2020-10-05 12:45 ` Hannes Reinecke 2020-10-05 12:52 ` Christoph Hellwig 1 sibling, 1 reply; 21+ messages in thread From: Hannes Reinecke @ 2020-10-05 12:45 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch, Hannes Reinecke Currently namespaces behave differently depending on the 'CMIC' setting. If CMIC is zero, the device is removed once the last path goes away. If CMIC has the multipath bit set, the device is retained even if the last path is removed. This is okay for fabrics, where one can do an explicit disconnect to remove the device, but for nvme-pci this induces a regression with PCI hotplug. When the NVMe device is opened (eg by MD), the NVMe device is not removed after a PCI hot-remove. Hence MD will not be notified about the event, and will continue to consider this device as operational. Consequently, upon PCI hot-add the device shows up as a new NVMe device, and MD will fail to reattach the device. So this patch adds NVME_NSHEAD_QUEUE_IF_NO_PATH flag to the nshead to restore the original behaviour for non-fabrics NVMe devices. Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/nvme/host/core.c | 10 +++++++++- drivers/nvme/host/multipath.c | 38 ++++++++++++++++++++++++++++++++++++++ drivers/nvme/host/nvme.h | 2 ++ 3 files changed, 49 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 4459a40b057c..e21c32ea4b51 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -475,8 +475,11 @@ static void nvme_free_ns_head(struct kref *ref) container_of(ref, struct nvme_ns_head, ref); #ifdef CONFIG_NVME_MULTIPATH - if (head->disk) + if (head->disk) { + if (test_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags)) + nvme_mpath_remove_disk(head); put_disk(head->disk); + } #endif ida_simple_remove(&head->subsys->ns_ida, head->instance); cleanup_srcu_struct(&head->srcu); @@ -3357,6 +3360,7 @@ static struct attribute *nvme_ns_id_attrs[] = { #ifdef CONFIG_NVME_MULTIPATH &dev_attr_ana_grpid.attr, &dev_attr_ana_state.attr, + &dev_attr_queue_if_no_path.attr, #endif NULL, }; @@ -3387,6 +3391,10 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj, if (!nvme_ctrl_use_ana(nvme_get_ns_from_dev(dev)->ctrl)) return 0; } + if (a == &dev_attr_queue_if_no_path.attr) { + if (dev_to_disk(dev)->fops == &nvme_fops) + return 0; + } #endif return a->mode; } diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 55045291b4de..bbdad5917112 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -381,6 +381,9 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head) /* set to a default value for 512 until disk is validated */ blk_queue_logical_block_size(q, 512); blk_set_stacking_limits(&q->limits); + /* Enable queue_if_no_path semantics for fabrics */ + if (ctrl->ops->flags & NVME_F_FABRICS) + set_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags); /* we need to propagate up the VMC settings */ if (ctrl->vwc & NVME_CTRL_VWC_PRESENT) @@ -640,6 +643,37 @@ static ssize_t ana_state_show(struct device *dev, struct device_attribute *attr, } DEVICE_ATTR_RO(ana_state); +static ssize_t queue_if_no_path_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nvme_ns_head *head = nvme_get_ns_from_dev(dev)->head; + + return sprintf(buf, "%s\n", + test_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags) ? + "on" : "off"); +} + +static ssize_t queue_if_no_path_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t count) +{ + struct nvme_ns_head *head = nvme_get_ns_from_dev(dev)->head; + int err; + bool queue_if_no_path; + + err = kstrtobool(buf, &queue_if_no_path); + if (err) + return -EINVAL; + + if (queue_if_no_path) + set_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags); + else + clear_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags); + + return count; +} +DEVICE_ATTR(queue_if_no_path, S_IRUGO | S_IWUSR, + queue_if_no_path_show, queue_if_no_path_store); + static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl, struct nvme_ana_group_desc *desc, void *data) { @@ -682,6 +716,10 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head) { if (!head->disk) return; + if (test_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags)) { + kblockd_schedule_work(&head->requeue_work); + return; + } if (head->disk->flags & GENHD_FL_UP) del_gendisk(head->disk); blk_set_queue_dying(head->disk->queue); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index b6180bb3361d..94ae06cdc934 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -411,6 +411,7 @@ struct nvme_ns_head { struct mutex lock; unsigned long flags; #define NVME_NSHEAD_DISK_LIVE 0 +#define NVME_NSHEAD_QUEUE_IF_NO_PATH 1 struct nvme_ns __rcu *current_path[]; #endif }; @@ -684,6 +685,7 @@ static inline void nvme_trace_bio_complete(struct request *req, extern struct device_attribute dev_attr_ana_grpid; extern struct device_attribute dev_attr_ana_state; +extern struct device_attribute dev_attr_queue_if_no_path; extern struct device_attribute subsys_attr_iopolicy; #else -- 2.16.4 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-05 12:45 ` [PATCH 2/2] nvme: add 'queue_if_no_path' semantics Hannes Reinecke @ 2020-10-05 12:52 ` Christoph Hellwig 2020-10-06 5:48 ` Hannes Reinecke 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2020-10-05 12:52 UTC (permalink / raw) To: Hannes Reinecke; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg On Mon, Oct 05, 2020 at 02:45:00PM +0200, Hannes Reinecke wrote: > Currently namespaces behave differently depending on the 'CMIC' > setting. If CMIC is zero, the device is removed once the last path > goes away. If CMIC has the multipath bit set, the device is retained > even if the last path is removed. > This is okay for fabrics, where one can do an explicit disconnect > to remove the device, but for nvme-pci this induces a regression > with PCI hotplug. > When the NVMe device is opened (eg by MD), the NVMe device is not > removed after a PCI hot-remove. Hence MD will not be notified about > the event, and will continue to consider this device as operational. > Consequently, upon PCI hot-add the device shows up as a new NVMe > device, and MD will fail to reattach the device. > So this patch adds NVME_NSHEAD_QUEUE_IF_NO_PATH flag to the nshead > to restore the original behaviour for non-fabrics NVMe devices. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/nvme/host/core.c | 10 +++++++++- > drivers/nvme/host/multipath.c | 38 ++++++++++++++++++++++++++++++++++++++ > drivers/nvme/host/nvme.h | 2 ++ > 3 files changed, 49 insertions(+), 1 deletion(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 4459a40b057c..e21c32ea4b51 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -475,8 +475,11 @@ static void nvme_free_ns_head(struct kref *ref) > container_of(ref, struct nvme_ns_head, ref); > > #ifdef CONFIG_NVME_MULTIPATH > - if (head->disk) > + if (head->disk) { > + if (test_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags)) > + nvme_mpath_remove_disk(head); > put_disk(head->disk); > + } > #endif > ida_simple_remove(&head->subsys->ns_ida, head->instance); > cleanup_srcu_struct(&head->srcu); > @@ -3357,6 +3360,7 @@ static struct attribute *nvme_ns_id_attrs[] = { > #ifdef CONFIG_NVME_MULTIPATH > &dev_attr_ana_grpid.attr, > &dev_attr_ana_state.attr, > + &dev_attr_queue_if_no_path.attr, > #endif > NULL, > }; > @@ -3387,6 +3391,10 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj, > if (!nvme_ctrl_use_ana(nvme_get_ns_from_dev(dev)->ctrl)) > return 0; > } > + if (a == &dev_attr_queue_if_no_path.attr) { > + if (dev_to_disk(dev)->fops == &nvme_fops) > + return 0; > + } > #endif > return a->mode; > } > diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c > index 55045291b4de..bbdad5917112 100644 > --- a/drivers/nvme/host/multipath.c > +++ b/drivers/nvme/host/multipath.c > @@ -381,6 +381,9 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head) > /* set to a default value for 512 until disk is validated */ > blk_queue_logical_block_size(q, 512); > blk_set_stacking_limits(&q->limits); > + /* Enable queue_if_no_path semantics for fabrics */ > + if (ctrl->ops->flags & NVME_F_FABRICS) > + set_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags); Well, that is blindly obvious from the code. But why would we treat fabrics special? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-05 12:52 ` Christoph Hellwig @ 2020-10-06 5:48 ` Hannes Reinecke 2020-10-06 7:51 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Hannes Reinecke @ 2020-10-06 5:48 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch On 10/5/20 2:52 PM, Christoph Hellwig wrote: > On Mon, Oct 05, 2020 at 02:45:00PM +0200, Hannes Reinecke wrote: >> Currently namespaces behave differently depending on the 'CMIC' >> setting. If CMIC is zero, the device is removed once the last path >> goes away. If CMIC has the multipath bit set, the device is retained >> even if the last path is removed. >> This is okay for fabrics, where one can do an explicit disconnect >> to remove the device, but for nvme-pci this induces a regression >> with PCI hotplug. >> When the NVMe device is opened (eg by MD), the NVMe device is not >> removed after a PCI hot-remove. Hence MD will not be notified about >> the event, and will continue to consider this device as operational. >> Consequently, upon PCI hot-add the device shows up as a new NVMe >> device, and MD will fail to reattach the device. >> So this patch adds NVME_NSHEAD_QUEUE_IF_NO_PATH flag to the nshead >> to restore the original behaviour for non-fabrics NVMe devices. >> >> Signed-off-by: Hannes Reinecke <hare@suse.de> >> --- >> drivers/nvme/host/core.c | 10 +++++++++- >> drivers/nvme/host/multipath.c | 38 ++++++++++++++++++++++++++++++++++++++ >> drivers/nvme/host/nvme.h | 2 ++ >> 3 files changed, 49 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> index 4459a40b057c..e21c32ea4b51 100644 >> --- a/drivers/nvme/host/core.c >> +++ b/drivers/nvme/host/core.c >> @@ -475,8 +475,11 @@ static void nvme_free_ns_head(struct kref *ref) >> container_of(ref, struct nvme_ns_head, ref); >> >> #ifdef CONFIG_NVME_MULTIPATH >> - if (head->disk) >> + if (head->disk) { >> + if (test_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags)) >> + nvme_mpath_remove_disk(head); >> put_disk(head->disk); >> + } >> #endif >> ida_simple_remove(&head->subsys->ns_ida, head->instance); >> cleanup_srcu_struct(&head->srcu); >> @@ -3357,6 +3360,7 @@ static struct attribute *nvme_ns_id_attrs[] = { >> #ifdef CONFIG_NVME_MULTIPATH >> &dev_attr_ana_grpid.attr, >> &dev_attr_ana_state.attr, >> + &dev_attr_queue_if_no_path.attr, >> #endif >> NULL, >> }; >> @@ -3387,6 +3391,10 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj, >> if (!nvme_ctrl_use_ana(nvme_get_ns_from_dev(dev)->ctrl)) >> return 0; >> } >> + if (a == &dev_attr_queue_if_no_path.attr) { >> + if (dev_to_disk(dev)->fops == &nvme_fops) >> + return 0; >> + } >> #endif >> return a->mode; >> } >> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c >> index 55045291b4de..bbdad5917112 100644 >> --- a/drivers/nvme/host/multipath.c >> +++ b/drivers/nvme/host/multipath.c >> @@ -381,6 +381,9 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head) >> /* set to a default value for 512 until disk is validated */ >> blk_queue_logical_block_size(q, 512); >> blk_set_stacking_limits(&q->limits); >> + /* Enable queue_if_no_path semantics for fabrics */ >> + if (ctrl->ops->flags & NVME_F_FABRICS) >> + set_bit(NVME_NSHEAD_QUEUE_IF_NO_PATH, &head->flags); > > Well, that is blindly obvious from the code. But why would we treat > fabrics special? > Well, because it's established behaviour of the current code. Changing it now has the potential to break existing scenarios. For PCI (ie non-fabrics) the current behaviour is arguably a corner case (as one needs to have a PCI-NVMe with CMIC bit set), but once you have it PCI hotplug is done for. So there we really want to change the behaviour to get the same user experience for all NVMe drives. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-06 5:48 ` Hannes Reinecke @ 2020-10-06 7:51 ` Christoph Hellwig 2020-10-06 8:07 ` Hannes Reinecke 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2020-10-06 7:51 UTC (permalink / raw) To: Hannes Reinecke; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg On Tue, Oct 06, 2020 at 07:48:40AM +0200, Hannes Reinecke wrote: > Well, because it's established behaviour of the current code. > Changing it now has the potential to break existing scenarios. > > For PCI (ie non-fabrics) the current behaviour is arguably a corner case > (as one needs to have a PCI-NVMe with CMIC bit set), but once you have it > PCI hotplug is done for. So there we really want to change the behaviour to > get the same user experience for all NVMe drives. Dual port PCI NVMe devices are pretty common in the enterprise space. I really don't see any good reason to differ here. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-06 7:51 ` Christoph Hellwig @ 2020-10-06 8:07 ` Hannes Reinecke 2020-10-06 8:27 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Hannes Reinecke @ 2020-10-06 8:07 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch On 10/6/20 9:51 AM, Christoph Hellwig wrote: > On Tue, Oct 06, 2020 at 07:48:40AM +0200, Hannes Reinecke wrote: >> Well, because it's established behaviour of the current code. >> Changing it now has the potential to break existing scenarios. >> >> For PCI (ie non-fabrics) the current behaviour is arguably a corner case >> (as one needs to have a PCI-NVMe with CMIC bit set), but once you have it >> PCI hotplug is done for. So there we really want to change the behaviour to >> get the same user experience for all NVMe drives. > > Dual port PCI NVMe devices are pretty common in the enterprise space. > I really don't see any good reason to differ here. > ? Differ from what? Normal (non-dual ported) PCI-NVMe? NVMe-oF? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-06 8:07 ` Hannes Reinecke @ 2020-10-06 8:27 ` Christoph Hellwig 2020-10-06 8:29 ` Hannes Reinecke 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2020-10-06 8:27 UTC (permalink / raw) To: Hannes Reinecke; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg On Tue, Oct 06, 2020 at 10:07:49AM +0200, Hannes Reinecke wrote: > On 10/6/20 9:51 AM, Christoph Hellwig wrote: >> On Tue, Oct 06, 2020 at 07:48:40AM +0200, Hannes Reinecke wrote: >>> Well, because it's established behaviour of the current code. >>> Changing it now has the potential to break existing scenarios. >>> >>> For PCI (ie non-fabrics) the current behaviour is arguably a corner case >>> (as one needs to have a PCI-NVMe with CMIC bit set), but once you have it >>> PCI hotplug is done for. So there we really want to change the behaviour to >>> get the same user experience for all NVMe drives. >> >> Dual port PCI NVMe devices are pretty common in the enterprise space. >> I really don't see any good reason to differ here. >> > ? > Differ from what? > > Normal (non-dual ported) PCI-NVMe? > NVMe-oF? All multipath devices should behave the same. No special casing for PCIe, please. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-06 8:27 ` Christoph Hellwig @ 2020-10-06 8:29 ` Hannes Reinecke 2020-10-06 8:39 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Hannes Reinecke @ 2020-10-06 8:29 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch On 10/6/20 10:27 AM, Christoph Hellwig wrote: > On Tue, Oct 06, 2020 at 10:07:49AM +0200, Hannes Reinecke wrote: >> On 10/6/20 9:51 AM, Christoph Hellwig wrote: >>> On Tue, Oct 06, 2020 at 07:48:40AM +0200, Hannes Reinecke wrote: >>>> Well, because it's established behaviour of the current code. >>>> Changing it now has the potential to break existing scenarios. >>>> >>>> For PCI (ie non-fabrics) the current behaviour is arguably a corner case >>>> (as one needs to have a PCI-NVMe with CMIC bit set), but once you have it >>>> PCI hotplug is done for. So there we really want to change the behaviour to >>>> get the same user experience for all NVMe drives. >>> >>> Dual port PCI NVMe devices are pretty common in the enterprise space. >>> I really don't see any good reason to differ here. >>> >> ? >> Differ from what? >> >> Normal (non-dual ported) PCI-NVMe? >> NVMe-oF? > > All multipath devices should behave the same. No special casing for > PCIe, please. > Even if the default behaviour breaks PCI hotplug? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-06 8:29 ` Hannes Reinecke @ 2020-10-06 8:39 ` Christoph Hellwig 2020-10-06 13:30 ` Hannes Reinecke 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2020-10-06 8:39 UTC (permalink / raw) To: Hannes Reinecke; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg On Tue, Oct 06, 2020 at 10:29:49AM +0200, Hannes Reinecke wrote: >> All multipath devices should behave the same. No special casing for >> PCIe, please. >> > Even if the default behaviour breaks PCI hotplug? Why would it "break" PCI hotplug? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-06 8:39 ` Christoph Hellwig @ 2020-10-06 13:30 ` Hannes Reinecke 2020-10-06 13:45 ` Hannes Reinecke 2020-10-06 17:41 ` Keith Busch 0 siblings, 2 replies; 21+ messages in thread From: Hannes Reinecke @ 2020-10-06 13:30 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch On 10/6/20 10:39 AM, Christoph Hellwig wrote: > On Tue, Oct 06, 2020 at 10:29:49AM +0200, Hannes Reinecke wrote: >>> All multipath devices should behave the same. No special casing for >>> PCIe, please. >>> >> Even if the default behaviour breaks PCI hotplug? > > Why would it "break" PCI hotplug? > When running under MD RAID: Before hotplug: # nvme list Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 SLESNVME1 QEMU NVMe Ctrl 1 17.18 GB / 17.18 GB 512 B + 0 B 1.0 /dev/nvme1n1 SLESNVME2 QEMU NVMe Ctrl 1 4.29 GB / 4.29 GB 512 B + 0 B 1.0 /dev/nvme2n1 SLESNVME3 QEMU NVMe Ctrl 1 4.29 GB / 4.29 GB 512 B + 0 B 1.0 After hotplug: # nvme list Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 SLESNVME1 QEMU NVMe Ctrl 1 17.18 GB / 17.18 GB 512 B + 0 B 1.0 /dev/nvme1n1 SLESNVME2 QEMU NVMe Ctrl -1 0.00 B / 0.00 B 1 B + 0 B 1.0 /dev/nvme1n2 SLESNVME2 QEMU NVMe Ctrl 1 4.29 GB / 4.29 GB 512 B + 0 B 1.0 /dev/nvme2n1 SLESNVME3 QEMU NVMe Ctrl 1 4.29 GB / 4.29 GB 512 B + 0 B 1.0 And MD hasn't been notified that the device is gone: # cat /proc/mdstat Personalities : [raid10] md0 : active raid10 nvme2n1[1] nvme1n1[0] 4189184 blocks super 1.2 2 near-copies [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none> Once I do some I/O to it MD recognized a faulty device: # cat /proc/mdstat Personalities : [raid10] md0 : active raid10 nvme2n1[1] nvme1n1[0](F) 4189184 blocks super 1.2 2 near-copies [2/1] [_U] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none> but the re-added device isn't added to the MD RAID. In fact, it has been assigned a _different_ namespace ID: [ 904.299065] pcieport 0000:00:08.0: pciehp: Slot(0-1): Card present [ 904.299067] pcieport 0000:00:08.0: pciehp: Slot(0-1): Link Up [ 904.435314] pci 0000:02:00.0: [8086:5845] type 00 class 0x010802 [ 904.435523] pci 0000:02:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit] [ 904.435676] pci 0000:02:00.0: reg 0x20: [mem 0x00000000-0x00000fff] [ 904.436982] pci 0000:02:00.0: BAR 0: assigned [mem 0xc1200000-0xc1201fff 64bit] [ 904.437086] pci 0000:02:00.0: BAR 4: assigned [mem 0xc1202000-0xc1202fff] [ 904.437118] pcieport 0000:00:08.0: PCI bridge to [bus 02] [ 904.437137] pcieport 0000:00:08.0: bridge window [io 0x7000-0x7fff] [ 904.439024] pcieport 0000:00:08.0: bridge window [mem 0xc1200000-0xc13fffff] [ 904.440229] pcieport 0000:00:08.0: bridge window [mem 0x802000000-0x803ffffff 64bit pref] [ 904.447150] nvme nvme3: pci function 0000:02:00.0 [ 904.447487] nvme 0000:02:00.0: enabling device (0000 -> 0002) [ 904.458880] nvme nvme3: 1/0/0 default/read/poll queues [ 904.461296] nvme1n2: detected capacity change from 0 to 4294967296 and the 'old', pre-hotplug device still lingers on in the 'nvme list' output. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-06 13:30 ` Hannes Reinecke @ 2020-10-06 13:45 ` Hannes Reinecke 2021-03-05 20:31 ` Sagi Grimberg 2020-10-06 17:41 ` Keith Busch 1 sibling, 1 reply; 21+ messages in thread From: Hannes Reinecke @ 2020-10-06 13:45 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, Sagi Grimberg, linux-nvme On 10/6/20 3:30 PM, Hannes Reinecke wrote: > On 10/6/20 10:39 AM, Christoph Hellwig wrote: >> On Tue, Oct 06, 2020 at 10:29:49AM +0200, Hannes Reinecke wrote: >>>> All multipath devices should behave the same. No special casing for >>>> PCIe, please. >>>> >>> Even if the default behaviour breaks PCI hotplug? >> >> Why would it "break" PCI hotplug? >> > When running under MD RAID: > Before hotplug: > # nvme list > Node SN Model Namespace > Usage Format FW Rev > ---------------- -------------------- > ---------------------------------------- --------- > -------------------------- ---------------- -------- > /dev/nvme0n1 SLESNVME1 QEMU NVMe Ctrl 1 > 17.18 GB / 17.18 GB 512 B + 0 B 1.0 > /dev/nvme1n1 SLESNVME2 QEMU NVMe Ctrl 1 > 4.29 GB / 4.29 GB 512 B + 0 B 1.0 > /dev/nvme2n1 SLESNVME3 QEMU NVMe Ctrl 1 > 4.29 GB / 4.29 GB 512 B + 0 B 1.0 > After hotplug: > > # nvme list > Node SN Model Namespace > Usage Format FW Rev > ---------------- -------------------- > ---------------------------------------- --------- > -------------------------- ---------------- -------- > /dev/nvme0n1 SLESNVME1 QEMU NVMe Ctrl 1 > 17.18 GB / 17.18 GB 512 B + 0 B 1.0 > /dev/nvme1n1 SLESNVME2 QEMU NVMe Ctrl -1 > 0.00 B / 0.00 B 1 B + 0 B 1.0 > /dev/nvme1n2 SLESNVME2 QEMU NVMe Ctrl 1 > 4.29 GB / 4.29 GB 512 B + 0 B 1.0 > /dev/nvme2n1 SLESNVME3 QEMU NVMe Ctrl 1 > 4.29 GB / 4.29 GB 512 B + 0 B 1.0 > > And MD hasn't been notified that the device is gone: > # cat /proc/mdstat > Personalities : [raid10] > md0 : active raid10 nvme2n1[1] nvme1n1[0] > 4189184 blocks super 1.2 2 near-copies [2/2] [UU] > bitmap: 0/1 pages [0KB], 65536KB chunk > > unused devices: <none> > > Once I do some I/O to it MD recognized a faulty device: > > # cat /proc/mdstat > Personalities : [raid10] > md0 : active raid10 nvme2n1[1] nvme1n1[0](F) > 4189184 blocks super 1.2 2 near-copies [2/1] [_U] > bitmap: 0/1 pages [0KB], 65536KB chunk > > unused devices: <none> > > but the re-added device isn't added to the MD RAID. > In fact, it has been assigned a _different_ namespace ID: > > [ 904.299065] pcieport 0000:00:08.0: pciehp: Slot(0-1): Card present > [ 904.299067] pcieport 0000:00:08.0: pciehp: Slot(0-1): Link Up > [ 904.435314] pci 0000:02:00.0: [8086:5845] type 00 class 0x010802 > [ 904.435523] pci 0000:02:00.0: reg 0x10: [mem 0x00000000-0x00001fff > 64bit] > [ 904.435676] pci 0000:02:00.0: reg 0x20: [mem 0x00000000-0x00000fff] > [ 904.436982] pci 0000:02:00.0: BAR 0: assigned [mem > 0xc1200000-0xc1201fff 64bit] > [ 904.437086] pci 0000:02:00.0: BAR 4: assigned [mem > 0xc1202000-0xc1202fff] > [ 904.437118] pcieport 0000:00:08.0: PCI bridge to [bus 02] > [ 904.437137] pcieport 0000:00:08.0: bridge window [io 0x7000-0x7fff] > [ 904.439024] pcieport 0000:00:08.0: bridge window [mem > 0xc1200000-0xc13fffff] > [ 904.440229] pcieport 0000:00:08.0: bridge window [mem > 0x802000000-0x803ffffff 64bit pref] > [ 904.447150] nvme nvme3: pci function 0000:02:00.0 > [ 904.447487] nvme 0000:02:00.0: enabling device (0000 -> 0002) > [ 904.458880] nvme nvme3: 1/0/0 default/read/poll queues > [ 904.461296] nvme1n2: detected capacity change from 0 to 4294967296 > > and the 'old', pre-hotplug device still lingers on in the 'nvme list' > output. > Compare that to the 'standard', non-CMIC nvme, where with the same setup MD would detach the nvme on its own: # cat /proc/mdstat Personalities : [raid10] md127 : active (auto-read-only) raid10 nvme2n1[1] 4189184 blocks super 1.2 2 near-copies [2/1] [_U] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none> # nvme list Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 SLESNVME1 QEMU NVMe Ctrl 1 17.18 GB / 17.18 GB 512 B + 0 B 1.0 /dev/nvme2n1 SLESNVME3 QEMU NVMe Ctrl 1 4.29 GB / 4.29 GB 512 B + 0 B 1.0 And yes, this is exactly the same setup, the only difference being the CMIC setting for the NVMe device. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-06 13:45 ` Hannes Reinecke @ 2021-03-05 20:31 ` Sagi Grimberg 2021-03-08 13:17 ` Hannes Reinecke 0 siblings, 1 reply; 21+ messages in thread From: Sagi Grimberg @ 2021-03-05 20:31 UTC (permalink / raw) To: Hannes Reinecke, Christoph Hellwig; +Cc: linux-nvme, Keith Busch > Compare that to the 'standard', non-CMIC nvme, where with the same setup > MD would detach the nvme on its own: > > # cat /proc/mdstat > Personalities : [raid10] > md127 : active (auto-read-only) raid10 nvme2n1[1] > 4189184 blocks super 1.2 2 near-copies [2/1] [_U] > bitmap: 0/1 pages [0KB], 65536KB chunk > > unused devices: <none> > # nvme list > Node SN Model Namespace > Usage Format FW Rev > ---------------- -------------------- > ---------------------------------------- --------- > -------------------------- ---------------- -------- > /dev/nvme0n1 SLESNVME1 QEMU NVMe Ctrl 1 > 17.18 GB / 17.18 GB 512 B + 0 B 1.0 > /dev/nvme2n1 SLESNVME3 QEMU NVMe Ctrl 1 > 4.29 GB / 4.29 GB 512 B + 0 B 1.0 > > And yes, this is exactly the same setup, the only difference being the > CMIC setting for the NVMe device. I agree with Christoph that we should do exactly the same for all. Hannes, My understanding here is that you want the device to go away after the last path disappeared because it breaks md, why don't you want to have this also for fabrics? You mention that in fabrics one can manually disconnect, but why should the user resort to a manual disconnect? Is something else broken with this behavior? Maybe I am missing something here? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2021-03-05 20:31 ` Sagi Grimberg @ 2021-03-08 13:17 ` Hannes Reinecke 2021-03-15 17:21 ` Sagi Grimberg 0 siblings, 1 reply; 21+ messages in thread From: Hannes Reinecke @ 2021-03-08 13:17 UTC (permalink / raw) To: Sagi Grimberg, Christoph Hellwig; +Cc: linux-nvme, Keith Busch On 3/5/21 9:31 PM, Sagi Grimberg wrote: > >> Compare that to the 'standard', non-CMIC nvme, where with the same >> setup MD would detach the nvme on its own: >> >> # cat /proc/mdstat >> Personalities : [raid10] >> md127 : active (auto-read-only) raid10 nvme2n1[1] >> 4189184 blocks super 1.2 2 near-copies [2/1] [_U] >> bitmap: 0/1 pages [0KB], 65536KB chunk >> >> unused devices: <none> >> # nvme list >> Node SN Model Namespace >> Usage Format FW Rev >> ---------------- -------------------- >> ---------------------------------------- --------- >> -------------------------- ---------------- -------- >> /dev/nvme0n1 SLESNVME1 QEMU NVMe Ctrl 1 >> 17.18 GB / 17.18 GB 512 B + 0 B 1.0 >> /dev/nvme2n1 SLESNVME3 QEMU NVMe Ctrl 1 >> 4.29 GB / 4.29 GB 512 B + 0 B 1.0 >> >> And yes, this is exactly the same setup, the only difference being the >> CMIC setting for the NVMe device. > > I agree with Christoph that we should do exactly the same for all. > > Hannes, My understanding here is that you want the device to go away > after the last path disappeared because it breaks md, why don't you > want to have this also for fabrics? > Oh, I would _love_ to have it for fabrics per default. If you agree with it I can resend Keiths original patch, which solves the issue without the need for any additional settings. > You mention that in fabrics one can manually disconnect, but why should > the user resort to a manual disconnect? > > Is something else broken with this behavior? Maybe I am missing > something here? I'm just trying to understand why the current logic was build into fabrics in the first place. But if no-one remembers (or there was none) I'd happily convert to the PCI logic. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2021-03-08 13:17 ` Hannes Reinecke @ 2021-03-15 17:21 ` Sagi Grimberg 0 siblings, 0 replies; 21+ messages in thread From: Sagi Grimberg @ 2021-03-15 17:21 UTC (permalink / raw) To: Hannes Reinecke, Christoph Hellwig; +Cc: linux-nvme, Keith Busch >>> And yes, this is exactly the same setup, the only difference being the >>> CMIC setting for the NVMe device. >> >> I agree with Christoph that we should do exactly the same for all. >> >> Hannes, My understanding here is that you want the device to go away >> after the last path disappeared because it breaks md, why don't you >> want to have this also for fabrics? >> > Oh, I would _love_ to have it for fabrics per default. > If you agree with it I can resend Keiths original patch, which solves > the issue without the need for any additional settings. I think that this is what we should do. Care to resend this for both fabrics and pci? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-06 13:30 ` Hannes Reinecke 2020-10-06 13:45 ` Hannes Reinecke @ 2020-10-06 17:41 ` Keith Busch 2021-03-05 20:11 ` Sagi Grimberg 1 sibling, 1 reply; 21+ messages in thread From: Keith Busch @ 2020-10-06 17:41 UTC (permalink / raw) To: Hannes Reinecke; +Cc: Keith Busch, Christoph Hellwig, linux-nvme, Sagi Grimberg On Tue, Oct 06, 2020 at 03:30:14PM +0200, Hannes Reinecke wrote: > but the re-added device isn't added to the MD RAID. > In fact, it has been assigned a _different_ namespace ID: That's actually just a driver assigned instance rather than a Namespace ID. > [ 904.299065] pcieport 0000:00:08.0: pciehp: Slot(0-1): Card present > [ 904.299067] pcieport 0000:00:08.0: pciehp: Slot(0-1): Link Up > [ 904.435314] pci 0000:02:00.0: [8086:5845] type 00 class 0x010802 > [ 904.435523] pci 0000:02:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit] > [ 904.435676] pci 0000:02:00.0: reg 0x20: [mem 0x00000000-0x00000fff] > [ 904.436982] pci 0000:02:00.0: BAR 0: assigned [mem 0xc1200000-0xc1201fff 64bit] > [ 904.437086] pci 0000:02:00.0: BAR 4: assigned [mem 0xc1202000-0xc1202fff] > [ 904.437118] pcieport 0000:00:08.0: PCI bridge to [bus 02] > [ 904.437137] pcieport 0000:00:08.0: bridge window [io 0x7000-0x7fff] > [ 904.439024] pcieport 0000:00:08.0: bridge window [mem 0xc1200000-0xc13fffff] > [ 904.440229] pcieport 0000:00:08.0: bridge window [mem 0x802000000-0x803ffffff 64bit pref] > [ 904.447150] nvme nvme3: pci function 0000:02:00.0 > [ 904.447487] nvme 0000:02:00.0: enabling device (0000 -> 0002) > [ 904.458880] nvme nvme3: 1/0/0 default/read/poll queues > [ 904.461296] nvme1n2: detected capacity change from 0 to 4294967296 > > and the 'old', pre-hotplug device still lingers on in the 'nvme list' > output. I think what you're showing is a problem for everyone, including fabrics. The namespace was attached to the previously existing subsystem, nvme1, but we didn't find the previously existing "head" to reattach the namespace. I just threw the below together very hastily, so I'm sure there's something wrong with it, but just to capture what's wrong with the current code: --- diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 3833584b5cee..0b3bbbe6d1e8 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3897,8 +3897,6 @@ static void nvme_ns_remove(struct nvme_ns *ns) mutex_lock(&ns->ctrl->subsys->lock); list_del_rcu(&ns->siblings); - if (list_empty(&ns->head->list)) - list_del_init(&ns->head->entry); mutex_unlock(&ns->ctrl->subsys->lock); synchronize_rcu(); /* guarantee not available in head->list */ -- _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2020-10-06 17:41 ` Keith Busch @ 2021-03-05 20:11 ` Sagi Grimberg 2021-03-11 12:41 ` Hannes Reinecke 0 siblings, 1 reply; 21+ messages in thread From: Sagi Grimberg @ 2021-03-05 20:11 UTC (permalink / raw) To: Keith Busch, Hannes Reinecke; +Cc: Christoph Hellwig, linux-nvme, Keith Busch > I think what you're showing is a problem for everyone, including > fabrics. The namespace was attached to the previously existing > subsystem, nvme1, but we didn't find the previously existing "head" to > reattach the namespace. > > I just threw the below together very hastily, so I'm sure there's > something wrong with it, but just to capture what's wrong with the > current code: > > --- > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 3833584b5cee..0b3bbbe6d1e8 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -3897,8 +3897,6 @@ static void nvme_ns_remove(struct nvme_ns *ns) > > mutex_lock(&ns->ctrl->subsys->lock); > list_del_rcu(&ns->siblings); > - if (list_empty(&ns->head->list)) > - list_del_init(&ns->head->entry); Not removing it ever from the head->list? > mutex_unlock(&ns->ctrl->subsys->lock); > > synchronize_rcu(); /* guarantee not available in head->list */ > -- This is a problem. beacuse if you don't remove the ns from the head->list nothing prevents it from being re-selected as the current_path after we synchronize the srcu, although it is going away... _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics 2021-03-05 20:11 ` Sagi Grimberg @ 2021-03-11 12:41 ` Hannes Reinecke 0 siblings, 0 replies; 21+ messages in thread From: Hannes Reinecke @ 2021-03-11 12:41 UTC (permalink / raw) To: linux-nvme On 3/5/21 9:11 PM, Sagi Grimberg wrote: > >> I think what you're showing is a problem for everyone, including >> fabrics. The namespace was attached to the previously existing >> subsystem, nvme1, but we didn't find the previously existing "head" to >> reattach the namespace. >> >> I just threw the below together very hastily, so I'm sure there's >> something wrong with it, but just to capture what's wrong with the >> current code: >> >> --- >> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> index 3833584b5cee..0b3bbbe6d1e8 100644 >> --- a/drivers/nvme/host/core.c >> +++ b/drivers/nvme/host/core.c >> @@ -3897,8 +3897,6 @@ static void nvme_ns_remove(struct nvme_ns *ns) >> mutex_lock(&ns->ctrl->subsys->lock); >> list_del_rcu(&ns->siblings); >> - if (list_empty(&ns->head->list)) >> - list_del_init(&ns->head->entry); > > Not removing it ever from the head->list? > >> mutex_unlock(&ns->ctrl->subsys->lock); >> synchronize_rcu(); /* guarantee not available in head->list */ >> -- > > This is a problem. beacuse if you don't remove the ns from the > head->list nothing prevents it from being re-selected as the > current_path after we synchronize the srcu, although it is going away... > You misread the patch. I'm not skipping the removal from 'head->list', I'm skipping the removal from 'subsys->nsheads' (which is where 'head->entry' is pointing to). But I'll be sending an updated patch which clarifies this situation. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2021-03-15 17:22 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-05 12:44 [RFC PATCHv3 0/2] nvme: queue_if_no_path functionality Hannes Reinecke 2020-10-05 12:44 ` [PATCH 1/2] nvme-mpath: delete disk after last connection Hannes Reinecke 2020-10-05 12:50 ` Christoph Hellwig 2021-03-05 20:06 ` Sagi Grimberg 2021-03-04 14:34 ` Daniel Wagner 2020-10-05 12:45 ` [PATCH 2/2] nvme: add 'queue_if_no_path' semantics Hannes Reinecke 2020-10-05 12:52 ` Christoph Hellwig 2020-10-06 5:48 ` Hannes Reinecke 2020-10-06 7:51 ` Christoph Hellwig 2020-10-06 8:07 ` Hannes Reinecke 2020-10-06 8:27 ` Christoph Hellwig 2020-10-06 8:29 ` Hannes Reinecke 2020-10-06 8:39 ` Christoph Hellwig 2020-10-06 13:30 ` Hannes Reinecke 2020-10-06 13:45 ` Hannes Reinecke 2021-03-05 20:31 ` Sagi Grimberg 2021-03-08 13:17 ` Hannes Reinecke 2021-03-15 17:21 ` Sagi Grimberg 2020-10-06 17:41 ` Keith Busch 2021-03-05 20:11 ` Sagi Grimberg 2021-03-11 12:41 ` Hannes Reinecke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).