* [PATCH] nvme-core: Fix subsystem instance mismatches @ 2019-08-31 0:01 Logan Gunthorpe 2019-08-31 15:29 ` Keith Busch 0 siblings, 1 reply; 15+ messages in thread From: Logan Gunthorpe @ 2019-08-31 0:01 UTC (permalink / raw) To: linux-kernel, linux-nvme, Christoph Hellwig Cc: Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, Jens Axboe, Keith Busch, Logan Gunthorpe With multipath enabled (which is now default in many distros), nvme controllers and their respective namespaces can be numbered differently. For example: nvme0n1 might actually belong to controller nvme1, which is super confusing (and may have broken any scripts that rely on the numbering relation). To make matters worse, the mismatches can sometimes change from boot to boot so anyone dealing with the nvme control device has to inspect sysfs every boot to ensure they get the right one. The reason for this is that the X in nvmeXn1 is the subsystem's instance number (when multipath is enabled) which is distinct from the controller's instance and the subsystem instance is assigned from a separate IDA after the controller gets identified and this can race a bit when multiple controllers are being setup. To fix this, assign the subsystem's instance based on the instance number of the controller's instance that first created it. There should always be fewer subsystems than controllers so the should not be a need to create extra subsystems that overlap existing controllers. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> --- drivers/nvme/host/core.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index d3d6b7bd6903..ca201b71ab49 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -81,7 +81,6 @@ EXPORT_SYMBOL_GPL(nvme_reset_wq); struct workqueue_struct *nvme_delete_wq; EXPORT_SYMBOL_GPL(nvme_delete_wq); -static DEFINE_IDA(nvme_subsystems_ida); static LIST_HEAD(nvme_subsystems); static DEFINE_MUTEX(nvme_subsystems_lock); @@ -2332,7 +2331,6 @@ static void nvme_release_subsystem(struct device *dev) struct nvme_subsystem *subsys = container_of(dev, struct nvme_subsystem, dev); - ida_simple_remove(&nvme_subsystems_ida, subsys->instance); kfree(subsys); } @@ -2461,12 +2459,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id) subsys = kzalloc(sizeof(*subsys), GFP_KERNEL); if (!subsys) return -ENOMEM; - ret = ida_simple_get(&nvme_subsystems_ida, 0, 0, GFP_KERNEL); - if (ret < 0) { - kfree(subsys); - return ret; - } - subsys->instance = ret; + subsys->instance = ctrl->instance; mutex_init(&subsys->lock); kref_init(&subsys->ref); INIT_LIST_HEAD(&subsys->ctrls); @@ -4074,7 +4067,6 @@ static int __init nvme_core_init(void) static void __exit nvme_core_exit(void) { - ida_destroy(&nvme_subsystems_ida); class_destroy(nvme_subsys_class); class_destroy(nvme_class); unregister_chrdev_region(nvme_chr_devt, NVME_MINORS); -- 2.20.1 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-08-31 0:01 [PATCH] nvme-core: Fix subsystem instance mismatches Logan Gunthorpe @ 2019-08-31 15:29 ` Keith Busch 2019-09-03 16:08 ` Logan Gunthorpe 0 siblings, 1 reply; 15+ messages in thread From: Keith Busch @ 2019-08-31 15:29 UTC (permalink / raw) To: Logan Gunthorpe Cc: Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Jens Axboe, Keith Busch, Christoph Hellwig On Fri, Aug 30, 2019 at 06:01:39PM -0600, Logan Gunthorpe wrote: > To fix this, assign the subsystem's instance based on the instance > number of the controller's instance that first created it. There should > always be fewer subsystems than controllers so the should not be a need > to create extra subsystems that overlap existing controllers. The subsystem's lifetime is not tied to the controller's. When the controller is removed and releases its instance, the next controller to take that available instance will create naming collisions with the subsystem still using it. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-08-31 15:29 ` Keith Busch @ 2019-09-03 16:08 ` Logan Gunthorpe 2019-09-03 16:46 ` Keith Busch 0 siblings, 1 reply; 15+ messages in thread From: Logan Gunthorpe @ 2019-09-03 16:08 UTC (permalink / raw) To: Keith Busch Cc: Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Jens Axboe, Keith Busch, Christoph Hellwig On 2019-08-31 9:29 a.m., Keith Busch wrote: > On Fri, Aug 30, 2019 at 06:01:39PM -0600, Logan Gunthorpe wrote: >> To fix this, assign the subsystem's instance based on the instance >> number of the controller's instance that first created it. There should >> always be fewer subsystems than controllers so the should not be a need >> to create extra subsystems that overlap existing controllers. > > The subsystem's lifetime is not tied to the controller's. When the > controller is removed and releases its instance, the next controller > to take that available instance will create naming collisions with the > subsystem still using it. > Hmm, yes, ok. So perhaps we can just make the subsystem prefer the ctrl's instance when allocating the ID? Then at least, in the common case, the controller numbers will match the subsystem numbers. Only when there's random hot-plugs would the numbers get out of sync. Logan _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-09-03 16:08 ` Logan Gunthorpe @ 2019-09-03 16:46 ` Keith Busch 2019-09-03 18:13 ` Logan Gunthorpe 2019-09-04 6:05 ` Christoph Hellwig 0 siblings, 2 replies; 15+ messages in thread From: Keith Busch @ 2019-09-03 16:46 UTC (permalink / raw) To: Logan Gunthorpe Cc: Jens Axboe, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Keith Busch, Christoph Hellwig On Tue, Sep 03, 2019 at 10:08:01AM -0600, Logan Gunthorpe wrote: > On 2019-08-31 9:29 a.m., Keith Busch wrote: > > On Fri, Aug 30, 2019 at 06:01:39PM -0600, Logan Gunthorpe wrote: > >> To fix this, assign the subsystem's instance based on the instance > >> number of the controller's instance that first created it. There should > >> always be fewer subsystems than controllers so the should not be a need > >> to create extra subsystems that overlap existing controllers. > > > > The subsystem's lifetime is not tied to the controller's. When the > > controller is removed and releases its instance, the next controller > > to take that available instance will create naming collisions with the > > subsystem still using it. > > > > Hmm, yes, ok. > > So perhaps we can just make the subsystem prefer the ctrl's instance > when allocating the ID? Then at least, in the common case, the > controller numbers will match the subsystem numbers. Only when there's > random hot-plugs would the numbers get out of sync. I really don't know about a patch that works only on static configurations. Connects and disconnects do happen on live systems, so the numerals will inevitably get out of sync. Could we possibly make /dev/nvmeX be a subsystem handle without causing trouble for anyone? This would essentially be the same thing as today for non-CMIC controllers with a device-per-controller and only affects the CMIC ones. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-09-03 16:46 ` Keith Busch @ 2019-09-03 18:13 ` Logan Gunthorpe 2019-09-04 6:05 ` Christoph Hellwig 1 sibling, 0 replies; 15+ messages in thread From: Logan Gunthorpe @ 2019-09-03 18:13 UTC (permalink / raw) To: Keith Busch Cc: Jens Axboe, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Keith Busch, Christoph Hellwig On 2019-09-03 10:46 a.m., Keith Busch wrote: > On Tue, Sep 03, 2019 at 10:08:01AM -0600, Logan Gunthorpe wrote: >> On 2019-08-31 9:29 a.m., Keith Busch wrote: >>> On Fri, Aug 30, 2019 at 06:01:39PM -0600, Logan Gunthorpe wrote: >>>> To fix this, assign the subsystem's instance based on the instance >>>> number of the controller's instance that first created it. There should >>>> always be fewer subsystems than controllers so the should not be a need >>>> to create extra subsystems that overlap existing controllers. >>> >>> The subsystem's lifetime is not tied to the controller's. When the >>> controller is removed and releases its instance, the next controller >>> to take that available instance will create naming collisions with the >>> subsystem still using it. >>> >> >> Hmm, yes, ok. >> >> So perhaps we can just make the subsystem prefer the ctrl's instance >> when allocating the ID? Then at least, in the common case, the >> controller numbers will match the subsystem numbers. Only when there's >> random hot-plugs would the numbers get out of sync. > > I really don't know about a patch that works only on static > configurations. Connects and disconnects do happen on live systems, > so the numerals will inevitably get out of sync. Well this depends on how big a problem we think the number mismatch is. Right now it's pretty annoying because numbers aren't matching for non-CMIC controllers in simple setups on boot. I think having a small patch that makes it more consistent for the static would be worth it and if CMIC controllers with significant hot-plug events have mismatches that seems more understandable to me. > Could we possibly make /dev/nvmeX be a subsystem handle without causing > trouble for anyone? This would essentially be the same thing as today > for non-CMIC controllers with a device-per-controller and only affects > the CMIC ones. Well then we'd have to be able to do everything that's possible with a controller via the subsystem and it would have to multiplex all admin commands for CMIC ones, etc to a sensible controller. This might make sense in the long term but it sounds like a larger project than I have time to take on. Logan _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-09-03 16:46 ` Keith Busch 2019-09-03 18:13 ` Logan Gunthorpe @ 2019-09-04 6:05 ` Christoph Hellwig 2019-09-04 14:44 ` Keith Busch 1 sibling, 1 reply; 15+ messages in thread From: Christoph Hellwig @ 2019-09-04 6:05 UTC (permalink / raw) To: Keith Busch Cc: Jens Axboe, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Keith Busch, Logan Gunthorpe, Christoph Hellwig On Tue, Sep 03, 2019 at 10:46:20AM -0600, Keith Busch wrote: > Could we possibly make /dev/nvmeX be a subsystem handle without causing > trouble for anyone? This would essentially be the same thing as today > for non-CMIC controllers with a device-per-controller and only affects > the CMIC ones. A per-subsyste character device doesn't make sense, as a lot of admin command require a specific controller. If this really is an isue for people we'll just need to refcount the handle allocation. That is: - nvme_init_ctrl allocates a new nvme_instance or so object, which does the ida_simple_get. - we allocate a new subsystem that reuses the handle and grabs a reference in nvme_init_subsystem, then if we find an existing subsystem we drop that reference again. - last free of a ctrl or subsystem also drops a reference, with the final free releasing the ida _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-09-04 6:05 ` Christoph Hellwig @ 2019-09-04 14:44 ` Keith Busch 2019-09-04 15:42 ` Christoph Hellwig 0 siblings, 1 reply; 15+ messages in thread From: Keith Busch @ 2019-09-04 14:44 UTC (permalink / raw) To: Christoph Hellwig Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Jens Axboe, Logan Gunthorpe On Wed, Sep 04, 2019 at 08:05:58AM +0200, Christoph Hellwig wrote: > On Tue, Sep 03, 2019 at 10:46:20AM -0600, Keith Busch wrote: > > Could we possibly make /dev/nvmeX be a subsystem handle without causing > > trouble for anyone? This would essentially be the same thing as today > > for non-CMIC controllers with a device-per-controller and only affects > > the CMIC ones. > > A per-subsyste character device doesn't make sense, as a lot of admin > command require a specific controller. Yeah, I was hoping to provide something special for CMIC controllers so you can do path specific admin, but that looks sure to break user space. > If this really is an isue for people we'll just need to refcount the > handle allocation. That is: > > - nvme_init_ctrl allocates a new nvme_instance or so object, which > does the ida_simple_get. > - we allocate a new subsystem that reuses the handle and grabs > a reference in nvme_init_subsystem, then if we find an existing > subsystem we drop that reference again. > - last free of a ctrl or subsystem also drops a reference, with > the final free releasing the ida Let me step through an example: Ctrl A gets instance 0. Its subsystem gets the same instance, and takes ref count on it: all namespaces in this subsystem will use '0'. Ctrl B gets instance 1, and it's in the same subsystem as Ctrl A so no new subsytem is allocated. Ctrl A is disconnected, dropping its ref on instance 0, but the subsystem still has its refcount, making it unavailable. Ctrl A is reconnected, and allocates instance 2 because 0 is still in use. Now all the namespaces in this subsystem are prefixed with nvme0, but no controller exists with the same prefix. We still have inevitable naming mismatch, right? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-09-04 14:44 ` Keith Busch @ 2019-09-04 15:42 ` Christoph Hellwig 2019-09-04 15:54 ` Keith Busch 0 siblings, 1 reply; 15+ messages in thread From: Christoph Hellwig @ 2019-09-04 15:42 UTC (permalink / raw) To: Keith Busch Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Jens Axboe, Logan Gunthorpe, Christoph Hellwig On Wed, Sep 04, 2019 at 08:44:27AM -0600, Keith Busch wrote: > Let me step through an example: > > Ctrl A gets instance 0. > > Its subsystem gets the same instance, and takes ref count on it: > all namespaces in this subsystem will use '0'. > > Ctrl B gets instance 1, and it's in the same subsystem as Ctrl A so > no new subsytem is allocated. > > Ctrl A is disconnected, dropping its ref on instance 0, but the > subsystem still has its refcount, making it unavailable. > > Ctrl A is reconnected, and allocates instance 2 because 0 is still in > use. > > Now all the namespaces in this subsystem are prefixed with nvme0, but no > controller exists with the same prefix. We still have inevitable naming > mismatch, right? I think th major confusion was that we can use the same handle for and unrelated subsystem vs controller, and that would avoid it. I don't see how we can avoid the controller is entirely different from namespace problem ever. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-09-04 15:42 ` Christoph Hellwig @ 2019-09-04 15:54 ` Keith Busch 2019-09-04 16:02 ` Christoph Hellwig 2019-09-04 16:07 ` Logan Gunthorpe 0 siblings, 2 replies; 15+ messages in thread From: Keith Busch @ 2019-09-04 15:54 UTC (permalink / raw) To: Christoph Hellwig Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Jens Axboe, Logan Gunthorpe On Wed, Sep 04, 2019 at 05:42:15PM +0200, Christoph Hellwig wrote: > On Wed, Sep 04, 2019 at 08:44:27AM -0600, Keith Busch wrote: > > Let me step through an example: > > > > Ctrl A gets instance 0. > > > > Its subsystem gets the same instance, and takes ref count on it: > > all namespaces in this subsystem will use '0'. > > > > Ctrl B gets instance 1, and it's in the same subsystem as Ctrl A so > > no new subsytem is allocated. > > > > Ctrl A is disconnected, dropping its ref on instance 0, but the > > subsystem still has its refcount, making it unavailable. > > > > Ctrl A is reconnected, and allocates instance 2 because 0 is still in > > use. > > > > Now all the namespaces in this subsystem are prefixed with nvme0, but no > > controller exists with the same prefix. We still have inevitable naming > > mismatch, right? > > I think th major confusion was that we can use the same handle for > and unrelated subsystem vs controller, and that would avoid it. > > I don't see how we can avoid the controller is entirely different > from namespace problem ever. Can we just ensure there is never a matching controller then? This patch will accomplish that and simpler than wrapping the instance in a refcount'ed object: http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-09-04 15:54 ` Keith Busch @ 2019-09-04 16:02 ` Christoph Hellwig 2019-09-04 16:07 ` Logan Gunthorpe 1 sibling, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2019-09-04 16:02 UTC (permalink / raw) To: Keith Busch Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Jens Axboe, Logan Gunthorpe, Christoph Hellwig On Wed, Sep 04, 2019 at 09:54:45AM -0600, Keith Busch wrote: > Can we just ensure there is never a matching controller then? This > patch will accomplish that and simpler than wrapping the instance in a > refcount'ed object: > > http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html I guess that is fine to. Btw, what happened to the plan for udev rules in that thread? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-09-04 15:54 ` Keith Busch 2019-09-04 16:02 ` Christoph Hellwig @ 2019-09-04 16:07 ` Logan Gunthorpe 2019-09-04 16:35 ` Keith Busch 1 sibling, 1 reply; 15+ messages in thread From: Logan Gunthorpe @ 2019-09-04 16:07 UTC (permalink / raw) To: Keith Busch, Christoph Hellwig Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Jens Axboe On 2019-09-04 9:54 a.m., Keith Busch wrote: > On Wed, Sep 04, 2019 at 05:42:15PM +0200, Christoph Hellwig wrote: >> On Wed, Sep 04, 2019 at 08:44:27AM -0600, Keith Busch wrote: >>> Let me step through an example: >>> >>> Ctrl A gets instance 0. >>> >>> Its subsystem gets the same instance, and takes ref count on it: >>> all namespaces in this subsystem will use '0'. >>> >>> Ctrl B gets instance 1, and it's in the same subsystem as Ctrl A so >>> no new subsytem is allocated. >>> >>> Ctrl A is disconnected, dropping its ref on instance 0, but the >>> subsystem still has its refcount, making it unavailable. >>> >>> Ctrl A is reconnected, and allocates instance 2 because 0 is still in >>> use. >>> >>> Now all the namespaces in this subsystem are prefixed with nvme0, but no >>> controller exists with the same prefix. We still have inevitable naming >>> mismatch, right? >> >> I think th major confusion was that we can use the same handle for >> and unrelated subsystem vs controller, and that would avoid it. >> >> I don't see how we can avoid the controller is entirely different >> from namespace problem ever. Yes, I agree, we can't solve the mismatch problem in the general case: with sequences of hot plug events there will always be a case that mismatches. I just think we can do better in the simple common default case. > Can we just ensure there is never a matching controller then? This > patch will accomplish that and simpler than wrapping the instance in a > refcount'ed object: > > http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html I don't really like that idea. It reduces the confusion caused by mismatching numbers, but causes the controller to never match the namespace, which is also confusing but in a different way. I like the nvme_instance idea. It's not going to be perfect but it has some nice properties: the subsystem will try to match the controller's instance whenever possible, but in cases where it doesn't, the instance number of the subsystem will never be the same as an existing controller. I'll see if I can work up a quick patch set and see what people think. Logan _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-09-04 16:07 ` Logan Gunthorpe @ 2019-09-04 16:35 ` Keith Busch 2019-09-04 17:01 ` Logan Gunthorpe 0 siblings, 1 reply; 15+ messages in thread From: Keith Busch @ 2019-09-04 16:35 UTC (permalink / raw) To: Logan Gunthorpe Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Jens Axboe, Christoph Hellwig On Wed, Sep 04, 2019 at 10:07:12AM -0600, Logan Gunthorpe wrote: > Yes, I agree, we can't solve the mismatch problem in the general case: > with sequences of hot plug events there will always be a case that > mismatches. I just think we can do better in the simple common default case. This may be something where udev can help us. I might be able to find some time to look at that, but not today. > > Can we just ensure there is never a matching controller then? This > > patch will accomplish that and simpler than wrapping the instance in a > > refcount'ed object: > > > > http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html > > I don't really like that idea. It reduces the confusion caused by > mismatching numbers, but causes the controller to never match the > namespace, which is also confusing but in a different way. > > I like the nvme_instance idea. It's not going to be perfect but it has > some nice properties: the subsystem will try to match the controller's > instance whenever possible, but in cases where it doesn't, the instance > number of the subsystem will never be the same as an existing controller. > > I'll see if I can work up a quick patch set and see what people think. How about this: we have the subsys copy the controller's instance, and the nvme_free_ctrl() doesn't release it if its subsys matches? --- diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 14c0bfb55615..8a8279ece5ee 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -81,7 +81,6 @@ EXPORT_SYMBOL_GPL(nvme_reset_wq); struct workqueue_struct *nvme_delete_wq; EXPORT_SYMBOL_GPL(nvme_delete_wq); -static DEFINE_IDA(nvme_subsystems_ida); static LIST_HEAD(nvme_subsystems); static DEFINE_MUTEX(nvme_subsystems_lock); @@ -2344,7 +2343,8 @@ static void nvme_release_subsystem(struct device *dev) struct nvme_subsystem *subsys = container_of(dev, struct nvme_subsystem, dev); - ida_simple_remove(&nvme_subsystems_ida, subsys->instance); + if (subsys->instance >= 0) + ida_simple_remove(&nvme_instance_ida, subsys->instance); kfree(subsys); } @@ -2473,12 +2473,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id) subsys = kzalloc(sizeof(*subsys), GFP_KERNEL); if (!subsys) return -ENOMEM; - ret = ida_simple_get(&nvme_subsystems_ida, 0, 0, GFP_KERNEL); - if (ret < 0) { - kfree(subsys); - return ret; - } - subsys->instance = ret; + + subsys->instance = -1; mutex_init(&subsys->lock); kref_init(&subsys->ref); INIT_LIST_HEAD(&subsys->ctrls); @@ -2497,7 +2493,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id) subsys->dev.class = nvme_subsys_class; subsys->dev.release = nvme_release_subsystem; subsys->dev.groups = nvme_subsys_attrs_groups; - dev_set_name(&subsys->dev, "nvme-subsys%d", subsys->instance); + dev_set_name(&subsys->dev, "nvme-subsys%d", ctrl->instance); device_initialize(&subsys->dev); mutex_lock(&nvme_subsystems_lock); @@ -2528,6 +2524,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id) goto out_put_subsystem; } + if (!found) + subsys->instance = ctrl->instance; ctrl->subsys = subsys; list_add_tail(&ctrl->subsys_entry, &subsys->ctrls); mutex_unlock(&nvme_subsystems_lock); @@ -3803,7 +3801,9 @@ static void nvme_free_ctrl(struct device *dev) container_of(dev, struct nvme_ctrl, ctrl_device); struct nvme_subsystem *subsys = ctrl->subsys; - ida_simple_remove(&nvme_instance_ida, ctrl->instance); + if (subsys && ctrl->instance != subsys->instance) + ida_simple_remove(&nvme_instance_ida, ctrl->instance); + kfree(ctrl->effects); nvme_mpath_uninit(ctrl); __free_page(ctrl->discard_page); @@ -4085,7 +4085,6 @@ static int __init nvme_core_init(void) static void __exit nvme_core_exit(void) { - ida_destroy(&nvme_subsystems_ida); class_destroy(nvme_subsys_class); class_destroy(nvme_class); unregister_chrdev_region(nvme_chr_devt, NVME_MINORS); -- _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-09-04 16:35 ` Keith Busch @ 2019-09-04 17:01 ` Logan Gunthorpe 2019-09-04 17:14 ` Keith Busch 0 siblings, 1 reply; 15+ messages in thread From: Logan Gunthorpe @ 2019-09-04 17:01 UTC (permalink / raw) To: Keith Busch Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Jens Axboe, Christoph Hellwig On 2019-09-04 10:35 a.m., Keith Busch wrote: > On Wed, Sep 04, 2019 at 10:07:12AM -0600, Logan Gunthorpe wrote: >> Yes, I agree, we can't solve the mismatch problem in the general case: >> with sequences of hot plug events there will always be a case that >> mismatches. I just think we can do better in the simple common default case. > > This may be something where udev can help us. I might be able to find > some time to look at that, but not today. > >>> Can we just ensure there is never a matching controller then? This >>> patch will accomplish that and simpler than wrapping the instance in a >>> refcount'ed object: >>> >>> http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html >> >> I don't really like that idea. It reduces the confusion caused by >> mismatching numbers, but causes the controller to never match the >> namespace, which is also confusing but in a different way. >> >> I like the nvme_instance idea. It's not going to be perfect but it has >> some nice properties: the subsystem will try to match the controller's >> instance whenever possible, but in cases where it doesn't, the instance >> number of the subsystem will never be the same as an existing controller. >> >> I'll see if I can work up a quick patch set and see what people think. > > How about this: we have the subsys copy the controller's instance, > and the nvme_free_ctrl() doesn't release it if its subsys matches? Oh, yes that's simpler than the struct/kref method and looks like it will accomplish the same thing. I did some brief testing with it and it seems to work for me (though I don't have any subsystems with multiple controllers). If you want to make a patch out of it you can add my Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Thanks! Logan > --- > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 14c0bfb55615..8a8279ece5ee 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -81,7 +81,6 @@ EXPORT_SYMBOL_GPL(nvme_reset_wq); > struct workqueue_struct *nvme_delete_wq; > EXPORT_SYMBOL_GPL(nvme_delete_wq); > > -static DEFINE_IDA(nvme_subsystems_ida); > static LIST_HEAD(nvme_subsystems); > static DEFINE_MUTEX(nvme_subsystems_lock); > > @@ -2344,7 +2343,8 @@ static void nvme_release_subsystem(struct device *dev) > struct nvme_subsystem *subsys = > container_of(dev, struct nvme_subsystem, dev); > > - ida_simple_remove(&nvme_subsystems_ida, subsys->instance); > + if (subsys->instance >= 0) > + ida_simple_remove(&nvme_instance_ida, subsys->instance); > kfree(subsys); > } > > @@ -2473,12 +2473,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id) > subsys = kzalloc(sizeof(*subsys), GFP_KERNEL); > if (!subsys) > return -ENOMEM; > - ret = ida_simple_get(&nvme_subsystems_ida, 0, 0, GFP_KERNEL); > - if (ret < 0) { > - kfree(subsys); > - return ret; > - } > - subsys->instance = ret; > + > + subsys->instance = -1; > mutex_init(&subsys->lock); > kref_init(&subsys->ref); > INIT_LIST_HEAD(&subsys->ctrls); > @@ -2497,7 +2493,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id) > subsys->dev.class = nvme_subsys_class; > subsys->dev.release = nvme_release_subsystem; > subsys->dev.groups = nvme_subsys_attrs_groups; > - dev_set_name(&subsys->dev, "nvme-subsys%d", subsys->instance); > + dev_set_name(&subsys->dev, "nvme-subsys%d", ctrl->instance); > device_initialize(&subsys->dev); > > mutex_lock(&nvme_subsystems_lock); > @@ -2528,6 +2524,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id) > goto out_put_subsystem; > } > > + if (!found) > + subsys->instance = ctrl->instance; > ctrl->subsys = subsys; > list_add_tail(&ctrl->subsys_entry, &subsys->ctrls); > mutex_unlock(&nvme_subsystems_lock); > @@ -3803,7 +3801,9 @@ static void nvme_free_ctrl(struct device *dev) > container_of(dev, struct nvme_ctrl, ctrl_device); > struct nvme_subsystem *subsys = ctrl->subsys; > > - ida_simple_remove(&nvme_instance_ida, ctrl->instance); > + if (subsys && ctrl->instance != subsys->instance) > + ida_simple_remove(&nvme_instance_ida, ctrl->instance); > + > kfree(ctrl->effects); > nvme_mpath_uninit(ctrl); > __free_page(ctrl->discard_page); > @@ -4085,7 +4085,6 @@ static int __init nvme_core_init(void) > > static void __exit nvme_core_exit(void) > { > - ida_destroy(&nvme_subsystems_ida); > class_destroy(nvme_subsys_class); > class_destroy(nvme_class); > unregister_chrdev_region(nvme_chr_devt, NVME_MINORS); > -- > _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-09-04 17:01 ` Logan Gunthorpe @ 2019-09-04 17:14 ` Keith Busch 2019-09-04 17:29 ` Logan Gunthorpe 0 siblings, 1 reply; 15+ messages in thread From: Keith Busch @ 2019-09-04 17:14 UTC (permalink / raw) To: Logan Gunthorpe Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Jens Axboe, Christoph Hellwig On Wed, Sep 04, 2019 at 11:01:22AM -0600, Logan Gunthorpe wrote: > Oh, yes that's simpler than the struct/kref method and looks like it > will accomplish the same thing. I did some brief testing with it and it > seems to work for me (though I don't have any subsystems with multiple > controllers). If you want to make a patch out of it you can add my > > Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Thanks! I'll make it a proper patch and send shortly. For testing multi-controller subsystems, I haven't got proper hardware either, so I really like the nvme loop target. Here's a very simple json defining a two namespace subsystem backed by two real nvme devices: loop.json: --- { "ports": [ { "addr": { "adrfam": "", "traddr": "", "treq": "not specified", "trsvcid": "", "trtype": "loop" }, "portid": 1, "referrals": [], "subsystems": [ "testnqn" ] } ], "subsystems": [ { "attr": { "allow_any_host": "1" }, "namespaces": [ { "device": { "nguid": "ef90689c-6c46-d44c-89c1-4067801309a8", "path": "/dev/nvme0n1" }, "enable": 1, "nsid": 1 }, { "device": { "nguid": "ef90689c-6c46-d44c-89c1-4067801309a9", "path": "/dev/nvme1n1" }, "enable": 1, "nsid": 2 } ], "nqn": "testnqn" } ] } -- Configure the target: # nvmetcli restore loop.json Connect to it twice: # nvme connect -n testnqn -t loop # nvme connect -n testnqn -t loop List the result: # nvme list -v NVM Express Subsystems Subsystem Subsystem-NQN Controllers ---------------- ------------------------------------------------------------------------------------------------ ---------------- nvme-subsys0 nqn.2014.08.org.nvmexpress:8086108ePHLE7200015N6P4BGN-17335943:ICDPC5ED2ORA6.4T nvme0 nvme-subsys1 nqn.2014.08.org.nvmexpress:8086108ePHLE7200015N6P4BGN-27335943:ICDPC5ED2ORA6.4T nvme1 nvme-subsys2 testnqn nvme2, nvme3 NVM Express Controllers Device SN MN FR TxPort Address Subsystem Namespaces -------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ---------------- nvme0 PHLE7200015N6P4BGN-1 7335943:ICDPC5ED2ORA6.4T QDV1RD07 pcie 0000:88:00.0 nvme-subsys0 nvme0n1 nvme1 PHLE7200015N6P4BGN-2 7335943:ICDPC5ED2ORA6.4T QDV1RD03 pcie 0000:89:00.0 nvme-subsys1 nvme1n1 nvme2 9eb72cbeecc6fdb0 Linux 5.3.0-rc loop nvme-subsys2 nvme2n1, nvme2n2 nvme3 9eb72cbeecc6fdb0 Linux 5.3.0-rc loop nvme-subsys2 nvme2n1, nvme2n2 NVM Express Namespaces Device NSID Usage Format Controllers ------------ -------- -------------------------- ---------------- ---------------- nvme0n1 1 3.20 TB / 3.20 TB 512 B + 0 B nvme0 nvme1n1 1 3.20 TB / 3.20 TB 512 B + 0 B nvme1 nvme2n1 1 3.20 TB / 3.20 TB 512 B + 0 B nvme2, nvme3 nvme2n2 2 3.20 TB / 3.20 TB 512 B + 0 B nvme2, nvme3 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] nvme-core: Fix subsystem instance mismatches 2019-09-04 17:14 ` Keith Busch @ 2019-09-04 17:29 ` Logan Gunthorpe 0 siblings, 0 replies; 15+ messages in thread From: Logan Gunthorpe @ 2019-09-04 17:29 UTC (permalink / raw) To: Keith Busch Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, linux-kernel, linux-nvme, Jens Axboe, Christoph Hellwig On 2019-09-04 11:14 a.m., Keith Busch wrote: > On Wed, Sep 04, 2019 at 11:01:22AM -0600, Logan Gunthorpe wrote: >> Oh, yes that's simpler than the struct/kref method and looks like it >> will accomplish the same thing. I did some brief testing with it and it >> seems to work for me (though I don't have any subsystems with multiple >> controllers). If you want to make a patch out of it you can add my >> >> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> > > Thanks! I'll make it a proper patch and send shortly. > > For testing multi-controller subsystems, I haven't got proper hardware > either, so I really like the nvme loop target. Here's a very simple json > defining a two namespace subsystem backed by two real nvme devices: Cool right, thanks for the tip, I should have thought of that. I just did some more loop testing with your patch and it behaves roughly as we expect. The controller and subsystem IDs never overlap unless they are created at the same time and it doesn't look like any IDs are ever leaked. With simple non-CMIC devices the ctrl and subsystem always have the same instance number. Logan _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2019-09-04 17:29 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-31 0:01 [PATCH] nvme-core: Fix subsystem instance mismatches Logan Gunthorpe 2019-08-31 15:29 ` Keith Busch 2019-09-03 16:08 ` Logan Gunthorpe 2019-09-03 16:46 ` Keith Busch 2019-09-03 18:13 ` Logan Gunthorpe 2019-09-04 6:05 ` Christoph Hellwig 2019-09-04 14:44 ` Keith Busch 2019-09-04 15:42 ` Christoph Hellwig 2019-09-04 15:54 ` Keith Busch 2019-09-04 16:02 ` Christoph Hellwig 2019-09-04 16:07 ` Logan Gunthorpe 2019-09-04 16:35 ` Keith Busch 2019-09-04 17:01 ` Logan Gunthorpe 2019-09-04 17:14 ` Keith Busch 2019-09-04 17:29 ` Logan Gunthorpe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).