Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] nvme-core: Fix subsystem instance mismatches
@ 2019-08-31  0:01 Logan Gunthorpe
  2019-08-31 15:29 ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Logan Gunthorpe @ 2019-08-31  0:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, Christoph Hellwig
  Cc: Hannes Reinecke, Sagi Grimberg, Martin K . Petersen, Jens Axboe,
	Keith Busch, Logan Gunthorpe

With multipath enabled (which is now default in many distros), nvme
controllers and their respective namespaces can be numbered
differently. For example: nvme0n1 might actually belong to controller
nvme1, which is super confusing (and may have broken any scripts that
rely on the numbering relation). To make matters worse, the mismatches
can sometimes change from boot to boot so anyone dealing with the
nvme control device has to inspect sysfs every boot to ensure they
get the right one.

The reason for this is that the X in nvmeXn1 is the subsystem's instance
number (when multipath is enabled) which is distinct from the
controller's instance and the subsystem instance is assigned from a
separate IDA after the controller gets identified and this can race
a bit when multiple controllers are being setup.

To fix this, assign the subsystem's instance based on the instance
number of the controller's instance that first created it. There should
always be fewer subsystems than controllers so the should not be a need
to create extra subsystems that overlap existing controllers.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/nvme/host/core.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d3d6b7bd6903..ca201b71ab49 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -81,7 +81,6 @@ EXPORT_SYMBOL_GPL(nvme_reset_wq);
 struct workqueue_struct *nvme_delete_wq;
 EXPORT_SYMBOL_GPL(nvme_delete_wq);
 
-static DEFINE_IDA(nvme_subsystems_ida);
 static LIST_HEAD(nvme_subsystems);
 static DEFINE_MUTEX(nvme_subsystems_lock);
 
@@ -2332,7 +2331,6 @@ static void nvme_release_subsystem(struct device *dev)
 	struct nvme_subsystem *subsys =
 		container_of(dev, struct nvme_subsystem, dev);
 
-	ida_simple_remove(&nvme_subsystems_ida, subsys->instance);
 	kfree(subsys);
 }
 
@@ -2461,12 +2459,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 	subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
 	if (!subsys)
 		return -ENOMEM;
-	ret = ida_simple_get(&nvme_subsystems_ida, 0, 0, GFP_KERNEL);
-	if (ret < 0) {
-		kfree(subsys);
-		return ret;
-	}
-	subsys->instance = ret;
+	subsys->instance = ctrl->instance;
 	mutex_init(&subsys->lock);
 	kref_init(&subsys->ref);
 	INIT_LIST_HEAD(&subsys->ctrls);
@@ -4074,7 +4067,6 @@ static int __init nvme_core_init(void)
 
 static void __exit nvme_core_exit(void)
 {
-	ida_destroy(&nvme_subsystems_ida);
 	class_destroy(nvme_subsys_class);
 	class_destroy(nvme_class);
 	unregister_chrdev_region(nvme_chr_devt, NVME_MINORS);
-- 
2.20.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-08-31  0:01 [PATCH] nvme-core: Fix subsystem instance mismatches Logan Gunthorpe
@ 2019-08-31 15:29 ` Keith Busch
  2019-09-03 16:08   ` Logan Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2019-08-31 15:29 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Jens Axboe, Keith Busch,
	Christoph Hellwig

On Fri, Aug 30, 2019 at 06:01:39PM -0600, Logan Gunthorpe wrote:
> To fix this, assign the subsystem's instance based on the instance
> number of the controller's instance that first created it. There should
> always be fewer subsystems than controllers so the should not be a need
> to create extra subsystems that overlap existing controllers.

The subsystem's lifetime is not tied to the controller's. When the
controller is removed and releases its instance, the next controller
to take that available instance will create naming collisions with the
subsystem still using it.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-08-31 15:29 ` Keith Busch
@ 2019-09-03 16:08   ` Logan Gunthorpe
  2019-09-03 16:46     ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Logan Gunthorpe @ 2019-09-03 16:08 UTC (permalink / raw)
  To: Keith Busch
  Cc: Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Jens Axboe, Keith Busch,
	Christoph Hellwig



On 2019-08-31 9:29 a.m., Keith Busch wrote:
> On Fri, Aug 30, 2019 at 06:01:39PM -0600, Logan Gunthorpe wrote:
>> To fix this, assign the subsystem's instance based on the instance
>> number of the controller's instance that first created it. There should
>> always be fewer subsystems than controllers so the should not be a need
>> to create extra subsystems that overlap existing controllers.
> 
> The subsystem's lifetime is not tied to the controller's. When the
> controller is removed and releases its instance, the next controller
> to take that available instance will create naming collisions with the
> subsystem still using it.
> 

Hmm, yes, ok.

So perhaps we can just make the subsystem prefer the ctrl's instance
when allocating the ID? Then at least, in the common case, the
controller numbers will match the subsystem numbers. Only when there's
random hot-plugs would the numbers get out of sync.

Logan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-09-03 16:08   ` Logan Gunthorpe
@ 2019-09-03 16:46     ` Keith Busch
  2019-09-03 18:13       ` Logan Gunthorpe
  2019-09-04  6:05       ` Christoph Hellwig
  0 siblings, 2 replies; 15+ messages in thread
From: Keith Busch @ 2019-09-03 16:46 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jens Axboe, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Keith Busch, Christoph Hellwig

On Tue, Sep 03, 2019 at 10:08:01AM -0600, Logan Gunthorpe wrote:
> On 2019-08-31 9:29 a.m., Keith Busch wrote:
> > On Fri, Aug 30, 2019 at 06:01:39PM -0600, Logan Gunthorpe wrote:
> >> To fix this, assign the subsystem's instance based on the instance
> >> number of the controller's instance that first created it. There should
> >> always be fewer subsystems than controllers so the should not be a need
> >> to create extra subsystems that overlap existing controllers.
> > 
> > The subsystem's lifetime is not tied to the controller's. When the
> > controller is removed and releases its instance, the next controller
> > to take that available instance will create naming collisions with the
> > subsystem still using it.
> > 
> 
> Hmm, yes, ok.
> 
> So perhaps we can just make the subsystem prefer the ctrl's instance
> when allocating the ID? Then at least, in the common case, the
> controller numbers will match the subsystem numbers. Only when there's
> random hot-plugs would the numbers get out of sync.

I really don't know about a patch that works only on static
configurations. Connects and disconnects do happen on live systems,
so the numerals will inevitably get out of sync.

Could we possibly make /dev/nvmeX be a subsystem handle without causing
trouble for anyone? This would essentially be the same thing as today
for non-CMIC controllers with a device-per-controller and only affects
the CMIC ones.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-09-03 16:46     ` Keith Busch
@ 2019-09-03 18:13       ` Logan Gunthorpe
  2019-09-04  6:05       ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Logan Gunthorpe @ 2019-09-03 18:13 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Keith Busch, Christoph Hellwig



On 2019-09-03 10:46 a.m., Keith Busch wrote:
> On Tue, Sep 03, 2019 at 10:08:01AM -0600, Logan Gunthorpe wrote:
>> On 2019-08-31 9:29 a.m., Keith Busch wrote:
>>> On Fri, Aug 30, 2019 at 06:01:39PM -0600, Logan Gunthorpe wrote:
>>>> To fix this, assign the subsystem's instance based on the instance
>>>> number of the controller's instance that first created it. There should
>>>> always be fewer subsystems than controllers so the should not be a need
>>>> to create extra subsystems that overlap existing controllers.
>>>
>>> The subsystem's lifetime is not tied to the controller's. When the
>>> controller is removed and releases its instance, the next controller
>>> to take that available instance will create naming collisions with the
>>> subsystem still using it.
>>>
>>
>> Hmm, yes, ok.
>>
>> So perhaps we can just make the subsystem prefer the ctrl's instance
>> when allocating the ID? Then at least, in the common case, the
>> controller numbers will match the subsystem numbers. Only when there's
>> random hot-plugs would the numbers get out of sync.
> 
> I really don't know about a patch that works only on static
> configurations. Connects and disconnects do happen on live systems,
> so the numerals will inevitably get out of sync.

Well this depends on how big a problem we think the number mismatch is.
Right now it's pretty annoying because numbers aren't matching for
non-CMIC controllers in simple setups on boot. I think having a small
patch that makes it more consistent for the static would be worth it and
if CMIC controllers with significant hot-plug events have mismatches
that seems more understandable to me.

> Could we possibly make /dev/nvmeX be a subsystem handle without causing
> trouble for anyone? This would essentially be the same thing as today
> for non-CMIC controllers with a device-per-controller and only affects
> the CMIC ones.

Well then we'd have to be able to do everything that's possible with a
controller via the subsystem and it would have to multiplex all admin
commands for CMIC ones, etc to a sensible controller. This might make
sense in the long term but it sounds like a larger project than I have
time to take on.

Logan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-09-03 16:46     ` Keith Busch
  2019-09-03 18:13       ` Logan Gunthorpe
@ 2019-09-04  6:05       ` Christoph Hellwig
  2019-09-04 14:44         ` Keith Busch
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2019-09-04  6:05 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Keith Busch, Logan Gunthorpe,
	Christoph Hellwig

On Tue, Sep 03, 2019 at 10:46:20AM -0600, Keith Busch wrote:
> Could we possibly make /dev/nvmeX be a subsystem handle without causing
> trouble for anyone? This would essentially be the same thing as today
> for non-CMIC controllers with a device-per-controller and only affects
> the CMIC ones.

A per-subsyste character device doesn't make sense, as a lot of admin
command require a specific controller.

If this really is an isue for people we'll just need to refcount the
handle allocation.  That is:

 - nvme_init_ctrl allocates a new nvme_instance or so object, which
   does the ida_simple_get.
 - we allocate a new subsystem that reuses the handle and grabs
   a reference in nvme_init_subsystem, then if we find an existing
   subsystem we drop that reference again.
 - last free of a ctrl or subsystem also drops a reference, with
   the final free releasing the ida

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-09-04  6:05       ` Christoph Hellwig
@ 2019-09-04 14:44         ` Keith Busch
  2019-09-04 15:42           ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2019-09-04 14:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Jens Axboe, Logan Gunthorpe

On Wed, Sep 04, 2019 at 08:05:58AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 03, 2019 at 10:46:20AM -0600, Keith Busch wrote:
> > Could we possibly make /dev/nvmeX be a subsystem handle without causing
> > trouble for anyone? This would essentially be the same thing as today
> > for non-CMIC controllers with a device-per-controller and only affects
> > the CMIC ones.
> 
> A per-subsyste character device doesn't make sense, as a lot of admin
> command require a specific controller.

Yeah, I was hoping to provide something special for CMIC controllers
so you can do path specific admin, but that looks sure to break user
space.

> If this really is an isue for people we'll just need to refcount the
> handle allocation.  That is:
> 
>  - nvme_init_ctrl allocates a new nvme_instance or so object, which
>    does the ida_simple_get.
>  - we allocate a new subsystem that reuses the handle and grabs
>    a reference in nvme_init_subsystem, then if we find an existing
>    subsystem we drop that reference again.
>  - last free of a ctrl or subsystem also drops a reference, with
>    the final free releasing the ida

Let me step through an example:

  Ctrl A gets instance 0.

  Its subsystem gets the same instance, and takes ref count on it:
  all namespaces in this subsystem will use '0'.

  Ctrl B gets instance 1, and it's in the same subsystem as Ctrl A so
  no new subsytem is allocated.

  Ctrl A is disconnected, dropping its ref on instance 0, but the
  subsystem still has its refcount, making it unavailable.

  Ctrl A is reconnected, and allocates instance 2 because 0 is still in
  use.

Now all the namespaces in this subsystem are prefixed with nvme0, but no
controller exists with the same prefix. We still have inevitable naming
mismatch, right?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-09-04 14:44         ` Keith Busch
@ 2019-09-04 15:42           ` Christoph Hellwig
  2019-09-04 15:54             ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2019-09-04 15:42 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Jens Axboe, Logan Gunthorpe,
	Christoph Hellwig

On Wed, Sep 04, 2019 at 08:44:27AM -0600, Keith Busch wrote:
> Let me step through an example:
> 
>   Ctrl A gets instance 0.
> 
>   Its subsystem gets the same instance, and takes ref count on it:
>   all namespaces in this subsystem will use '0'.
> 
>   Ctrl B gets instance 1, and it's in the same subsystem as Ctrl A so
>   no new subsytem is allocated.
> 
>   Ctrl A is disconnected, dropping its ref on instance 0, but the
>   subsystem still has its refcount, making it unavailable.
> 
>   Ctrl A is reconnected, and allocates instance 2 because 0 is still in
>   use.
> 
> Now all the namespaces in this subsystem are prefixed with nvme0, but no
> controller exists with the same prefix. We still have inevitable naming
> mismatch, right?

I think th major confusion was that we can use the same handle for
and unrelated subsystem vs controller, and that would avoid it.

I don't see how we can avoid the controller is entirely different
from namespace problem ever.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-09-04 15:42           ` Christoph Hellwig
@ 2019-09-04 15:54             ` Keith Busch
  2019-09-04 16:02               ` Christoph Hellwig
  2019-09-04 16:07               ` Logan Gunthorpe
  0 siblings, 2 replies; 15+ messages in thread
From: Keith Busch @ 2019-09-04 15:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Jens Axboe, Logan Gunthorpe

On Wed, Sep 04, 2019 at 05:42:15PM +0200, Christoph Hellwig wrote:
> On Wed, Sep 04, 2019 at 08:44:27AM -0600, Keith Busch wrote:
> > Let me step through an example:
> > 
> >   Ctrl A gets instance 0.
> > 
> >   Its subsystem gets the same instance, and takes ref count on it:
> >   all namespaces in this subsystem will use '0'.
> > 
> >   Ctrl B gets instance 1, and it's in the same subsystem as Ctrl A so
> >   no new subsytem is allocated.
> > 
> >   Ctrl A is disconnected, dropping its ref on instance 0, but the
> >   subsystem still has its refcount, making it unavailable.
> > 
> >   Ctrl A is reconnected, and allocates instance 2 because 0 is still in
> >   use.
> > 
> > Now all the namespaces in this subsystem are prefixed with nvme0, but no
> > controller exists with the same prefix. We still have inevitable naming
> > mismatch, right?
> 
> I think th major confusion was that we can use the same handle for
> and unrelated subsystem vs controller, and that would avoid it.
>
> I don't see how we can avoid the controller is entirely different
> from namespace problem ever.

Can we just ensure there is never a matching controller then? This
patch will accomplish that and simpler than wrapping the instance in a
refcount'ed object:

http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-09-04 15:54             ` Keith Busch
@ 2019-09-04 16:02               ` Christoph Hellwig
  2019-09-04 16:07               ` Logan Gunthorpe
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2019-09-04 16:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Jens Axboe, Logan Gunthorpe,
	Christoph Hellwig

On Wed, Sep 04, 2019 at 09:54:45AM -0600, Keith Busch wrote:
> Can we just ensure there is never a matching controller then? This
> patch will accomplish that and simpler than wrapping the instance in a
> refcount'ed object:
> 
> http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html

I guess that is fine to.  Btw, what happened to the plan for udev
rules in that thread?


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-09-04 15:54             ` Keith Busch
  2019-09-04 16:02               ` Christoph Hellwig
@ 2019-09-04 16:07               ` Logan Gunthorpe
  2019-09-04 16:35                 ` Keith Busch
  1 sibling, 1 reply; 15+ messages in thread
From: Logan Gunthorpe @ 2019-09-04 16:07 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig
  Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Jens Axboe



On 2019-09-04 9:54 a.m., Keith Busch wrote:
> On Wed, Sep 04, 2019 at 05:42:15PM +0200, Christoph Hellwig wrote:
>> On Wed, Sep 04, 2019 at 08:44:27AM -0600, Keith Busch wrote:
>>> Let me step through an example:
>>>
>>>   Ctrl A gets instance 0.
>>>
>>>   Its subsystem gets the same instance, and takes ref count on it:
>>>   all namespaces in this subsystem will use '0'.
>>>
>>>   Ctrl B gets instance 1, and it's in the same subsystem as Ctrl A so
>>>   no new subsytem is allocated.
>>>
>>>   Ctrl A is disconnected, dropping its ref on instance 0, but the
>>>   subsystem still has its refcount, making it unavailable.
>>>
>>>   Ctrl A is reconnected, and allocates instance 2 because 0 is still in
>>>   use.
>>>
>>> Now all the namespaces in this subsystem are prefixed with nvme0, but no
>>> controller exists with the same prefix. We still have inevitable naming
>>> mismatch, right?
>>
>> I think th major confusion was that we can use the same handle for
>> and unrelated subsystem vs controller, and that would avoid it.
>>
>> I don't see how we can avoid the controller is entirely different
>> from namespace problem ever.


Yes, I agree, we can't solve the mismatch problem in the general case:
with sequences of hot plug events there will always be a case that
mismatches. I just think we can do better in the simple common default case.


> Can we just ensure there is never a matching controller then? This
> patch will accomplish that and simpler than wrapping the instance in a
> refcount'ed object:
> 
> http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html

I don't really like that idea. It reduces the confusion caused by
mismatching numbers, but causes the controller to never match the
namespace, which is also confusing but in a different way.

I like the nvme_instance idea. It's not going to be perfect but it has
some nice properties: the subsystem will try to match the controller's
instance whenever possible, but in cases where it doesn't, the instance
number of the subsystem will never be the same as an existing controller.

I'll see if I can work up a quick patch set and see what people think.

Logan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-09-04 16:07               ` Logan Gunthorpe
@ 2019-09-04 16:35                 ` Keith Busch
  2019-09-04 17:01                   ` Logan Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2019-09-04 16:35 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Jens Axboe, Christoph Hellwig

On Wed, Sep 04, 2019 at 10:07:12AM -0600, Logan Gunthorpe wrote:
> Yes, I agree, we can't solve the mismatch problem in the general case:
> with sequences of hot plug events there will always be a case that
> mismatches. I just think we can do better in the simple common default case.

This may be something where udev can help us. I might be able to find
some time to look at that, but not today.
 
> > Can we just ensure there is never a matching controller then? This
> > patch will accomplish that and simpler than wrapping the instance in a
> > refcount'ed object:
> > 
> > http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html
> 
> I don't really like that idea. It reduces the confusion caused by
> mismatching numbers, but causes the controller to never match the
> namespace, which is also confusing but in a different way.
> 
> I like the nvme_instance idea. It's not going to be perfect but it has
> some nice properties: the subsystem will try to match the controller's
> instance whenever possible, but in cases where it doesn't, the instance
> number of the subsystem will never be the same as an existing controller.
> 
> I'll see if I can work up a quick patch set and see what people think.

How about this: we have the subsys copy the controller's instance,
and the nvme_free_ctrl() doesn't release it if its subsys matches?

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 14c0bfb55615..8a8279ece5ee 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -81,7 +81,6 @@ EXPORT_SYMBOL_GPL(nvme_reset_wq);
 struct workqueue_struct *nvme_delete_wq;
 EXPORT_SYMBOL_GPL(nvme_delete_wq);
 
-static DEFINE_IDA(nvme_subsystems_ida);
 static LIST_HEAD(nvme_subsystems);
 static DEFINE_MUTEX(nvme_subsystems_lock);
 
@@ -2344,7 +2343,8 @@ static void nvme_release_subsystem(struct device *dev)
 	struct nvme_subsystem *subsys =
 		container_of(dev, struct nvme_subsystem, dev);
 
-	ida_simple_remove(&nvme_subsystems_ida, subsys->instance);
+	if (subsys->instance >= 0)
+		ida_simple_remove(&nvme_instance_ida, subsys->instance);
 	kfree(subsys);
 }
 
@@ -2473,12 +2473,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 	subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
 	if (!subsys)
 		return -ENOMEM;
-	ret = ida_simple_get(&nvme_subsystems_ida, 0, 0, GFP_KERNEL);
-	if (ret < 0) {
-		kfree(subsys);
-		return ret;
-	}
-	subsys->instance = ret;
+
+	subsys->instance = -1;
 	mutex_init(&subsys->lock);
 	kref_init(&subsys->ref);
 	INIT_LIST_HEAD(&subsys->ctrls);
@@ -2497,7 +2493,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 	subsys->dev.class = nvme_subsys_class;
 	subsys->dev.release = nvme_release_subsystem;
 	subsys->dev.groups = nvme_subsys_attrs_groups;
-	dev_set_name(&subsys->dev, "nvme-subsys%d", subsys->instance);
+	dev_set_name(&subsys->dev, "nvme-subsys%d", ctrl->instance);
 	device_initialize(&subsys->dev);
 
 	mutex_lock(&nvme_subsystems_lock);
@@ -2528,6 +2524,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 		goto out_put_subsystem;
 	}
 
+	if (!found)
+		subsys->instance = ctrl->instance;
 	ctrl->subsys = subsys;
 	list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
 	mutex_unlock(&nvme_subsystems_lock);
@@ -3803,7 +3801,9 @@ static void nvme_free_ctrl(struct device *dev)
 		container_of(dev, struct nvme_ctrl, ctrl_device);
 	struct nvme_subsystem *subsys = ctrl->subsys;
 
-	ida_simple_remove(&nvme_instance_ida, ctrl->instance);
+	if (subsys && ctrl->instance != subsys->instance)
+		ida_simple_remove(&nvme_instance_ida, ctrl->instance);
+
 	kfree(ctrl->effects);
 	nvme_mpath_uninit(ctrl);
 	__free_page(ctrl->discard_page);
@@ -4085,7 +4085,6 @@ static int __init nvme_core_init(void)
 
 static void __exit nvme_core_exit(void)
 {
-	ida_destroy(&nvme_subsystems_ida);
 	class_destroy(nvme_subsys_class);
 	class_destroy(nvme_class);
 	unregister_chrdev_region(nvme_chr_devt, NVME_MINORS);
--

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-09-04 16:35                 ` Keith Busch
@ 2019-09-04 17:01                   ` Logan Gunthorpe
  2019-09-04 17:14                     ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Logan Gunthorpe @ 2019-09-04 17:01 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Jens Axboe, Christoph Hellwig



On 2019-09-04 10:35 a.m., Keith Busch wrote:
> On Wed, Sep 04, 2019 at 10:07:12AM -0600, Logan Gunthorpe wrote:
>> Yes, I agree, we can't solve the mismatch problem in the general case:
>> with sequences of hot plug events there will always be a case that
>> mismatches. I just think we can do better in the simple common default case.
> 
> This may be something where udev can help us. I might be able to find
> some time to look at that, but not today.
>  
>>> Can we just ensure there is never a matching controller then? This
>>> patch will accomplish that and simpler than wrapping the instance in a
>>> refcount'ed object:
>>>
>>> http://lists.infradead.org/pipermail/linux-nvme/2019-May/024142.html
>>
>> I don't really like that idea. It reduces the confusion caused by
>> mismatching numbers, but causes the controller to never match the
>> namespace, which is also confusing but in a different way.
>>
>> I like the nvme_instance idea. It's not going to be perfect but it has
>> some nice properties: the subsystem will try to match the controller's
>> instance whenever possible, but in cases where it doesn't, the instance
>> number of the subsystem will never be the same as an existing controller.
>>
>> I'll see if I can work up a quick patch set and see what people think.
> 
> How about this: we have the subsys copy the controller's instance,
> and the nvme_free_ctrl() doesn't release it if its subsys matches?

Oh, yes that's simpler than the struct/kref method and looks like it
will accomplish the same thing. I did some brief testing with it and it
seems to work for me (though I don't have any subsystems with multiple
controllers). If you want to make a patch out of it you can add my

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Thanks!

Logan

> ---
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 14c0bfb55615..8a8279ece5ee 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -81,7 +81,6 @@ EXPORT_SYMBOL_GPL(nvme_reset_wq);
>  struct workqueue_struct *nvme_delete_wq;
>  EXPORT_SYMBOL_GPL(nvme_delete_wq);
>  
> -static DEFINE_IDA(nvme_subsystems_ida);
>  static LIST_HEAD(nvme_subsystems);
>  static DEFINE_MUTEX(nvme_subsystems_lock);
>  
> @@ -2344,7 +2343,8 @@ static void nvme_release_subsystem(struct device *dev)
>  	struct nvme_subsystem *subsys =
>  		container_of(dev, struct nvme_subsystem, dev);
>  
> -	ida_simple_remove(&nvme_subsystems_ida, subsys->instance);
> +	if (subsys->instance >= 0)
> +		ida_simple_remove(&nvme_instance_ida, subsys->instance);
>  	kfree(subsys);
>  }
>  
> @@ -2473,12 +2473,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
>  	subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
>  	if (!subsys)
>  		return -ENOMEM;
> -	ret = ida_simple_get(&nvme_subsystems_ida, 0, 0, GFP_KERNEL);
> -	if (ret < 0) {
> -		kfree(subsys);
> -		return ret;
> -	}
> -	subsys->instance = ret;
> +
> +	subsys->instance = -1;
>  	mutex_init(&subsys->lock);
>  	kref_init(&subsys->ref);
>  	INIT_LIST_HEAD(&subsys->ctrls);
> @@ -2497,7 +2493,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
>  	subsys->dev.class = nvme_subsys_class;
>  	subsys->dev.release = nvme_release_subsystem;
>  	subsys->dev.groups = nvme_subsys_attrs_groups;
> -	dev_set_name(&subsys->dev, "nvme-subsys%d", subsys->instance);
> +	dev_set_name(&subsys->dev, "nvme-subsys%d", ctrl->instance);
>  	device_initialize(&subsys->dev);
>  
>  	mutex_lock(&nvme_subsystems_lock);
> @@ -2528,6 +2524,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
>  		goto out_put_subsystem;
>  	}
>  
> +	if (!found)
> +		subsys->instance = ctrl->instance;
>  	ctrl->subsys = subsys;
>  	list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
>  	mutex_unlock(&nvme_subsystems_lock);
> @@ -3803,7 +3801,9 @@ static void nvme_free_ctrl(struct device *dev)
>  		container_of(dev, struct nvme_ctrl, ctrl_device);
>  	struct nvme_subsystem *subsys = ctrl->subsys;
>  
> -	ida_simple_remove(&nvme_instance_ida, ctrl->instance);
> +	if (subsys && ctrl->instance != subsys->instance)
> +		ida_simple_remove(&nvme_instance_ida, ctrl->instance);
> +
>  	kfree(ctrl->effects);
>  	nvme_mpath_uninit(ctrl);
>  	__free_page(ctrl->discard_page);
> @@ -4085,7 +4085,6 @@ static int __init nvme_core_init(void)
>  
>  static void __exit nvme_core_exit(void)
>  {
> -	ida_destroy(&nvme_subsystems_ida);
>  	class_destroy(nvme_subsys_class);
>  	class_destroy(nvme_class);
>  	unregister_chrdev_region(nvme_chr_devt, NVME_MINORS);
> --
> 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-09-04 17:01                   ` Logan Gunthorpe
@ 2019-09-04 17:14                     ` Keith Busch
  2019-09-04 17:29                       ` Logan Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2019-09-04 17:14 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Jens Axboe, Christoph Hellwig

On Wed, Sep 04, 2019 at 11:01:22AM -0600, Logan Gunthorpe wrote:
> Oh, yes that's simpler than the struct/kref method and looks like it
> will accomplish the same thing. I did some brief testing with it and it
> seems to work for me (though I don't have any subsystems with multiple
> controllers). If you want to make a patch out of it you can add my
>
> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Thanks! I'll make it a proper patch and send shortly.

For testing multi-controller subsystems, I haven't got proper hardware
either, so I really like the nvme loop target. Here's a very simple json
defining a two namespace subsystem backed by two real nvme devices:

loop.json:
---
{
  "ports": [
    {
      "addr": {
        "adrfam": "",
        "traddr": "",
        "treq": "not specified",
        "trsvcid": "",
        "trtype": "loop"
      },
      "portid": 1,
      "referrals": [],
      "subsystems": [
        "testnqn"
      ]
    }
  ],
  "subsystems": [
    {
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "ef90689c-6c46-d44c-89c1-4067801309a8",
            "path": "/dev/nvme0n1"
          },
          "enable": 1,
          "nsid": 1
        },
        {
          "device": {
            "nguid": "ef90689c-6c46-d44c-89c1-4067801309a9",
            "path": "/dev/nvme1n1"
          },
          "enable": 1,
          "nsid": 2
        }
      ],
      "nqn": "testnqn"
    }
  ]
}
--

Configure the target:

  # nvmetcli restore loop.json

Connect to it twice:

  # nvme connect -n testnqn -t loop
  # nvme connect -n testnqn -t loop

List the result:

  # nvme list -v
  NVM Express Subsystems

  Subsystem        Subsystem-NQN                                                                                    Controllers
  ---------------- ------------------------------------------------------------------------------------------------ ----------------
  nvme-subsys0     nqn.2014.08.org.nvmexpress:8086108ePHLE7200015N6P4BGN-17335943:ICDPC5ED2ORA6.4T                  nvme0
  nvme-subsys1     nqn.2014.08.org.nvmexpress:8086108ePHLE7200015N6P4BGN-27335943:ICDPC5ED2ORA6.4T                  nvme1
  nvme-subsys2     testnqn                                                                                          nvme2, nvme3

  NVM Express Controllers

  Device   SN                   MN                                       FR       TxPort Address        Subsystem    Namespaces
  -------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ----------------
  nvme0    PHLE7200015N6P4BGN-1 7335943:ICDPC5ED2ORA6.4T                 QDV1RD07 pcie   0000:88:00.0   nvme-subsys0 nvme0n1
  nvme1    PHLE7200015N6P4BGN-2 7335943:ICDPC5ED2ORA6.4T                 QDV1RD03 pcie   0000:89:00.0   nvme-subsys1 nvme1n1
  nvme2    9eb72cbeecc6fdb0     Linux                                    5.3.0-rc loop                  nvme-subsys2 nvme2n1, nvme2n2
  nvme3    9eb72cbeecc6fdb0     Linux                                    5.3.0-rc loop                  nvme-subsys2 nvme2n1, nvme2n2

  NVM Express Namespaces

  Device       NSID     Usage                      Format           Controllers
  ------------ -------- -------------------------- ---------------- ----------------
  nvme0n1      1          3.20  TB /   3.20  TB    512   B +  0 B   nvme0
  nvme1n1      1          3.20  TB /   3.20  TB    512   B +  0 B   nvme1
  nvme2n1      1          3.20  TB /   3.20  TB    512   B +  0 B   nvme2, nvme3
  nvme2n2      2          3.20  TB /   3.20  TB    512   B +  0 B   nvme2, nvme3



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-core: Fix subsystem instance mismatches
  2019-09-04 17:14                     ` Keith Busch
@ 2019-09-04 17:29                       ` Logan Gunthorpe
  0 siblings, 0 replies; 15+ messages in thread
From: Logan Gunthorpe @ 2019-09-04 17:29 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, Hannes Reinecke, Sagi Grimberg, Martin K . Petersen,
	linux-kernel, linux-nvme, Jens Axboe, Christoph Hellwig



On 2019-09-04 11:14 a.m., Keith Busch wrote:
> On Wed, Sep 04, 2019 at 11:01:22AM -0600, Logan Gunthorpe wrote:
>> Oh, yes that's simpler than the struct/kref method and looks like it
>> will accomplish the same thing. I did some brief testing with it and it
>> seems to work for me (though I don't have any subsystems with multiple
>> controllers). If you want to make a patch out of it you can add my
>>
>> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
> 
> Thanks! I'll make it a proper patch and send shortly.
> 
> For testing multi-controller subsystems, I haven't got proper hardware
> either, so I really like the nvme loop target. Here's a very simple json
> defining a two namespace subsystem backed by two real nvme devices:

Cool right, thanks for the tip, I should have thought of that. I just
did some more loop testing with your patch and it behaves roughly as we
expect. The controller and subsystem IDs never overlap unless they are
created at the same time and it doesn't look like any IDs are ever
leaked. With simple non-CMIC devices the ctrl and subsystem always have
the same instance number.

Logan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, back to index

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-31  0:01 [PATCH] nvme-core: Fix subsystem instance mismatches Logan Gunthorpe
2019-08-31 15:29 ` Keith Busch
2019-09-03 16:08   ` Logan Gunthorpe
2019-09-03 16:46     ` Keith Busch
2019-09-03 18:13       ` Logan Gunthorpe
2019-09-04  6:05       ` Christoph Hellwig
2019-09-04 14:44         ` Keith Busch
2019-09-04 15:42           ` Christoph Hellwig
2019-09-04 15:54             ` Keith Busch
2019-09-04 16:02               ` Christoph Hellwig
2019-09-04 16:07               ` Logan Gunthorpe
2019-09-04 16:35                 ` Keith Busch
2019-09-04 17:01                   ` Logan Gunthorpe
2019-09-04 17:14                     ` Keith Busch
2019-09-04 17:29                       ` Logan Gunthorpe

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org linux-nvme@archiver.kernel.org
	public-inbox-index linux-nvme


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/ public-inbox