All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] nvme-multipath: show wrong nvme ns after dlpar a nvme device
@ 2022-05-27 15:49 wenxiong
  2022-05-27 20:52 ` Keith Busch
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: wenxiong @ 2022-05-27 15:49 UTC (permalink / raw)
  To: hch; +Cc: kbusch, linux-nvme, hare, wenxiong, Wen Xiong

From: Wen Xiong <wenxiong@linux.ibm.com>

If we have an active partition/namespace on nvme device, nvme device driver
won’t release these controller IDs/Namespaces IDs when dlpar remove.
So we got the wrong nvme devices names if nvme device driver still use old
subsystem id as controller id when dlpar add it back. We expect to see new
nvme devices with new controller IDs/Namespace IDs.

For example, we have a nvme U2 device with 4 namespaces in linux,

After system boots up,
Controller id:  0
Subsystem id:   0
namespaces ids: 1, 2, 3 and 4
nvme devices  : nvme0n1, nvme0n2, nvme0n3 and nvme0n4

Dlpar remove with active partitions on nvme0n1,
Controller id :   ID=0 still hold reference, ID=0 won’t be removed,  but /dev/nvme0 not exists anymore
Subsystem id :    0(nvme-subsys0 still exist in nvme list-subsys)
Namespace ids:    ID=1 still hold reference, ID=1 won’t be removed, but /dev/nvme0n1 not exists anymore
Nvme devices:  all nvme devices are gone in /dev

Dlpar Add back
Controller id:  1
Subsystem id :  0 still use old controller id=0
namespaces ids: 2, 3, 4 and 5
nvme devices:   We saw "nvme0n2, nvme0n3, nvme0n4, nvme0n5" in /dev
                /dev/nvme0 not exist anymore,
                We expect to see: nvme1n2, nvme1n3, nvme1n4 and nvme1n5

The patch fixes the issue.

Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com>
---
 drivers/nvme/host/multipath.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index d464fdf978fb..945974ceef3c 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -495,7 +495,7 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 	head->disk->fops = &nvme_ns_head_ops;
 	head->disk->private_data = head;
 	sprintf(head->disk->disk_name, "nvme%dn%d",
-			ctrl->subsys->instance, head->instance);
+			ctrl->instance, head->instance);
 
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, head->disk->queue);
 	blk_queue_flag_set(QUEUE_FLAG_NOWAIT, head->disk->queue);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] nvme-multipath: show wrong nvme ns after dlpar a nvme device
  2022-05-27 15:49 [PATCH 1/1] nvme-multipath: show wrong nvme ns after dlpar a nvme device wenxiong
@ 2022-05-27 20:52 ` Keith Busch
  2022-05-28  5:17 ` Christoph Hellwig
  2022-05-28 12:16 ` Hannes Reinecke
  2 siblings, 0 replies; 4+ messages in thread
From: Keith Busch @ 2022-05-27 20:52 UTC (permalink / raw)
  To: wenxiong; +Cc: hch, linux-nvme, hare, wenxiong

On Fri, May 27, 2022 at 10:49:37AM -0500, wenxiong@linux.ibm.com wrote:
> From: Wen Xiong <wenxiong@linux.ibm.com>
> 
> If we have an active partition/namespace on nvme device, nvme device driver
> won’t release these controller IDs/Namespaces IDs when dlpar remove.
> So we got the wrong nvme devices names if nvme device driver still use old
> subsystem id as controller id when dlpar add it back. We expect to see new
> nvme devices with new controller IDs/Namespace IDs.

It was done this way on purpose. You can dynamically detach and attach
namespaces to any controller in the subsystem, so naming the namespace after
the controller instance doesn't help.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] nvme-multipath: show wrong nvme ns after dlpar a nvme device
  2022-05-27 15:49 [PATCH 1/1] nvme-multipath: show wrong nvme ns after dlpar a nvme device wenxiong
  2022-05-27 20:52 ` Keith Busch
@ 2022-05-28  5:17 ` Christoph Hellwig
  2022-05-28 12:16 ` Hannes Reinecke
  2 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2022-05-28  5:17 UTC (permalink / raw)
  To: wenxiong; +Cc: hch, kbusch, linux-nvme, hare, wenxiong

On Fri, May 27, 2022 at 10:49:37AM -0500, wenxiong@linux.ibm.com wrote:
> From: Wen Xiong <wenxiong@linux.ibm.com>
> 
> If we have an active partition/namespace on nvme device, nvme device driver
> won’t release these controller IDs/Namespaces IDs when dlpar remove.

What is a dlpar?

> So we got the wrong nvme devices names if nvme device driver still use old
> subsystem id as controller id when dlpar add it back. We expect to see new
> nvme devices with new controller IDs/Namespace IDs.

What is wrong?

> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index d464fdf978fb..945974ceef3c 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -495,7 +495,7 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
>  	head->disk->fops = &nvme_ns_head_ops;
>  	head->disk->private_data = head;
>  	sprintf(head->disk->disk_name, "nvme%dn%d",
> -			ctrl->subsys->instance, head->instance);
> +			ctrl->instance, head->instance);

No, this is broken.  The name needs to refer to the subsystem instance
as otherwise we will run into naming conflicts.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] nvme-multipath: show wrong nvme ns after dlpar a nvme device
  2022-05-27 15:49 [PATCH 1/1] nvme-multipath: show wrong nvme ns after dlpar a nvme device wenxiong
  2022-05-27 20:52 ` Keith Busch
  2022-05-28  5:17 ` Christoph Hellwig
@ 2022-05-28 12:16 ` Hannes Reinecke
  2 siblings, 0 replies; 4+ messages in thread
From: Hannes Reinecke @ 2022-05-28 12:16 UTC (permalink / raw)
  To: wenxiong, hch; +Cc: kbusch, linux-nvme, wenxiong

On 5/27/22 17:49, wenxiong@linux.ibm.com wrote:
> From: Wen Xiong <wenxiong@linux.ibm.com>
> 
> If we have an active partition/namespace on nvme device, nvme device driver
> won’t release these controller IDs/Namespaces IDs when dlpar remove.
> So we got the wrong nvme devices names if nvme device driver still use old
> subsystem id as controller id when dlpar add it back. We expect to see new
> nvme devices with new controller IDs/Namespace IDs.
> 
> For example, we have a nvme U2 device with 4 namespaces in linux,
> 
> After system boots up,
> Controller id:  0
> Subsystem id:   0
> namespaces ids: 1, 2, 3 and 4
> nvme devices  : nvme0n1, nvme0n2, nvme0n3 and nvme0n4
> 
> Dlpar remove with active partitions on nvme0n1,
> Controller id :   ID=0 still hold reference, ID=0 won’t be removed,  but /dev/nvme0 not exists anymore
> Subsystem id :    0(nvme-subsys0 still exist in nvme list-subsys)
> Namespace ids:    ID=1 still hold reference, ID=1 won’t be removed, but /dev/nvme0n1 not exists anymore
> Nvme devices:  all nvme devices are gone in /dev
> 
> Dlpar Add back
> Controller id:  1
> Subsystem id :  0 still use old controller id=0
> namespaces ids: 2, 3, 4 and 5
> nvme devices:   We saw "nvme0n2, nvme0n3, nvme0n4, nvme0n5" in /dev
>                  /dev/nvme0 not exist anymore,
>                  We expect to see: nvme1n2, nvme1n3, nvme1n4 and nvme1n5
> 
> The patch fixes the issue.
> 
> Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com>
> ---
>   drivers/nvme/host/multipath.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index d464fdf978fb..945974ceef3c 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -495,7 +495,7 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
>   	head->disk->fops = &nvme_ns_head_ops;
>   	head->disk->private_data = head;
>   	sprintf(head->disk->disk_name, "nvme%dn%d",
> -			ctrl->subsys->instance, head->instance);
> +			ctrl->instance, head->instance);
>   
>   	blk_queue_flag_set(QUEUE_FLAG_NONROT, head->disk->queue);
>   	blk_queue_flag_set(QUEUE_FLAG_NOWAIT, head->disk->queue);

No, this is wrong.
The nvme subsystem is an internal construct within the linux subsystem, 
and the lifetime of that is determined by the lifetime of the associated 
controllers and namespaces.
So as long as a controller object (!) or namespace object exists which 
refer to the subsystem NQN, that subsystem object will exist.
When a controller is removed/detached by the dlpar the device nodes will 
be removed from the system, and the controller object itself will be 
freed/removed after the last reference drops. And only then will the 
reference to the subsystem be dropped, causing the subsystem object to 
be removed.
So if a new controller referring to the same subsystem NQN is connected 
while the subsystem object is still present it will naturally select 
that subsystem.

Everything is working as designed.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-28 12:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-27 15:49 [PATCH 1/1] nvme-multipath: show wrong nvme ns after dlpar a nvme device wenxiong
2022-05-27 20:52 ` Keith Busch
2022-05-28  5:17 ` Christoph Hellwig
2022-05-28 12:16 ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.