All of lore.kernel.org
 help / color / mirror / Atom feed
* Saw I/O errors while delete/create/attach a namespace on nvme device.
@ 2023-11-07  4:10 Wen Xiong
  2023-11-07  4:36 ` Chaitanya Kulkarni
  2023-11-07  8:56 ` Christoph Hellwig
  0 siblings, 2 replies; 10+ messages in thread
From: Wen Xiong @ 2023-11-07  4:10 UTC (permalink / raw)
  To: linux-nvme; +Cc: Wenxiong

Hi All,

I am working on new nvme device and found this: nguid is changed while 
delete/create/attach a namespace, we saw some error messages in linux 
log.

Should we see these errors messages since recreating namespaces or are 
there some issues in error path?

# dmesg
[10524.330156] nvme nvme0: rescanning namespaces.
[10524.338810] nvme nvme0: identifiers changed for nsid 1
[10524.343448] block nvme0n1: no usable path - requeuing I/O
[10524.480468] block nvme0n1: no available path - failing I/O
[10524.480497] block nvme0n1: no available path - failing I/O
[10524.480501] Buffer I/O error on dev nvme0n1, logical block 781402912, 
async page read
[10524.480508] block nvme0n1: no available path - failing I/O
[10524.480510] Buffer I/O error on dev nvme0n1, logical block 781402913, 
async page read
[10524.480515] block nvme0n1: no available path - failing I/O
[10524.480517] Buffer I/O error on dev nvme0n1, logical block 781402914, 
async page read
[10524.480524] block nvme0n1: no available path - failing I/O
[10524.480525] Buffer I/O error on dev nvme0n1, logical block 781402915, 
async page read
[10524.480528] block nvme0n1: no available path - failing I/O
[10524.480529] Buffer I/O error on dev nvme0n1, logical block 781402916, 
async page read
[10524.480531] block nvme0n1: no available path - failing I/O
[10524.480535] Buffer I/O error on dev nvme0n1, logical block 781402917, 
async page read
[10524.480539] block nvme0n1: no available path - failing I/O
[10524.480541] Buffer I/O error on dev nvme0n1, logical block 781402918, 
async page read
[10524.480546] block nvme0n1: no available path - failing I/O
[10524.480548] Buffer I/O error on dev nvme0n1, logical block 781402919, 
async page read

Thanks for your help
Wendy



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Saw I/O errors while delete/create/attach a namespace on nvme device.
  2023-11-07  4:10 Saw I/O errors while delete/create/attach a namespace on nvme device Wen Xiong
@ 2023-11-07  4:36 ` Chaitanya Kulkarni
  2023-11-07  8:56 ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Chaitanya Kulkarni @ 2023-11-07  4:36 UTC (permalink / raw)
  To: Wen Xiong; +Cc: linux-nvme

On 11/6/23 20:10, Wen Xiong wrote:
> Hi All,
>
> I am working on new nvme device and found this: nguid is changed while 
> delete/create/attach a namespace, we saw some error messages in linux log.


Please provide the detailed steps from start which has lead to this
behavior, also please provide git repo/branch/branch head that you
are using.

-ck



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Saw I/O errors while delete/create/attach a namespace on nvme device.
  2023-11-07  4:10 Saw I/O errors while delete/create/attach a namespace on nvme device Wen Xiong
  2023-11-07  4:36 ` Chaitanya Kulkarni
@ 2023-11-07  8:56 ` Christoph Hellwig
  2023-11-07 10:25   ` Chaitanya Kulkarni
  2023-11-07 13:26   ` Wen Xiong
  1 sibling, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2023-11-07  8:56 UTC (permalink / raw)
  To: Wen Xiong; +Cc: linux-nvme, Wenxiong

On Mon, Nov 06, 2023 at 10:10:33PM -0600, Wen Xiong wrote:
> Hi All,
> 
> I am working on new nvme device and found this: nguid is changed while
> delete/create/attach a namespace, we saw some error messages in linux log.
> 
> Should we see these errors messages since recreating namespaces or are there
> some issues in error path?

What exactly are you doing?  To the Linux host code this looks like the
NGUID changed for an existing namespace.  Are you deleting and
recreating nsid1 rapidly and the controller is assigning a different
nguid?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Saw I/O errors while delete/create/attach a namespace on nvme device.
  2023-11-07  8:56 ` Christoph Hellwig
@ 2023-11-07 10:25   ` Chaitanya Kulkarni
  2023-11-07 14:31     ` Wen Xiong
  2023-11-07 13:26   ` Wen Xiong
  1 sibling, 1 reply; 10+ messages in thread
From: Chaitanya Kulkarni @ 2023-11-07 10:25 UTC (permalink / raw)
  To: Christoph Hellwig, Wen Xiong; +Cc: linux-nvme, Wenxiong

On 11/7/23 00:56, Christoph Hellwig wrote:
> On Mon, Nov 06, 2023 at 10:10:33PM -0600, Wen Xiong wrote:
>> Hi All,
>>
>> I am working on new nvme device and found this: nguid is changed while
>> delete/create/attach a namespace, we saw some error messages in linux log.
>>
>> Should we see these errors messages since recreating namespaces or are there
>> some issues in error path?
> What exactly are you doing?  To the Linux host code this looks like the
> NGUID changed for an existing namespace.  Are you deleting and
> recreating nsid1 rapidly and the controller is assigning a different
> nguid?
>
>

exactly "identifiers changed for nsid XXX" comes from nvme_validate_ns()
and the only caller is nvme_scan_ns() where ns is present, so scan ns
found the namespace and nvme_ns_ids_eaqual() returned an error hence 2nd 
print
message, but as said earlier we really need to see the exact steps ...

To check exactly which id is problematic something like in [1] can be
used, totally untested ...

-ck

[1]

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 75a1b58a7a43..84651c922548 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1749,10 +1749,23 @@ static void nvme_config_discard(struct gendisk 
*disk, struct nvme_ns *ns)

  static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct 
nvme_ns_ids *b)
  {
-       return uuid_equal(&a->uuid, &b->uuid) &&
-               memcmp(&a->nguid, &b->nguid, sizeof(a->nguid)) == 0 &&
-               memcmp(&a->eui64, &b->eui64, sizeof(a->eui64)) == 0 &&
-               a->csi == b->csi;
+       if (uuid_equal(&a->uuid, &b->uuid) == false) {
+               pr_info("uuid mismatch\n");
+               return false;
+       }
+       if (memcmp(&a->nguid, &b->nguid, sizeof(a->nguid)) != 0) {
+               pr_info("nguid mismatch\n");
+               return false;
+       }
+       if (memcmp(&a->eui64, &b->eui64, sizeof(a->eui64)) != 0) {
+               pr_info("euid64 mismaatch\n");
+               return false;
+       }
+       if (a->csi != b->csi) {
+               pr_info("csi mismatch\n");
+               return false;
+       }
+       return true;
  }

  static int nvme_init_ms(struct nvme_ns *ns, struct nvme_id_ns *id)



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Saw I/O errors while delete/create/attach a namespace on nvme device.
  2023-11-07  8:56 ` Christoph Hellwig
  2023-11-07 10:25   ` Chaitanya Kulkarni
@ 2023-11-07 13:26   ` Wen Xiong
  1 sibling, 0 replies; 10+ messages in thread
From: Wen Xiong @ 2023-11-07 13:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Wenxiong


> What exactly are you doing?  To the Linux host code this looks like the
> NGUID changed for an existing namespace.  Are you deleting and
> recreating nsid1 rapidly and the controller is assigning a different
> nguid?

Hi Christoph,

Good morning!

Yes. controller assigned a difference nguid to a namespace.

Steps:
Executed nvme detach-ns /dev/nvme0 --namespace-id=1 --controllers=0x81
Executed nvme delete-ns /dev/nvme0 --namespace-id=1
Executed nvme create-ns /dev/nvme0 --nsze=1562805846 --ncap=1562805846 
--flbas=0 -dps=0 -nmic=1
Executed nvme attach-ns /dev/nvme --namespace-id=1 --controllers=0x81

Saw NGUID changed and IO errors when executing attach-ns command.

Thanks a lot!
Wendy


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Saw I/O errors while delete/create/attach a namespace on nvme device.
  2023-11-07 10:25   ` Chaitanya Kulkarni
@ 2023-11-07 14:31     ` Wen Xiong
  2023-11-07 15:18       ` Keith Busch
  0 siblings, 1 reply; 10+ messages in thread
From: Wen Xiong @ 2023-11-07 14:31 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: Christoph Hellwig, linux-nvme, Wenxiong


> To check exactly which id is problematic something like in [1] can be
> used, totally untested ...
> 
Steps:
# nvme detach-ns /dev/nvme0 --namespace-id=1 --controllers=0x81
# nvme delete-ns /dev/nvme0 --namespace-id=1
# nvme create-ns /dev/nvme0 --nsze=562805846 --ncap=562805846 --flbas=0 
-dps=0 -nmic=1
# nvme attach-ns /dev/nvme0 -n 1  --controller=0x81

Below is linux log with your patch:

[  149.570987] nvme nvme0: rescanning namespaces.
[  149.578714] nguid mismatch
[  149.578719] nvme nvme0: identifiers changed for nsid 1
[  149.582291] block nvme0n1: no usable path - requeuing I/O
[  149.722140] block nvme0n1: no available path - failing I/O
[  149.722157] block nvme0n1: no available path - failing I/O
[  149.722165] Buffer I/O error on dev nvme0n1, logical block 281402912, 
async page read
[  149.722171] block nvme0n1: no available path - failing I/O
[  149.722175] Buffer I/O error on dev nvme0n1, logical block 281402913, 
async page read
[  149.722181] block nvme0n1: no available path - failing I/O
[  149.722185] Buffer I/O error on dev nvme0n1, logical block 281402914, 
async page read
[  149.722191] block nvme0n1: no available path - failing I/O
[  149.722195] Buffer I/O error on dev nvme0n1, logical block 281402915, 
async page read
[  149.722203] block nvme0n1: no available path - failing I/O
[  149.722208] Buffer I/O error on dev nvme0n1, logical block 281402916, 
async page read
[  149.722217] block nvme0n1: no available path - failing I/O
[  149.722226] Buffer I/O error on dev nvme0n1, logical block 281402917, 
async page read
[  149.722231] block nvme0n1: no available path - failing I/O
[  149.722233] Buffer I/O error on dev nvme0n1, logical block 281402918, 
async page read
[  149.722237] block nvme0n1: no available path - failing I/O
[  149.722239] Buffer I/O error on dev nvme0n1, logical block 281402919, 
async page read
[root@ltcrain119-lp4 ~]#

Below is nguid changes:
# nvme id-ns /dev/nvme0n1|grep nguid
nguid   : 37444630577000630025384700000245

  nvme id-ns /dev/nvme0n1|grep nguid
nguid   : 37444630577000630025384700000246

Thanks a lot!
Wendy


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Saw I/O errors while delete/create/attach a namespace on nvme device.
  2023-11-07 14:31     ` Wen Xiong
@ 2023-11-07 15:18       ` Keith Busch
  2023-11-07 15:53         ` Wen Xiong
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Keith Busch @ 2023-11-07 15:18 UTC (permalink / raw)
  To: Wen Xiong; +Cc: Chaitanya Kulkarni, Christoph Hellwig, linux-nvme, Wenxiong

On Tue, Nov 07, 2023 at 08:31:37AM -0600, Wen Xiong wrote:
> 
> > To check exactly which id is problematic something like in [1] can be
> > used, totally untested ...
> > 
> Steps:
> # nvme detach-ns /dev/nvme0 --namespace-id=1 --controllers=0x81
> # nvme delete-ns /dev/nvme0 --namespace-id=1
> # nvme create-ns /dev/nvme0 --nsze=562805846 --ncap=562805846 --flbas=0
> -dps=0 -nmic=1
> # nvme attach-ns /dev/nvme0 -n 1  --controller=0x81
> 
> Below is linux log with your patch:
> 
> [  149.570987] nvme nvme0: rescanning namespaces.

Are you running these commands in quick succession? There should be a
"rescanning namespaces" message right after the 'detach-ns' command, and
before subsequent 'attach-ns' command. It looks here that the rescan
didn't run until after the 'attach-ns' occured. Instead of tearing down
the original, the driver just sees the namespace it previously knew
about has changed unexpectedly; the processing for the namespace removal
didn't happen prior to the attach-ns command.

> [  149.578714] nguid mismatch
> [  149.578719] nvme nvme0: identifiers changed for nsid 1
> [  149.582291] block nvme0n1: no usable path - requeuing I/O
> [  149.722140] block nvme0n1: no available path - failing I/O
> [  149.722157] block nvme0n1: no available path - failing I/O

If you drop all open references to /dev/nvme0n1, then the handle should
get deleted, and a manual rescan after that should get your new
namespace visible.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Saw I/O errors while delete/create/attach a namespace on nvme device.
  2023-11-07 15:18       ` Keith Busch
@ 2023-11-07 15:53         ` Wen Xiong
  2023-11-07 19:22         ` Wen Xiong
  2023-11-08  7:15         ` Christoph Hellwig
  2 siblings, 0 replies; 10+ messages in thread
From: Wen Xiong @ 2023-11-07 15:53 UTC (permalink / raw)
  To: Keith Busch; +Cc: Chaitanya Kulkarni, Christoph Hellwig, linux-nvme, Wenxiong

On 2023-11-07 09:18, Keith Busch wrote:

Hi Keith,

> "rescanning namespaces" message right after the 'detach-ns' command, 
> and
> before subsequent 'attach-ns' command. It looks here that the rescan
> didn't run until after the 'attach-ns' occured. Instead of tearing down
> the original, the driver just sees the namespace it previously knew
> about has changed unexpectedly; the processing for the namespace 
> removal
> didn't happen prior to the attach-ns command.

Re-did:
# nvme detach-ns /dev/nvme0 --namespace-id=1 --controllers=0x81
detach-ns: Success, nsid:1
# dmesg
[ 4804.431303] nvme nvme0: rescanning namespaces
# nvme delete-ns /dev/nvme0 --namespace-id=1
delete-ns: Success, deleted nsid:1
# dmesg
[ 4804.431303] nvme nvme0: rescanning namespaces.
# nvme create-ns /dev/nvme0 --nsze=562805846 --ncap=562805846 --flbas=0 
-dps=0 -nmic=1
create-ns: Success, created nsid:1
[root@ltcrain119-lp4 ~]# dmesg
[ 4804.431303] nvme nvme0: rescanning namespaces.


> If you drop all open references to /dev/nvme0n1, then the handle should
> get deleted, and a manual rescan after that should get your new
> namespace visible.

# nvme attach-ns /dev/nvme0 -n 1  --controller=0x81
attach-ns: Success, nsid:1
# dmesg
[ 4804.431303] nvme nvme0: rescanning namespaces.
[ 5219.493625] nvme nvme0: rescanning namespaces.
[ 5219.502136] nguid mismatch
[ 5219.502146] nvme nvme0: identifiers changed for nsid 1
[ 5219.506668] block nvme0n1: no usable path - requeuing I/O
[ 5219.662788] block nvme0n1: no available path - failing I/O
[ 5219.662824] block nvme0n1: no available path - failing I/O
[ 5219.662841] Buffer I/O error on dev nvme0n1, logical block 281402912, 
async page read
[ 5219.662859] block nvme0n1: no available path - failing I/O
[ 5219.662875] Buffer I/O error on dev nvme0n1, logical block 281402913, 
async page read
[ 5219.662887] block nvme0n1: no available path - failing I/O
[ 5219.662894] Buffer I/O error on dev nvme0n1, logical block 281402914, 
async page read
[ 5219.662913] block nvme0n1: no available path - failing I/O
[ 5219.662926] Buffer I/O error on dev nvme0n1, logical block 281402915, 
async page read
[ 5219.662956] block nvme0n1: no available path - failing I/O
[ 5219.662970] Buffer I/O error on dev nvme0n1, logical block 281402916, 
async page read
[ 5219.662985] bio_check_eod: 7 callbacks suppressed
[ 5219.662988] systemd-udevd: attempt to access beyond end of device
                nvme0n1: rw=0, sector=4502446672, nr_sectors = 16 limit=0
[ 5219.663022] Buffer I/O error on dev nvme0n1, logical block 281402917, 
async page read
[ 5219.663035] systemd-udevd: attempt to access beyond end of device
                nvme0n1: rw=0, sector=4502446688, nr_sectors = 16 limit=0
[ 5219.663052] Buffer I/O error on dev nvme0n1, logical block 281402918, 
async page read
[ 5219.663065] systemd-udevd: attempt to access beyond end of device
                nvme0n1: rw=0, sector=4502446704, nr_sectors = 16 limit=0
[ 5219.663099] Buffer I/O error on dev nvme0n1, logical block 281402919, 
async page read
# nvme ns-rescan /dev/nvme0n1
/dev/nvme0n1: No such file or directory
Usage: nvme ns-rescan <device> [OPTIONS]

Rescans the NVMe namespaces

# nvme ns-rescan /dev/nvme0n1
/dev/nvme0n1: No such file or directory
Usage: nvme ns-rescan <device> [OPTIONS]

Rescans the NVMe namespaces

# ls -l /dev/nvme*
crw-------. 1 root root 240, 0 Nov  7 08:26 /dev/nvme0
crw-------. 1 root root 240, 1 Nov  7 08:13 /dev/nvme1
brw-rw----. 1 root disk 259, 1 Nov  7 08:13 /dev/nvme1n1

[root@ltcrain119-lp4 ~]# nvme attach-ns /dev/nvme0 -n 1  
--controller=0x81
NVMe status: Namespace Already Attached: The controller is already 
attached to the namespace specified(0x2118)

[root@ltcrain119-lp4 ~]# ls -l /dev/nvme*
crw-------. 1 root root 240, 0 Nov  7 08:26 /dev/nvme0
brw-rw----. 1 root disk 259, 3 Nov  7 09:48 /dev/nvme0n1
crw-------. 1 root root 240, 1 Nov  7 08:13 /dev/nvme1
brw-rw----. 1 root disk 259, 1 Nov  7 08:13 /dev/nvme1n1

After attach-ns command, /dev/nvme0n1 is not showed up in /dev/*, 
somehow I have to do the 2nd attach-ns command, nvme ns-rescan works 
after the 2nd attach-ns.

Is a firmware issue on nvme device?

Thanks,
Wendy



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Saw I/O errors while delete/create/attach a namespace on nvme device.
  2023-11-07 15:18       ` Keith Busch
  2023-11-07 15:53         ` Wen Xiong
@ 2023-11-07 19:22         ` Wen Xiong
  2023-11-08  7:15         ` Christoph Hellwig
  2 siblings, 0 replies; 10+ messages in thread
From: Wen Xiong @ 2023-11-07 19:22 UTC (permalink / raw)
  To: Keith Busch; +Cc: Chaitanya Kulkarni, Christoph Hellwig, linux-nvme, Wenxiong

On 2023-11-07 09:18, Keith Busch wrote:

Hi Keith,

> If you drop all open references to /dev/nvme0n1, then the handle should
> get deleted, and a manual rescan after that should get your new
> namespace visible.

Should we call nvme_ns_scan(*ctrl, nsid) again if nguid/uuid/eui64/csi 
are changed when recreating a namespace? Customers don't need to run 
ns-attach/ns-rescan manually in user space.

Thanks!
Wendy


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Saw I/O errors while delete/create/attach a namespace on nvme device.
  2023-11-07 15:18       ` Keith Busch
  2023-11-07 15:53         ` Wen Xiong
  2023-11-07 19:22         ` Wen Xiong
@ 2023-11-08  7:15         ` Christoph Hellwig
  2 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2023-11-08  7:15 UTC (permalink / raw)
  To: Keith Busch
  Cc: Wen Xiong, Chaitanya Kulkarni, Christoph Hellwig, linux-nvme, Wenxiong

On Tue, Nov 07, 2023 at 08:18:17AM -0700, Keith Busch wrote:
> Are you running these commands in quick succession? There should be a
> "rescanning namespaces" message right after the 'detach-ns' command, and
> before subsequent 'attach-ns' command. It looks here that the rescan
> didn't run until after the 'attach-ns' occured. Instead of tearing down
> the original, the driver just sees the namespace it previously knew
> about has changed unexpectedly; the processing for the namespace removal
> didn't happen prior to the attach-ns command.

Yep.

> 
> > [  149.578714] nguid mismatch
> > [  149.578719] nvme nvme0: identifiers changed for nsid 1
> > [  149.582291] block nvme0n1: no usable path - requeuing I/O
> > [  149.722140] block nvme0n1: no available path - failing I/O
> > [  149.722157] block nvme0n1: no available path - failing I/O
> 
> If you drop all open references to /dev/nvme0n1, then the handle should
> get deleted, and a manual rescan after that should get your new
> namespace visible.

I fear we still need to handle this somehow.  For actual per-spec
namespce management we'll just need to snoop the namespace management
commands and update the ns_head membership.  For out of band management
there's not much we can do as-is.  A good addition to the spec would
be to add the concept of a namespace generation that is incremented
every time the nsid is reused.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-11-08  7:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-07  4:10 Saw I/O errors while delete/create/attach a namespace on nvme device Wen Xiong
2023-11-07  4:36 ` Chaitanya Kulkarni
2023-11-07  8:56 ` Christoph Hellwig
2023-11-07 10:25   ` Chaitanya Kulkarni
2023-11-07 14:31     ` Wen Xiong
2023-11-07 15:18       ` Keith Busch
2023-11-07 15:53         ` Wen Xiong
2023-11-07 19:22         ` Wen Xiong
2023-11-08  7:15         ` Christoph Hellwig
2023-11-07 13:26   ` Wen Xiong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.