All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug Report] nvme-cli commands fails to open head disk node and print error
@ 2024-03-28  6:30 Nilay Shroff
  2024-03-28  7:15 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Nilay Shroff @ 2024-03-28  6:30 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: linux-nvme, linux-block, axboe, Gregory Joyce

Hi,

We observed that nvme-cli commands (nvme list, nvme list-subsys, nvme show topology etc.) print error message prior to printing the actual output.

Notes and observations:
======================-
This issue is observed on the latest linus kernel tree (v6.9-rc1). This was working well in kernel v6.8.

Test details:
=============
I have an NVMe disk which has two controllers, two namespaces and it's multipath capable:

# nvme list-ns /dev/nvme0 
[   0]:0x1
[   1]:0x3

One of namespaces has zero disk capacity:

# nvme id-ns /dev/nvme0 -n 0x3
NVME Identify Namespace 3:
nsze    : 0
ncap    : 0
nuse    : 0
nsfeat  : 0x14
nlbaf   : 4
flbas   : 0
<snip>

Another namespace has non-zero disk capacity:

# nvme id-ns /dev/nvme0 -n 0x1 
NVME Identify Namespace 1:
nsze    : 0x156d56
ncap    : 0x156d56
nuse    : 0
nsfeat  : 0x14
nlbaf   : 4
flbas   : 0
<snip>
 
6.8 kernel:
----------

# nvme list -v 

Subsystem        Subsystem-NQN                                                                                    Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys0     nqn.2019-10.com.kioxia:KCM7DRUG1T92:3D60A04906N1                                                 nvme0, nvme2

Device   SN                   MN                                       FR       TxPort Asdress        Slot   Subsystem    Namespaces      
-------- -------------------- ---------------------------------------- -------- ------ -------------- ------ ------------ ----------------
nvme0    3D60A04906N1         1.6TB NVMe Gen4 U.2 SSD IV               REV.CAS2 pcie   0524:28:00.0          nvme-subsys0 nvme0n1
nvme2    3D60A04906N1         1.6TB NVMe Gen4 U.2 SSD IV               REV.CAS2 pcie   0584:28:00.0          nvme-subsys0 

Device       Generic      NSID       Usage                      Format           Controllers     
------------ ------------ ---------- -------------------------- ---------------- ----------------
/dev/nvme0n1 /dev/ng0n1   0x1          0.00   B /   5.75  GB      4 KiB +  0 B   nvme0

As we can see above the namespace (0x3) with zero disk capacity is not listed in the output.
Furthermore, we don't create head disk node (i.e. /dev/nvmeXnY) for a namespace with zero
disk capacity and also we don't have any entry for such disk under /sys/block/.  

6.9-rc1 kernel:
---------------

# nvme list -v 

Failed to open ns nvme0n3, errno 2 <== error is printed first followed by output

Subsystem        Subsystem-NQN                                                                                    Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys0     nqn.2019-10.com.kioxia:KCM7DRUG1T92:3D60A04906N1                                                 nvme0, nvme2

Device   SN                   MN                                       FR       TxPort Asdress        Slot   Subsystem    Namespaces      
-------- -------------------- ---------------------------------------- -------- ------ -------------- ------ ------------ ----------------
nvme0    3D60A04906N1         1.6TB NVMe Gen4 U.2 SSD IV               REV.CAS2 pcie   0524:28:00.0          nvme-subsys0 nvme0n1
nvme2    3D60A04906N1         1.6TB NVMe Gen4 U.2 SSD IV               REV.CAS2 pcie   0584:28:00.0          nvme-subsys0 

Device       Generic      NSID       Usage                      Format           Controllers     
------------ ------------ ---------- -------------------------- ---------------- ----------------
/dev/nvme0n1 /dev/ng0n1   0x1          0.00   B /   5.75  GB      4 KiB +  0 B   nvme0


# nvme list-subsys 

Failed to open ns nvme0n3, errno 2 <== error is printed first followed by output

nvme-subsys0 - NQN=nqn.2019-10.com.kioxia:KCM7DRUG1T92:3D60A04906N1
               hostnqn=nqn.2014-08.org.nvmexpress:uuid:41528538-e8ad-4eaf-84a7-9c552917d988
               iopolicy=numa
\
 +- nvme2 pcie 0584:28:00.0 live
 +- nvme0 pcie 0524:28:00.0 live

# nvme show-topology

Failed to open ns nvme0n3, errno 2 <== error is printed first followed by output

nvme-subsys0 - NQN=nqn.2019-10.com.kioxia:KCM7DRUG1T92:3D60A04906N1
               hostnqn=nqn.2014-08.org.nvmexpress:uuid:41528538-e8ad-4eaf-84a7-9c552917d988
               iopolicy=numa
\
 +- ns 1
 \
  +- nvme0 pcie 0524:28:00.0 live optimized

From the above output it's evident that nvme-cli attempts to open the disk node /dev/nvme0n3 
however that entry doesn't exist. Apparently, on 6.9-rc1 kernel though head disk node /dev/nvme0n3
doesn't exit, the relevant entries /sys/block/nvme0c0n3 and /sys/block/nvme0n3 are present. 

As I understand, typically the nvme-cli command build the nvme subsystem topology first before 
printing the output. Here in this case, nvme-cli could find the nvme0c0n3 and nvme0n3 under 
/sys/block and so it assumes that there would be a corresponding disk node entry /dev/nvme0n3
show present however when nvme-cli attempts to open the /dev/nvme0n3 it fails and causing the 
observed symptom. 

Git bisect:
===========
The git bisect points to the below commit:

commit 46e7422cda8482aa3074c9caf4c224cf2fb74d71 (HEAD)
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Mar 4 07:04:54 2024 -0700

    nvme: move common logic into nvme_update_ns_info
    
    nvme_update_ns_info_generic and nvme_update_ns_info_block share a
    fair amount of logic related to not fully supported namespace
    formats and updating the multipath information.  Move this logic
    into the common caller.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Keith Busch <kbusch@kernel.org>


In 6.9-rc1, it seems that with the above code restructuring, we would now hide the head disk 
node nvmeXnY showing up under /dev, however the relevant disk names nvmeXcYnZ and nvmeXnY do 
exist under /sys/block/. On 6.8 kernel, we don't create any disk node under /dev and as well
the corresponding disk folders under /sys/block if the disk capacity is zero. 

Thanks,
--Nilay





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug Report] nvme-cli commands fails to open head disk node and print error
  2024-03-28  6:30 [Bug Report] nvme-cli commands fails to open head disk node and print error Nilay Shroff
@ 2024-03-28  7:15 ` Christoph Hellwig
  2024-03-28 10:25   ` Nilay Shroff
  2024-03-28  8:45 ` Daniel Wagner
  2024-04-02 15:04 ` Christoph Hellwig
  2 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2024-03-28  7:15 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, linux-block, axboe,
	Gregory Joyce

On Thu, Mar 28, 2024 at 12:00:07PM +0530, Nilay Shroff wrote:
> One of namespaces has zero disk capacity:
> 
> # nvme id-ns /dev/nvme0 -n 0x3
> NVME Identify Namespace 3:
> nsze    : 0
> ncap    : 0
> nuse    : 0
> nsfeat  : 0x14
> nlbaf   : 4
> flbas   : 0
> <snip>

How can you have a namespace with a zero capacity?  NCAP is used
as the check for legacy pre-ns scan controllers to check that
the namespace exists.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug Report] nvme-cli commands fails to open head disk node and print error
  2024-03-28  6:30 [Bug Report] nvme-cli commands fails to open head disk node and print error Nilay Shroff
  2024-03-28  7:15 ` Christoph Hellwig
@ 2024-03-28  8:45 ` Daniel Wagner
  2024-03-28 10:05   ` Nilay Shroff
  2024-04-02 22:07   ` Kamaljit Singh
  2024-04-02 15:04 ` Christoph Hellwig
  2 siblings, 2 replies; 10+ messages in thread
From: Daniel Wagner @ 2024-03-28  8:45 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, linux-block, axboe,
	Gregory Joyce

On Thu, Mar 28, 2024 at 12:00:07PM +0530, Nilay Shroff wrote:
> From the above output it's evident that nvme-cli attempts to open the disk node /dev/nvme0n3
> however that entry doesn't exist. Apparently, on 6.9-rc1 kernel though head disk node /dev/nvme0n3
> doesn't exit, the relevant entries /sys/block/nvme0c0n3 and /sys/block/nvme0n3 are present. 

I assume you are using not latest version of nvme-cli/libnvme. The
latest version does not try to open any block devices when scanning the
sysfs topology.

What does `nvme version` say?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug Report] nvme-cli commands fails to open head disk node and print error
  2024-03-28  8:45 ` Daniel Wagner
@ 2024-03-28 10:05   ` Nilay Shroff
  2024-04-02 22:07   ` Kamaljit Singh
  1 sibling, 0 replies; 10+ messages in thread
From: Nilay Shroff @ 2024-03-28 10:05 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, linux-block, axboe,
	Gregory Joyce



On 3/28/24 14:15, Daniel Wagner wrote:
> On Thu, Mar 28, 2024 at 12:00:07PM +0530, Nilay Shroff wrote:
>> From the above output it's evident that nvme-cli attempts to open the disk node /dev/nvme0n3
>> however that entry doesn't exist. Apparently, on 6.9-rc1 kernel though head disk node /dev/nvme0n3
>> doesn't exit, the relevant entries /sys/block/nvme0c0n3 and /sys/block/nvme0n3 are present. 
> 
> I assume you are using not latest version of nvme-cli/libnvme. The
> latest version does not try to open any block devices when scanning the
> sysfs topology.
> 
> What does `nvme version` say?
> 
yes you are correct, my nvme version was not latest:

# nvme --version 
nvme version 2.6 (git 2.6)
libnvme version 1.6 (git 1.6)

I have just upgraded to the latest version 2.8 and I don't see this issue.
I see that newer version of nvme-cli doesn't need to open head disk node if
kernel version is >= 6.8.

Thanks,
--Nilay






^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug Report] nvme-cli commands fails to open head disk node and print error
  2024-03-28  7:15 ` Christoph Hellwig
@ 2024-03-28 10:25   ` Nilay Shroff
  0 siblings, 0 replies; 10+ messages in thread
From: Nilay Shroff @ 2024-03-28 10:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-nvme, linux-block, axboe, Gregory Joyce



On 3/28/24 12:45, Christoph Hellwig wrote:
> On Thu, Mar 28, 2024 at 12:00:07PM +0530, Nilay Shroff wrote:
>> One of namespaces has zero disk capacity:
>>
>> # nvme id-ns /dev/nvme0 -n 0x3
>> NVME Identify Namespace 3:
>> nsze    : 0
>> ncap    : 0
>> nuse    : 0
>> nsfeat  : 0x14
>> nlbaf   : 4
>> flbas   : 0
>> <snip>
> 
> How can you have a namespace with a zero capacity?  NCAP is used
> as the check for legacy pre-ns scan controllers to check that
> the namespace exists.
> 
> 
I have this NVMe disk which has a namepsace with zero capacity.

# nvme list -v 
Subsystem        Subsystem-NQN                                                                                    Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys2     nqn.2019-10.com.kioxia:KCM7DRUG1T92:3D60A04906N1                                                 nvme0, nvme2

<snip>

Device       Generic      NSID       Usage                      Format           Controllers     
------------ ------------ ---------- -------------------------- ---------------- ----------------
/dev/nvme2n1 /dev/ng2n1   0x1          0.00   B /  46.01  GB      4 KiB +  0 B   nvme0
nvme2n2      /dev/ng2n2   0x3          0.00   B /   0.00   B    512   B +  0 B   nvme0

I didn't create that namespace with zero capacity and I didn't know about it until 
recently I upgraded to 6.9-rc1 kernel. It became only apparent with latest kernel. 
However as Daniel mentioned in another thread latest nvme-cli wouldn't exhibit the 
issue I reported. And I'm going to wipe this namespace anyways. 

Thanks,
--Nilay



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug Report] nvme-cli commands fails to open head disk node and print error
  2024-03-28  6:30 [Bug Report] nvme-cli commands fails to open head disk node and print error Nilay Shroff
  2024-03-28  7:15 ` Christoph Hellwig
  2024-03-28  8:45 ` Daniel Wagner
@ 2024-04-02 15:04 ` Christoph Hellwig
  2024-04-03  7:03   ` Nilay Shroff
  2 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2024-04-02 15:04 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, linux-block, axboe,
	Gregory Joyce

Hi Nilay,

can you see if this patch makes a different for your weird controller
with the listed but zero capacity namespaces?

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3b0498f320e6b9..ad60cf5581a419 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2089,7 +2089,7 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
 	if (id->ncap == 0) {
 		/* namespace not allocated or attached */
 		info->is_removed = true;
-		ret = -ENODEV;
+		ret = -ENXIO;
 		goto out;
 	}
 	lbaf = nvme_lbaf_index(id->flbas);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Bug Report] nvme-cli commands fails to open head disk node and print error
  2024-03-28  8:45 ` Daniel Wagner
  2024-03-28 10:05   ` Nilay Shroff
@ 2024-04-02 22:07   ` Kamaljit Singh
  2024-04-03  3:07     ` Keith Busch
  1 sibling, 1 reply; 10+ messages in thread
From: Kamaljit Singh @ 2024-04-02 22:07 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, linux-block, axboe,
	Gregory Joyce, Nilay Shroff


Hi Daniel,
Your question about the nvme-cli version makes me wonder if there is a version compatibility matrix (nvme-cli vs kernel) somewhere you could point me to? I didn't see such info in the nvme-cli release notes.

For example, I've seen issues with newer than nvme-cli v1.16 on Ubuntu 22.04 (stock & newer kernels). From a compatibility perspective I do wonder whether circumventing a distro's package manager and directly installing newer nvme-cli versions might be a bad idea. This could possibly become dire if there were intentional version dependencies across the stack.
 
Thanks,
Kamaljit
 

From: Linux-nvme <linux-nvme-bounces@lists.infradead.org> on behalf of Daniel Wagner <dwagner@suse.de>
Date: Thursday, March 28, 2024 at 01:46
To: Nilay Shroff <nilay@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>, linux-nvme@lists.infradead.org <linux-nvme@lists.infradead.org>, linux-block@vger.kernel.org <linux-block@vger.kernel.org>, axboe@fb.com <axboe@fb.com>, Gregory Joyce <gjoyce@ibm.com>
Subject: Re: [Bug Report] nvme-cli commands fails to open head disk node and print error
CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.


On Thu, Mar 28, 2024 at 12:00:07PM +0530, Nilay Shroff wrote:
> From the above output it's evident that nvme-cli attempts to open the disk node /dev/nvme0n3
> however that entry doesn't exist. Apparently, on 6.9-rc1 kernel though head disk node /dev/nvme0n3
> doesn't exit, the relevant entries /sys/block/nvme0c0n3 and /sys/block/nvme0n3 are present.

I assume you are using not latest version of nvme-cli/libnvme. The
latest version does not try to open any block devices when scanning the
sysfs topology.

What does `nvme version` say?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug Report] nvme-cli commands fails to open head disk node and print error
  2024-04-02 22:07   ` Kamaljit Singh
@ 2024-04-03  3:07     ` Keith Busch
  2024-04-03 10:10       ` Daniel Wagner
  0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2024-04-03  3:07 UTC (permalink / raw)
  To: Kamaljit Singh
  Cc: Daniel Wagner, Christoph Hellwig, linux-nvme, linux-block, axboe,
	Gregory Joyce, Nilay Shroff

On Tue, Apr 02, 2024 at 10:07:25PM +0000, Kamaljit Singh wrote:
> 
> Hi Daniel,
> Your question about the nvme-cli version makes me wonder if there is a
> version compatibility matrix (nvme-cli vs kernel) somewhere you could
> point me to? I didn't see such info in the nvme-cli release notes.

I don't believe there's ever been an intentional incompatibility for
nvme-cli vs. kernel versions. Most of the incompatibility problems come
from sysfs dependencies, but those should not be necessary for the core
passthrough commands on any version pairing.

And yeah, there should be sane fallbacks for older kernels in case a new
feature introduces a regression, but it's not always perfect. We try to
fix them as we learn about them, so bug reports on the github are useful
for tracking that.

> For example, I've seen issues with newer than nvme-cli v1.16 on Ubuntu
> 22.04 (stock & newer kernels). From a compatibility perspective I do
> wonder whether circumventing a distro's package manager and directly
> installing newer nvme-cli versions might be a bad idea. This could
> possibly become dire if there were intentional version dependencies
> across the stack.

The struggle is real, isn't it? New protocol features are added upstream
faster than distro package updates provide their users. On the other
hand, distros may be cautious to potential instability.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug Report] nvme-cli commands fails to open head disk node and print error
  2024-04-02 15:04 ` Christoph Hellwig
@ 2024-04-03  7:03   ` Nilay Shroff
  0 siblings, 0 replies; 10+ messages in thread
From: Nilay Shroff @ 2024-04-03  7:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-nvme, linux-block, axboe, Gregory Joyce


Hi Christoph,

On 4/2/24 20:34, Christoph Hellwig wrote:
> Hi Nilay,
> 
> can you see if this patch makes a different for your weird controller
> with the listed but zero capacity namespaces?
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 3b0498f320e6b9..ad60cf5581a419 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2089,7 +2089,7 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
>  	if (id->ncap == 0) {
>  		/* namespace not allocated or attached */
>  		info->is_removed = true;
> -		ret = -ENODEV;
> +		ret = -ENXIO;
>  		goto out;
>  	}
>  	lbaf = nvme_lbaf_index(id->flbas);
> 
I have just tested the above patch on my controller which has zero 
capacity namespaces. The patch works as expected and I don't encounter
any errors while using nvme-cli commands. Please note that I am using 
here the old version of nvme-cli (nvme version 2.6 / libnvme version 1.6).

Furthermore, with this patch, I no longer find any hidden disk nodes 
(i.e. nvmeXcYnZ or nvmeXnY) created for namespaces with zero capacity 
under /sys/block. So the behavior is similar to that of kernel v6.8.

IMO, we should upstream this patch.

Thanks,
--Nilay


 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug Report] nvme-cli commands fails to open head disk node and print error
  2024-04-03  3:07     ` Keith Busch
@ 2024-04-03 10:10       ` Daniel Wagner
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel Wagner @ 2024-04-03 10:10 UTC (permalink / raw)
  To: Keith Busch
  Cc: Kamaljit Singh, Christoph Hellwig, linux-nvme, linux-block,
	axboe, Gregory Joyce, Nilay Shroff

On Tue, Apr 02, 2024 at 09:07:55PM -0600, Keith Busch wrote:
> On Tue, Apr 02, 2024 at 10:07:25PM +0000, Kamaljit Singh wrote:
> > Your question about the nvme-cli version makes me wonder if there is a
> > version compatibility matrix (nvme-cli vs kernel) somewhere you could
> > point me to? I didn't see such info in the nvme-cli release notes.
> 
> I don't believe there's ever been an intentional incompatibility for
> nvme-cli vs. kernel versions. Most of the incompatibility problems come
> from sysfs dependencies, but those should not be necessary for the core
> passthrough commands on any version pairing.
> 
> And yeah, there should be sane fallbacks for older kernels in case a new
> feature introduces a regression, but it's not always perfect. We try to
> fix them as we learn about them, so bug reports on the github are useful
> for tracking that.

Indeed, all new features are auto detected. So if the kernel provides
them, nvme-cli/libnvme will be able to use them. Obviously, sometimes
there are some regressions but we avoid to increase the minimum kernel
dependency. Many things are also behind CONFIG options, thus the only
viable way is to auto detect features. Note, these new features are
almost all exclusive in the fabric code base. The PCI related bits are
pretty stable.

> > For example, I've seen issues with newer than nvme-cli v1.16 on Ubuntu
> > 22.04 (stock & newer kernels). From a compatibility perspective I do
> > wonder whether circumventing a distro's package manager and directly
> > installing newer nvme-cli versions might be a bad idea. This could
> > possibly become dire if there were intentional version dependencies
> > across the stack.
> 
> The struggle is real, isn't it? New protocol features are added upstream
> faster than distro package updates provide their users. On the other
> hand, distros may be cautious to potential instability.

We got a lot of request to provide up to data binaries for old distros.
For this reason we have an AppImage binary to play around. So if you
want to play with the latest greatest, it's fairly simple to do so.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-04-03 10:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-28  6:30 [Bug Report] nvme-cli commands fails to open head disk node and print error Nilay Shroff
2024-03-28  7:15 ` Christoph Hellwig
2024-03-28 10:25   ` Nilay Shroff
2024-03-28  8:45 ` Daniel Wagner
2024-03-28 10:05   ` Nilay Shroff
2024-04-02 22:07   ` Kamaljit Singh
2024-04-03  3:07     ` Keith Busch
2024-04-03 10:10       ` Daniel Wagner
2024-04-02 15:04 ` Christoph Hellwig
2024-04-03  7:03   ` Nilay Shroff

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.