All of lore.kernel.org
 help / color / mirror / Atom feed
* nvme-format: protection information enabled although metadata size is 0
@ 2022-10-31  9:27 Binarus
  2022-11-02 15:34 ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Binarus @ 2022-10-31  9:27 UTC (permalink / raw)
  To: linux-nvme

Dear all,

after having read the subjects of the posts in October, I am afraid that 
a dumb newbie question like the following may be inappropriate here. But 
due to the lack of other options (tried other Q & A sites without 
success), I'll be brave ... Having said this:

On a machine with Debian Bullseye and nvme-cli 1.12, I have formatted an 
Intel DC P3700 the following way:

   nvme format /dev/nvme0 -l 3 -i 1 -f

That command has been executed within a few seconds without any error. 
But in my understanding, it should have failed. '-l 3' means 4096 bytes 
LBA without metadata, but '-i 1' enables T10 protection information 
which needs 8 bytes of metadata.

Afterwards, I have checked the output of

   nvme id-ns /dev/nvme0n1 -H

It is quite long, so I am shortening it; the relevant lines are

   dps     : 0x1
     [3:3] : 0     Protection Information is Transferred as Last 8 Bytes 
of Metadata
     [2:0] : 0x1   Protection Information Type 1 Enabled
   ...
   LBA Format  3 : Metadata Size: 0   bytes - Data Size: 4096 bytes - 
Relative Performance: 0 Best (in use)

As expected, and according to the format command, a metadata size of 0 
is in use, but the protection information is enabled.

Could somebody please explain that in simple words? How can the PI be 
enabled although there is no room for the checksums, and how does the 
device actually behave now?

Best regards, and thank you very much in advance,

Binarus

P.S. From the revision 2c of the NVMe base specification, page 172, I 
also got the impression that the format command shown above should have 
failed. In figure 190, in the first table row, there is:

Invalid Format: The format specified is invalid. This may be due to 
various conditions, including:
1. specifying an invalid User Data Format number;
2. enabling protection information when there are not sufficient 
metadata resources; or
3. the specified format is not available in the current configuration.

Item 2 reflects the situation described above, doesn't it? But then the 
format command should return an error, and actually should not format 
the device, correct?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nvme-format: protection information enabled although metadata size is 0
  2022-10-31  9:27 nvme-format: protection information enabled although metadata size is 0 Binarus
@ 2022-11-02 15:34 ` Keith Busch
  2022-11-02 15:42   ` Binarus
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2022-11-02 15:34 UTC (permalink / raw)
  To: Binarus; +Cc: linux-nvme

On Mon, Oct 31, 2022 at 10:27:34AM +0100, Binarus wrote:
> Dear all,
> 
> after having read the subjects of the posts in October, I am afraid that a
> dumb newbie question like the following may be inappropriate here. But due
> to the lack of other options (tried other Q & A sites without success), I'll
> be brave ... Having said this:
> 
> On a machine with Debian Bullseye and nvme-cli 1.12, I have formatted an
> Intel DC P3700 the following way:
> 
>   nvme format /dev/nvme0 -l 3 -i 1 -f

You correctly surmised that the device is behaving out-of-compliance
with the specification. The device should have returned an "Invalid
Format" error since, as you found, this is the error to return when
"enabling protection information when there are not sufficient metadata
resources".

You're unlikely to find a fix for this EOL device, though. It's just a
firmware bug that you should avoid.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nvme-format: protection information enabled although metadata size is 0
  2022-11-02 15:34 ` Keith Busch
@ 2022-11-02 15:42   ` Binarus
  2022-11-02 15:59     ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Binarus @ 2022-11-02 15:42 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme

On 02.11.2022 16:34, Keith Busch wrote:
> On Mon, Oct 31, 2022 at 10:27:34AM +0100, Binarus wrote:
>> On a machine with Debian Bullseye and nvme-cli 1.12, I have formatted an
>> Intel DC P3700 the following way:
>>
>>    nvme format /dev/nvme0 -l 3 -i 1 -f
> 
> You correctly surmised that the device is behaving out-of-compliance
> with the specification. The device should have returned an "Invalid
> Format" error since, as you found, this is the error to return when
> "enabling protection information when there are not sufficient metadata
> resources".
> 
> You're unlikely to find a fix for this EOL device, though. It's just a
> firmware bug that you should avoid.

Thank you very much for confirming that this is a bug. May I steal your time again and ask what you would do in that situation? Throw away the device because we can't trust it, or format it with 8 bytes of metadata and hope that the PI works correctly then?

In every case, I have seen in the meantime that the firmware on the device is hopelessly outdated. At first, I'll try to upgrade it. If that works, I'll repeat the test and report back.

Thanks again, and best regards,

Binarus



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nvme-format: protection information enabled although metadata size is 0
  2022-11-02 15:42   ` Binarus
@ 2022-11-02 15:59     ` Keith Busch
  2022-11-02 19:32       ` Binarus
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2022-11-02 15:59 UTC (permalink / raw)
  To: Binarus; +Cc: linux-nvme

On Wed, Nov 02, 2022 at 04:42:21PM +0100, Binarus wrote:
> On 02.11.2022 16:34, Keith Busch wrote:
> > On Mon, Oct 31, 2022 at 10:27:34AM +0100, Binarus wrote:
> > > On a machine with Debian Bullseye and nvme-cli 1.12, I have formatted an
> > > Intel DC P3700 the following way:
> > > 
> > >    nvme format /dev/nvme0 -l 3 -i 1 -f
> > 
> > You correctly surmised that the device is behaving out-of-compliance
> > with the specification. The device should have returned an "Invalid
> > Format" error since, as you found, this is the error to return when
> > "enabling protection information when there are not sufficient metadata
> > resources".
> > 
> > You're unlikely to find a fix for this EOL device, though. It's just a
> > firmware bug that you should avoid.
> 
> Thank you very much for confirming that this is a bug. May I steal your time again and ask what you would do in that situation? Throw away the device because we can't trust it, or format it with 8 bytes of metadata and hope that the PI works correctly then?

I think that's going too far. To the best of my knowledge, the device
works fine. You just hit an untested parameter combo. The device may
report you requested PI, but without metadata, it's going to behave the
same as a non-PI 4k format. If you supply valid paramters for pi
formats, then the device will correctly honor that.

I'm not sure if you're familiar with the different nvme metadata types
though, so I'll add that this particular model's does not work with the
Linux kernel's end-to-end protection. This device supports only the
"extended" metadata, not the "separate" that the Linux block stack
requires. You won't be able to use the generic block layer for IO with
protection information, but you should be able to use it in passthrough
modes. And if you are using the 8-byte format (LBAF 4, I believe), then
the driver will have the device strip/generate PI without the host ever
seeing it.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nvme-format: protection information enabled although metadata size is 0
  2022-11-02 15:59     ` Keith Busch
@ 2022-11-02 19:32       ` Binarus
  2022-11-02 19:47         ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Binarus @ 2022-11-02 19:32 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme

On 02.11.2022 16:59, Keith Busch wrote:
> On Wed, Nov 02, 2022 at 04:42:21PM +0100, Binarus wrote:
>> Thank you very much for confirming that this is a bug. May I steal your time again and ask what you would do in that situation? Throw away the device because we can't trust it, or format it with 8 bytes of metadata and hope that the PI works correctly then?
> 
> I think that's going too far. To the best of my knowledge, the device
> works fine. You just hit an untested parameter combo. The device may
> report you requested PI, but without metadata, it's going to behave the
> same as a non-PI 4k format. If you supply valid paramters for pi
> formats, then the device will correctly honor that.

In the meantime, I have updated the firmware and confirm that this die 
not changed the wrong behavior. There is still no error message when 
using nvme-format with the wrong parameters shown in my first post.

Thank you very much for confirming that using the correct parameters 
(i.e. LBAF 4) will actually enable the PI. That's the way I would have 
expected it.

> I'm not sure if you're familiar with the different nvme metadata types
> though, so I'll add that this particular model's does not work with the
> Linux kernel's end-to-end protection. This device supports only the
> "extended" metadata, not the "separate" that the Linux block stack
> requires. You won't be able to use the generic block layer for IO with
> protection information, but you should be able to use it in passthrough
> modes. And if you are using the 8-byte format (LBAF 4, I believe), then
> the driver will have the device strip/generate PI without the host ever
> seeing it.

I have a vague notion of the metadata types, and have recognized 
something which worries me even more:

In the datasheet / manual for the P3700 from October 2015 (newest 
version I could find), in table 34 on page 38 which describes the 
Identify Namespace data structure, it clearly says that byte 27 will 
report value 0x3, which means that both metadata types (extended and 
separate) are supported. From the "Interpretation" column of the "MC" row:

"Indicated support for metadata transferred with the extended data LBA 
and in separate buffer - both are supported."

However, when I execute nvme id-ns /dev/nvme0n1 on the machine in 
question, it shows the value 0x1 for the MC, which means that it 
supports only the extended LBA metadata.

That means the either the datasheet / manual or nvme is wrong. I guess 
that the former is the case, and your statement supports that.

I had absolutely no clue that the standard Linux IO does not support 
extended LBA metadata, and thus does not support extended LBA PI. That's 
quite disappointing. Currently, I don't know what the passthrough mode 
you have mentioned is, but I'll research it.

Perhaps I am using it already, because the SSD in question acts as a 
cache device in a ZFS pool. Since ZFS circumvents the normal I/O layer 
at some places, maybe it can use extended LBA PI.

I am aware that I wouldn't need the PI anyway with ZFS (because ZFS has 
its own checksumming which sets it apart from other file systems), but 
I'm eager to learn more about NVMe and the PI for future cases and other 
constellations, so I'll read about passthrough and Linux. Plus, we have 
a few (consumer) Samsung SSD 980 Pro in Windows machines here, and of 
course we would like to learn how to turn on the PI on them (if Windows 
supports it at all).

Thank you very much again, and best regards,

Binarus



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nvme-format: protection information enabled although metadata size is 0
  2022-11-02 19:32       ` Binarus
@ 2022-11-02 19:47         ` Keith Busch
  0 siblings, 0 replies; 6+ messages in thread
From: Keith Busch @ 2022-11-02 19:47 UTC (permalink / raw)
  To: Binarus; +Cc: linux-nvme

On Wed, Nov 02, 2022 at 08:32:19PM +0100, Binarus wrote:
> On 02.11.2022 16:59, Keith Busch wrote:
> > though, so I'll add that this particular model's does not work with the
> > Linux kernel's end-to-end protection. This device supports only the
> > "extended" metadata, not the "separate" that the Linux block stack
> > requires. You won't be able to use the generic block layer for IO with
> > protection information, but you should be able to use it in passthrough
> > modes. And if you are using the 8-byte format (LBAF 4, I believe), then
> > the driver will have the device strip/generate PI without the host ever
> > seeing it.
> 
> I have a vague notion of the metadata types, and have recognized something
> which worries me even more:
> 
> In the datasheet / manual for the P3700 from October 2015 (newest version I
> could find), in table 34 on page 38 which describes the Identify Namespace
> data structure, it clearly says that byte 27 will report value 0x3, which
> means that both metadata types (extended and separate) are supported. From
> the "Interpretation" column of the "MC" row:
> 
> "Indicated support for metadata transferred with the extended data LBA and
> in separate buffer - both are supported."
> 
> However, when I execute nvme id-ns /dev/nvme0n1 on the machine in question,
> it shows the value 0x1 for the MC, which means that it supports only the
> extended LBA metadata.
> 
> That means the either the datasheet / manual or nvme is wrong. I guess that
> the former is the case, and your statement supports that.

Your data sheet is wrong. This family of controllers never supported
anything but interleaved metadata.
 
> I had absolutely no clue that the standard Linux IO does not support
> extended LBA metadata, and thus does not support extended LBA PI. That's
> quite disappointing.

How could it be supported? The format requires data+metadata be
virtually contiguous, but that's impossible from the user app that only
provides the data.

The only option would be for the kernel to bounce it through a new
buffer, and that's more horrible than it sounds, not to mention a
complete disaster for memory reclaim. This was my last attempt at it:

  http://lists.infradead.org/pipermail/linux-nvme/2018-February/015844.html

> Currently, I don't know what the passthrough mode you
> have mentioned is, but I'll research it.

From user space, you'd have to use ioctl NVME_IOCTL_IO_CMD instead of
normal read/write.

> Perhaps I am using it already, because the SSD in question acts as a cache
> device in a ZFS pool. Since ZFS circumvents the normal I/O layer at some
> places, maybe it can use extended LBA PI.

Kernel space can also issue passthrough commands if they really really
want to via REQ_DRV_IN/OUT requests, but I seriously doubt that's
happening. That'd be quite fragile for an out-of-tree filesystem to
attempt.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-02 19:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-31  9:27 nvme-format: protection information enabled although metadata size is 0 Binarus
2022-11-02 15:34 ` Keith Busch
2022-11-02 15:42   ` Binarus
2022-11-02 15:59     ` Keith Busch
2022-11-02 19:32       ` Binarus
2022-11-02 19:47         ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.