linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pci: add NVMe FLR quirk to the SM951 SSD
@ 2021-04-29 23:07 Robert Straw
  2021-04-30 20:51 ` Bjorn Helgaas
  0 siblings, 1 reply; 6+ messages in thread
From: Robert Straw @ 2021-04-29 23:07 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, Robert Straw

The SM951/PM951, when used in conjunction with the vfio-pci driver and
passed to a KVM guest, can exhibit the fatal state addressed by the
existing `nvme_disable_and_flr` quirk. If the guest cleanly shuts down
the SSD, and vfio-pci attempts an FLR to the device while it is in this 
state, the nvme driver will fail when it attempts to bind to the device 
after the FLR due to the frozen config area, e.g:

  nvme nvme2: frozen state error detected, reset controller
  nvme nvme2: Removing after probe failure status: -12

By including this older model (Samsung 950 PRO) of the controller in the
existing quirk: the device is able to be cleanly reset after being used
by a KVM guest.

Signed-off-by: Robert Straw <drbawb@fatalsyntax.com>
---
 drivers/pci/quirks.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e3b..e339ca238 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3920,6 +3920,7 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
 		reset_ivb_igd },
 	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IVB_M2_VGA,
 		reset_ivb_igd },
+	{ PCI_VENDOR_ID_SAMSUNG, 0xa802, nvme_disable_and_flr },
 	{ PCI_VENDOR_ID_SAMSUNG, 0xa804, nvme_disable_and_flr },
 	{ PCI_VENDOR_ID_INTEL, 0x0953, delay_250ms_after_flr },
 	{ PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] pci: add NVMe FLR quirk to the SM951 SSD
  2021-04-29 23:07 [PATCH] pci: add NVMe FLR quirk to the SM951 SSD Robert Straw
@ 2021-04-30 20:51 ` Bjorn Helgaas
  2021-04-30 22:34   ` Robert Straw
  2021-05-15 17:20   ` Robert Straw
  0 siblings, 2 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2021-04-30 20:51 UTC (permalink / raw)
  To: Robert Straw; +Cc: bhelgaas, linux-pci, linux-kernel, Alex Williamson

[+cc Alex, author of ffb0863426eb]

Please make your subject line match ffb0863426eb ("PCI: Disable
Samsung SM961/PM961 NVMe before FLR")

On Thu, Apr 29, 2021 at 06:07:30PM -0500, Robert Straw wrote:
> The SM951/PM951, when used in conjunction with the vfio-pci driver and
> passed to a KVM guest, can exhibit the fatal state addressed by the
> existing `nvme_disable_and_flr` quirk. If the guest cleanly shuts down
> the SSD, and vfio-pci attempts an FLR to the device while it is in this 
> state, the nvme driver will fail when it attempts to bind to the device 
> after the FLR due to the frozen config area, e.g:
> 
>   nvme nvme2: frozen state error detected, reset controller
>   nvme nvme2: Removing after probe failure status: -12
> 
> By including this older model (Samsung 950 PRO) of the controller in the
> existing quirk: the device is able to be cleanly reset after being used
> by a KVM guest.

I don't see anything in the PCIe spec about software being required to
do something special before initiating an FLR, so I assume this is a
hardware defect in the Samsung 950 PRO?  Has Samsung published an
erratum or at least acknowledged it?

There's always the possibility that we are doing something wrong in
Linux *after* the FLR, e.g., not waiting long enough, not
reinitializing something correctly, etc.

> Signed-off-by: Robert Straw <drbawb@fatalsyntax.com>
> ---
>  drivers/pci/quirks.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 653660e3b..e339ca238 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3920,6 +3920,7 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
>  		reset_ivb_igd },
>  	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IVB_M2_VGA,
>  		reset_ivb_igd },
> +	{ PCI_VENDOR_ID_SAMSUNG, 0xa802, nvme_disable_and_flr },
>  	{ PCI_VENDOR_ID_SAMSUNG, 0xa804, nvme_disable_and_flr },
>  	{ PCI_VENDOR_ID_INTEL, 0x0953, delay_250ms_after_flr },
>  	{ PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pci: add NVMe FLR quirk to the SM951 SSD
  2021-04-30 20:51 ` Bjorn Helgaas
@ 2021-04-30 22:34   ` Robert Straw
  2021-05-15 17:20   ` Robert Straw
  1 sibling, 0 replies; 6+ messages in thread
From: Robert Straw @ 2021-04-30 22:34 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: bhelgaas, linux-pci, linux-kernel, Alex Williamson

On Fri Apr 30, 2021 at 3:51 PM CDT, Bjorn Helgaas wrote:
> Please make your subject line match ffb0863426eb ("PCI: Disable
> Samsung SM961/PM961 NVMe before FLR")

Understood, I will send a revision ASAP.

> There's always the possibility that we are doing something wrong in
> Linux *after* the FLR, e.g., not waiting long enough, not
> reinitializing something correctly, etc.

In my experience I was not able to get my particular drive to enter this
state while issuing various types of resets purely from the Linux host. 
The issue only appeared when I pass the device to a KVM guest *and allow
that guest to cleanly shut-down.* The last part is crucial: if the guest 
is forcibly powered off Linux was able to reset the drive just fine.

So I suspect the issue here is related to the interaction between
whatever state the guest leaves the NVMe drive in, and the Linux kernel's 
own reset code triggering some pathological behavior in the controller.

FWIW even a remove/rescan, with an interim suspend to RAM, was not
enough to unfreeze the controller. The only way I've found to get the
device back (apart from this patch) was a full reboot.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pci: add NVMe FLR quirk to the SM951 SSD
  2021-04-30 20:51 ` Bjorn Helgaas
  2021-04-30 22:34   ` Robert Straw
@ 2021-05-15 17:20   ` Robert Straw
  2021-05-19  8:44     ` Christoph Hellwig
  1 sibling, 1 reply; 6+ messages in thread
From: Robert Straw @ 2021-05-15 17:20 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: bhelgaas, linux-pci, linux-kernel, Alex Williamson

On Fri Apr 30, 2021 at 3:51 PM CDT, Bjorn Helgaas wrote:

> Please make your subject line match ffb0863426eb ("PCI: Disable
> Samsung SM961/PM961 NVMe before FLR")

I had done this in a V2 of this patch, but after some additional
research I'm thinking the behavior of this quirk might be in-line 
w/ the NVMe specification more generally, I'll elaborate more below.

> I don't see anything in the PCIe spec about software being required to
> do something special before initiating an FLR, so I assume this is a
> hardware defect in the Samsung 950 PRO? Has Samsung published an
> erratum or at least acknowledged it?
>
> There's always the possibility that we are doing something wrong in
> Linux *after* the FLR, e.g., not waiting long enough, not
> reinitializing something correctly, etc.

I did some dumping of registers both with and without this patch, and
determined the following to be true in my use-case:

1. My guest VM leaves the device in a state where SHN (shutdown
notification) is set to 0b01 (normal shutdown)

2. The guest also leaves CC.EN (controller enable) set to 0b1

3. vfio-pci attempts to issue an FLR while the device is in this state.


On page 40, sec 3.1.6 of the NVMe 1.1 spec, the documentation on SHST 
states the following:

> To start executing commands on the controller after a shutdown 
> operation (CSTS.SHST set to 10b), a reset (CC.EN cleared to ‘0’) 
> is required. If host software submits commands to the controller 
> without issuing a reset, the behavior is undefined.

In the case of the SM951/SM961 it appears the undefined behavior is that
they stop responding to attempts to change their configuration if you do
an FLR while the device is in this state. The reason this patch
resolved the issue I was seeing is because the toggle of the CC.EN flag 
puts the drive in a known-good state after the guest's shutdown 
notification.

Knowing this I would suspect we'd actually want to treat most NVMe
drives in this manner *if the kernel sees the SHN/SHST has been set
prior.* Perhaps other NVMe devices are more tolerant of not doing this?

~ robert straw

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pci: add NVMe FLR quirk to the SM951 SSD
  2021-05-15 17:20   ` Robert Straw
@ 2021-05-19  8:44     ` Christoph Hellwig
  2021-05-19 12:54       ` Robert Straw
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2021-05-19  8:44 UTC (permalink / raw)
  To: Robert Straw
  Cc: Bjorn Helgaas, bhelgaas, linux-pci, linux-kernel, Alex Williamson

On Sat, May 15, 2021 at 12:20:05PM -0500, Robert Straw wrote:
> On page 40, sec 3.1.6 of the NVMe 1.1 spec, the documentation on SHST 
> states the following:

While it doesn't matter here, NVMe 1.1 is very much out of data, being
a more than 8 year old specification.  The current version is 1.4b,
with NVMe 2.0 about to be released.

> Knowing this I would suspect we'd actually want to treat most NVMe
> drives in this manner *if the kernel sees the SHN/SHST has been set
> prior.* Perhaps other NVMe devices are more tolerant of not doing this?

No, we don't.  This is a bug particular to a specific implementation.
In fact the whole existing NVMe shutdown before reset quirk is rather
broken and dangerous, as it concurrently accesses the NVMe registers
with the actual driver, which could be trivially triggered through the
sysfs reset attribute.

I'd much rather quirk these broken Samsung drivers to not allow
assigning them to VFIO.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pci: add NVMe FLR quirk to the SM951 SSD
  2021-05-19  8:44     ` Christoph Hellwig
@ 2021-05-19 12:54       ` Robert Straw
  0 siblings, 0 replies; 6+ messages in thread
From: Robert Straw @ 2021-05-19 12:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, bhelgaas, linux-pci, linux-kernel, Alex Williamson

On Wed May 19, 2021 at 3:44 AM CDT, Christoph Hellwig wrote:
> On Sat, May 15, 2021 at 12:20:05PM -0500, Robert Straw wrote:
> While it doesn't matter here, NVMe 1.1 is very much out of data, being
> a more than 8 year old specification. The current version is 1.4b,
> with NVMe 2.0 about to be released.

I can't comment on 2.0, but yes 1.4b has the same aside regarding undefined
behavior on the SHST field (on p. 50). The only reason I was looking at
1.1a is because it's specifically listed on the datasheet for the SM951.
(The device under test.)

> No, we don't. This is a bug particular to a specific implementation.
> In fact the whole existing NVMe shutdown before reset quirk is rather
> broken and dangerous, as it concurrently accesses the NVMe registers
> with the actual driver, which could be trivially triggered through the
> sysfs reset attribute.

I'm not exactly clear in what way the nvme driver would  be racing against 
vfio-pci here. (a) vfio-pci is the driver bound in this scenario, and (b)
the vfio-pci driver triggers this quirk by issuing an FLR, which is done 
with the device locked. (e.g: vfio_pci.c:499.)

In my testing *without this patch* vfio-pci is still bound to the device 
for at least 60s after guest shutdown, at which point the FLR times out.
After this FLR the device is useless w/o a full reboot of the host. 
Rebinding it to *either* another guest w/ vfio-pci, or the Linux nvme 
driver doesn't matter: as the device can no longer be reconfigured.

As I understand it: vfio-pci should not blindly issue an FLR to an NVMe class 
device w/o obeying the protocol. The protocol seems clear that after a 
shutdown CC->EN must transition from 1 to 0. (I would argue the guest OS 
leaving the device in this state is the actual violation of the spec. As 
I'm unable to change that behavior: having vfio-pci clean up the mess w/ 
this quirk seems to be an adequate workaround.)

I am currently testing  a version of this patch that only disables the
controller if the device has been previously shutdown. I am trying to
gauge whether this would be preferable to just blanket-disabling these 
bugged devices before relinquishing control of them back to the host.

> I'd much rather quirk these broken Samsung drivers to not allow
> assigning them to VFIO.

I'd much rather keep using my storage devices. I will leave the 
quirk limited to these known-bugged devices.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-19 14:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-29 23:07 [PATCH] pci: add NVMe FLR quirk to the SM951 SSD Robert Straw
2021-04-30 20:51 ` Bjorn Helgaas
2021-04-30 22:34   ` Robert Straw
2021-05-15 17:20   ` Robert Straw
2021-05-19  8:44     ` Christoph Hellwig
2021-05-19 12:54       ` Robert Straw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).