linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Robert Straw" <drbawb@fatalsyntax.com>
To: "Bjorn Helgaas" <helgaas@kernel.org>
Cc: <bhelgaas@google.com>, <linux-pci@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>,
	"Alex Williamson" <alex.williamson@redhat.com>
Subject: Re: [PATCH] pci: add NVMe FLR quirk to the SM951 SSD
Date: Sat, 15 May 2021 12:20:05 -0500	[thread overview]
Message-ID: <CBDZPI1DXKMS.88UVUXVIGC5V@nagato> (raw)
In-Reply-To: <20210430205105.GA683965@bjorn-Precision-5520>

On Fri Apr 30, 2021 at 3:51 PM CDT, Bjorn Helgaas wrote:

> Please make your subject line match ffb0863426eb ("PCI: Disable
> Samsung SM961/PM961 NVMe before FLR")

I had done this in a V2 of this patch, but after some additional
research I'm thinking the behavior of this quirk might be in-line 
w/ the NVMe specification more generally, I'll elaborate more below.

> I don't see anything in the PCIe spec about software being required to
> do something special before initiating an FLR, so I assume this is a
> hardware defect in the Samsung 950 PRO? Has Samsung published an
> erratum or at least acknowledged it?
>
> There's always the possibility that we are doing something wrong in
> Linux *after* the FLR, e.g., not waiting long enough, not
> reinitializing something correctly, etc.

I did some dumping of registers both with and without this patch, and
determined the following to be true in my use-case:

1. My guest VM leaves the device in a state where SHN (shutdown
notification) is set to 0b01 (normal shutdown)

2. The guest also leaves CC.EN (controller enable) set to 0b1

3. vfio-pci attempts to issue an FLR while the device is in this state.


On page 40, sec 3.1.6 of the NVMe 1.1 spec, the documentation on SHST 
states the following:

> To start executing commands on the controller after a shutdown 
> operation (CSTS.SHST set to 10b), a reset (CC.EN cleared to ‘0’) 
> is required. If host software submits commands to the controller 
> without issuing a reset, the behavior is undefined.

In the case of the SM951/SM961 it appears the undefined behavior is that
they stop responding to attempts to change their configuration if you do
an FLR while the device is in this state. The reason this patch
resolved the issue I was seeing is because the toggle of the CC.EN flag 
puts the drive in a known-good state after the guest's shutdown 
notification.

Knowing this I would suspect we'd actually want to treat most NVMe
drives in this manner *if the kernel sees the SHN/SHST has been set
prior.* Perhaps other NVMe devices are more tolerant of not doing this?

~ robert straw

  parent reply	other threads:[~2021-05-15 17:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29 23:07 [PATCH] pci: add NVMe FLR quirk to the SM951 SSD Robert Straw
2021-04-30 20:51 ` Bjorn Helgaas
2021-04-30 22:34   ` Robert Straw
2021-05-15 17:20   ` Robert Straw [this message]
2021-05-19  8:44     ` Christoph Hellwig
2021-05-19 12:54       ` Robert Straw

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CBDZPI1DXKMS.88UVUXVIGC5V@nagato \
    --to=drbawb@fatalsyntax.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).