All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Marcos Scriven <marcos@scriven.org>
Cc: linux-pci@vger.kernel.org, Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH] PCI: Avoid FLR for AMD Matisse HD Audio and USB Controllers
Date: Wed, 20 May 2020 18:29:54 -0500	[thread overview]
Message-ID: <20200520232954.GA1124908@bjorn-Precision-5520> (raw)
In-Reply-To: <CAAri2DoF-6A3qcag4etdWh3vQQUGqzfebw6syeU8HFeph5tWQw@mail.gmail.com>

On Wed, May 20, 2020 at 10:41:03AM +0100, Marcos Scriven wrote:
> On Mon, 18 May 2020 at 20:26, Marcos Scriven <marcos@scriven.org> wrote:
> >
> > On Mon, 18 May 2020 at 17:17, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > [+cc Alex]
> > >
> > > On Sat, May 16, 2020 at 02:37:23PM +0100, Marcos Scriven wrote:
> > > > This patch fixes an FLR bug on the following two devices:
> > > >
> > > > AMD Matisse HD Audio Controller 0x1487
> > > > AMD Matisse USB 3.0 Host Controller 0x149c
> > > >
> > > > As there was already such a quirk for an Intel network device, I have
> > > > renamed that method and updated the comments, trying to make it
> > > > clearer what the specific original devices that were affected are
> > > > (based on the commit message this was original done:
> > > > https://git.thm.de/mhnn55/eco32-linux-ba/commit/f65fd1aa4f9881d5540192d11f7b8ed2fec936db).
> > > >
> > > > I have ordered them by hex product ID.
> > > >
> > > > I have verified this works on a X570 I AORUS PRO WIFI (rev. 1.0) motherboard.
> > >
> > > If we avoid FLR, is there another method used to reset these devices
> > > between attachments to different VMs?  Does the lack of FLR mean we
> > > can leak information between VMs?
> > >
> > > Would additional delay after the FLR work around this, e.g., something
> > > like 51ba09452d11 ("PCI: Delay after FLR of Intel DC P3700 NVMe")?
> > >
> >
> > Thanks for looking at this patch Bjorn.
> >
> > To take your three points:
> >
> > 1. Certainly I can see those devices able to be passed back and forth
> > between host and guest multiple times, once this patch is applied.
> >
> > 2. I don't know the answer to that question; would appreciate guidance
> > on how to determine this. Do you mean perhaps some buffered data in
> > the USB controller, for instance?
> >
> > 3. I have not tried an additional delay. This is the logs I see when
> > the error is occurring:
> >
> > [ 2423.556570] vfio-pci 0000:0c:00.3: not ready 1023ms after FLR; waiting
> > [ 2425.604526] vfio-pci 0000:0c:00.3: not ready 2047ms after FLR; waiting
> > [ 2428.804509] vfio-pci 0000:0c:00.3: not ready 4095ms after FLR; waiting
> > [ 2433.924409] vfio-pci 0000:0c:00.3: not ready 8191ms after FLR; waiting
> > [ 2443.140721] vfio-pci 0000:0c:00.3: not ready 16383ms after FLR; waiting
> > [ 2461.571944] vfio-pci 0000:0c:00.3: not ready 32767ms after FLR; waiting
> > [ 2496.387544] vfio-pci 0000:0c:00.3: not ready 65535ms after FLR; giving up
> >
> > What makes this bug especially bad is the host never recovers, and
> > eventually hangs or crashes.
> >
> > For reference, the delay example you're talking about is:
> >
> > static int delay_250ms_after_flr(struct pci_dev *dev, int probe)
> > {
> > if (!pcie_has_flr(dev))
> > return -ENOTTY;
> >
> > if (probe)
> > return 0;
> >
> > pcie_flr(dev);
> >
> > msleep(250);
> >
> > return 0;
> > }
> >
> > I don't know if it would work, but I will try it out and report back.
> >
> > Marcos
> >
> >
> 
> Bjorn/Alex
> 
> I have just tried the alternate approach of adding a 250ms delay to
> the function level reset - this unfortunately results in the same
> broken behaviour, with the host itself never recovering.
> 
> [   76.905410] vfio-pci 0000:0d:00.3: not ready 1023ms after FLR; waiting
> [   79.018014] vfio-pci 0000:0d:00.3: not ready 2047ms after FLR; waiting
> [   82.089390] vfio-pci 0000:0d:00.3: not ready 4095ms after FLR; waiting
> [   87.209416] vfio-pci 0000:0d:00.3: not ready 8191ms after FLR; waiting
> [   96.425440] vfio-pci 0000:0d:00.3: not ready 16383ms after FLR; waiting
> [  114.615491] vfio-pci 0000:0d:00.3: not ready 32767ms after FLR; waiting
> [  149.417712] vfio-pci 0000:0d:00.3: not ready 65535ms after FLR; giving up
> 
> I also tried a full second, to no avail.
> 
> What would be the next step in proceeding with the original patch please?

Implementation of FLR is "strongly recommended" by the spec but is
optional.  So I don't see a problem with just avoiding it via your
patch.

I applied it to pci/virtualization for v5.8, thanks!

Bjorn

  reply	other threads:[~2020-05-20 23:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-16 13:37 [PATCH] PCI: Avoid FLR for AMD Matisse HD Audio and USB Controllers Marcos Scriven
2020-05-16 15:19 ` Marcos Scriven
2020-05-18 16:17 ` Bjorn Helgaas
2020-05-18 19:26   ` Marcos Scriven
2020-05-20  9:41     ` Marcos Scriven
2020-05-20 23:29       ` Bjorn Helgaas [this message]
2020-05-21  7:05         ` Marcos Scriven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200520232954.GA1124908@bjorn-Precision-5520 \
    --to=helgaas@kernel.org \
    --cc=alex.williamson@redhat.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=marcos@scriven.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.