linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
       [not found] <CAAri2DpQnrGH5bnjC==W+HmnD4XMh8gcp9u-_LQ=K-jtrdHwAg@mail.gmail.com>
@ 2020-07-13 22:14 ` Bjorn Helgaas
  2020-07-13 22:48   ` Deucher, Alexander
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2020-07-13 22:14 UTC (permalink / raw)
  To: Marcos Scriven
  Cc: Shah, Nehal-bakulchandra, Deucher, Alexander, Kevin Buettner,
	linux-pci, Bjorn Helgaas, Alex Williamson, Koenig, Christian

On Mon, Jul 13, 2020 at 01:44:44PM +0100, Marcos Scriven wrote:
> On Thu, 25 Jun 2020 at 11:22, Marcos Scriven <marcos@scriven.org> wrote:
> > On Tue, 9 Jun 2020 at 12:47, Shah, Nehal-bakulchandra
> > <nehal-bakulchandra.shah@amd.com> wrote:
> > > On 6/8/2020 11:17 PM, Marcos Scriven wrote:
> > > > On Thu, 28 May 2020 at 09:12, Marcos Scriven <marcos@scriven.org>
> > wrote:
> > > >> On Wed, 27 May 2020 at 22:42, Deucher, Alexander
> > > >> <Alexander.Deucher@amd.com> wrote:
> > > >>>> -----Original Message-----
> > > >>>> From: Bjorn Helgaas <helgaas@kernel.org>
> > > >>>>
> > > >>>> [+cc Alex D, Christian -- do you guys have any contacts or insight
> > into why we
> > > >>>> suddenly have three new AMD devices that advertise FLR support but
> > it
> > > >>>> doesn't work?  Are we doing something wrong in Linux, or are these
> > devices
> > > >>>> defective?
> > > >>> +Nehal who handles our USB drivers.
> > > >>>
> > > >>> Nehal any ideas about FLR or whether it should be advertised?
> > > >>>
> > > Sorry for the delay. We are looking into this with BIOS team. I shall
> > revert soon on this.
> >
> > Sorry to keep pestering about this, but wondering if there's any
> > movement on this?
> >
> > Is it something that's likely to be fixed and actually rolled out by
> > motherboard manufacturers?
> >
> > There's been some grumblings in the community about adding workarounds
> > rather than fixing, so it would be good to pass on expectations here.
> 
> Any word on this please? Would be keen to know if the BIOS can be fixed,
> and this workaround can eventually be dropped.

Just to clarify what the possible outcomes are:

  1) If these AMD devices are defective, but future ones are fixed, we
  keep the quirk.

  2) If these AMD devices are defective *and* future ones are also
  defective, we keep the quirk and keep adding device IDs to it.

  3) If the BIOS is defective, we keep the quirk.  If anybody cares
  about FLR enough, they can make the quirk smart enough to identify
  fixed BIOS versions and enable FLR.  

  4) If Linux is defective, we can fix Linux and drop the quirk.

The ideal outcome would be 4), but we don't have any indication that
Linux is doing something wrong.

What we're really trying to avoid is 2) because that means new devices
will break Linux until somebody figures out the problem again, updates
the quirk, and gets the update into distro kernels.

In case 3), we don't drop the quirk because that forces people to
upgrade their BIOS, and most people will not.  We can't drop the
quirk, reintroduce the problem on old BIOSes, and hide behind the
excuse of "you need to upgrade the BIOS."  That wastes the user's time
and our time.

> > > >>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > > >>>>> 43a0c2ce635e..b1db58d00d2b 100644
> > > >>>>> --- a/drivers/pci/quirks.c
> > > >>>>> +++ b/drivers/pci/quirks.c
> > > >>>>> @@ -5133,6 +5133,7 @@
> > > >>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
> > > >>>> quirk_intel_qat_vf_cap);
> > > >>>>>   * FLR may cause the following to devices to hang:
> > > >>>>>   *
> > > >>>>>   * AMD Starship/Matisse HD Audio Controller 0x1487
> > > >>>>> + * AMD Starship USB 3.0 Host Controller 0x148c
> > > >>>>>   * AMD Matisse USB 3.0 Host Controller 0x149c
> > > >>>>>   * Intel 82579LM Gigabit Ethernet Controller 0x1502
> > > >>>>>   * Intel 82579V Gigabit Ethernet Controller 0x1503 @@ -5143,6
> > +5144,7
> > > >>>>> @@ static void quirk_no_flr(struct pci_dev *dev)
> > > >>>>>     dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;  }
> > > >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
> > > >>>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c,
> > > >>>> quirk_no_flr);
> > > >>>>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c,
> > > >>>> quirk_no_flr);
> > > >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
> > > >>>> quirk_no_flr);
> > > >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
> > > >>>> quirk_no_flr);
> > >
> > > Regard
> > >
> > > Nehal Shah
> > >
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
  2020-07-13 22:14 ` [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0 Bjorn Helgaas
@ 2020-07-13 22:48   ` Deucher, Alexander
  2020-08-14  8:52     ` Marcos Scriven
  0 siblings, 1 reply; 11+ messages in thread
From: Deucher, Alexander @ 2020-07-13 22:48 UTC (permalink / raw)
  To: Bjorn Helgaas, Marcos Scriven
  Cc: Shah, Nehal-bakulchandra, Kevin Buettner, linux-pci,
	Bjorn Helgaas, Alex Williamson, Koenig, Christian

[AMD Public Use]

> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Monday, July 13, 2020 6:15 PM
> To: Marcos Scriven <marcos@scriven.org>
> Cc: Shah, Nehal-bakulchandra <Nehal-bakulchandra.Shah@amd.com>;
> Deucher, Alexander <Alexander.Deucher@amd.com>; Kevin Buettner
> <kevinb@redhat.com>; linux-pci@vger.kernel.org; Bjorn Helgaas
> <bhelgaas@google.com>; Alex Williamson <alex.williamson@redhat.com>;
> Koenig, Christian <Christian.Koenig@amd.com>
> Subject: Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
> 
> On Mon, Jul 13, 2020 at 01:44:44PM +0100, Marcos Scriven wrote:
> > On Thu, 25 Jun 2020 at 11:22, Marcos Scriven <marcos@scriven.org> wrote:
> > > On Tue, 9 Jun 2020 at 12:47, Shah, Nehal-bakulchandra
> > > <nehal-bakulchandra.shah@amd.com> wrote:
> > > > On 6/8/2020 11:17 PM, Marcos Scriven wrote:
> > > > > On Thu, 28 May 2020 at 09:12, Marcos Scriven
> > > > > <marcos@scriven.org>
> > > wrote:
> > > > >> On Wed, 27 May 2020 at 22:42, Deucher, Alexander
> > > > >> <Alexander.Deucher@amd.com> wrote:
> > > > >>>> -----Original Message-----
> > > > >>>> From: Bjorn Helgaas <helgaas@kernel.org>
> > > > >>>>
> > > > >>>> [+cc Alex D, Christian -- do you guys have any contacts or
> > > > >>>> insight
> > > into why we
> > > > >>>> suddenly have three new AMD devices that advertise FLR
> > > > >>>> support but
> > > it
> > > > >>>> doesn't work?  Are we doing something wrong in Linux, or are
> > > > >>>> these
> > > devices
> > > > >>>> defective?
> > > > >>> +Nehal who handles our USB drivers.
> > > > >>>
> > > > >>> Nehal any ideas about FLR or whether it should be advertised?
> > > > >>>
> > > > Sorry for the delay. We are looking into this with BIOS team. I
> > > > shall
> > > revert soon on this.
> > >
> > > Sorry to keep pestering about this, but wondering if there's any
> > > movement on this?
> > >
> > > Is it something that's likely to be fixed and actually rolled out by
> > > motherboard manufacturers?
> > >
> > > There's been some grumblings in the community about adding
> > > workarounds rather than fixing, so it would be good to pass on
> expectations here.
> >
> > Any word on this please? Would be keen to know if the BIOS can be
> > fixed, and this workaround can eventually be dropped.
> 
> Just to clarify what the possible outcomes are:
> 
>   1) If these AMD devices are defective, but future ones are fixed, we
>   keep the quirk.
> 
>   2) If these AMD devices are defective *and* future ones are also
>   defective, we keep the quirk and keep adding device IDs to it.
> 
>   3) If the BIOS is defective, we keep the quirk.  If anybody cares
>   about FLR enough, they can make the quirk smart enough to identify
>   fixed BIOS versions and enable FLR.
> 
>   4) If Linux is defective, we can fix Linux and drop the quirk.
> 
> The ideal outcome would be 4), but we don't have any indication that Linux is
> doing something wrong.
> 
> What we're really trying to avoid is 2) because that means new devices will
> break Linux until somebody figures out the problem again, updates the quirk,
> and gets the update into distro kernels.
> 
> In case 3), we don't drop the quirk because that forces people to upgrade
> their BIOS, and most people will not.  We can't drop the quirk, reintroduce
> the problem on old BIOSes, and hide behind the excuse of "you need to
> upgrade the BIOS."  That wastes the user's time and our time.
> 

Understood.  Just trying to find the right people internally to understand what has been validated and productized with respect to FLR on various peripherals.

Alex

> > > > >>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > >>>>> index 43a0c2ce635e..b1db58d00d2b 100644
> > > > >>>>> --- a/drivers/pci/quirks.c
> > > > >>>>> +++ b/drivers/pci/quirks.c
> > > > >>>>> @@ -5133,6 +5133,7 @@
> > > > >>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
> > > > >>>> quirk_intel_qat_vf_cap);
> > > > >>>>>   * FLR may cause the following to devices to hang:
> > > > >>>>>   *
> > > > >>>>>   * AMD Starship/Matisse HD Audio Controller 0x1487
> > > > >>>>> + * AMD Starship USB 3.0 Host Controller 0x148c
> > > > >>>>>   * AMD Matisse USB 3.0 Host Controller 0x149c
> > > > >>>>>   * Intel 82579LM Gigabit Ethernet Controller 0x1502
> > > > >>>>>   * Intel 82579V Gigabit Ethernet Controller 0x1503 @@
> > > > >>>>> -5143,6
> > > +5144,7
> > > > >>>>> @@ static void quirk_no_flr(struct pci_dev *dev)
> > > > >>>>>     dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;  }
> > > > >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487,
> > > > >>>>> quirk_no_flr);
> > > > >>>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c,
> > > > >>>> quirk_no_flr);
> > > > >>>>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c,
> > > > >>>> quirk_no_flr);
> > > > >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
> > > > >>>> quirk_no_flr);
> > > > >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
> > > > >>>> quirk_no_flr);
> > > >
> > > > Regard
> > > >
> > > > Nehal Shah
> > > >
> > >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
  2020-07-13 22:48   ` Deucher, Alexander
@ 2020-08-14  8:52     ` Marcos Scriven
  2020-08-14 14:46       ` Deucher, Alexander
  0 siblings, 1 reply; 11+ messages in thread
From: Marcos Scriven @ 2020-08-14  8:52 UTC (permalink / raw)
  To: Deucher, Alexander
  Cc: Bjorn Helgaas, Shah, Nehal-bakulchandra, Kevin Buettner,
	linux-pci, Bjorn Helgaas, Alex Williamson, Koenig, Christian

On Mon, 13 Jul 2020 at 23:48, Deucher, Alexander
<Alexander.Deucher@amd.com> wrote:
>
> [AMD Public Use]
>
> > -----Original Message-----
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: Monday, July 13, 2020 6:15 PM
> > To: Marcos Scriven <marcos@scriven.org>
> > Cc: Shah, Nehal-bakulchandra <Nehal-bakulchandra.Shah@amd.com>;
> > Deucher, Alexander <Alexander.Deucher@amd.com>; Kevin Buettner
> > <kevinb@redhat.com>; linux-pci@vger.kernel.org; Bjorn Helgaas
> > <bhelgaas@google.com>; Alex Williamson <alex.williamson@redhat.com>;
> > Koenig, Christian <Christian.Koenig@amd.com>
> > Subject: Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
> >
> > On Mon, Jul 13, 2020 at 01:44:44PM +0100, Marcos Scriven wrote:
> > > On Thu, 25 Jun 2020 at 11:22, Marcos Scriven <marcos@scriven.org> wrote:
> > > > On Tue, 9 Jun 2020 at 12:47, Shah, Nehal-bakulchandra
> > > > <nehal-bakulchandra.shah@amd.com> wrote:
> > > > > On 6/8/2020 11:17 PM, Marcos Scriven wrote:
> > > > > > On Thu, 28 May 2020 at 09:12, Marcos Scriven
> > > > > > <marcos@scriven.org>
> > > > wrote:
> > > > > >> On Wed, 27 May 2020 at 22:42, Deucher, Alexander
> > > > > >> <Alexander.Deucher@amd.com> wrote:
> > > > > >>>> -----Original Message-----
> > > > > >>>> From: Bjorn Helgaas <helgaas@kernel.org>
> > > > > >>>>
> > > > > >>>> [+cc Alex D, Christian -- do you guys have any contacts or
> > > > > >>>> insight
> > > > into why we
> > > > > >>>> suddenly have three new AMD devices that advertise FLR
> > > > > >>>> support but
> > > > it
> > > > > >>>> doesn't work?  Are we doing something wrong in Linux, or are
> > > > > >>>> these
> > > > devices
> > > > > >>>> defective?
> > > > > >>> +Nehal who handles our USB drivers.
> > > > > >>>
> > > > > >>> Nehal any ideas about FLR or whether it should be advertised?
> > > > > >>>
> > > > > Sorry for the delay. We are looking into this with BIOS team. I
> > > > > shall
> > > > revert soon on this.
> > > >
> > > > Sorry to keep pestering about this, but wondering if there's any
> > > > movement on this?
> > > >
> > > > Is it something that's likely to be fixed and actually rolled out by
> > > > motherboard manufacturers?
> > > >
> > > > There's been some grumblings in the community about adding
> > > > workarounds rather than fixing, so it would be good to pass on
> > expectations here.
> > >
> > > Any word on this please? Would be keen to know if the BIOS can be
> > > fixed, and this workaround can eventually be dropped.
> >
> > Just to clarify what the possible outcomes are:
> >
> >   1) If these AMD devices are defective, but future ones are fixed, we
> >   keep the quirk.
> >
> >   2) If these AMD devices are defective *and* future ones are also
> >   defective, we keep the quirk and keep adding device IDs to it.
> >
> >   3) If the BIOS is defective, we keep the quirk.  If anybody cares
> >   about FLR enough, they can make the quirk smart enough to identify
> >   fixed BIOS versions and enable FLR.
> >
> >   4) If Linux is defective, we can fix Linux and drop the quirk.
> >
> > The ideal outcome would be 4), but we don't have any indication that Linux is
> > doing something wrong.
> >
> > What we're really trying to avoid is 2) because that means new devices will
> > break Linux until somebody figures out the problem again, updates the quirk,
> > and gets the update into distro kernels.
> >
> > In case 3), we don't drop the quirk because that forces people to upgrade
> > their BIOS, and most people will not.  We can't drop the quirk, reintroduce
> > the problem on old BIOSes, and hide behind the excuse of "you need to
> > upgrade the BIOS."  That wastes the user's time and our time.
> >
>
> Understood.  Just trying to find the right people internally to understand what has been validated and productized with respect to FLR on various peripherals.
>
> Alex
>

Hi Alex

Sorry to keep bugging - wondering if you'd had any success finding the
people internally to look at this?

My main personal concern is that I faced some criticism from users in
submitting the quirk, as people felt that took the pressure off AMD to
fix.

Thanks

Marcos

> > > > > >>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > > >>>>> index 43a0c2ce635e..b1db58d00d2b 100644
> > > > > >>>>> --- a/drivers/pci/quirks.c
> > > > > >>>>> +++ b/drivers/pci/quirks.c
> > > > > >>>>> @@ -5133,6 +5133,7 @@
> > > > > >>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
> > > > > >>>> quirk_intel_qat_vf_cap);
> > > > > >>>>>   * FLR may cause the following to devices to hang:
> > > > > >>>>>   *
> > > > > >>>>>   * AMD Starship/Matisse HD Audio Controller 0x1487
> > > > > >>>>> + * AMD Starship USB 3.0 Host Controller 0x148c
> > > > > >>>>>   * AMD Matisse USB 3.0 Host Controller 0x149c
> > > > > >>>>>   * Intel 82579LM Gigabit Ethernet Controller 0x1502
> > > > > >>>>>   * Intel 82579V Gigabit Ethernet Controller 0x1503 @@
> > > > > >>>>> -5143,6
> > > > +5144,7
> > > > > >>>>> @@ static void quirk_no_flr(struct pci_dev *dev)
> > > > > >>>>>     dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;  }
> > > > > >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487,
> > > > > >>>>> quirk_no_flr);
> > > > > >>>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c,
> > > > > >>>> quirk_no_flr);
> > > > > >>>>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c,
> > > > > >>>> quirk_no_flr);
> > > > > >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
> > > > > >>>> quirk_no_flr);
> > > > > >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
> > > > > >>>> quirk_no_flr);
> > > > >
> > > > > Regard
> > > > >
> > > > > Nehal Shah
> > > > >
> > > >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
  2020-08-14  8:52     ` Marcos Scriven
@ 2020-08-14 14:46       ` Deucher, Alexander
  0 siblings, 0 replies; 11+ messages in thread
From: Deucher, Alexander @ 2020-08-14 14:46 UTC (permalink / raw)
  To: Marcos Scriven
  Cc: Bjorn Helgaas, Shah, Nehal-bakulchandra, Kevin Buettner,
	linux-pci, Bjorn Helgaas, Alex Williamson, Koenig, Christian

[AMD Public Use]

> -----Original Message-----
> From: Marcos Scriven <marcos@scriven.org>
> Sent: Friday, August 14, 2020 4:53 AM
> To: Deucher, Alexander <Alexander.Deucher@amd.com>
> Cc: Bjorn Helgaas <helgaas@kernel.org>; Shah, Nehal-bakulchandra <Nehal-
> bakulchandra.Shah@amd.com>; Kevin Buettner <kevinb@redhat.com>;
> linux-pci@vger.kernel.org; Bjorn Helgaas <bhelgaas@google.com>; Alex
> Williamson <alex.williamson@redhat.com>; Koenig, Christian
> <Christian.Koenig@amd.com>
> Subject: Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
> 
> On Mon, 13 Jul 2020 at 23:48, Deucher, Alexander
> <Alexander.Deucher@amd.com> wrote:
> >
> > [AMD Public Use]
> >
> > > -----Original Message-----
> > > From: Bjorn Helgaas <helgaas@kernel.org>
> > > Sent: Monday, July 13, 2020 6:15 PM
> > > To: Marcos Scriven <marcos@scriven.org>
> > > Cc: Shah, Nehal-bakulchandra <Nehal-bakulchandra.Shah@amd.com>;
> > > Deucher, Alexander <Alexander.Deucher@amd.com>; Kevin Buettner
> > > <kevinb@redhat.com>; linux-pci@vger.kernel.org; Bjorn Helgaas
> > > <bhelgaas@google.com>; Alex Williamson
> <alex.williamson@redhat.com>;
> > > Koenig, Christian <Christian.Koenig@amd.com>
> > > Subject: Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
> > >
> > > On Mon, Jul 13, 2020 at 01:44:44PM +0100, Marcos Scriven wrote:
> > > > On Thu, 25 Jun 2020 at 11:22, Marcos Scriven <marcos@scriven.org>
> wrote:
> > > > > On Tue, 9 Jun 2020 at 12:47, Shah, Nehal-bakulchandra
> > > > > <nehal-bakulchandra.shah@amd.com> wrote:
> > > > > > On 6/8/2020 11:17 PM, Marcos Scriven wrote:
> > > > > > > On Thu, 28 May 2020 at 09:12, Marcos Scriven
> > > > > > > <marcos@scriven.org>
> > > > > wrote:
> > > > > > >> On Wed, 27 May 2020 at 22:42, Deucher, Alexander
> > > > > > >> <Alexander.Deucher@amd.com> wrote:
> > > > > > >>>> -----Original Message-----
> > > > > > >>>> From: Bjorn Helgaas <helgaas@kernel.org>
> > > > > > >>>>
> > > > > > >>>> [+cc Alex D, Christian -- do you guys have any contacts
> > > > > > >>>> or insight
> > > > > into why we
> > > > > > >>>> suddenly have three new AMD devices that advertise FLR
> > > > > > >>>> support but
> > > > > it
> > > > > > >>>> doesn't work?  Are we doing something wrong in Linux, or
> > > > > > >>>> are these
> > > > > devices
> > > > > > >>>> defective?
> > > > > > >>> +Nehal who handles our USB drivers.
> > > > > > >>>
> > > > > > >>> Nehal any ideas about FLR or whether it should be advertised?
> > > > > > >>>
> > > > > > Sorry for the delay. We are looking into this with BIOS team.
> > > > > > I shall
> > > > > revert soon on this.
> > > > >
> > > > > Sorry to keep pestering about this, but wondering if there's any
> > > > > movement on this?
> > > > >
> > > > > Is it something that's likely to be fixed and actually rolled
> > > > > out by motherboard manufacturers?
> > > > >
> > > > > There's been some grumblings in the community about adding
> > > > > workarounds rather than fixing, so it would be good to pass on
> > > expectations here.
> > > >
> > > > Any word on this please? Would be keen to know if the BIOS can be
> > > > fixed, and this workaround can eventually be dropped.
> > >
> > > Just to clarify what the possible outcomes are:
> > >
> > >   1) If these AMD devices are defective, but future ones are fixed, we
> > >   keep the quirk.
> > >
> > >   2) If these AMD devices are defective *and* future ones are also
> > >   defective, we keep the quirk and keep adding device IDs to it.
> > >
> > >   3) If the BIOS is defective, we keep the quirk.  If anybody cares
> > >   about FLR enough, they can make the quirk smart enough to identify
> > >   fixed BIOS versions and enable FLR.
> > >
> > >   4) If Linux is defective, we can fix Linux and drop the quirk.
> > >
> > > The ideal outcome would be 4), but we don't have any indication that
> > > Linux is doing something wrong.
> > >
> > > What we're really trying to avoid is 2) because that means new
> > > devices will break Linux until somebody figures out the problem
> > > again, updates the quirk, and gets the update into distro kernels.
> > >
> > > In case 3), we don't drop the quirk because that forces people to
> > > upgrade their BIOS, and most people will not.  We can't drop the
> > > quirk, reintroduce the problem on old BIOSes, and hide behind the
> > > excuse of "you need to upgrade the BIOS."  That wastes the user's time
> and our time.
> > >
> >
> > Understood.  Just trying to find the right people internally to understand
> what has been validated and productized with respect to FLR on various
> peripherals.
> >
> > Alex
> >
> 
> Hi Alex
> 
> Sorry to keep bugging - wondering if you'd had any success finding the
> people internally to look at this?

Sorry, I have not.

> 
> My main personal concern is that I faced some criticism from users in
> submitting the quirk, as people felt that took the pressure off AMD to fix.

There is hardware out there that apparently needs the quirk so even if it were a bios issue or something like that, that doesn't help the hardware that is already out in the wild.  If there is ultimately some programming fix, we can always revert the patch once that is available.  If FLR is actually broken, then there is nothing to fix, the quirk is correct.

Alex

> 
> Thanks
> 
> Marcos
> 
> > > > > > >>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > > > >>>>> index 43a0c2ce635e..b1db58d00d2b 100644
> > > > > > >>>>> --- a/drivers/pci/quirks.c
> > > > > > >>>>> +++ b/drivers/pci/quirks.c
> > > > > > >>>>> @@ -5133,6 +5133,7 @@
> > > > > > >>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
> > > > > > >>>> quirk_intel_qat_vf_cap);
> > > > > > >>>>>   * FLR may cause the following to devices to hang:
> > > > > > >>>>>   *
> > > > > > >>>>>   * AMD Starship/Matisse HD Audio Controller 0x1487
> > > > > > >>>>> + * AMD Starship USB 3.0 Host Controller 0x148c
> > > > > > >>>>>   * AMD Matisse USB 3.0 Host Controller 0x149c
> > > > > > >>>>>   * Intel 82579LM Gigabit Ethernet Controller 0x1502
> > > > > > >>>>>   * Intel 82579V Gigabit Ethernet Controller 0x1503 @@
> > > > > > >>>>> -5143,6
> > > > > +5144,7
> > > > > > >>>>> @@ static void quirk_no_flr(struct pci_dev *dev)
> > > > > > >>>>>     dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;  }
> > > > > > >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487,
> > > > > > >>>>> quirk_no_flr);
> > > > > > >>>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD,
> 0x148c,
> > > > > > >>>> quirk_no_flr);
> > > > > > >>>>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD,
> 0x149c,
> > > > > > >>>> quirk_no_flr);
> > > > > > >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL,
> 0x1502,
> > > > > > >>>> quirk_no_flr);
> > > > > > >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL,
> 0x1503,
> > > > > > >>>> quirk_no_flr);
> > > > > >
> > > > > > Regard
> > > > > >
> > > > > > Nehal Shah
> > > > > >
> > > > >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
  2020-06-09 11:47         ` Shah, Nehal-bakulchandra
@ 2020-06-25 10:22           ` Marcos Scriven
  0 siblings, 0 replies; 11+ messages in thread
From: Marcos Scriven @ 2020-06-25 10:22 UTC (permalink / raw)
  To: Shah, Nehal-bakulchandra
  Cc: Deucher, Alexander, Bjorn Helgaas, Kevin Buettner, linux-pci,
	Bjorn Helgaas, Alex Williamson, Koenig, Christian

On Tue, 9 Jun 2020 at 12:47, Shah, Nehal-bakulchandra
<nehal-bakulchandra.shah@amd.com> wrote:
>
> Hi
>
> On 6/8/2020 11:17 PM, Marcos Scriven wrote:
> > On Thu, 28 May 2020 at 09:12, Marcos Scriven <marcos@scriven.org> wrote:
> >> On Wed, 27 May 2020 at 22:42, Deucher, Alexander
> >> <Alexander.Deucher@amd.com> wrote:
> >>> [AMD Official Use Only - Internal Distribution Only]
> >>>
> >>>> -----Original Message-----
> >>>> From: Bjorn Helgaas <helgaas@kernel.org>
> >>>> Sent: Wednesday, May 27, 2020 5:32 PM
> >>>> To: Kevin Buettner <kevinb@redhat.com>
> >>>> Cc: linux-pci@vger.kernel.org; Bjorn Helgaas <bhelgaas@google.com>; Alex
> >>>> Williamson <alex.williamson@redhat.com>; Deucher, Alexander
> >>>> <Alexander.Deucher@amd.com>; Koenig, Christian
> >>>> <Christian.Koenig@amd.com>
> >>>> Subject: Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
> >>>>
> >>>> [+cc Alex D, Christian -- do you guys have any contacts or insight into why we
> >>>> suddenly have three new AMD devices that advertise FLR support but it
> >>>> doesn't work?  Are we doing something wrong in Linux, or are these devices
> >>>> defective?
> >>> +Nehal who handles our USB drivers.
> >>>
> >>> Nehal any ideas about FLR or whether it should be advertised?
> >>>
> >>> Alex
> >>>
> Sorry for the delay. We are looking into this with BIOS team. I shall revert soon on this.
>
>

Hi Nehal

Sorry to keep pestering about this, but wondering if there's any
movement on this?

Is it something that's likely to be fixed and actually rolled out by
motherboard manufacturers?

There's been some grumblings in the community about adding workarounds
rather than fixing, so it would be good to pass on expectations here.

Marcos

> >> I had read somewhere that the IO die in the Ryzen/Threadripper
> >> packages are identical to the ones used in the motherboard chipsets.
> >>
> >> Since the latter do reset ok, it would seem a BIOS update of the AGESA
> >> may potentially fix the issue.
> >>
> >> Unfortunately, it's not something motherboard manufacturer's customer
> >> support people know how to deal with or pass back up the chain to AMD
> >> engineers. Actual use of this feature seems to be fairly niche.
> >>
> >> After I added the workaround for the USB and audio controllers on the
> >> 3rd-gen Ryzen, I tried contacting Kim Phillips (who I found as a
> >> kernel committer to x86/cpu/amd), but haven't heard back.
> >>
> >> It would be wonderful to know if this can potentially be fixed in CPU
> >> firmware, and whether there's any likelihood of it actually being
> >> distributed by motherboard manufacturers.
> >>
> >> Marcos
> >>
> >>
> >>
> > Dear Alex/Nehal
> >
> > I wonder if you're able to comment please on whether FLR should be advertised?
> >
> > Is there any chance this could be fixed at the bios/AGESA level, and
> > effectively rolled out?
> >
> > Thanks
> >
> > Marcos
> >
> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> >>>> kernel.org%2Fr%2F20200524003529.598434ff%40f31-
> >>>> 4.lan&amp;data=02%7C01%7Calexander.deucher%40amd.com%7Ccb77b56b
> >>>> 62ae47f60f8808d802855759%7C3dd8961fe4884e608e11a82d994e183d%7C0%
> >>>> 7C0%7C637262119015438912&amp;sdata=3z%2Btn%2Bv2pvUl3X0Tzk%2BLoi
> >>>> Mk06dLZCmgUOrsGf3kLpY%3D&amp;reserved=0
> >>>>   AMD Starship USB 3.0 host controller
> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> >>>> kernel.org%2Fr%2FCAAri2DpkcuQZYbT6XsALhx2e6vRqPHwtbjHYeiH7MNp4z
> >>>> mt1RA%40mail.gmail.com&amp;data=02%7C01%7Calexander.deucher%40a
> >>>> md.com%7Ccb77b56b62ae47f60f8808d802855759%7C3dd8961fe4884e608e11
> >>>> a82d994e183d%7C0%7C0%7C637262119015438912&amp;sdata=69GsHB0HCp
> >>>> 6x0xW0tA%2FrAln0Vy0Yc9I8QSHowebdIxI%3D&amp;reserved=0
> >>>>   AMD Matisse HD Audio & USB 3.0 host controller ]
> >>>>
> >>>> On Sun, May 24, 2020 at 12:35:29AM -0700, Kevin Buettner wrote:
> >>>>> This commit adds an entry to the quirk_no_flr table for the AMD
> >>>>> Starship USB 3.0 host controller.
> >>>>>
> >>>>> Tested on a Micro-Star International Co., Ltd. MS-7C59/Creator TRX40
> >>>>> motherboard with an AMD Ryzen Threadripper 3970X.
> >>>>>
> >>>>> Without this patch, when attempting to assign (pass through) an AMD
> >>>>> Starship USB 3.0 host controller to a guest OS, the system becomes
> >>>>> increasingly unresponsive over the course of several minutes,
> >>>>> eventually requiring a hard reset.
> >>>>>
> >>>>> Shortly after attempting to start the guest, I see these messages:
> >>>>>
> >>>>> May 23 22:59:46 mesquite kernel: vfio-pci 0000:05:00.3: not ready
> >>>>> 1023ms after FLR; waiting May 23 22:59:48 mesquite kernel: vfio-pci
> >>>>> 0000:05:00.3: not ready 2047ms after FLR; waiting May 23 22:59:51
> >>>>> mesquite kernel: vfio-pci 0000:05:00.3: not ready 4095ms after FLR;
> >>>>> waiting May 23 22:59:56 mesquite kernel: vfio-pci 0000:05:00.3: not
> >>>>> ready 8191ms after FLR; waiting
> >>>>>
> >>>>> And then eventually:
> >>>>>
> >>>>> May 23 23:01:00 mesquite kernel: vfio-pci 0000:05:00.3: not ready
> >>>>> 65535ms after FLR; giving up May 23 23:01:05 mesquite kernel: INFO:
> >>>>> NMI handler (perf_event_nmi_handler) took too long to run: 0.000 msecs
> >>>>> May 23 23:01:06 mesquite kernel: perf: interrupt took too long (642744
> >>>>>> 2500), lowering kernel.perf_event_max_sample_rate to 1000 May 23
> >>>>> 23:01:07 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler)
> >>>>> took too long to run: 82.270 msecs May 23 23:01:08 mesquite kernel: INFO:
> >>>> NMI handler (perf_event_nmi_handler) took too long to run: 680.608 msecs
> >>>> May 23 23:01:08 mesquite kernel: INFO: NMI handler
> >>>> (perf_event_nmi_handler) took too long to run: 100.952 msecs ...
> >>>>>  kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s!
> >>>>> [qemu-system-x86:7487] May 23 23:01:25 mesquite kernel: watchdog:
> >>>> BUG:
> >>>>> soft lockup - CPU#3 stuck for 22s! [qemu-system-x86:7487]
> >>>>>
> >>>>> The above log snippets were obtained using the aforementioned hardware
> >>>>> running Fedora 32 w/ kernel package kernel-5.6.13-300.fc32.x86_64.  My
> >>>>> fix was applied to a local copy of the F32 kernel package, then
> >>>>> rebuilt, etc.
> >>>>>
> >>>>> With this patch in place, the host kernel doesn't exhibit these
> >>>>> problems.  The guest OS (also Fedora 32) starts up and works as
> >>>>> expected with the passed-through USB host controller.
> >>>>>
> >>>>> Signed-off-by: Kevin Buettner <kevinb@redhat.com>
> >>>> Applied to pci/virtualization for v5.8, thanks!
> >>>>
> >>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> >>>>> 43a0c2ce635e..b1db58d00d2b 100644
> >>>>> --- a/drivers/pci/quirks.c
> >>>>> +++ b/drivers/pci/quirks.c
> >>>>> @@ -5133,6 +5133,7 @@
> >>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
> >>>> quirk_intel_qat_vf_cap);
> >>>>>   * FLR may cause the following to devices to hang:
> >>>>>   *
> >>>>>   * AMD Starship/Matisse HD Audio Controller 0x1487
> >>>>> + * AMD Starship USB 3.0 Host Controller 0x148c
> >>>>>   * AMD Matisse USB 3.0 Host Controller 0x149c
> >>>>>   * Intel 82579LM Gigabit Ethernet Controller 0x1502
> >>>>>   * Intel 82579V Gigabit Ethernet Controller 0x1503 @@ -5143,6 +5144,7
> >>>>> @@ static void quirk_no_flr(struct pci_dev *dev)
> >>>>>     dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;  }
> >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
> >>>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c,
> >>>> quirk_no_flr);
> >>>>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c,
> >>>> quirk_no_flr);
> >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
> >>>> quirk_no_flr);
> >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
> >>>> quirk_no_flr);
>
> Regard
>
> Nehal Shah
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
  2020-06-08 17:47       ` Marcos Scriven
@ 2020-06-09 11:47         ` Shah, Nehal-bakulchandra
  2020-06-25 10:22           ` Marcos Scriven
  0 siblings, 1 reply; 11+ messages in thread
From: Shah, Nehal-bakulchandra @ 2020-06-09 11:47 UTC (permalink / raw)
  To: Marcos Scriven, Deucher, Alexander
  Cc: Bjorn Helgaas, Kevin Buettner, linux-pci, Bjorn Helgaas,
	Alex Williamson, Koenig, Christian

Hi

On 6/8/2020 11:17 PM, Marcos Scriven wrote:
> On Thu, 28 May 2020 at 09:12, Marcos Scriven <marcos@scriven.org> wrote:
>> On Wed, 27 May 2020 at 22:42, Deucher, Alexander
>> <Alexander.Deucher@amd.com> wrote:
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>>> -----Original Message-----
>>>> From: Bjorn Helgaas <helgaas@kernel.org>
>>>> Sent: Wednesday, May 27, 2020 5:32 PM
>>>> To: Kevin Buettner <kevinb@redhat.com>
>>>> Cc: linux-pci@vger.kernel.org; Bjorn Helgaas <bhelgaas@google.com>; Alex
>>>> Williamson <alex.williamson@redhat.com>; Deucher, Alexander
>>>> <Alexander.Deucher@amd.com>; Koenig, Christian
>>>> <Christian.Koenig@amd.com>
>>>> Subject: Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
>>>>
>>>> [+cc Alex D, Christian -- do you guys have any contacts or insight into why we
>>>> suddenly have three new AMD devices that advertise FLR support but it
>>>> doesn't work?  Are we doing something wrong in Linux, or are these devices
>>>> defective?
>>> +Nehal who handles our USB drivers.
>>>
>>> Nehal any ideas about FLR or whether it should be advertised?
>>>
>>> Alex
>>>
Sorry for the delay. We are looking into this with BIOS team. I shall revert soon on this.


>> I had read somewhere that the IO die in the Ryzen/Threadripper
>> packages are identical to the ones used in the motherboard chipsets.
>>
>> Since the latter do reset ok, it would seem a BIOS update of the AGESA
>> may potentially fix the issue.
>>
>> Unfortunately, it's not something motherboard manufacturer's customer
>> support people know how to deal with or pass back up the chain to AMD
>> engineers. Actual use of this feature seems to be fairly niche.
>>
>> After I added the workaround for the USB and audio controllers on the
>> 3rd-gen Ryzen, I tried contacting Kim Phillips (who I found as a
>> kernel committer to x86/cpu/amd), but haven't heard back.
>>
>> It would be wonderful to know if this can potentially be fixed in CPU
>> firmware, and whether there's any likelihood of it actually being
>> distributed by motherboard manufacturers.
>>
>> Marcos
>>
>>
>>
> Dear Alex/Nehal
>
> I wonder if you're able to comment please on whether FLR should be advertised?
>
> Is there any chance this could be fixed at the bios/AGESA level, and
> effectively rolled out?
>
> Thanks
>
> Marcos
>
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
>>>> kernel.org%2Fr%2F20200524003529.598434ff%40f31-
>>>> 4.lan&amp;data=02%7C01%7Calexander.deucher%40amd.com%7Ccb77b56b
>>>> 62ae47f60f8808d802855759%7C3dd8961fe4884e608e11a82d994e183d%7C0%
>>>> 7C0%7C637262119015438912&amp;sdata=3z%2Btn%2Bv2pvUl3X0Tzk%2BLoi
>>>> Mk06dLZCmgUOrsGf3kLpY%3D&amp;reserved=0
>>>>   AMD Starship USB 3.0 host controller
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
>>>> kernel.org%2Fr%2FCAAri2DpkcuQZYbT6XsALhx2e6vRqPHwtbjHYeiH7MNp4z
>>>> mt1RA%40mail.gmail.com&amp;data=02%7C01%7Calexander.deucher%40a
>>>> md.com%7Ccb77b56b62ae47f60f8808d802855759%7C3dd8961fe4884e608e11
>>>> a82d994e183d%7C0%7C0%7C637262119015438912&amp;sdata=69GsHB0HCp
>>>> 6x0xW0tA%2FrAln0Vy0Yc9I8QSHowebdIxI%3D&amp;reserved=0
>>>>   AMD Matisse HD Audio & USB 3.0 host controller ]
>>>>
>>>> On Sun, May 24, 2020 at 12:35:29AM -0700, Kevin Buettner wrote:
>>>>> This commit adds an entry to the quirk_no_flr table for the AMD
>>>>> Starship USB 3.0 host controller.
>>>>>
>>>>> Tested on a Micro-Star International Co., Ltd. MS-7C59/Creator TRX40
>>>>> motherboard with an AMD Ryzen Threadripper 3970X.
>>>>>
>>>>> Without this patch, when attempting to assign (pass through) an AMD
>>>>> Starship USB 3.0 host controller to a guest OS, the system becomes
>>>>> increasingly unresponsive over the course of several minutes,
>>>>> eventually requiring a hard reset.
>>>>>
>>>>> Shortly after attempting to start the guest, I see these messages:
>>>>>
>>>>> May 23 22:59:46 mesquite kernel: vfio-pci 0000:05:00.3: not ready
>>>>> 1023ms after FLR; waiting May 23 22:59:48 mesquite kernel: vfio-pci
>>>>> 0000:05:00.3: not ready 2047ms after FLR; waiting May 23 22:59:51
>>>>> mesquite kernel: vfio-pci 0000:05:00.3: not ready 4095ms after FLR;
>>>>> waiting May 23 22:59:56 mesquite kernel: vfio-pci 0000:05:00.3: not
>>>>> ready 8191ms after FLR; waiting
>>>>>
>>>>> And then eventually:
>>>>>
>>>>> May 23 23:01:00 mesquite kernel: vfio-pci 0000:05:00.3: not ready
>>>>> 65535ms after FLR; giving up May 23 23:01:05 mesquite kernel: INFO:
>>>>> NMI handler (perf_event_nmi_handler) took too long to run: 0.000 msecs
>>>>> May 23 23:01:06 mesquite kernel: perf: interrupt took too long (642744
>>>>>> 2500), lowering kernel.perf_event_max_sample_rate to 1000 May 23
>>>>> 23:01:07 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler)
>>>>> took too long to run: 82.270 msecs May 23 23:01:08 mesquite kernel: INFO:
>>>> NMI handler (perf_event_nmi_handler) took too long to run: 680.608 msecs
>>>> May 23 23:01:08 mesquite kernel: INFO: NMI handler
>>>> (perf_event_nmi_handler) took too long to run: 100.952 msecs ...
>>>>>  kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s!
>>>>> [qemu-system-x86:7487] May 23 23:01:25 mesquite kernel: watchdog:
>>>> BUG:
>>>>> soft lockup - CPU#3 stuck for 22s! [qemu-system-x86:7487]
>>>>>
>>>>> The above log snippets were obtained using the aforementioned hardware
>>>>> running Fedora 32 w/ kernel package kernel-5.6.13-300.fc32.x86_64.  My
>>>>> fix was applied to a local copy of the F32 kernel package, then
>>>>> rebuilt, etc.
>>>>>
>>>>> With this patch in place, the host kernel doesn't exhibit these
>>>>> problems.  The guest OS (also Fedora 32) starts up and works as
>>>>> expected with the passed-through USB host controller.
>>>>>
>>>>> Signed-off-by: Kevin Buettner <kevinb@redhat.com>
>>>> Applied to pci/virtualization for v5.8, thanks!
>>>>
>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
>>>>> 43a0c2ce635e..b1db58d00d2b 100644
>>>>> --- a/drivers/pci/quirks.c
>>>>> +++ b/drivers/pci/quirks.c
>>>>> @@ -5133,6 +5133,7 @@
>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
>>>> quirk_intel_qat_vf_cap);
>>>>>   * FLR may cause the following to devices to hang:
>>>>>   *
>>>>>   * AMD Starship/Matisse HD Audio Controller 0x1487
>>>>> + * AMD Starship USB 3.0 Host Controller 0x148c
>>>>>   * AMD Matisse USB 3.0 Host Controller 0x149c
>>>>>   * Intel 82579LM Gigabit Ethernet Controller 0x1502
>>>>>   * Intel 82579V Gigabit Ethernet Controller 0x1503 @@ -5143,6 +5144,7
>>>>> @@ static void quirk_no_flr(struct pci_dev *dev)
>>>>>     dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;  }
>>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
>>>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c,
>>>> quirk_no_flr);
>>>>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c,
>>>> quirk_no_flr);
>>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
>>>> quirk_no_flr);
>>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
>>>> quirk_no_flr);

Regard

Nehal Shah


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
  2020-05-28  8:12     ` Marcos Scriven
@ 2020-06-08 17:47       ` Marcos Scriven
  2020-06-09 11:47         ` Shah, Nehal-bakulchandra
  0 siblings, 1 reply; 11+ messages in thread
From: Marcos Scriven @ 2020-06-08 17:47 UTC (permalink / raw)
  To: Deucher, Alexander
  Cc: Bjorn Helgaas, Kevin Buettner, Shah, Nehal-bakulchandra,
	linux-pci, Bjorn Helgaas, Alex Williamson, Koenig, Christian

On Thu, 28 May 2020 at 09:12, Marcos Scriven <marcos@scriven.org> wrote:
>
> On Wed, 27 May 2020 at 22:42, Deucher, Alexander
> <Alexander.Deucher@amd.com> wrote:
> >
> > [AMD Official Use Only - Internal Distribution Only]
> >
> > > -----Original Message-----
> > > From: Bjorn Helgaas <helgaas@kernel.org>
> > > Sent: Wednesday, May 27, 2020 5:32 PM
> > > To: Kevin Buettner <kevinb@redhat.com>
> > > Cc: linux-pci@vger.kernel.org; Bjorn Helgaas <bhelgaas@google.com>; Alex
> > > Williamson <alex.williamson@redhat.com>; Deucher, Alexander
> > > <Alexander.Deucher@amd.com>; Koenig, Christian
> > > <Christian.Koenig@amd.com>
> > > Subject: Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
> > >
> > > [+cc Alex D, Christian -- do you guys have any contacts or insight into why we
> > > suddenly have three new AMD devices that advertise FLR support but it
> > > doesn't work?  Are we doing something wrong in Linux, or are these devices
> > > defective?
> >
> > +Nehal who handles our USB drivers.
> >
> > Nehal any ideas about FLR or whether it should be advertised?
> >
> > Alex
> >
>
> I had read somewhere that the IO die in the Ryzen/Threadripper
> packages are identical to the ones used in the motherboard chipsets.
>
> Since the latter do reset ok, it would seem a BIOS update of the AGESA
> may potentially fix the issue.
>
> Unfortunately, it's not something motherboard manufacturer's customer
> support people know how to deal with or pass back up the chain to AMD
> engineers. Actual use of this feature seems to be fairly niche.
>
> After I added the workaround for the USB and audio controllers on the
> 3rd-gen Ryzen, I tried contacting Kim Phillips (who I found as a
> kernel committer to x86/cpu/amd), but haven't heard back.
>
> It would be wonderful to know if this can potentially be fixed in CPU
> firmware, and whether there's any likelihood of it actually being
> distributed by motherboard manufacturers.
>
> Marcos
>
>
>

Dear Alex/Nehal

I wonder if you're able to comment please on whether FLR should be advertised?

Is there any chance this could be fixed at the bios/AGESA level, and
effectively rolled out?

Thanks

Marcos

> > >
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> > > kernel.org%2Fr%2F20200524003529.598434ff%40f31-
> > > 4.lan&amp;data=02%7C01%7Calexander.deucher%40amd.com%7Ccb77b56b
> > > 62ae47f60f8808d802855759%7C3dd8961fe4884e608e11a82d994e183d%7C0%
> > > 7C0%7C637262119015438912&amp;sdata=3z%2Btn%2Bv2pvUl3X0Tzk%2BLoi
> > > Mk06dLZCmgUOrsGf3kLpY%3D&amp;reserved=0
> > >   AMD Starship USB 3.0 host controller
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> > > kernel.org%2Fr%2FCAAri2DpkcuQZYbT6XsALhx2e6vRqPHwtbjHYeiH7MNp4z
> > > mt1RA%40mail.gmail.com&amp;data=02%7C01%7Calexander.deucher%40a
> > > md.com%7Ccb77b56b62ae47f60f8808d802855759%7C3dd8961fe4884e608e11
> > > a82d994e183d%7C0%7C0%7C637262119015438912&amp;sdata=69GsHB0HCp
> > > 6x0xW0tA%2FrAln0Vy0Yc9I8QSHowebdIxI%3D&amp;reserved=0
> > >   AMD Matisse HD Audio & USB 3.0 host controller ]
> > >
> > > On Sun, May 24, 2020 at 12:35:29AM -0700, Kevin Buettner wrote:
> > > > This commit adds an entry to the quirk_no_flr table for the AMD
> > > > Starship USB 3.0 host controller.
> > > >
> > > > Tested on a Micro-Star International Co., Ltd. MS-7C59/Creator TRX40
> > > > motherboard with an AMD Ryzen Threadripper 3970X.
> > > >
> > > > Without this patch, when attempting to assign (pass through) an AMD
> > > > Starship USB 3.0 host controller to a guest OS, the system becomes
> > > > increasingly unresponsive over the course of several minutes,
> > > > eventually requiring a hard reset.
> > > >
> > > > Shortly after attempting to start the guest, I see these messages:
> > > >
> > > > May 23 22:59:46 mesquite kernel: vfio-pci 0000:05:00.3: not ready
> > > > 1023ms after FLR; waiting May 23 22:59:48 mesquite kernel: vfio-pci
> > > > 0000:05:00.3: not ready 2047ms after FLR; waiting May 23 22:59:51
> > > > mesquite kernel: vfio-pci 0000:05:00.3: not ready 4095ms after FLR;
> > > > waiting May 23 22:59:56 mesquite kernel: vfio-pci 0000:05:00.3: not
> > > > ready 8191ms after FLR; waiting
> > > >
> > > > And then eventually:
> > > >
> > > > May 23 23:01:00 mesquite kernel: vfio-pci 0000:05:00.3: not ready
> > > > 65535ms after FLR; giving up May 23 23:01:05 mesquite kernel: INFO:
> > > > NMI handler (perf_event_nmi_handler) took too long to run: 0.000 msecs
> > > > May 23 23:01:06 mesquite kernel: perf: interrupt took too long (642744
> > > > > 2500), lowering kernel.perf_event_max_sample_rate to 1000 May 23
> > > > 23:01:07 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler)
> > > > took too long to run: 82.270 msecs May 23 23:01:08 mesquite kernel: INFO:
> > > NMI handler (perf_event_nmi_handler) took too long to run: 680.608 msecs
> > > May 23 23:01:08 mesquite kernel: INFO: NMI handler
> > > (perf_event_nmi_handler) took too long to run: 100.952 msecs ...
> > > >  kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s!
> > > > [qemu-system-x86:7487] May 23 23:01:25 mesquite kernel: watchdog:
> > > BUG:
> > > > soft lockup - CPU#3 stuck for 22s! [qemu-system-x86:7487]
> > > >
> > > > The above log snippets were obtained using the aforementioned hardware
> > > > running Fedora 32 w/ kernel package kernel-5.6.13-300.fc32.x86_64.  My
> > > > fix was applied to a local copy of the F32 kernel package, then
> > > > rebuilt, etc.
> > > >
> > > > With this patch in place, the host kernel doesn't exhibit these
> > > > problems.  The guest OS (also Fedora 32) starts up and works as
> > > > expected with the passed-through USB host controller.
> > > >
> > > > Signed-off-by: Kevin Buettner <kevinb@redhat.com>
> > >
> > > Applied to pci/virtualization for v5.8, thanks!
> > >
> > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > > > 43a0c2ce635e..b1db58d00d2b 100644
> > > > --- a/drivers/pci/quirks.c
> > > > +++ b/drivers/pci/quirks.c
> > > > @@ -5133,6 +5133,7 @@
> > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
> > > quirk_intel_qat_vf_cap);
> > > >   * FLR may cause the following to devices to hang:
> > > >   *
> > > >   * AMD Starship/Matisse HD Audio Controller 0x1487
> > > > + * AMD Starship USB 3.0 Host Controller 0x148c
> > > >   * AMD Matisse USB 3.0 Host Controller 0x149c
> > > >   * Intel 82579LM Gigabit Ethernet Controller 0x1502
> > > >   * Intel 82579V Gigabit Ethernet Controller 0x1503 @@ -5143,6 +5144,7
> > > > @@ static void quirk_no_flr(struct pci_dev *dev)
> > > >     dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;  }
> > > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
> > > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c,
> > > quirk_no_flr);
> > > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c,
> > > quirk_no_flr);
> > > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
> > > quirk_no_flr);
> > > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
> > > quirk_no_flr);
> > > >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
  2020-05-27 21:42   ` Deucher, Alexander
@ 2020-05-28  8:12     ` Marcos Scriven
  2020-06-08 17:47       ` Marcos Scriven
  0 siblings, 1 reply; 11+ messages in thread
From: Marcos Scriven @ 2020-05-28  8:12 UTC (permalink / raw)
  To: Deucher, Alexander
  Cc: Bjorn Helgaas, Kevin Buettner, Shah, Nehal-bakulchandra,
	linux-pci, Bjorn Helgaas, Alex Williamson, Koenig, Christian

On Wed, 27 May 2020 at 22:42, Deucher, Alexander
<Alexander.Deucher@amd.com> wrote:
>
> [AMD Official Use Only - Internal Distribution Only]
>
> > -----Original Message-----
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: Wednesday, May 27, 2020 5:32 PM
> > To: Kevin Buettner <kevinb@redhat.com>
> > Cc: linux-pci@vger.kernel.org; Bjorn Helgaas <bhelgaas@google.com>; Alex
> > Williamson <alex.williamson@redhat.com>; Deucher, Alexander
> > <Alexander.Deucher@amd.com>; Koenig, Christian
> > <Christian.Koenig@amd.com>
> > Subject: Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
> >
> > [+cc Alex D, Christian -- do you guys have any contacts or insight into why we
> > suddenly have three new AMD devices that advertise FLR support but it
> > doesn't work?  Are we doing something wrong in Linux, or are these devices
> > defective?
>
> +Nehal who handles our USB drivers.
>
> Nehal any ideas about FLR or whether it should be advertised?
>
> Alex
>

I had read somewhere that the IO die in the Ryzen/Threadripper
packages are identical to the ones used in the motherboard chipsets.

Since the latter do reset ok, it would seem a BIOS update of the AGESA
may potentially fix the issue.

Unfortunately, it's not something motherboard manufacturer's customer
support people know how to deal with or pass back up the chain to AMD
engineers. Actual use of this feature seems to be fairly niche.

After I added the workaround for the USB and audio controllers on the
3rd-gen Ryzen, I tried contacting Kim Phillips (who I found as a
kernel committer to x86/cpu/amd), but haven't heard back.

It would be wonderful to know if this can potentially be fixed in CPU
firmware, and whether there's any likelihood of it actually being
distributed by motherboard manufacturers.

Marcos



> >
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> > kernel.org%2Fr%2F20200524003529.598434ff%40f31-
> > 4.lan&amp;data=02%7C01%7Calexander.deucher%40amd.com%7Ccb77b56b
> > 62ae47f60f8808d802855759%7C3dd8961fe4884e608e11a82d994e183d%7C0%
> > 7C0%7C637262119015438912&amp;sdata=3z%2Btn%2Bv2pvUl3X0Tzk%2BLoi
> > Mk06dLZCmgUOrsGf3kLpY%3D&amp;reserved=0
> >   AMD Starship USB 3.0 host controller
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> > kernel.org%2Fr%2FCAAri2DpkcuQZYbT6XsALhx2e6vRqPHwtbjHYeiH7MNp4z
> > mt1RA%40mail.gmail.com&amp;data=02%7C01%7Calexander.deucher%40a
> > md.com%7Ccb77b56b62ae47f60f8808d802855759%7C3dd8961fe4884e608e11
> > a82d994e183d%7C0%7C0%7C637262119015438912&amp;sdata=69GsHB0HCp
> > 6x0xW0tA%2FrAln0Vy0Yc9I8QSHowebdIxI%3D&amp;reserved=0
> >   AMD Matisse HD Audio & USB 3.0 host controller ]
> >
> > On Sun, May 24, 2020 at 12:35:29AM -0700, Kevin Buettner wrote:
> > > This commit adds an entry to the quirk_no_flr table for the AMD
> > > Starship USB 3.0 host controller.
> > >
> > > Tested on a Micro-Star International Co., Ltd. MS-7C59/Creator TRX40
> > > motherboard with an AMD Ryzen Threadripper 3970X.
> > >
> > > Without this patch, when attempting to assign (pass through) an AMD
> > > Starship USB 3.0 host controller to a guest OS, the system becomes
> > > increasingly unresponsive over the course of several minutes,
> > > eventually requiring a hard reset.
> > >
> > > Shortly after attempting to start the guest, I see these messages:
> > >
> > > May 23 22:59:46 mesquite kernel: vfio-pci 0000:05:00.3: not ready
> > > 1023ms after FLR; waiting May 23 22:59:48 mesquite kernel: vfio-pci
> > > 0000:05:00.3: not ready 2047ms after FLR; waiting May 23 22:59:51
> > > mesquite kernel: vfio-pci 0000:05:00.3: not ready 4095ms after FLR;
> > > waiting May 23 22:59:56 mesquite kernel: vfio-pci 0000:05:00.3: not
> > > ready 8191ms after FLR; waiting
> > >
> > > And then eventually:
> > >
> > > May 23 23:01:00 mesquite kernel: vfio-pci 0000:05:00.3: not ready
> > > 65535ms after FLR; giving up May 23 23:01:05 mesquite kernel: INFO:
> > > NMI handler (perf_event_nmi_handler) took too long to run: 0.000 msecs
> > > May 23 23:01:06 mesquite kernel: perf: interrupt took too long (642744
> > > > 2500), lowering kernel.perf_event_max_sample_rate to 1000 May 23
> > > 23:01:07 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler)
> > > took too long to run: 82.270 msecs May 23 23:01:08 mesquite kernel: INFO:
> > NMI handler (perf_event_nmi_handler) took too long to run: 680.608 msecs
> > May 23 23:01:08 mesquite kernel: INFO: NMI handler
> > (perf_event_nmi_handler) took too long to run: 100.952 msecs ...
> > >  kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s!
> > > [qemu-system-x86:7487] May 23 23:01:25 mesquite kernel: watchdog:
> > BUG:
> > > soft lockup - CPU#3 stuck for 22s! [qemu-system-x86:7487]
> > >
> > > The above log snippets were obtained using the aforementioned hardware
> > > running Fedora 32 w/ kernel package kernel-5.6.13-300.fc32.x86_64.  My
> > > fix was applied to a local copy of the F32 kernel package, then
> > > rebuilt, etc.
> > >
> > > With this patch in place, the host kernel doesn't exhibit these
> > > problems.  The guest OS (also Fedora 32) starts up and works as
> > > expected with the passed-through USB host controller.
> > >
> > > Signed-off-by: Kevin Buettner <kevinb@redhat.com>
> >
> > Applied to pci/virtualization for v5.8, thanks!
> >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > > 43a0c2ce635e..b1db58d00d2b 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -5133,6 +5133,7 @@
> > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
> > quirk_intel_qat_vf_cap);
> > >   * FLR may cause the following to devices to hang:
> > >   *
> > >   * AMD Starship/Matisse HD Audio Controller 0x1487
> > > + * AMD Starship USB 3.0 Host Controller 0x148c
> > >   * AMD Matisse USB 3.0 Host Controller 0x149c
> > >   * Intel 82579LM Gigabit Ethernet Controller 0x1502
> > >   * Intel 82579V Gigabit Ethernet Controller 0x1503 @@ -5143,6 +5144,7
> > > @@ static void quirk_no_flr(struct pci_dev *dev)
> > >     dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;  }
> > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
> > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c,
> > quirk_no_flr);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c,
> > quirk_no_flr);
> > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
> > quirk_no_flr);
> > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
> > quirk_no_flr);
> > >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
  2020-05-27 21:31 ` Bjorn Helgaas
@ 2020-05-27 21:42   ` Deucher, Alexander
  2020-05-28  8:12     ` Marcos Scriven
  0 siblings, 1 reply; 11+ messages in thread
From: Deucher, Alexander @ 2020-05-27 21:42 UTC (permalink / raw)
  To: Bjorn Helgaas, Kevin Buettner, Shah, Nehal-bakulchandra
  Cc: linux-pci, Bjorn Helgaas, Alex Williamson, Koenig, Christian

[AMD Official Use Only - Internal Distribution Only]

> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Wednesday, May 27, 2020 5:32 PM
> To: Kevin Buettner <kevinb@redhat.com>
> Cc: linux-pci@vger.kernel.org; Bjorn Helgaas <bhelgaas@google.com>; Alex
> Williamson <alex.williamson@redhat.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>
> Subject: Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
> 
> [+cc Alex D, Christian -- do you guys have any contacts or insight into why we
> suddenly have three new AMD devices that advertise FLR support but it
> doesn't work?  Are we doing something wrong in Linux, or are these devices
> defective?

+Nehal who handles our USB drivers.

Nehal any ideas about FLR or whether it should be advertised?  

Alex


> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> kernel.org%2Fr%2F20200524003529.598434ff%40f31-
> 4.lan&amp;data=02%7C01%7Calexander.deucher%40amd.com%7Ccb77b56b
> 62ae47f60f8808d802855759%7C3dd8961fe4884e608e11a82d994e183d%7C0%
> 7C0%7C637262119015438912&amp;sdata=3z%2Btn%2Bv2pvUl3X0Tzk%2BLoi
> Mk06dLZCmgUOrsGf3kLpY%3D&amp;reserved=0
>   AMD Starship USB 3.0 host controller
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> kernel.org%2Fr%2FCAAri2DpkcuQZYbT6XsALhx2e6vRqPHwtbjHYeiH7MNp4z
> mt1RA%40mail.gmail.com&amp;data=02%7C01%7Calexander.deucher%40a
> md.com%7Ccb77b56b62ae47f60f8808d802855759%7C3dd8961fe4884e608e11
> a82d994e183d%7C0%7C0%7C637262119015438912&amp;sdata=69GsHB0HCp
> 6x0xW0tA%2FrAln0Vy0Yc9I8QSHowebdIxI%3D&amp;reserved=0
>   AMD Matisse HD Audio & USB 3.0 host controller ]
> 
> On Sun, May 24, 2020 at 12:35:29AM -0700, Kevin Buettner wrote:
> > This commit adds an entry to the quirk_no_flr table for the AMD
> > Starship USB 3.0 host controller.
> >
> > Tested on a Micro-Star International Co., Ltd. MS-7C59/Creator TRX40
> > motherboard with an AMD Ryzen Threadripper 3970X.
> >
> > Without this patch, when attempting to assign (pass through) an AMD
> > Starship USB 3.0 host controller to a guest OS, the system becomes
> > increasingly unresponsive over the course of several minutes,
> > eventually requiring a hard reset.
> >
> > Shortly after attempting to start the guest, I see these messages:
> >
> > May 23 22:59:46 mesquite kernel: vfio-pci 0000:05:00.3: not ready
> > 1023ms after FLR; waiting May 23 22:59:48 mesquite kernel: vfio-pci
> > 0000:05:00.3: not ready 2047ms after FLR; waiting May 23 22:59:51
> > mesquite kernel: vfio-pci 0000:05:00.3: not ready 4095ms after FLR;
> > waiting May 23 22:59:56 mesquite kernel: vfio-pci 0000:05:00.3: not
> > ready 8191ms after FLR; waiting
> >
> > And then eventually:
> >
> > May 23 23:01:00 mesquite kernel: vfio-pci 0000:05:00.3: not ready
> > 65535ms after FLR; giving up May 23 23:01:05 mesquite kernel: INFO:
> > NMI handler (perf_event_nmi_handler) took too long to run: 0.000 msecs
> > May 23 23:01:06 mesquite kernel: perf: interrupt took too long (642744
> > > 2500), lowering kernel.perf_event_max_sample_rate to 1000 May 23
> > 23:01:07 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler)
> > took too long to run: 82.270 msecs May 23 23:01:08 mesquite kernel: INFO:
> NMI handler (perf_event_nmi_handler) took too long to run: 680.608 msecs
> May 23 23:01:08 mesquite kernel: INFO: NMI handler
> (perf_event_nmi_handler) took too long to run: 100.952 msecs ...
> >  kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s!
> > [qemu-system-x86:7487] May 23 23:01:25 mesquite kernel: watchdog:
> BUG:
> > soft lockup - CPU#3 stuck for 22s! [qemu-system-x86:7487]
> >
> > The above log snippets were obtained using the aforementioned hardware
> > running Fedora 32 w/ kernel package kernel-5.6.13-300.fc32.x86_64.  My
> > fix was applied to a local copy of the F32 kernel package, then
> > rebuilt, etc.
> >
> > With this patch in place, the host kernel doesn't exhibit these
> > problems.  The guest OS (also Fedora 32) starts up and works as
> > expected with the passed-through USB host controller.
> >
> > Signed-off-by: Kevin Buettner <kevinb@redhat.com>
> 
> Applied to pci/virtualization for v5.8, thanks!
> 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > 43a0c2ce635e..b1db58d00d2b 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5133,6 +5133,7 @@
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
> quirk_intel_qat_vf_cap);
> >   * FLR may cause the following to devices to hang:
> >   *
> >   * AMD Starship/Matisse HD Audio Controller 0x1487
> > + * AMD Starship USB 3.0 Host Controller 0x148c
> >   * AMD Matisse USB 3.0 Host Controller 0x149c
> >   * Intel 82579LM Gigabit Ethernet Controller 0x1502
> >   * Intel 82579V Gigabit Ethernet Controller 0x1503 @@ -5143,6 +5144,7
> > @@ static void quirk_no_flr(struct pci_dev *dev)
> >  	dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;  }
> > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
> > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c,
> quirk_no_flr);
> >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c,
> quirk_no_flr);
> > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
> quirk_no_flr);
> > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
> quirk_no_flr);
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
  2020-05-24  7:35 Kevin Buettner
@ 2020-05-27 21:31 ` Bjorn Helgaas
  2020-05-27 21:42   ` Deucher, Alexander
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2020-05-27 21:31 UTC (permalink / raw)
  To: Kevin Buettner
  Cc: linux-pci, Bjorn Helgaas, Alex Williamson, Alex Deucher,
	Christian König

[+cc Alex D, Christian -- do you guys have any contacts or insight
into why we suddenly have three new AMD devices that advertise FLR
support but it doesn't work?  Are we doing something wrong in Linux,
or are these devices defective?

https://lore.kernel.org/r/20200524003529.598434ff@f31-4.lan
  AMD Starship USB 3.0 host controller
https://lore.kernel.org/r/CAAri2DpkcuQZYbT6XsALhx2e6vRqPHwtbjHYeiH7MNp4zmt1RA@mail.gmail.com
  AMD Matisse HD Audio & USB 3.0 host controller
]

On Sun, May 24, 2020 at 12:35:29AM -0700, Kevin Buettner wrote:
> This commit adds an entry to the quirk_no_flr table for the AMD
> Starship USB 3.0 host controller.
> 
> Tested on a Micro-Star International Co., Ltd. MS-7C59/Creator TRX40
> motherboard with an AMD Ryzen Threadripper 3970X.
> 
> Without this patch, when attempting to assign (pass through) an AMD
> Starship USB 3.0 host controller to a guest OS, the system becomes
> increasingly unresponsive over the course of several minutes,
> eventually requiring a hard reset.
> 
> Shortly after attempting to start the guest, I see these messages:
> 
> May 23 22:59:46 mesquite kernel: vfio-pci 0000:05:00.3: not ready 1023ms after FLR; waiting
> May 23 22:59:48 mesquite kernel: vfio-pci 0000:05:00.3: not ready 2047ms after FLR; waiting
> May 23 22:59:51 mesquite kernel: vfio-pci 0000:05:00.3: not ready 4095ms after FLR; waiting
> May 23 22:59:56 mesquite kernel: vfio-pci 0000:05:00.3: not ready 8191ms after FLR; waiting
> 
> And then eventually:
> 
> May 23 23:01:00 mesquite kernel: vfio-pci 0000:05:00.3: not ready 65535ms after FLR; giving up
> May 23 23:01:05 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 0.000 msecs
> May 23 23:01:06 mesquite kernel: perf: interrupt took too long (642744 > 2500), lowering kernel.perf_event_max_sample_rate to 1000
> May 23 23:01:07 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 82.270 msecs
> May 23 23:01:08 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 680.608 msecs
> May 23 23:01:08 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 100.952 msecs
> ...
>  kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [qemu-system-x86:7487]
> May 23 23:01:25 mesquite kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [qemu-system-x86:7487]
> 
> The above log snippets were obtained using the aforementioned hardware
> running Fedora 32 w/ kernel package kernel-5.6.13-300.fc32.x86_64.  My
> fix was applied to a local copy of the F32 kernel package, then
> rebuilt, etc.
> 
> With this patch in place, the host kernel doesn't exhibit these
> problems.  The guest OS (also Fedora 32) starts up and works as
> expected with the passed-through USB host controller.
> 
> Signed-off-by: Kevin Buettner <kevinb@redhat.com>

Applied to pci/virtualization for v5.8, thanks!

> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 43a0c2ce635e..b1db58d00d2b 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5133,6 +5133,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
>   * FLR may cause the following to devices to hang:
>   *
>   * AMD Starship/Matisse HD Audio Controller 0x1487
> + * AMD Starship USB 3.0 Host Controller 0x148c
>   * AMD Matisse USB 3.0 Host Controller 0x149c
>   * Intel 82579LM Gigabit Ethernet Controller 0x1502
>   * Intel 82579V Gigabit Ethernet Controller 0x1503
> @@ -5143,6 +5144,7 @@ static void quirk_no_flr(struct pci_dev *dev)
>  	dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
>  }
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, quirk_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_no_flr);
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0
@ 2020-05-24  7:35 Kevin Buettner
  2020-05-27 21:31 ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Buettner @ 2020-05-24  7:35 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, alex.williamson

This commit adds an entry to the quirk_no_flr table for the AMD
Starship USB 3.0 host controller.

Tested on a Micro-Star International Co., Ltd. MS-7C59/Creator TRX40
motherboard with an AMD Ryzen Threadripper 3970X.

Without this patch, when attempting to assign (pass through) an AMD
Starship USB 3.0 host controller to a guest OS, the system becomes
increasingly unresponsive over the course of several minutes,
eventually requiring a hard reset.

Shortly after attempting to start the guest, I see these messages:

May 23 22:59:46 mesquite kernel: vfio-pci 0000:05:00.3: not ready 1023ms after FLR; waiting
May 23 22:59:48 mesquite kernel: vfio-pci 0000:05:00.3: not ready 2047ms after FLR; waiting
May 23 22:59:51 mesquite kernel: vfio-pci 0000:05:00.3: not ready 4095ms after FLR; waiting
May 23 22:59:56 mesquite kernel: vfio-pci 0000:05:00.3: not ready 8191ms after FLR; waiting

And then eventually:

May 23 23:01:00 mesquite kernel: vfio-pci 0000:05:00.3: not ready 65535ms after FLR; giving up
May 23 23:01:05 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 0.000 msecs
May 23 23:01:06 mesquite kernel: perf: interrupt took too long (642744 > 2500), lowering kernel.perf_event_max_sample_rate to 1000
May 23 23:01:07 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 82.270 msecs
May 23 23:01:08 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 680.608 msecs
May 23 23:01:08 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 100.952 msecs
...
 kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [qemu-system-x86:7487]
May 23 23:01:25 mesquite kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [qemu-system-x86:7487]

The above log snippets were obtained using the aforementioned hardware
running Fedora 32 w/ kernel package kernel-5.6.13-300.fc32.x86_64.  My
fix was applied to a local copy of the F32 kernel package, then
rebuilt, etc.

With this patch in place, the host kernel doesn't exhibit these
problems.  The guest OS (also Fedora 32) starts up and works as
expected with the passed-through USB host controller.

Signed-off-by: Kevin Buettner <kevinb@redhat.com>

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 43a0c2ce635e..b1db58d00d2b 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5133,6 +5133,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
  * FLR may cause the following to devices to hang:
  *
  * AMD Starship/Matisse HD Audio Controller 0x1487
+ * AMD Starship USB 3.0 Host Controller 0x148c
  * AMD Matisse USB 3.0 Host Controller 0x149c
  * Intel 82579LM Gigabit Ethernet Controller 0x1502
  * Intel 82579V Gigabit Ethernet Controller 0x1503
@@ -5143,6 +5144,7 @@ static void quirk_no_flr(struct pci_dev *dev)
 	dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, quirk_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_no_flr);


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-08-14 14:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAAri2DpQnrGH5bnjC==W+HmnD4XMh8gcp9u-_LQ=K-jtrdHwAg@mail.gmail.com>
2020-07-13 22:14 ` [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0 Bjorn Helgaas
2020-07-13 22:48   ` Deucher, Alexander
2020-08-14  8:52     ` Marcos Scriven
2020-08-14 14:46       ` Deucher, Alexander
2020-05-24  7:35 Kevin Buettner
2020-05-27 21:31 ` Bjorn Helgaas
2020-05-27 21:42   ` Deucher, Alexander
2020-05-28  8:12     ` Marcos Scriven
2020-06-08 17:47       ` Marcos Scriven
2020-06-09 11:47         ` Shah, Nehal-bakulchandra
2020-06-25 10:22           ` Marcos Scriven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).