linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 5.7 regression: Lots of PCIe AER errors and suspend failure without pcie=noaer
       [not found] <CADLC3L20DuXw8WbS=SApmu2m49mkxxWKZrMJS_GBHDX7Vh0TvQ@mail.gmail.com>
@ 2020-07-11  0:28 ` Robert Hancock
  2020-07-21 23:55   ` Robert Hancock
  0 siblings, 1 reply; 4+ messages in thread
From: Robert Hancock @ 2020-07-11  0:28 UTC (permalink / raw)
  To: linux-kernel, linux-pci

On Fri, Jul 10, 2020 at 6:23 PM Robert Hancock <hancockrwd@gmail.com> wrote:
>
> Noticed a problem on my desktop with an Asus PRIME H270-PRO
> motherboard after Fedora 32 upgraded to the 5.7 kernel (now on 5.7.8):
> periodically there are PCIe AER errors getting spewed in dmesg that
> weren't happening before, and this also seems to causes suspend to
> fail - the system just wakes back up again right away, I am assuming
> due to some AER errors interrupting the process. 5.6 kernels didn't
> have this problem. Setting "pcie=noaer" on the kernel command line
> works around the issue, but I'm not sure what would have changed to
> trigger this to occur?

Correction: the workaround option is "pci=noaer".

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 5.7 regression: Lots of PCIe AER errors and suspend failure without pcie=noaer
  2020-07-11  0:28 ` 5.7 regression: Lots of PCIe AER errors and suspend failure without pcie=noaer Robert Hancock
@ 2020-07-21 23:55   ` Robert Hancock
  2020-07-24 14:32     ` Kai-Heng Feng
  0 siblings, 1 reply; 4+ messages in thread
From: Robert Hancock @ 2020-07-21 23:55 UTC (permalink / raw)
  To: linux-kernel, linux-pci, Kai-Heng Feng; +Cc: Bjorn Helgaas

On Fri, Jul 10, 2020 at 6:28 PM Robert Hancock <hancockrwd@gmail.com> wrote:
>
> On Fri, Jul 10, 2020 at 6:23 PM Robert Hancock <hancockrwd@gmail.com> wrote:
> >
> > Noticed a problem on my desktop with an Asus PRIME H270-PRO
> > motherboard after Fedora 32 upgraded to the 5.7 kernel (now on 5.7.8):
> > periodically there are PCIe AER errors getting spewed in dmesg that
> > weren't happening before, and this also seems to causes suspend to
> > fail - the system just wakes back up again right away, I am assuming
> > due to some AER errors interrupting the process. 5.6 kernels didn't
> > have this problem. Setting "pcie=noaer" on the kernel command line
> > works around the issue, but I'm not sure what would have changed to
> > trigger this to occur?
>
> Correction: the workaround option is "pci=noaer".

As a follow-up, from some more experimentation, it appears that
disabling PCIe ASPM with setpci on both the ASMedia PCIe-PCI bridge as
well as the PCIe root port it is connected to seems to silence the AER
errors and allow suspend/resume to work again:

setpci -s 00:1c.0 0x50.B=0x00
setpci -s 02:00.0 0x90.B=0x00

It appears the behavior changed as a result of this patch (which went
into the stable tree for 5.7.6 and so affects 5.7 kernels as well):

commit 66ff14e59e8a30690755b08bc3042359703fb07a
Author: Kai-Heng Feng <kai.heng.feng@canonical.com>
Date:   Wed May 6 01:34:21 2020 +0800

    PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges

    7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for
    Linux to enable ASPM, but for some undocumented reason, it didn't enable
    ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge.

    Remove this exclusion so we can enable ASPM on these links.

    The Dell OptiPlex 7080 mentioned in the bugzilla has a TI XIO2001
    PCIe-to-PCI Bridge.  Enabling ASPM on the link leading to it allows the
    Intel SoC to enter deeper Package C-states, which is a significant power
    savings.

    [bhelgaas: commit log]
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207571
    Link: https://lore.kernel.org/r/20200505173423.26968-1-kai.heng.feng@canonical.com
    Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

Unfortunately it appears that this ASMedia PCIe-PCI bridge:

02:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe
to PCI Bridge [1b21:1080] (rev 04)

doesn't cope with ASPM properly and causes a bunch of PCIe link
errors. (This is in addition to some broken-ness known as far back as
2012 with these ASM1083/1085 chips with regard to PCI interrupts
getting stuck, but this ASPM problem causes issues even if no devices
are connected to the PCI side of the bridge, as is the case on my
system.)

Might need a quirk to disable ASPM on this device?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 5.7 regression: Lots of PCIe AER errors and suspend failure without pcie=noaer
  2020-07-21 23:55   ` Robert Hancock
@ 2020-07-24 14:32     ` Kai-Heng Feng
  2020-07-24 21:08       ` Robert Hancock
  0 siblings, 1 reply; 4+ messages in thread
From: Kai-Heng Feng @ 2020-07-24 14:32 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel, open list:PCI SUBSYSTEM, Bjorn Helgaas

Hi Robert,

> On Jul 22, 2020, at 07:55, Robert Hancock <hancockrwd@gmail.com> wrote:
> 
> On Fri, Jul 10, 2020 at 6:28 PM Robert Hancock <hancockrwd@gmail.com> wrote:
>> 
>> On Fri, Jul 10, 2020 at 6:23 PM Robert Hancock <hancockrwd@gmail.com> wrote:
>>> 
>>> Noticed a problem on my desktop with an Asus PRIME H270-PRO
>>> motherboard after Fedora 32 upgraded to the 5.7 kernel (now on 5.7.8):
>>> periodically there are PCIe AER errors getting spewed in dmesg that
>>> weren't happening before, and this also seems to causes suspend to
>>> fail - the system just wakes back up again right away, I am assuming
>>> due to some AER errors interrupting the process. 5.6 kernels didn't
>>> have this problem. Setting "pcie=noaer" on the kernel command line
>>> works around the issue, but I'm not sure what would have changed to
>>> trigger this to occur?
>> 
>> Correction: the workaround option is "pci=noaer".
> 
> As a follow-up, from some more experimentation, it appears that
> disabling PCIe ASPM with setpci on both the ASMedia PCIe-PCI bridge as
> well as the PCIe root port it is connected to seems to silence the AER
> errors and allow suspend/resume to work again:
> 
> setpci -s 00:1c.0 0x50.B=0x00
> setpci -s 02:00.0 0x90.B=0x00
> 
> It appears the behavior changed as a result of this patch (which went
> into the stable tree for 5.7.6 and so affects 5.7 kernels as well):
> 
> commit 66ff14e59e8a30690755b08bc3042359703fb07a
> Author: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Date:   Wed May 6 01:34:21 2020 +0800
> 
>    PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges
> 
>    7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for
>    Linux to enable ASPM, but for some undocumented reason, it didn't enable
>    ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge.
> 
>    Remove this exclusion so we can enable ASPM on these links.
> 
>    The Dell OptiPlex 7080 mentioned in the bugzilla has a TI XIO2001
>    PCIe-to-PCI Bridge.  Enabling ASPM on the link leading to it allows the
>    Intel SoC to enter deeper Package C-states, which is a significant power
>    savings.
> 
>    [bhelgaas: commit log]
>    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207571
>    Link: https://lore.kernel.org/r/20200505173423.26968-1-kai.heng.feng@canonical.com
>    Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>    Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> Unfortunately it appears that this ASMedia PCIe-PCI bridge:
> 
> 02:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe
> to PCI Bridge [1b21:1080] (rev 04)
> 
> doesn't cope with ASPM properly and causes a bunch of PCIe link
> errors. (This is in addition to some broken-ness known as far back as
> 2012 with these ASM1083/1085 chips with regard to PCI interrupts
> getting stuck, but this ASPM problem causes issues even if no devices
> are connected to the PCI side of the bridge, as is the case on my
> system.)
> 
> Might need a quirk to disable ASPM on this device?

Yes I think it's a great idea to do it.

Can you please file a bug on [1] and we can continue our discussion there.

[1] https://bugzilla.kernel.org

Kai-Heng

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 5.7 regression: Lots of PCIe AER errors and suspend failure without pcie=noaer
  2020-07-24 14:32     ` Kai-Heng Feng
@ 2020-07-24 21:08       ` Robert Hancock
  0 siblings, 0 replies; 4+ messages in thread
From: Robert Hancock @ 2020-07-24 21:08 UTC (permalink / raw)
  To: Kai-Heng Feng; +Cc: linux-kernel, open list:PCI SUBSYSTEM, Bjorn Helgaas

On Fri, Jul 24, 2020 at 8:32 AM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
> Hi Robert,
>
> > Unfortunately it appears that this ASMedia PCIe-PCI bridge:
> >
> > 02:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe
> > to PCI Bridge [1b21:1080] (rev 04)
> >
> > doesn't cope with ASPM properly and causes a bunch of PCIe link
> > errors. (This is in addition to some broken-ness known as far back as
> > 2012 with these ASM1083/1085 chips with regard to PCI interrupts
> > getting stuck, but this ASPM problem causes issues even if no devices
> > are connected to the PCI side of the bridge, as is the case on my
> > system.)
> >
> > Might need a quirk to disable ASPM on this device?
>
> Yes I think it's a great idea to do it.
>
> Can you please file a bug on [1] and we can continue our discussion there.
>
> [1] https://bugzilla.kernel.org

Hi, I created a bug entry earlier as a result of another discussion,
which includes the debug info as well as a proposed patch:
https://bugzilla.kernel.org/show_bug.cgi?id=208667

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-07-24 21:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CADLC3L20DuXw8WbS=SApmu2m49mkxxWKZrMJS_GBHDX7Vh0TvQ@mail.gmail.com>
2020-07-11  0:28 ` 5.7 regression: Lots of PCIe AER errors and suspend failure without pcie=noaer Robert Hancock
2020-07-21 23:55   ` Robert Hancock
2020-07-24 14:32     ` Kai-Heng Feng
2020-07-24 21:08       ` Robert Hancock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).