All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.4.x kernel (only) gives pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
@ 2016-02-13 21:57 Marc MERLIN
  2016-02-15 17:14 ` Bjorn Helgaas
  0 siblings, 1 reply; 3+ messages in thread
From: Marc MERLIN @ 2016-02-13 21:57 UTC (permalink / raw)
  To: linux-pci

Howdy,

I just upgraded my laptop to a Lenovo thinkpad P70 (skylake), moved my linux
image (4.4.1 kernel), and I'm pseudo-randomly getting these:

pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
pcieport 0000:00:1c.4:   device [8086:a114] error status/mask=00001000/00002000
pcieport 0000:00:1c.4:    [12] Replay Timer Timeout
pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
pcieport 0000:00:1c.4:   device [8086:a114] error status/mask=00001000/00002000
pcieport 0000:00:1c.4:    [12] Replay Timer Timeout

pcieport 0000:00:1c.4: AER: Multiple Corrected error received: id=00e4
pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
pcieport 0000:00:1c.4:   device [8086:a114] error status/mask=00001000/00002000
pcieport 0000:00:1c.4:    [12] Replay Timer Timeout
pcieport 0000:00:1c.4: AER: Multiple Corrected error received: id=00e4
pcieport 0000:00:1c.4: can't find device of ID00e4
pcieport 0000:00:1c.4: AER: Multiple Corrected error received: id=00e4
pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)

They did not seem to be happening with 4.3.3 kernel.
With 4.4.1, I've had a boot where I got so many of those that the machine was unusable.
Other times, it happens a bit, and stops.
My last boot, it didn't happen at all.

Sadly, I have no idea what they mean, what I should do about them, and
why they only seem to be happening with 4.4.1 and not older kernels.

Boot log: http://marc.merlins.org/tmp/4.1.4.boot.txt
config.gz: http://marc.merlins.org/tmp/4.1.4.config.gz

8086:a114 is this:
PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1)
00:1c.4 0604: 8086:a114 (rev f1) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 123
        Bus: primary=00, secondary=05, subordinate=6f, sec-latency=0
        I/O behind bridge: 00002000-00002fff
        Memory behind bridge: a4000000-ba0fffff
        Prefetchable memory behind bridge: 0000000080000000-00000000a1ffffff
        Capabilities: [40] Express Root Port (Slot+), MSI 00
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [90] Subsystem: 17aa:222d
        Capabilities: [a0] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Access Control Services
        Capabilities: [220] #19
        Kernel driver in use: pcieport

Can someone offer some suggestions?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 4.4.x kernel (only) gives pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
  2016-02-13 21:57 4.4.x kernel (only) gives pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4 Marc MERLIN
@ 2016-02-15 17:14 ` Bjorn Helgaas
  2016-02-15 17:17   ` Marc MERLIN
  0 siblings, 1 reply; 3+ messages in thread
From: Bjorn Helgaas @ 2016-02-15 17:14 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-pci

Hi Marc,

On Sat, Feb 13, 2016 at 01:57:36PM -0800, Marc MERLIN wrote:
> Howdy,
> 
> I just upgraded my laptop to a Lenovo thinkpad P70 (skylake), moved my linux
> image (4.4.1 kernel), and I'm pseudo-randomly getting these:
> 
> pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
> pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
> pcieport 0000:00:1c.4:   device [8086:a114] error status/mask=00001000/00002000
> pcieport 0000:00:1c.4:    [12] Replay Timer Timeout
> pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
> pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
> pcieport 0000:00:1c.4:   device [8086:a114] error status/mask=00001000/00002000
> pcieport 0000:00:1c.4:    [12] Replay Timer Timeout
> 
> pcieport 0000:00:1c.4: AER: Multiple Corrected error received: id=00e4
> pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
> pcieport 0000:00:1c.4:   device [8086:a114] error status/mask=00001000/00002000
> pcieport 0000:00:1c.4:    [12] Replay Timer Timeout
> pcieport 0000:00:1c.4: AER: Multiple Corrected error received: id=00e4
> pcieport 0000:00:1c.4: can't find device of ID00e4
> pcieport 0000:00:1c.4: AER: Multiple Corrected error received: id=00e4
> pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
> 
> They did not seem to be happening with 4.3.3 kernel.
> With 4.4.1, I've had a boot where I got so many of those that the machine was unusable.
> Other times, it happens a bit, and stops.
> My last boot, it didn't happen at all.
> 
> Sadly, I have no idea what they mean, what I should do about them, and
> why they only seem to be happening with 4.4.1 and not older kernels.
> 
> Boot log: http://marc.merlins.org/tmp/4.1.4.boot.txt
> config.gz: http://marc.merlins.org/tmp/4.1.4.config.gz
> 
> 8086:a114 is this:
> PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1)
> 00:1c.4 0604: 8086:a114 (rev f1) (prog-if 00 [Normal decode])
>         Flags: bus master, fast devsel, latency 0, IRQ 123
>         Bus: primary=00, secondary=05, subordinate=6f, sec-latency=0
>         I/O behind bridge: 00002000-00002fff
>         Memory behind bridge: a4000000-ba0fffff
>         Prefetchable memory behind bridge: 0000000080000000-00000000a1ffffff
>         Capabilities: [40] Express Root Port (Slot+), MSI 00
>         Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
>         Capabilities: [90] Subsystem: 17aa:222d
>         Capabilities: [a0] Power Management version 3
>         Capabilities: [100] Advanced Error Reporting
>         Capabilities: [140] Access Control Services
>         Capabilities: [220] #19
>         Kernel driver in use: pcieport
> 
> Can someone offer some suggestions?

Thanks a lot for your report.  I think this is probably the same issue
reported in these bug reports:

  https://bugzilla.kernel.org/show_bug.cgi?id=109691
  https://bugzilla.kernel.org/show_bug.cgi?id=111601
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173

Short story: the AER driver receives the corrected error notification
but fails to clear it.  Nobody has stepped up to fix the bug yet.  You
can probably work around it by disabling AER completely by booting
with "pci=noaer".

I attached your dmesg log to
https://bugzilla.kernel.org/show_bug.cgi?id=111601

Bjorn

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 4.4.x kernel (only) gives pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
  2016-02-15 17:14 ` Bjorn Helgaas
@ 2016-02-15 17:17   ` Marc MERLIN
  0 siblings, 0 replies; 3+ messages in thread
From: Marc MERLIN @ 2016-02-15 17:17 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci

On Mon, Feb 15, 2016 at 11:14:23AM -0600, Bjorn Helgaas wrote:
> 
> Short story: the AER driver receives the corrected error notification
> but fails to clear it.  Nobody has stepped up to fix the bug yet.  You
> can probably work around it by disabling AER completely by booting
> with "pci=noaer".
 
Thanks, that's good to know. I wanted to make sure I had some idea why
before turning it off :)

> I attached your dmesg log to
> https://bugzilla.kernel.org/show_bug.cgi?id=111601

I appreciate it, thanks.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-02-15 17:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-13 21:57 4.4.x kernel (only) gives pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4 Marc MERLIN
2016-02-15 17:14 ` Bjorn Helgaas
2016-02-15 17:17   ` Marc MERLIN

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.