All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Pali Rohár" <pali@kernel.org>
To: Stefan Roese <sr@denx.de>
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	Keith Busch <kbusch@kernel.org>,
	Bharat Kumar Gogada <bharatku@xilinx.com>
Subject: Re: PCIe AER generates no interrupts on host (ZynqMP)
Date: Fri, 7 Jan 2022 11:04:58 +0100	[thread overview]
Message-ID: <20220107100458.sfqcq7gy6nwwamjt@pali> (raw)
In-Reply-To: <4736848c-7b3b-a99d-8fd3-540ec6eb920b@denx.de>

Hello! You asked me in another email for comments to this email, so I'm
replying directly to this email...

On Tuesday 04 January 2022 10:02:18 Stefan Roese wrote:
> Hi,
> 
> I'm trying to get the Kernel PCIe AER infrastructure to work on my
> ZynqMP based system. E.g. handle the events (correctable, uncorrectable
> etc). In my current tests, no AER interrupt is generated though. I'm
> currently using the "surprise down error status" in the uncorrectable
> error status register of the connected PCIe switch (PLX / Broadcom
> PEX8718). Here the bit is correctly logged in the PEX switch
> uncorrectable error status register but no interrupt is generated
> to the root-port / system. And hence no AER message(s) reported.
> 
> Does any one of you have some ideas on what might be missing? Why are
> these events not reported to the PCIe rootport driver via IRQ? Might
> this be a problem of the missing MSI-X support of the ZynqMP? The AER
> interrupt is connected as legacy IRQ:
> 
> cat /proc/interrupts | grep -i aer
>  58:          0          0          0          0  nwl_pcie:legacy   0 Level
> PCIe PME, aerdrv

Error events (correctable, non-fatal and fatal) are reported by PCIe
devices to the Root Complex via PCIe error messages (Message code of TLP
is set to Error Message) and not via interrupts. Root Port is then
responsible to "convert" these PCIe error messages to MSI(X) interrupt
and report it to the system. According to PCIe spec, AER is supported
only via MSI(X) interrupts, not legacy INTx.

Via Bridge Control register (SERR# enable bit) on the Root Port it is
possible to enable / disable reporting of these errors from PCIe devices
on the other end of PCIe link to the system. Then via Command register
(SERR# enable bit) and Device Control register it is possible to enable
/ disable reporting of all errors (from Root Port and also devices on
other end of the link). And via AER registers on the Root Port it is
also possible to disable generating MSI(X) interrupts when error is
reported. And IIRC via PCIe Downstream Port Containment there is also
way how to "mask" reporting of error events. But I do not have PCIe
devices with DPC support, so I have not played with it yet. So there are
many places where error event can be stopped. But important is that
kernel AER driver should correctly enable all required bits to start
receiving MSI(X) interrupts for error events.

On other devices I'm seeing following issues... Root Ports are not
compliant to PCIe spec and do not implement error reporting at all. Or
they do not implement those enable/disable bits correctly. Or they do
not implement proper support for extended PCIe config space for Root
Port (AER is in extended space). Or they report error events via custom
proprietary interrupts and not via MSI(X) as required by PCIe spec. This
is the case for (all?) Marvell PCIe controllers and I saw here on
linux-pci list that it applies also for PCIe controllers from some other
vendors. Also drivers for Marvell PCIe controllers requires additional
code to access extended PCIe config space of Root Port (accessing config
space of PCIe devices on the other end of PCIe link is working fine).

So the first suspicious thing is why kernel AER driver is using legacy
shared INTx interrupt as in most cases Root Port would not report any
error event via INTx. And the second thing, try to look into
documentation for used PCIe controller, just in case if vendor
"invented" some proprietary and non-compliant way how to report error /
AER events to OS...

I saw more issues with PCIe controllers as with PCIe switches so in my
opinion issue would be either in controller driver or controller hw
itself. And if you see event status logged in PCIe switch register I
would expect that switch correctly sent PCIe Error message to Root
Complex.

> BTW: This was tested on v5.10 and recent v5.16-rc6.
> 
> Thanks,
> Stefan

  reply	other threads:[~2022-01-07 10:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-04  9:02 PCIe AER generates no interrupts on host (ZynqMP) Stefan Roese
2022-01-07 10:04 ` Pali Rohár [this message]
2022-01-07 20:34   ` Bjorn Helgaas
2022-01-07 21:31     ` Pali Rohár
2022-01-08  3:13       ` Bjorn Helgaas
2022-01-10 11:12         ` Pali Rohár
2022-01-10 12:17     ` Stefan Roese
2022-01-10 12:16   ` Stefan Roese
2022-01-11  8:14     ` Stefan Roese
2022-01-12 17:49       ` Bjorn Helgaas
2022-01-13  7:13         ` Stefan Roese
2022-01-13 21:32           ` Bjorn Helgaas
2022-01-14  6:25             ` Stefan Roese
2022-01-14 18:30               ` Bjorn Helgaas
2022-01-14 18:38                 ` Pali Rohár
2022-01-17  6:24                 ` Stefan Roese

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220107100458.sfqcq7gy6nwwamjt@pali \
    --to=pali@kernel.org \
    --cc=bharatku@xilinx.com \
    --cc=bhelgaas@google.com \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sr@denx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.