All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Pali Rohár" <pali@kernel.org>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Stefan Roese <sr@denx.de>,
	linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	Keith Busch <kbusch@kernel.org>,
	Bharat Kumar Gogada <bharatku@xilinx.com>
Subject: Re: PCIe AER generates no interrupts on host (ZynqMP)
Date: Mon, 10 Jan 2022 12:12:11 +0100	[thread overview]
Message-ID: <20220110111211.l6lqmfmyu47dfhjh@pali> (raw)
In-Reply-To: <20220108031357.GA443744@bhelgaas>

On Friday 07 January 2022 21:13:57 Bjorn Helgaas wrote:
> On Fri, Jan 07, 2022 at 10:31:06PM +0100, Pali Rohár wrote:
> > On Friday 07 January 2022 14:34:15 Bjorn Helgaas wrote:
> > > On Fri, Jan 07, 2022 at 11:04:58AM +0100, Pali Rohár wrote:
> > > > Hello! You asked me in another email for comments to this email, so I'm
> > > > replying directly to this email...
> > > > 
> > > > On Tuesday 04 January 2022 10:02:18 Stefan Roese wrote:
> > > > > Hi,
> > > > > 
> > > > > I'm trying to get the Kernel PCIe AER infrastructure to work on my
> > > > > ZynqMP based system. E.g. handle the events (correctable, uncorrectable
> > > > > etc). In my current tests, no AER interrupt is generated though. I'm
> > > > > currently using the "surprise down error status" in the uncorrectable
> > > > > error status register of the connected PCIe switch (PLX / Broadcom
> > > > > PEX8718). Here the bit is correctly logged in the PEX switch
> > > > > uncorrectable error status register but no interrupt is generated
> > > > > to the root-port / system. And hence no AER message(s) reported.
> > > 
> > > I think the error should also be logged in the Root Port AER
> > > Capability.  And of course the interrupt enable bits in the Root Error
> > > Command register would have to be set.
> > > 
> > > > > Does any one of you have some ideas on what might be missing? Why are
> > > > > these events not reported to the PCIe rootport driver via IRQ? Might
> > > > > this be a problem of the missing MSI-X support of the ZynqMP? The AER
> > > > > interrupt is connected as legacy IRQ:
> > > > > 
> > > > > cat /proc/interrupts | grep -i aer
> > > > >  58:          0          0          0          0  nwl_pcie:legacy   0 Level
> > > > > PCIe PME, aerdrv
> > > 
> > > I guess this means whatever INTx the Root Port is using is connected
> > > to IRQ 58?  Can you tell whether that INTx works if a device below the
> > > Root Port uses it?  Or whether it is asserted for PMEs?
> > > 
> > > > Error events (correctable, non-fatal and fatal) are reported by PCIe
> > > > devices to the Root Complex via PCIe error messages (Message code of TLP
> > > > is set to Error Message) and not via interrupts. Root Port is then
> > > > responsible to "convert" these PCIe error messages to MSI(X) interrupt
> > > > and report it to the system. According to PCIe spec, AER is supported
> > > > only via MSI(X) interrupts, not legacy INTx.
> > > 
> > > Where does it say that?  PCIe r5.0, sec 6.2.4.1.2 and 6.2.6, both
> > > mention INTx, and the diagram in 6.2.6 even shows possible
> > > platform-specific System Error signaling.
> > 
> > Kernel AER driver is not available when MSI is not supported:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/aer.c?h=v5.15#n112
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/portdrv_core.c?h=v5.15#n224
> > Originally this was my primary indication that AER is MSI(X)-only.
> 
> I think that is broken.  Looks like it was added by 3e77a3f7895e
> ("PCI: Disable AER with pci=nomsi"), which says that with "pci=nomsi",
> we see lost interrupts and lockdep inversions.
> 
> I don't know any more of the history behind that, but I suspect that
> turning off AER in that case just covered up some other Linux issue.

I have same feeling that something different is broken.

And now it reminds me that I was testing AER interrupts about year ago
and when booted kernel with pci=nomsi then AER kernel driver refused
start and I realized that AER is probably MSI-only. I have checked spec
for AER Root Error Status Register and I due to this I probably come to
the conclusion that AER must be MSI-only...

Is kernel going to enable support for AER when MSI is disabled? Or we
let it in current state forever?

> > And my understanding is that AER Root Error Status Register (7.8.4.10)
> > specifies Advanced Error Interrupt Message Number which indicates which
> > MSI(X) interrupt is used. And there is no information about INTx if you
> > enable particular reporting category via AER Root Error Command Register.
> > That is why I was in impression that AER interrupts are MSI-only.
> > 
> > But now I'm looking at 6.2.4.1.2 section and seems that AER can really
> > use INTx. So I was wrong here.
> > 
> > But why then kernel AER driver has check that AER is available only when
> > MSI is enabled? And not available when MSI is disabled?
> 
> Apart from the issue behind 3e77a3f7895e, I think this is just because
> the intersection of platforms that lack MSI and people with enough
> interest in AER is small.
> 
> The interrupt configuration in portdrv is nasty.  Although it looks
> like pcie_init_service_irqs() might now be smart enough to use the
> legacy INTx when MSI is not available.

In last pci-aardvark patches is added support for emulation of AER
interrupt via virtual INTx and we tested that it is working. portdrv
correctly allocates emulated INTx and when this interrupt is triggered
then kernel AER driver see it and log error into dmesg.

So I guess, it probably could work also with real INTx (if implemented
in HW correctly).

  reply	other threads:[~2022-01-10 11:14 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-04  9:02 PCIe AER generates no interrupts on host (ZynqMP) Stefan Roese
2022-01-07 10:04 ` Pali Rohár
2022-01-07 20:34   ` Bjorn Helgaas
2022-01-07 21:31     ` Pali Rohár
2022-01-08  3:13       ` Bjorn Helgaas
2022-01-10 11:12         ` Pali Rohár [this message]
2022-01-10 12:17     ` Stefan Roese
2022-01-10 12:16   ` Stefan Roese
2022-01-11  8:14     ` Stefan Roese
2022-01-12 17:49       ` Bjorn Helgaas
2022-01-13  7:13         ` Stefan Roese
2022-01-13 21:32           ` Bjorn Helgaas
2022-01-14  6:25             ` Stefan Roese
2022-01-14 18:30               ` Bjorn Helgaas
2022-01-14 18:38                 ` Pali Rohár
2022-01-17  6:24                 ` Stefan Roese

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220110111211.l6lqmfmyu47dfhjh@pali \
    --to=pali@kernel.org \
    --cc=bharatku@xilinx.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sr@denx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.