linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* pcie-xilinx-nwl: Uncorrectable errors upon PCIe surprise removal
@ 2021-05-27  8:47 Stefan Roese
  0 siblings, 0 replies; only message in thread
From: Stefan Roese @ 2021-05-27  8:47 UTC (permalink / raw)
  To: linux-pci; +Cc: Bharat Kumar Gogada, Michal Simek

Hi,

on our ZynqMP platform we are seeing uncorrectable errors when we try
to access the BAR of a PCIe device (NVMe drive) which was removed
(surprise removal):

[  255.743801] nwl-pcie fd0e0000.pcie: Slave error
[  255.745210] nwl-pcie fd0e0000.pcie: Non-Fatal Error in AER Capability
[  255.750714] nwl-pcie fd0e0000.pcie: Non-Fatal Error Detected
[  255.752523] nwl-pcie fd0e0000.pcie: Non-Fatal Error in AER Capability
[  255.753840] nwl-pcie fd0e0000.pcie: Non-Fatal Error Detected
[  255.755174] nwl-pcie fd0e0000.pcie: Non-Fatal Error in AER Capability
[  255.756706] nwl-pcie fd0e0000.pcie: Non-Fatal Error Detected
[  255.758168] nwl-pcie fd0e0000.pcie: Non-Fatal Error in AER Capability
...

Sometimes even accompanied (started) by a Kernel crash:

Internal error: synchronous external abort: 96000210 [#1] SMP

It seems that the "Slave error" (bit 4) can be cleared in
nwl_pcie_misc_handler() but both other "Non-Fatal" errors not.

I'm wondering now, if this situation can be resolved somehow, so that
the system "survives" such surprise removals without a crash. What we
really would like to see is, that reading from the unavailable PCI space
(BAR area) returns 0xffffffff as common for PCI.

So is this a known issue that accesses to BAR ranges of removed PCIe
devices result in such errors? If yes, why is this the case? Is there
perhaps a way to fully clear the error condition?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-05-27  8:47 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-27  8:47 pcie-xilinx-nwl: Uncorrectable errors upon PCIe surprise removal Stefan Roese

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).