All of lore.kernel.org
 help / color / mirror / Atom feed
* LS1043A : "synchronous abort" at boot due to PCI config read
@ 2018-04-13 17:32 Gilles Buloz
  2018-04-27  8:43   ` Ard Biesheuvel
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Buloz @ 2018-04-13 17:32 UTC (permalink / raw)
  To: linux-arm-kernel

Dear developers,

I currently have two functional workarounds for this issue but would like to know which one you would recommend, if any :-)
I'm using a LS1043A CPU (NXP QorIQ Layerscape) and get a "synchronous external abort" when booting because of a PCI config read 
during PCI scan.

I'm using a custom hardware (based on LS1043ARDB) having a PEX8112 PCIe-to-PCI bridge connected to the LS1043A to have a PCI slot 
for legacy devices. This bridge only supports PCI-Compatible config accesses (offset 0x00-0xFF).
On this PCI slot I connect a PCI module made of a PCI-to-PCIe bridge plus PCIe devices behind.
The problem occurs when the kernel probes the PCIe devices : as they are PCIe devices, the kernel does a PCI config read access at 
offset 0x100 to check if "PCIe extended capability registers" are accessible (see drivers/pci/probe.c, function 
pci_cfg_space_size_ext()). Unfortunately the PEX8112 PCIe-to-PCI bridge that is in the path reports an error to the CPU for this 
access, and it seems there's no way to disable that on this bridge.

The first workaround I found was to patch drivers/pci/host/pci-layerscape.c to have PCIE_ABSERR_SETTING set to 0x9400 instead of 
0x9401 (for PCIE_ABSERR register) to disable error reporting. This only impacts an NXP part of the Linux kernel code, but I'm not 
sure this is a good idea (however it seems to be like that on Intel platforms where even MEM accesses to a no-device address return 
FF without any error).

I've also tried another workaround that works : patch drivers/pci/probe.c to use bus_flags to remember if a bus is behind a bridge 
without extended address capability, to avoid PCi config read accesses at offset 0x100 in
pci_cfg_space_size() / pci_cfg_space_size_ext(). But this patch impacts the generic PCI probe method of Linux.

Any Idea to properly handle that issue ?

Best regards

^ permalink raw reply	[flat|nested] 37+ messages in thread
* (no subject)
@ 2018-05-04 20:06 Bjorn Helgaas
  2018-05-07 21:56   ` Bjorn Helgaas
  0 siblings, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2018-05-04 20:06 UTC (permalink / raw)
  To: Gilles Buloz
  Cc: Bjorn Helgaas, linux-pci, Minghuan.Lian, linux-arm-kernel,
	Ard Biesheuvel, linux-kernel, Frederick Lawler, Sinan Kaya

Bcc: 
Subject: Re: [PATCH] PCI: Check whether bridges allow access to extended
 config space
Reply-To: 
In-Reply-To: <5AEC8002.5000309@kontron.com>

[+cc Fred, Sinan]

On Fri, May 04, 2018 at 03:45:07PM +0000, Gilles Buloz wrote:
> Le 04/05/2018 00:31, Bjorn Helgaas a écrit :
> > [+cc LKML]
> >
> > On Thu, May 03, 2018 at 12:40:27PM +0000, Gilles Buloz wrote:
> >> Subject:    [PATCH] For exception at PCI probe due to bridge reporting UR
> >>
> >> Even if a device supports extended config access, no such access must be
> >> done to this device If there's a bridge not supporting that in the path
> >> to this device. Doing such access with UR reporting enabled on the root
> >> bridge leads to an exception.
> >>
> >> This is the case on a LS1043A CPU (NXP QorIQ Layerscape) platform with
> >> the following bus topology :
> >>    LS1043 PCIe root
> >>      -> PEX8112 PCIe-to-PCI bridge (not supporting ext cfg on PCI side)
> >>        -> PMC slot connector (for legacy PMC modules)
> >> With a PMC module topology as follows :
> >>    PMC connector
> >>      -> PCI-to-PCIe bridge
> >>        -> PCIe switch (4 ports)
> >>          -> 4 PCIe devices (one on each port)
> >> In this case all devices behind the PEX8112 are supporting extended config
> >> access but this is prohibited by the PEX8112. Without this patch, an
> >> exception (synchronous abort) occurs in pci_cfg_space_size_ext().
> >>
> >> This patch checks the parent bridge of each allocated child bus to know if
> >> extended config access is supported on the child bus, and sets a flag in
> >> child->bus_flags if not supported. This  flag is inherited by all children
> >> buses of this child bus and then is checked to avoid this unsupported
> >> accesses to every device on these buses.
> > Hi Gilles,
> >
> > Thanks for the patch!  I reworked it a little bit to simplify the code
> > in pci_alloc_child_bus().  Can you test it and make sure I didn't
> > break anything?
> >
> Hi Bjorn,
> 
> Your rework works as expected. Tested on LS1043A platform with kernel 4.17-rc1, and with some backport on kernel 4.1.35
> 
> Suggestion : maybe change the pci_info() string to have :
>      pci_bus 0000:xx: extended config space not accessible
> instead of
>      pci_bus 0000:xx: extended config space not accessible on secondary bus
> as xx is already the number of the secondary bus

Oops, when I wrote that I was thinking it would be printed for the
bridge device (not the bus).  I changed it as you suggest.

Interesting, I didn't think about the fact that pci_info() would work
on a struct pci_bus * as well as on a struct pci_dev *, since it's a
macro and they both have a "dev" member.

> Info : with kernel 4.17-rc1, it turns out I need pcie_aspm=off to
> have the PMC devices behind the PCI-to-PCIe bridge of the PMC safely
> detected/configured. But this is not caused by the patch.

> Without pcie_aspm=off I saw this at one boot :
>     "pci 0000:02:0e.0: ASPM: Could not configure common clock" for this bridge, but devices
>     correctly detected/configured
> but at most boots I get :
>     no ASPM message but "pci 0000:04:02.0: bridge configuration invalid ([bus ff-ff]), reconfiguring "
>     instead, and some devices are missing. Also lspci show "rev ff" for some devices.
> I don't see this problem on 4.1.35 with the same backported patch.

This is interesting, especially since you have this unusual topology
of a path to the device that is PCIe, then conventional PCI, then PCIe
again.  We *should* be able to use ASPM on the PCIe links, but it's
definitely not a well-tested scenario.

Can you tell if something is actually broken?  Sinan's recent change,
04875177dbe0 ("PCI/ASPM: Don't warn if already in common clock mode"),
which appeared in v4.17-rc1, turns off the message in some cases.

The "bridge configuration invalid" message just means the firmware
didn't configure the bridge.  We *should* still set it up correctly,
but please report a bug if we don't.

lspci showing "ff" for some devices might be a symptom of the devices
being powered off.  In that case config reads normally return ~0 data
(though on your platform maybe it would cause exceptions).  I've seen
this in other situations and wondered if it would be worth adding a
hint to lspci so it could say "device may be powered off".

Anyway, if you are seeing something broken (more than just the
messages), please start a new thread about each one.  If you do, could
you please:

  - open a report at https://bugzilla.kernel.org/, in the Drivers/PCI
    component (open a separate bug for each issue you see)

  - use kernel version 4.17-rc1 and mark it as a regression if
    appropriate

  - attach (don't paste inline) the complete dmesg log and "lspci -vv"
    output (as root) to the bug

  - post a note to linux-pci@vger.kernel.org, cc Fred, Sinan, and me,
    and include the link to the bugzilla

Bjorn

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2018-05-10  2:45 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-13 17:32 LS1043A : "synchronous abort" at boot due to PCI config read Gilles Buloz
2018-04-27  8:43 ` Ard Biesheuvel
2018-04-27  8:43   ` Ard Biesheuvel
2018-04-27 12:29   ` Gilles Buloz
2018-04-27 12:29     ` Gilles Buloz
2018-04-27 16:56     ` Bjorn Helgaas
2018-04-27 16:56       ` Bjorn Helgaas
2018-04-30  8:46       ` Gilles Buloz
2018-04-30  8:46         ` Gilles Buloz
2018-04-30 13:36         ` Gilles Buloz
2018-04-30 13:36           ` Gilles Buloz
2018-04-30 17:04           ` Bjorn Helgaas
2018-04-30 17:04             ` Bjorn Helgaas
2018-04-30 17:53             ` Gilles Buloz
2018-04-30 17:53               ` Gilles Buloz
2018-05-02 12:57               ` Gilles Buloz
2018-05-02 12:57                 ` Gilles Buloz
2018-05-02 13:26                 ` Bjorn Helgaas
2018-05-02 13:26                   ` Bjorn Helgaas
2018-05-02 13:48                   ` Gilles Buloz
2018-05-02 13:48                     ` Gilles Buloz
2018-05-02 17:23                     ` Bjorn Helgaas
2018-05-02 17:23                       ` Bjorn Helgaas
2018-05-03 12:40                       ` Gilles Buloz
2018-05-03 12:40                         ` Gilles Buloz
2018-05-03 22:31                         ` [PATCH] PCI: Check whether bridges allow access to extended config space Bjorn Helgaas
2018-05-03 22:31                           ` Bjorn Helgaas
2018-05-03 22:31                           ` Bjorn Helgaas
2018-05-04 15:45                           ` Gilles Buloz
2018-05-04 15:45                             ` Gilles Buloz
2018-05-04 15:45                             ` Gilles Buloz
2018-05-04 20:06 Bjorn Helgaas
2018-05-07 21:56 ` [PATCH] PCI: Check whether bridges allow access to extended config space Bjorn Helgaas
2018-05-07 21:56   ` Bjorn Helgaas
2018-05-07 21:56   ` Bjorn Helgaas
2018-05-09 12:27   ` Gilles Buloz
2018-05-10  2:44     ` Frederick Lawler
2018-05-10  2:44       ` Frederick Lawler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.