linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] rk3399-rockpro64 pcie synchronous external abort
@ 2019-11-04 18:55 Peter Geis
  2019-11-09  1:08 ` Peter Geis
  2019-11-12  0:13 ` Bjorn Helgaas
  0 siblings, 2 replies; 20+ messages in thread
From: Peter Geis @ 2019-11-04 18:55 UTC (permalink / raw)
  To: shawn.lin, Heiko Stuebner; +Cc: linux-pci, linux-rockchip

Good Morning,

I'm attempting to debug an issue with the rockpro64 pcie port.
It would appear that the port does not like various cards, including
cards of the same make that randomly work or do not work, such as
Intel i340 based NICs.
I'm experiencing it with a GTX645 gpu.

This seems to be a long running issue, referenced both at [0], and [1].
There was an attempt to rectify it, by adding a delay between training
and probing [2], but that doesn't seem to be the issue here.
It appears that when we probe further into the card, such as devfn >
1, we trigger the bug.
I've added a print statement that prints the devfn, address, and size
information, which you can see below.

I've attempted setting the available number of lanes to 1 as well, to
no difference.

If anyone could point me in the right direction as to where to
continue debugging, I'd greatly appreciate it.

[0] https://github.com/ayufan-rock64/linux-build/issues/254
[1] https://github.com/rockchip-linux/kernel/issues/116
[2] https://github.com/ayufan-rock64/linux-kernel/commit/3cde5c624c9c39aa03251a55c2d26a48b5bdca5b

[  198.491458] rockchip-pcie f8000000.pcie: missing legacy phy; search
for per-lane PHY
[  198.492986] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
[  198.493060] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
[  198.550444] rockchip-pcie f8000000.pcie: current link width is x1
[  198.550458] rockchip-pcie f8000000.pcie: idling lane 1
[  198.550479] rockchip-pcie f8000000.pcie: idling lane 2
[  198.550490] rockchip-pcie f8000000.pcie: idling lane 3
[  198.550608] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
[  198.550625] rockchip-pcie f8000000.pcie: Parsing ranges property...
[  198.550656] rockchip-pcie f8000000.pcie:   MEM
0xfa000000..0xfbdfffff -> 0xfa000000
[  198.550676] rockchip-pcie f8000000.pcie:    IO
0xfbe00000..0xfbefffff -> 0xfbe00000
[  198.552908] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
[  198.552933] pci_bus 0000:00: root bus resource [bus 00-1f]
[  198.552943] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
[  198.552954] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
(bus address [0xfbe00000-0xfbefffff])
[  198.552965] pci_bus 0000:00: scanning bus
[  198.554198] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
[  198.555508] pci 0000:00:00.0: supports D1
[  198.555516] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
[  198.556023] pci 0000:00:00.0: PME# disabled
[  198.561245] pci_bus 0000:00: fixups for bus
[  198.561269] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
[  198.561277] pci 0000:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[  198.566429] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
[  198.567008] pci_bus 0000:01: scanning bus
[  198.567171] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
[  198.567420] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
[  198.567515] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
64bit pref]
[  198.567608] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
64bit pref]
[  198.567665] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
[  198.567673] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
[  198.567730] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[  198.567815] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
[  198.569051] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
with 8 GT/s x16 link)
[  198.570225] pci 0000:01:00.0: vgaarb: VGA device added:
decodes=io+mem,owns=none,locks=none
[  198.570481] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
[  198.570663] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[  198.571039] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
<snip>
[  198.749857] pci_bus 0000:01: read pcie, devfn 1, at 100, size 2
[  198.750252] pci_bus 0000:01: read pcie, devfn 2, at 0, size 4
[  198.750881] Internal error: synchronous external abort: 96000210
[#1] PREEMPT SMP
[  198.751581] Modules linked in: drm_panel_orientation_quirks
pcie_rockchip_host(+) cpufreq_dt sch_fq_codel ip_tables x_tables ipv6
crc_ccitt nf_defrag_ipv6
[  198.752861] CPU: 1 PID: 1686 Comm: systemd-udevd Not tainted
5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #5
[  198.753791] Hardware name: Pine64 RockPro64 (DT)
[  198.754215] pstate: 60400085 (nZCv daIf +PAN -UAO)
[  198.754672] pc : __raw_readl+0x0/0x8 [pcie_rockchip_host]
[  198.755172] lr : rockchip_pcie_rd_conf+0x140/0x1dc [pcie_rockchip_host]
[  198.755773] sp : ffff8000132af530
[  198.756079] x29: ffff8000132af530 x28: 0000000000000000
[  198.756565] x27: 0000000000000001 x26: 0000000000000000
[  198.757049] x25: ffff0000c20ac000 x24: 0000000000002000
[  198.757534] x23: ffff0000c20ae5c0 x22: ffff8000132af5d4
[  198.758018] x21: 0000000000002000 x20: 0000000000000004
[  198.758502] x19: 0000000000102000 x18: 0000000000000001
[  198.758987] x17: 0000000000000000 x16: 0000000000000000
[  198.759472] x15: ffffffffffffffff x14: ffff80001159bcc8
[  198.759957] x13: 0000000000000000 x12: ffff800011b2c000
[  198.760441] x11: ffff8000115bf000 x10: ffff800011310018
[  198.760926] x9 : 00000000fffb9fff x8 : 0000000000000001
[  198.761410] x7 : 0000000000000000 x6 : ffff0000f7492548
[  198.761894] x5 : 0000000000000001 x4 : ffff0000f7492548
[  198.762379] x3 : 0000000000000000 x2 : 0000000000c00008
[  198.762863] x1 : ffff80001dc00008 x0 : ffff80001a102000
[  198.763348] Call trace:
[  198.763583]  __raw_readl+0x0/0x8 [pcie_rockchip_host]
[  198.764057]  pci_bus_read_config_dword+0x88/0xd0
[  198.764484]  pci_bus_generic_read_dev_vendor_id+0x40/0x1b8
[  198.764982]  pci_bus_read_dev_vendor_id+0x58/0x88
[  198.765413]  pci_scan_single_device+0x84/0xf8
[  198.765812]  pci_scan_slot+0x7c/0x120
[  198.766149]  pci_scan_child_bus_extend+0x68/0x2dc
[  198.766579]  pci_scan_bridge_extend+0x350/0x588
[  198.766992]  pci_scan_child_bus_extend+0x21c/0x2dc
[  198.767430]  pci_scan_child_bus+0x24/0x30
[  198.767797]  pci_scan_root_bus_bridge+0xc4/0xd0
[  198.768215]  rockchip_pcie_probe+0x610/0x74c [pcie_rockchip_host]
[  198.768770]  platform_drv_probe+0x58/0xa8
[  198.769139]  really_probe+0xe0/0x318
[  198.769468]  driver_probe_device+0x5c/0xf0
[  198.769844]  device_driver_attach+0x74/0x80
[  198.770227]  __driver_attach+0x64/0xe8
[  198.770572]  bus_for_each_dev+0x84/0xd8
[  198.770924]  driver_attach+0x30/0x40
[  198.771253]  bus_add_driver+0x188/0x1e8
[  198.771605]  driver_register+0x64/0x110
[  198.771956]  __platform_driver_register+0x54/0x60
[  198.772388]  rockchip_pcie_driver_init+0x28/0x10000 [pcie_rockchip_host]
[  198.772998]  do_one_initcall+0x94/0x390
[  198.773353]  do_init_module+0x88/0x268
[  198.773697]  load_module+0x1e18/0x2198
[  198.774043]  __do_sys_finit_module+0xd0/0xe8
[  198.774435]  __arm64_sys_finit_module+0x28/0x38
[  198.774858]  el0_svc_common.constprop.3+0xa4/0x1d8
[  198.775297]  el0_svc_handler+0x34/0xa0
[  198.775645]  el0_svc+0x14/0x40
[  198.775928]  el0_sync_handler+0x118/0x290
[  198.776295]  el0_sync+0x164/0x180
[  198.776609] Code: bad PC value
[  198.776897] ---[ end trace 88fc77651b5e2909 ]---

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-04 18:55 [BUG] rk3399-rockpro64 pcie synchronous external abort Peter Geis
@ 2019-11-09  1:08 ` Peter Geis
  2019-11-09 16:37   ` Doug Anderson
  2019-11-12  0:13 ` Bjorn Helgaas
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Geis @ 2019-11-09  1:08 UTC (permalink / raw)
  To: shawn.lin, Heiko Stuebner; +Cc: linux-pci, linux-rockchip

Good Evening,

I'm not sure, but I believe the pcie address space built into the
rk3399 is not large enough to accommodate the pcie addresses the card
requires.
I've been trying to figure out if it's possible to use system ram
instead, but so far I haven't been successful.
Also, the ram layout for the rk3399 is odd considering the TRM, if
anyone has any insights in to that, I'd be grateful.

On Mon, Nov 4, 2019 at 1:55 PM Peter Geis <pgwipeout@gmail.com> wrote:
>
> Good Morning,
>
> I'm attempting to debug an issue with the rockpro64 pcie port.
> It would appear that the port does not like various cards, including
> cards of the same make that randomly work or do not work, such as
> Intel i340 based NICs.
> I'm experiencing it with a GTX645 gpu.
>
> This seems to be a long running issue, referenced both at [0], and [1].
> There was an attempt to rectify it, by adding a delay between training
> and probing [2], but that doesn't seem to be the issue here.
> It appears that when we probe further into the card, such as devfn >
> 1, we trigger the bug.
> I've added a print statement that prints the devfn, address, and size
> information, which you can see below.
>
> I've attempted setting the available number of lanes to 1 as well, to
> no difference.
>
> If anyone could point me in the right direction as to where to
> continue debugging, I'd greatly appreciate it.
>
> [0] https://github.com/ayufan-rock64/linux-build/issues/254
> [1] https://github.com/rockchip-linux/kernel/issues/116
> [2] https://github.com/ayufan-rock64/linux-kernel/commit/3cde5c624c9c39aa03251a55c2d26a48b5bdca5b
>
> [  198.491458] rockchip-pcie f8000000.pcie: missing legacy phy; search
> for per-lane PHY
> [  198.492986] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
> [  198.493060] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
> [  198.550444] rockchip-pcie f8000000.pcie: current link width is x1
> [  198.550458] rockchip-pcie f8000000.pcie: idling lane 1
> [  198.550479] rockchip-pcie f8000000.pcie: idling lane 2
> [  198.550490] rockchip-pcie f8000000.pcie: idling lane 3
> [  198.550608] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
> [  198.550625] rockchip-pcie f8000000.pcie: Parsing ranges property...
> [  198.550656] rockchip-pcie f8000000.pcie:   MEM
> 0xfa000000..0xfbdfffff -> 0xfa000000
> [  198.550676] rockchip-pcie f8000000.pcie:    IO
> 0xfbe00000..0xfbefffff -> 0xfbe00000
> [  198.552908] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
> [  198.552933] pci_bus 0000:00: root bus resource [bus 00-1f]
> [  198.552943] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
> [  198.552954] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
> (bus address [0xfbe00000-0xfbefffff])
> [  198.552965] pci_bus 0000:00: scanning bus
> [  198.554198] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
> [  198.555508] pci 0000:00:00.0: supports D1
> [  198.555516] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
> [  198.556023] pci 0000:00:00.0: PME# disabled
> [  198.561245] pci_bus 0000:00: fixups for bus
> [  198.561269] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
> [  198.561277] pci 0000:00:00.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [  198.566429] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> [  198.567008] pci_bus 0000:01: scanning bus
> [  198.567171] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
> [  198.567420] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> [  198.567515] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> 64bit pref]
> [  198.567608] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> 64bit pref]
> [  198.567665] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
> [  198.567673] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
> [  198.567730] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> [  198.567815] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
> [  198.569051] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
> limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
> with 8 GT/s x16 link)
> [  198.570225] pci 0000:01:00.0: vgaarb: VGA device added:
> decodes=io+mem,owns=none,locks=none
> [  198.570481] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
> [  198.570663] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> [  198.571039] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
> <snip>
> [  198.749857] pci_bus 0000:01: read pcie, devfn 1, at 100, size 2
> [  198.750252] pci_bus 0000:01: read pcie, devfn 2, at 0, size 4
> [  198.750881] Internal error: synchronous external abort: 96000210
> [#1] PREEMPT SMP
> [  198.751581] Modules linked in: drm_panel_orientation_quirks
> pcie_rockchip_host(+) cpufreq_dt sch_fq_codel ip_tables x_tables ipv6
> crc_ccitt nf_defrag_ipv6
> [  198.752861] CPU: 1 PID: 1686 Comm: systemd-udevd Not tainted
> 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #5
> [  198.753791] Hardware name: Pine64 RockPro64 (DT)
> [  198.754215] pstate: 60400085 (nZCv daIf +PAN -UAO)
> [  198.754672] pc : __raw_readl+0x0/0x8 [pcie_rockchip_host]
> [  198.755172] lr : rockchip_pcie_rd_conf+0x140/0x1dc [pcie_rockchip_host]
> [  198.755773] sp : ffff8000132af530
> [  198.756079] x29: ffff8000132af530 x28: 0000000000000000
> [  198.756565] x27: 0000000000000001 x26: 0000000000000000
> [  198.757049] x25: ffff0000c20ac000 x24: 0000000000002000
> [  198.757534] x23: ffff0000c20ae5c0 x22: ffff8000132af5d4
> [  198.758018] x21: 0000000000002000 x20: 0000000000000004
> [  198.758502] x19: 0000000000102000 x18: 0000000000000001
> [  198.758987] x17: 0000000000000000 x16: 0000000000000000
> [  198.759472] x15: ffffffffffffffff x14: ffff80001159bcc8
> [  198.759957] x13: 0000000000000000 x12: ffff800011b2c000
> [  198.760441] x11: ffff8000115bf000 x10: ffff800011310018
> [  198.760926] x9 : 00000000fffb9fff x8 : 0000000000000001
> [  198.761410] x7 : 0000000000000000 x6 : ffff0000f7492548
> [  198.761894] x5 : 0000000000000001 x4 : ffff0000f7492548
> [  198.762379] x3 : 0000000000000000 x2 : 0000000000c00008
> [  198.762863] x1 : ffff80001dc00008 x0 : ffff80001a102000
> [  198.763348] Call trace:
> [  198.763583]  __raw_readl+0x0/0x8 [pcie_rockchip_host]
> [  198.764057]  pci_bus_read_config_dword+0x88/0xd0
> [  198.764484]  pci_bus_generic_read_dev_vendor_id+0x40/0x1b8
> [  198.764982]  pci_bus_read_dev_vendor_id+0x58/0x88
> [  198.765413]  pci_scan_single_device+0x84/0xf8
> [  198.765812]  pci_scan_slot+0x7c/0x120
> [  198.766149]  pci_scan_child_bus_extend+0x68/0x2dc
> [  198.766579]  pci_scan_bridge_extend+0x350/0x588
> [  198.766992]  pci_scan_child_bus_extend+0x21c/0x2dc
> [  198.767430]  pci_scan_child_bus+0x24/0x30
> [  198.767797]  pci_scan_root_bus_bridge+0xc4/0xd0
> [  198.768215]  rockchip_pcie_probe+0x610/0x74c [pcie_rockchip_host]
> [  198.768770]  platform_drv_probe+0x58/0xa8
> [  198.769139]  really_probe+0xe0/0x318
> [  198.769468]  driver_probe_device+0x5c/0xf0
> [  198.769844]  device_driver_attach+0x74/0x80
> [  198.770227]  __driver_attach+0x64/0xe8
> [  198.770572]  bus_for_each_dev+0x84/0xd8
> [  198.770924]  driver_attach+0x30/0x40
> [  198.771253]  bus_add_driver+0x188/0x1e8
> [  198.771605]  driver_register+0x64/0x110
> [  198.771956]  __platform_driver_register+0x54/0x60
> [  198.772388]  rockchip_pcie_driver_init+0x28/0x10000 [pcie_rockchip_host]
> [  198.772998]  do_one_initcall+0x94/0x390
> [  198.773353]  do_init_module+0x88/0x268
> [  198.773697]  load_module+0x1e18/0x2198
> [  198.774043]  __do_sys_finit_module+0xd0/0xe8
> [  198.774435]  __arm64_sys_finit_module+0x28/0x38
> [  198.774858]  el0_svc_common.constprop.3+0xa4/0x1d8
> [  198.775297]  el0_svc_handler+0x34/0xa0
> [  198.775645]  el0_svc+0x14/0x40
> [  198.775928]  el0_sync_handler+0x118/0x290
> [  198.776295]  el0_sync+0x164/0x180
> [  198.776609] Code: bad PC value
> [  198.776897] ---[ end trace 88fc77651b5e2909 ]---

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-09  1:08 ` Peter Geis
@ 2019-11-09 16:37   ` Doug Anderson
  2019-11-10 15:43     ` Peter Geis
  0 siblings, 1 reply; 20+ messages in thread
From: Doug Anderson @ 2019-11-09 16:37 UTC (permalink / raw)
  To: Peter Geis
  Cc: Shawn Lin, Heiko Stuebner, linux-pci, open list:ARM/Rockchip SoC...

Hi,

On Fri, Nov 8, 2019 at 5:09 PM Peter Geis <pgwipeout@gmail.com> wrote:
>
> Good Evening,
>
> I'm not sure, but I believe the pcie address space built into the
> rk3399 is not large enough to accommodate the pcie addresses the card
> requires.
> I've been trying to figure out if it's possible to use system ram
> instead, but so far I haven't been successful.
> Also, the ram layout for the rk3399 is odd considering the TRM, if
> anyone has any insights in to that, I'd be grateful.
>
> On Mon, Nov 4, 2019 at 1:55 PM Peter Geis <pgwipeout@gmail.com> wrote:
> >
> > Good Morning,
> >
> > I'm attempting to debug an issue with the rockpro64 pcie port.
> > It would appear that the port does not like various cards, including
> > cards of the same make that randomly work or do not work, such as
> > Intel i340 based NICs.
> > I'm experiencing it with a GTX645 gpu.
> >
> > This seems to be a long running issue, referenced both at [0], and [1].
> > There was an attempt to rectify it, by adding a delay between training
> > and probing [2], but that doesn't seem to be the issue here.
> > It appears that when we probe further into the card, such as devfn >
> > 1, we trigger the bug.
> > I've added a print statement that prints the devfn, address, and size
> > information, which you can see below.
> >
> > I've attempted setting the available number of lanes to 1 as well, to
> > no difference.
> >
> > If anyone could point me in the right direction as to where to
> > continue debugging, I'd greatly appreciate it.

I don't have tons of knowledge here, but your thread happened to fly
by my Inbox and it reminded me of issues we faced during the bringup
of rk3399-gru-kevin where our WiFi driver would kill the whole system
in random / hard to debug ways.

If I remember, the problem was that the rk3399 PCIe behaved very badly
when you try to access the bus is in certain power save modes.  I
think that traditional x86-based controllers just return 0xff in the
same state, but rk3399 gives some sorts of asynchronous bus errors.

IIRC our problem was fixed with:

https://crrev.com/c/402092 - FROMLIST: mwifiex: fix corner case power save issue

...that's about all the knowledge I have around this area, but
possibly it could point you in the right direction?

-Doug

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-09 16:37   ` Doug Anderson
@ 2019-11-10 15:43     ` Peter Geis
  2019-11-12  0:03       ` Bjorn Helgaas
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Geis @ 2019-11-10 15:43 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Shawn Lin, Heiko Stuebner, linux-pci, open list:ARM/Rockchip SoC...

On Sat, Nov 9, 2019 at 11:38 AM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Fri, Nov 8, 2019 at 5:09 PM Peter Geis <pgwipeout@gmail.com> wrote:
> >
> > Good Evening,
> >
> > I'm not sure, but I believe the pcie address space built into the
> > rk3399 is not large enough to accommodate the pcie addresses the card
> > requires.
> > I've been trying to figure out if it's possible to use system ram
> > instead, but so far I haven't been successful.
> > Also, the ram layout for the rk3399 is odd considering the TRM, if
> > anyone has any insights in to that, I'd be grateful.
> >
> > On Mon, Nov 4, 2019 at 1:55 PM Peter Geis <pgwipeout@gmail.com> wrote:
> > >
> > > Good Morning,
> > >
> > > I'm attempting to debug an issue with the rockpro64 pcie port.
> > > It would appear that the port does not like various cards, including
> > > cards of the same make that randomly work or do not work, such as
> > > Intel i340 based NICs.
> > > I'm experiencing it with a GTX645 gpu.
> > >
> > > This seems to be a long running issue, referenced both at [0], and [1].
> > > There was an attempt to rectify it, by adding a delay between training
> > > and probing [2], but that doesn't seem to be the issue here.
> > > It appears that when we probe further into the card, such as devfn >
> > > 1, we trigger the bug.
> > > I've added a print statement that prints the devfn, address, and size
> > > information, which you can see below.
> > >
> > > I've attempted setting the available number of lanes to 1 as well, to
> > > no difference.
> > >
> > > If anyone could point me in the right direction as to where to
> > > continue debugging, I'd greatly appreciate it.
>
> I don't have tons of knowledge here, but your thread happened to fly
> by my Inbox and it reminded me of issues we faced during the bringup
> of rk3399-gru-kevin where our WiFi driver would kill the whole system
> in random / hard to debug ways.
>
> If I remember, the problem was that the rk3399 PCIe behaved very badly
> when you try to access the bus is in certain power save modes.  I
> think that traditional x86-based controllers just return 0xff in the
> same state, but rk3399 gives some sorts of asynchronous bus errors.
>
> IIRC our problem was fixed with:
>
> https://crrev.com/c/402092 - FROMLIST: mwifiex: fix corner case power save issue
>
> ...that's about all the knowledge I have around this area, but
> possibly it could point you in the right direction?
>
> -Doug

Thanks!

Unfortunately this is happening before it even gets done probing the
pcie controller, and has not had a chance to enter a power save state.
I plugged in an i350 two port nic and examined the assigned address spaces.
I've attached it below.
Judging by the usage, I think this controller has enough address space
for another two port NIC, and that's about it.
I'm pretty sure now that the rk3399 controller just doesn't have the
address space to map larger devices.
I'm pretty sure the IOMMU would allow us to address system memory as
pcie address space and overcome this limitation, but I don't know how
to do that.

The address space for the nic is below:
f8000000-f8ffffff : axi-base
fa000000-fbdfffff : MEM
  fa000000-fa4fffff : PCI Bus 0000:01
    fa000000-fa07ffff : 0000:01:00.0
      fa000000-fa07ffff : igb
    fa080000-fa0fffff : 0000:01:00.0
    fa100000-fa17ffff : 0000:01:00.1
      fa100000-fa17ffff : igb
    fa180000-fa1fffff : 0000:01:00.1
    fa200000-fa27ffff : 0000:01:00.0
    fa280000-fa2fffff : 0000:01:00.0
    fa300000-fa37ffff : 0000:01:00.1
    fa380000-fa3fffff : 0000:01:00.1
    fa400000-fa403fff : 0000:01:00.0
      fa400000-fa403fff : igb
    fa404000-fa407fff : 0000:01:00.1
      fa404000-fa407fff : igb
fd000000-fdffffff : f8000000.pcie

$ dmesg | grep pci | grep -v devfn
[   64.408414] pci 0000:01:00.1: reg 0x1c: [mem 0x00000000-0x00003fff]
[   64.408508] pci 0000:01:00.1: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[   64.408546] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 512)
[   64.409002] pci 0000:01:00.1: PME# supported from D0 D3hot D3cold
[   64.409016] pci 0000:01:00.1: PME# disabled
[   64.409179] pci 0000:01:00.1: reg 0x184: [mem 0x00000000-0x0000ffff
64bit pref]
[   64.409181] pci 0000:01:00.1: VF(n) BAR0 space: [mem
0x00000000-0x0007ffff 64bit pref] (contains BAR0 for 8 VFs)
[   64.409268] pci 0000:01:00.1: reg 0x190: [mem 0x00000000-0x0000ffff
64bit pref]
[   64.409271] pci 0000:01:00.1: VF(n) BAR3 space: [mem
0x00000000-0x0007ffff 64bit pref] (contains BAR3 for 8 VFs)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-10 15:43     ` Peter Geis
@ 2019-11-12  0:03       ` Bjorn Helgaas
  2019-11-12  0:21         ` Peter Geis
  0 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2019-11-12  0:03 UTC (permalink / raw)
  To: Peter Geis
  Cc: Doug Anderson, Shawn Lin, Heiko Stuebner, linux-pci,
	open list:ARM/Rockchip SoC...

On Sun, Nov 10, 2019 at 10:43:48AM -0500, Peter Geis wrote:

> I plugged in an i350 two port nic and examined the assigned address spaces.
> I've attached it below.
> Judging by the usage, I think this controller has enough address space
> for another two port NIC, and that's about it.
> I'm pretty sure now that the rk3399 controller just doesn't have the
> address space to map larger devices.
> I'm pretty sure the IOMMU would allow us to address system memory as
> pcie address space and overcome this limitation, but I don't know how
> to do that.

I don't think you're out of MMIO space, at least in this instance.  It
looks like you have 32MB available and the two-port NIC on bus 01 only
takes 5MB.

The IOMMU is used for DMA (e.g., reads/writes initiated by the NIC),
while the MMIO space is used for CPU programmed I/O (reads/writes done
by the driver running on the CPU).

> The address space for the nic is below:
> f8000000-f8ffffff : axi-base
> fa000000-fbdfffff : MEM

32MB.

>   fa000000-fa4fffff : PCI Bus 0000:01

5MB.

>     fa000000-fa07ffff : 0000:01:00.0
>       fa000000-fa07ffff : igb
>     fa080000-fa0fffff : 0000:01:00.0
>     fa100000-fa17ffff : 0000:01:00.1
>       fa100000-fa17ffff : igb
>     fa180000-fa1fffff : 0000:01:00.1
>     fa200000-fa27ffff : 0000:01:00.0
>     fa280000-fa2fffff : 0000:01:00.0
>     fa300000-fa37ffff : 0000:01:00.1
>     fa380000-fa3fffff : 0000:01:00.1
>     fa400000-fa403fff : 0000:01:00.0
>       fa400000-fa403fff : igb
>     fa404000-fa407fff : 0000:01:00.1
>       fa404000-fa407fff : igb
> fd000000-fdffffff : f8000000.pcie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-04 18:55 [BUG] rk3399-rockpro64 pcie synchronous external abort Peter Geis
  2019-11-09  1:08 ` Peter Geis
@ 2019-11-12  0:13 ` Bjorn Helgaas
  2019-11-12  0:30   ` Peter Geis
  1 sibling, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2019-11-12  0:13 UTC (permalink / raw)
  To: Peter Geis; +Cc: shawn.lin, Heiko Stuebner, linux-pci, linux-rockchip

On Mon, Nov 04, 2019 at 01:55:40PM -0500, Peter Geis wrote:
> Good Morning,
> 
> I'm attempting to debug an issue with the rockpro64 pcie port.
> It would appear that the port does not like various cards, including
> cards of the same make that randomly work or do not work, such as
> Intel i340 based NICs.
> I'm experiencing it with a GTX645 gpu.
> 
> This seems to be a long running issue, referenced both at [0], and [1].
> There was an attempt to rectify it, by adding a delay between training
> and probing [2], but that doesn't seem to be the issue here.
> It appears that when we probe further into the card, such as devfn >
> 1, we trigger the bug.
> I've added a print statement that prints the devfn, address, and size
> information, which you can see below.
> 
> I've attempted setting the available number of lanes to 1 as well, to
> no difference.
> 
> If anyone could point me in the right direction as to where to
> continue debugging, I'd greatly appreciate it.
> 
> [0] https://github.com/ayufan-rock64/linux-build/issues/254
> [1] https://github.com/rockchip-linux/kernel/issues/116
> [2] https://github.com/ayufan-rock64/linux-kernel/commit/3cde5c624c9c39aa03251a55c2d26a48b5bdca5b
> 
> [  198.491458] rockchip-pcie f8000000.pcie: missing legacy phy; search
> for per-lane PHY
> [  198.492986] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
> [  198.493060] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
> [  198.550444] rockchip-pcie f8000000.pcie: current link width is x1
> [  198.550458] rockchip-pcie f8000000.pcie: idling lane 1
> [  198.550479] rockchip-pcie f8000000.pcie: idling lane 2
> [  198.550490] rockchip-pcie f8000000.pcie: idling lane 3
> [  198.550608] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
> [  198.550625] rockchip-pcie f8000000.pcie: Parsing ranges property...
> [  198.550656] rockchip-pcie f8000000.pcie:   MEM
> 0xfa000000..0xfbdfffff -> 0xfa000000
> [  198.550676] rockchip-pcie f8000000.pcie:    IO
> 0xfbe00000..0xfbefffff -> 0xfbe00000
> [  198.552908] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
> [  198.552933] pci_bus 0000:00: root bus resource [bus 00-1f]
> [  198.552943] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
> [  198.552954] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
> (bus address [0xfbe00000-0xfbefffff])
> [  198.552965] pci_bus 0000:00: scanning bus
> [  198.554198] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
> [  198.555508] pci 0000:00:00.0: supports D1
> [  198.555516] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
> [  198.556023] pci 0000:00:00.0: PME# disabled
> [  198.561245] pci_bus 0000:00: fixups for bus
> [  198.561269] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
> [  198.561277] pci 0000:00:00.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [  198.566429] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> [  198.567008] pci_bus 0000:01: scanning bus
> [  198.567171] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
> [  198.567420] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> [  198.567515] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> 64bit pref]
> [  198.567608] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> 64bit pref]
> [  198.567665] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
> [  198.567673] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
> [  198.567730] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> [  198.567815] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
> [  198.569051] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
> limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
> with 8 GT/s x16 link)
> [  198.570225] pci 0000:01:00.0: vgaarb: VGA device added:
> decodes=io+mem,owns=none,locks=none
> [  198.570481] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
> [  198.570663] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> [  198.571039] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
> <snip>
> [  198.749857] pci_bus 0000:01: read pcie, devfn 1, at 100, size 2
> [  198.750252] pci_bus 0000:01: read pcie, devfn 2, at 0, size 4
> [  198.750881] Internal error: synchronous external abort: 96000210
> [#1] PREEMPT SMP

Is there really supposed to be a device at 01:00.2?

Maybe this is just the PCIe Unsupported Request error that we expect
to get when trying to read config space of a device that doesn't
exist.

On "most" platforms, we just get ~0 data back when that happens, but
I'm not sure that's always the case on arm64.  I think it depends on
how the PCIe host bridge is designed, and there might be some CPU
configuration, too.

> [  198.751581] Modules linked in: drm_panel_orientation_quirks
> pcie_rockchip_host(+) cpufreq_dt sch_fq_codel ip_tables x_tables ipv6
> crc_ccitt nf_defrag_ipv6
> [  198.752861] CPU: 1 PID: 1686 Comm: systemd-udevd Not tainted
> 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #5
> [  198.753791] Hardware name: Pine64 RockPro64 (DT)
> [  198.754215] pstate: 60400085 (nZCv daIf +PAN -UAO)
> [  198.754672] pc : __raw_readl+0x0/0x8 [pcie_rockchip_host]
> [  198.755172] lr : rockchip_pcie_rd_conf+0x140/0x1dc [pcie_rockchip_host]
> [  198.755773] sp : ffff8000132af530
> [  198.756079] x29: ffff8000132af530 x28: 0000000000000000
> [  198.756565] x27: 0000000000000001 x26: 0000000000000000
> [  198.757049] x25: ffff0000c20ac000 x24: 0000000000002000
> [  198.757534] x23: ffff0000c20ae5c0 x22: ffff8000132af5d4
> [  198.758018] x21: 0000000000002000 x20: 0000000000000004
> [  198.758502] x19: 0000000000102000 x18: 0000000000000001
> [  198.758987] x17: 0000000000000000 x16: 0000000000000000
> [  198.759472] x15: ffffffffffffffff x14: ffff80001159bcc8
> [  198.759957] x13: 0000000000000000 x12: ffff800011b2c000
> [  198.760441] x11: ffff8000115bf000 x10: ffff800011310018
> [  198.760926] x9 : 00000000fffb9fff x8 : 0000000000000001
> [  198.761410] x7 : 0000000000000000 x6 : ffff0000f7492548
> [  198.761894] x5 : 0000000000000001 x4 : ffff0000f7492548
> [  198.762379] x3 : 0000000000000000 x2 : 0000000000c00008
> [  198.762863] x1 : ffff80001dc00008 x0 : ffff80001a102000
> [  198.763348] Call trace:
> [  198.763583]  __raw_readl+0x0/0x8 [pcie_rockchip_host]
> [  198.764057]  pci_bus_read_config_dword+0x88/0xd0
> [  198.764484]  pci_bus_generic_read_dev_vendor_id+0x40/0x1b8
> [  198.764982]  pci_bus_read_dev_vendor_id+0x58/0x88
> [  198.765413]  pci_scan_single_device+0x84/0xf8
> [  198.765812]  pci_scan_slot+0x7c/0x120
> [  198.766149]  pci_scan_child_bus_extend+0x68/0x2dc
> [  198.766579]  pci_scan_bridge_extend+0x350/0x588
> [  198.766992]  pci_scan_child_bus_extend+0x21c/0x2dc
> [  198.767430]  pci_scan_child_bus+0x24/0x30
> [  198.767797]  pci_scan_root_bus_bridge+0xc4/0xd0
> [  198.768215]  rockchip_pcie_probe+0x610/0x74c [pcie_rockchip_host]
> [  198.768770]  platform_drv_probe+0x58/0xa8
> [  198.769139]  really_probe+0xe0/0x318
> [  198.769468]  driver_probe_device+0x5c/0xf0
> [  198.769844]  device_driver_attach+0x74/0x80
> [  198.770227]  __driver_attach+0x64/0xe8
> [  198.770572]  bus_for_each_dev+0x84/0xd8
> [  198.770924]  driver_attach+0x30/0x40
> [  198.771253]  bus_add_driver+0x188/0x1e8
> [  198.771605]  driver_register+0x64/0x110
> [  198.771956]  __platform_driver_register+0x54/0x60
> [  198.772388]  rockchip_pcie_driver_init+0x28/0x10000 [pcie_rockchip_host]
> [  198.772998]  do_one_initcall+0x94/0x390
> [  198.773353]  do_init_module+0x88/0x268
> [  198.773697]  load_module+0x1e18/0x2198
> [  198.774043]  __do_sys_finit_module+0xd0/0xe8
> [  198.774435]  __arm64_sys_finit_module+0x28/0x38
> [  198.774858]  el0_svc_common.constprop.3+0xa4/0x1d8
> [  198.775297]  el0_svc_handler+0x34/0xa0
> [  198.775645]  el0_svc+0x14/0x40
> [  198.775928]  el0_sync_handler+0x118/0x290
> [  198.776295]  el0_sync+0x164/0x180
> [  198.776609] Code: bad PC value
> [  198.776897] ---[ end trace 88fc77651b5e2909 ]---

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-12  0:03       ` Bjorn Helgaas
@ 2019-11-12  0:21         ` Peter Geis
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Geis @ 2019-11-12  0:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Doug Anderson, Shawn Lin, Heiko Stuebner, linux-pci,
	open list:ARM/Rockchip SoC...

On Mon, Nov 11, 2019 at 7:03 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Sun, Nov 10, 2019 at 10:43:48AM -0500, Peter Geis wrote:
>
> > I plugged in an i350 two port nic and examined the assigned address spaces.
> > I've attached it below.
> > Judging by the usage, I think this controller has enough address space
> > for another two port NIC, and that's about it.
> > I'm pretty sure now that the rk3399 controller just doesn't have the
> > address space to map larger devices.
> > I'm pretty sure the IOMMU would allow us to address system memory as
> > pcie address space and overcome this limitation, but I don't know how
> > to do that.
>
> I don't think you're out of MMIO space, at least in this instance.  It
> looks like you have 32MB available and the two-port NIC on bus 01 only
> takes 5MB.
>
> The IOMMU is used for DMA (e.g., reads/writes initiated by the NIC),
> while the MMIO space is used for CPU programmed I/O (reads/writes done
> by the driver running on the CPU).
>
> > The address space for the nic is below:
> > f8000000-f8ffffff : axi-base
> > fa000000-fbdfffff : MEM
>
> 32MB.
>
> >   fa000000-fa4fffff : PCI Bus 0000:01
>
> 5MB.
Just a note, this is not the device that triggered the bug.
This is a i350 NIC, i posted it as a comparison to my original email.
>
> >     fa000000-fa07ffff : 0000:01:00.0
> >       fa000000-fa07ffff : igb
> >     fa080000-fa0fffff : 0000:01:00.0
> >     fa100000-fa17ffff : 0000:01:00.1
> >       fa100000-fa17ffff : igb
> >     fa180000-fa1fffff : 0000:01:00.1
> >     fa200000-fa27ffff : 0000:01:00.0
> >     fa280000-fa2fffff : 0000:01:00.0
> >     fa300000-fa37ffff : 0000:01:00.1
> >     fa380000-fa3fffff : 0000:01:00.1
> >     fa400000-fa403fff : 0000:01:00.0
> >       fa400000-fa403fff : igb
> >     fa404000-fa407fff : 0000:01:00.1
> >       fa404000-fa407fff : igb
> > fd000000-fdffffff : f8000000.pcie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-12  0:13 ` Bjorn Helgaas
@ 2019-11-12  0:30   ` Peter Geis
  2019-11-12  2:29     ` Bjorn Helgaas
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Geis @ 2019-11-12  0:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Shawn Lin, Heiko Stuebner, linux-pci, open list:ARM/Rockchip SoC...

On Mon, Nov 11, 2019 at 7:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, Nov 04, 2019 at 01:55:40PM -0500, Peter Geis wrote:
> > Good Morning,
> >
> > I'm attempting to debug an issue with the rockpro64 pcie port.
> > It would appear that the port does not like various cards, including
> > cards of the same make that randomly work or do not work, such as
> > Intel i340 based NICs.
> > I'm experiencing it with a GTX645 gpu.
> >
> > This seems to be a long running issue, referenced both at [0], and [1].
> > There was an attempt to rectify it, by adding a delay between training
> > and probing [2], but that doesn't seem to be the issue here.
> > It appears that when we probe further into the card, such as devfn >
> > 1, we trigger the bug.
> > I've added a print statement that prints the devfn, address, and size
> > information, which you can see below.
> >
> > I've attempted setting the available number of lanes to 1 as well, to
> > no difference.
> >
> > If anyone could point me in the right direction as to where to
> > continue debugging, I'd greatly appreciate it.
> >
> > [0] https://github.com/ayufan-rock64/linux-build/issues/254
> > [1] https://github.com/rockchip-linux/kernel/issues/116
> > [2] https://github.com/ayufan-rock64/linux-kernel/commit/3cde5c624c9c39aa03251a55c2d26a48b5bdca5b
> >
> > [  198.491458] rockchip-pcie f8000000.pcie: missing legacy phy; search
> > for per-lane PHY
> > [  198.492986] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
> > [  198.493060] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
> > [  198.550444] rockchip-pcie f8000000.pcie: current link width is x1
> > [  198.550458] rockchip-pcie f8000000.pcie: idling lane 1
> > [  198.550479] rockchip-pcie f8000000.pcie: idling lane 2
> > [  198.550490] rockchip-pcie f8000000.pcie: idling lane 3
> > [  198.550608] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
> > [  198.550625] rockchip-pcie f8000000.pcie: Parsing ranges property...
> > [  198.550656] rockchip-pcie f8000000.pcie:   MEM
> > 0xfa000000..0xfbdfffff -> 0xfa000000
> > [  198.550676] rockchip-pcie f8000000.pcie:    IO
> > 0xfbe00000..0xfbefffff -> 0xfbe00000
> > [  198.552908] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
> > [  198.552933] pci_bus 0000:00: root bus resource [bus 00-1f]
> > [  198.552943] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
> > [  198.552954] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
> > (bus address [0xfbe00000-0xfbefffff])
> > [  198.552965] pci_bus 0000:00: scanning bus
> > [  198.554198] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
> > [  198.555508] pci 0000:00:00.0: supports D1
> > [  198.555516] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
> > [  198.556023] pci 0000:00:00.0: PME# disabled
> > [  198.561245] pci_bus 0000:00: fixups for bus
> > [  198.561269] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
> > [  198.561277] pci 0000:00:00.0: bridge configuration invalid ([bus
> > 00-00]), reconfiguring
> > [  198.566429] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> > [  198.567008] pci_bus 0000:01: scanning bus
> > [  198.567171] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
> > [  198.567420] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> > [  198.567515] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> > 64bit pref]
> > [  198.567608] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> > 64bit pref]
> > [  198.567665] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
> > [  198.567673] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
> > [  198.567730] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> > [  198.567815] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
> > [  198.569051] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
> > with 8 GT/s x16 link)
> > [  198.570225] pci 0000:01:00.0: vgaarb: VGA device added:
> > decodes=io+mem,owns=none,locks=none
> > [  198.570481] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
> > [  198.570663] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> > [  198.571039] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
> > <snip>
> > [  198.749857] pci_bus 0000:01: read pcie, devfn 1, at 100, size 2
> > [  198.750252] pci_bus 0000:01: read pcie, devfn 2, at 0, size 4
> > [  198.750881] Internal error: synchronous external abort: 96000210
> > [#1] PREEMPT SMP
>
> Is there really supposed to be a device at 01:00.2?
>
> Maybe this is just the PCIe Unsupported Request error that we expect
> to get when trying to read config space of a device that doesn't
> exist.
>
> On "most" platforms, we just get ~0 data back when that happens, but
> I'm not sure that's always the case on arm64.  I think it depends on
> how the PCIe host bridge is designed, and there might be some CPU
> configuration, too.

Yes, this is a GTX645 video card.
Nvidia cards usually have two to three devices,
The GPU proper, the audio device for the hdmi output, and the i2c controller.

I do think that this driver is missing sanity checking on the
addressing, since the BRCM driver for the RPI4 doesn't try to
enumerate a video card, since it checks if the MMIO space is large
enough to fit the BAR before assigning the addresses. See [3]. Also in
that thread he was able to increase the address space provided to the
BRCM driver and fix the issue, but I don't see how we could do that on
the rk3399.

pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff 64bit pref] is
128 MB, which already exceeds our address space.
I think the driver is just overflowing the address space.

[3] https://twitter.com/domipheus/status/1167586160077627393
>
> > [  198.751581] Modules linked in: drm_panel_orientation_quirks
> > pcie_rockchip_host(+) cpufreq_dt sch_fq_codel ip_tables x_tables ipv6
> > crc_ccitt nf_defrag_ipv6
> > [  198.752861] CPU: 1 PID: 1686 Comm: systemd-udevd Not tainted
> > 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #5
> > [  198.753791] Hardware name: Pine64 RockPro64 (DT)
> > [  198.754215] pstate: 60400085 (nZCv daIf +PAN -UAO)
> > [  198.754672] pc : __raw_readl+0x0/0x8 [pcie_rockchip_host]
> > [  198.755172] lr : rockchip_pcie_rd_conf+0x140/0x1dc [pcie_rockchip_host]
> > [  198.755773] sp : ffff8000132af530
> > [  198.756079] x29: ffff8000132af530 x28: 0000000000000000
> > [  198.756565] x27: 0000000000000001 x26: 0000000000000000
> > [  198.757049] x25: ffff0000c20ac000 x24: 0000000000002000
> > [  198.757534] x23: ffff0000c20ae5c0 x22: ffff8000132af5d4
> > [  198.758018] x21: 0000000000002000 x20: 0000000000000004
> > [  198.758502] x19: 0000000000102000 x18: 0000000000000001
> > [  198.758987] x17: 0000000000000000 x16: 0000000000000000
> > [  198.759472] x15: ffffffffffffffff x14: ffff80001159bcc8
> > [  198.759957] x13: 0000000000000000 x12: ffff800011b2c000
> > [  198.760441] x11: ffff8000115bf000 x10: ffff800011310018
> > [  198.760926] x9 : 00000000fffb9fff x8 : 0000000000000001
> > [  198.761410] x7 : 0000000000000000 x6 : ffff0000f7492548
> > [  198.761894] x5 : 0000000000000001 x4 : ffff0000f7492548
> > [  198.762379] x3 : 0000000000000000 x2 : 0000000000c00008
> > [  198.762863] x1 : ffff80001dc00008 x0 : ffff80001a102000
> > [  198.763348] Call trace:
> > [  198.763583]  __raw_readl+0x0/0x8 [pcie_rockchip_host]
> > [  198.764057]  pci_bus_read_config_dword+0x88/0xd0
> > [  198.764484]  pci_bus_generic_read_dev_vendor_id+0x40/0x1b8
> > [  198.764982]  pci_bus_read_dev_vendor_id+0x58/0x88
> > [  198.765413]  pci_scan_single_device+0x84/0xf8
> > [  198.765812]  pci_scan_slot+0x7c/0x120
> > [  198.766149]  pci_scan_child_bus_extend+0x68/0x2dc
> > [  198.766579]  pci_scan_bridge_extend+0x350/0x588
> > [  198.766992]  pci_scan_child_bus_extend+0x21c/0x2dc
> > [  198.767430]  pci_scan_child_bus+0x24/0x30
> > [  198.767797]  pci_scan_root_bus_bridge+0xc4/0xd0
> > [  198.768215]  rockchip_pcie_probe+0x610/0x74c [pcie_rockchip_host]
> > [  198.768770]  platform_drv_probe+0x58/0xa8
> > [  198.769139]  really_probe+0xe0/0x318
> > [  198.769468]  driver_probe_device+0x5c/0xf0
> > [  198.769844]  device_driver_attach+0x74/0x80
> > [  198.770227]  __driver_attach+0x64/0xe8
> > [  198.770572]  bus_for_each_dev+0x84/0xd8
> > [  198.770924]  driver_attach+0x30/0x40
> > [  198.771253]  bus_add_driver+0x188/0x1e8
> > [  198.771605]  driver_register+0x64/0x110
> > [  198.771956]  __platform_driver_register+0x54/0x60
> > [  198.772388]  rockchip_pcie_driver_init+0x28/0x10000 [pcie_rockchip_host]
> > [  198.772998]  do_one_initcall+0x94/0x390
> > [  198.773353]  do_init_module+0x88/0x268
> > [  198.773697]  load_module+0x1e18/0x2198
> > [  198.774043]  __do_sys_finit_module+0xd0/0xe8
> > [  198.774435]  __arm64_sys_finit_module+0x28/0x38
> > [  198.774858]  el0_svc_common.constprop.3+0xa4/0x1d8
> > [  198.775297]  el0_svc_handler+0x34/0xa0
> > [  198.775645]  el0_svc+0x14/0x40
> > [  198.775928]  el0_sync_handler+0x118/0x290
> > [  198.776295]  el0_sync+0x164/0x180
> > [  198.776609] Code: bad PC value
> > [  198.776897] ---[ end trace 88fc77651b5e2909 ]---

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-12  0:30   ` Peter Geis
@ 2019-11-12  2:29     ` Bjorn Helgaas
  2019-11-12 15:55       ` Peter Geis
  0 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2019-11-12  2:29 UTC (permalink / raw)
  To: Peter Geis
  Cc: Shawn Lin, Heiko Stuebner, linux-pci, open list:ARM/Rockchip SoC...

On Mon, Nov 11, 2019 at 07:30:15PM -0500, Peter Geis wrote:
> On Mon, Nov 11, 2019 at 7:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Mon, Nov 04, 2019 at 01:55:40PM -0500, Peter Geis wrote:
> > > Good Morning,
> > >
> > > I'm attempting to debug an issue with the rockpro64 pcie port.
> > > It would appear that the port does not like various cards, including
> > > cards of the same make that randomly work or do not work, such as
> > > Intel i340 based NICs.
> > > I'm experiencing it with a GTX645 gpu.
> > >
> > > This seems to be a long running issue, referenced both at [0], and [1].
> > > There was an attempt to rectify it, by adding a delay between training
> > > and probing [2], but that doesn't seem to be the issue here.
> > > It appears that when we probe further into the card, such as devfn >
> > > 1, we trigger the bug.
> > > I've added a print statement that prints the devfn, address, and size
> > > information, which you can see below.
> > >
> > > I've attempted setting the available number of lanes to 1 as well, to
> > > no difference.
> > >
> > > If anyone could point me in the right direction as to where to
> > > continue debugging, I'd greatly appreciate it.
> > >
> > > [0] https://github.com/ayufan-rock64/linux-build/issues/254
> > > [1] https://github.com/rockchip-linux/kernel/issues/116
> > > [2] https://github.com/ayufan-rock64/linux-kernel/commit/3cde5c624c9c39aa03251a55c2d26a48b5bdca5b
> > >
> > > [  198.491458] rockchip-pcie f8000000.pcie: missing legacy phy; search
> > > for per-lane PHY
> > > [  198.492986] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
> > > [  198.493060] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
> > > [  198.550444] rockchip-pcie f8000000.pcie: current link width is x1
> > > [  198.550458] rockchip-pcie f8000000.pcie: idling lane 1
> > > [  198.550479] rockchip-pcie f8000000.pcie: idling lane 2
> > > [  198.550490] rockchip-pcie f8000000.pcie: idling lane 3
> > > [  198.550608] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
> > > [  198.550625] rockchip-pcie f8000000.pcie: Parsing ranges property...
> > > [  198.550656] rockchip-pcie f8000000.pcie:   MEM
> > > 0xfa000000..0xfbdfffff -> 0xfa000000
> > > [  198.550676] rockchip-pcie f8000000.pcie:    IO
> > > 0xfbe00000..0xfbefffff -> 0xfbe00000
> > > [  198.552908] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
> > > [  198.552933] pci_bus 0000:00: root bus resource [bus 00-1f]
> > > [  198.552943] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
> > > [  198.552954] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
> > > (bus address [0xfbe00000-0xfbefffff])
> > > [  198.552965] pci_bus 0000:00: scanning bus
> > > [  198.554198] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
> > > [  198.555508] pci 0000:00:00.0: supports D1
> > > [  198.555516] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
> > > [  198.556023] pci 0000:00:00.0: PME# disabled
> > > [  198.561245] pci_bus 0000:00: fixups for bus
> > > [  198.561269] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
> > > [  198.561277] pci 0000:00:00.0: bridge configuration invalid ([bus
> > > 00-00]), reconfiguring
> > > [  198.566429] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> > > [  198.567008] pci_bus 0000:01: scanning bus
> > > [  198.567171] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
> > > [  198.567420] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> > > [  198.567515] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> > > 64bit pref]
> > > [  198.567608] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> > > 64bit pref]
> > > [  198.567665] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
> > > [  198.567673] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
> > > [  198.567730] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> > > [  198.567815] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
> > > [  198.569051] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
> > > limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
> > > with 8 GT/s x16 link)
> > > [  198.570225] pci 0000:01:00.0: vgaarb: VGA device added:
> > > decodes=io+mem,owns=none,locks=none
> > > [  198.570481] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
> > > [  198.570663] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> > > [  198.571039] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
> > > <snip>
> > > [  198.749857] pci_bus 0000:01: read pcie, devfn 1, at 100, size 2
> > > [  198.750252] pci_bus 0000:01: read pcie, devfn 2, at 0, size 4
> > > [  198.750881] Internal error: synchronous external abort: 96000210
> > > [#1] PREEMPT SMP
> >
> > Is there really supposed to be a device at 01:00.2?
> >
> > Maybe this is just the PCIe Unsupported Request error that we expect
> > to get when trying to read config space of a device that doesn't
> > exist.
> >
> > On "most" platforms, we just get ~0 data back when that happens, but
> > I'm not sure that's always the case on arm64.  I think it depends on
> > how the PCIe host bridge is designed, and there might be some CPU
> > configuration, too.
> 
> Yes, this is a GTX645 video card.
> Nvidia cards usually have two to three devices,
> The GPU proper, the audio device for the hdmi output, and the i2c controller.
> 
> I do think that this driver is missing sanity checking on the
> addressing, since the BRCM driver for the RPI4 doesn't try to
> enumerate a video card, since it checks if the MMIO space is large
> enough to fit the BAR before assigning the addresses. See [3]. Also in
> that thread he was able to increase the address space provided to the
> BRCM driver and fix the issue, but I don't see how we could do that on
> the rk3399.
> 
> pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff 64bit pref] is
> 128 MB, which already exceeds our address space.
> I think the driver is just overflowing the address space.

If we don't have enough space to assign all the device BARs, I think a
driver will still be able to claim the device, but when the driver
calls pci_enable_device(), it should fail.  Lack of address space
should not cause a PCIe error.

But in this case, none of that matters because we're still enumerating
devices in pci_scan_root_bus_bridge().  We haven't gotten to the point
of trying to bind drivers to devices, so the driver isn't involved at
all yet.

The backtrace says we're trying to read the Vendor ID of a device, and
your debug output suggests we're trying to enumerate 01:00.2.  If you
put that card in another system, you could find out how many functions
it has.

Or if you swapped this with other cards where you know the number of
functions, you could see if the external abort always happens when
probing for the first unimplemented function.

If the Root Port (00:00.0) supports AER, you could also dump out the
status registers from the AER capability and figure out whether it
logged a PCIe error.  This would be sort of like what
aer_process_err_devices() does.  A bit of a hassle to do this by hand
in the exception path, but could be enlightening, just as a debug
tool.

> [3] https://twitter.com/domipheus/status/1167586160077627393
> >
> > > [  198.751581] Modules linked in: drm_panel_orientation_quirks
> > > pcie_rockchip_host(+) cpufreq_dt sch_fq_codel ip_tables x_tables ipv6
> > > crc_ccitt nf_defrag_ipv6
> > > [  198.752861] CPU: 1 PID: 1686 Comm: systemd-udevd Not tainted
> > > 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #5
> > > [  198.753791] Hardware name: Pine64 RockPro64 (DT)
> > > [  198.754215] pstate: 60400085 (nZCv daIf +PAN -UAO)
> > > [  198.754672] pc : __raw_readl+0x0/0x8 [pcie_rockchip_host]
> > > [  198.755172] lr : rockchip_pcie_rd_conf+0x140/0x1dc [pcie_rockchip_host]
> > > [  198.755773] sp : ffff8000132af530
> > > [  198.756079] x29: ffff8000132af530 x28: 0000000000000000
> > > [  198.756565] x27: 0000000000000001 x26: 0000000000000000
> > > [  198.757049] x25: ffff0000c20ac000 x24: 0000000000002000
> > > [  198.757534] x23: ffff0000c20ae5c0 x22: ffff8000132af5d4
> > > [  198.758018] x21: 0000000000002000 x20: 0000000000000004
> > > [  198.758502] x19: 0000000000102000 x18: 0000000000000001
> > > [  198.758987] x17: 0000000000000000 x16: 0000000000000000
> > > [  198.759472] x15: ffffffffffffffff x14: ffff80001159bcc8
> > > [  198.759957] x13: 0000000000000000 x12: ffff800011b2c000
> > > [  198.760441] x11: ffff8000115bf000 x10: ffff800011310018
> > > [  198.760926] x9 : 00000000fffb9fff x8 : 0000000000000001
> > > [  198.761410] x7 : 0000000000000000 x6 : ffff0000f7492548
> > > [  198.761894] x5 : 0000000000000001 x4 : ffff0000f7492548
> > > [  198.762379] x3 : 0000000000000000 x2 : 0000000000c00008
> > > [  198.762863] x1 : ffff80001dc00008 x0 : ffff80001a102000
> > > [  198.763348] Call trace:
> > > [  198.763583]  __raw_readl+0x0/0x8 [pcie_rockchip_host]
> > > [  198.764057]  pci_bus_read_config_dword+0x88/0xd0
> > > [  198.764484]  pci_bus_generic_read_dev_vendor_id+0x40/0x1b8
> > > [  198.764982]  pci_bus_read_dev_vendor_id+0x58/0x88
> > > [  198.765413]  pci_scan_single_device+0x84/0xf8
> > > [  198.765812]  pci_scan_slot+0x7c/0x120
> > > [  198.766149]  pci_scan_child_bus_extend+0x68/0x2dc
> > > [  198.766579]  pci_scan_bridge_extend+0x350/0x588
> > > [  198.766992]  pci_scan_child_bus_extend+0x21c/0x2dc
> > > [  198.767430]  pci_scan_child_bus+0x24/0x30
> > > [  198.767797]  pci_scan_root_bus_bridge+0xc4/0xd0
> > > [  198.768215]  rockchip_pcie_probe+0x610/0x74c [pcie_rockchip_host]
> > > [  198.768770]  platform_drv_probe+0x58/0xa8
> > > [  198.769139]  really_probe+0xe0/0x318
> > > [  198.769468]  driver_probe_device+0x5c/0xf0
> > > [  198.769844]  device_driver_attach+0x74/0x80
> > > [  198.770227]  __driver_attach+0x64/0xe8
> > > [  198.770572]  bus_for_each_dev+0x84/0xd8
> > > [  198.770924]  driver_attach+0x30/0x40
> > > [  198.771253]  bus_add_driver+0x188/0x1e8
> > > [  198.771605]  driver_register+0x64/0x110
> > > [  198.771956]  __platform_driver_register+0x54/0x60
> > > [  198.772388]  rockchip_pcie_driver_init+0x28/0x10000 [pcie_rockchip_host]
> > > [  198.772998]  do_one_initcall+0x94/0x390
> > > [  198.773353]  do_init_module+0x88/0x268
> > > [  198.773697]  load_module+0x1e18/0x2198
> > > [  198.774043]  __do_sys_finit_module+0xd0/0xe8
> > > [  198.774435]  __arm64_sys_finit_module+0x28/0x38
> > > [  198.774858]  el0_svc_common.constprop.3+0xa4/0x1d8
> > > [  198.775297]  el0_svc_handler+0x34/0xa0
> > > [  198.775645]  el0_svc+0x14/0x40
> > > [  198.775928]  el0_sync_handler+0x118/0x290
> > > [  198.776295]  el0_sync+0x164/0x180
> > > [  198.776609] Code: bad PC value
> > > [  198.776897] ---[ end trace 88fc77651b5e2909 ]---

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-12  2:29     ` Bjorn Helgaas
@ 2019-11-12 15:55       ` Peter Geis
  2019-11-12 19:15         ` Robin Murphy
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Geis @ 2019-11-12 15:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Shawn Lin, Heiko Stuebner, linux-pci, open list:ARM/Rockchip SoC...

On Mon, Nov 11, 2019 at 9:29 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, Nov 11, 2019 at 07:30:15PM -0500, Peter Geis wrote:
> > On Mon, Nov 11, 2019 at 7:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Mon, Nov 04, 2019 at 01:55:40PM -0500, Peter Geis wrote:
> > > > Good Morning,
> > > >
> > > > I'm attempting to debug an issue with the rockpro64 pcie port.
> > > > It would appear that the port does not like various cards, including
> > > > cards of the same make that randomly work or do not work, such as
> > > > Intel i340 based NICs.
> > > > I'm experiencing it with a GTX645 gpu.
> > > >
> > > > This seems to be a long running issue, referenced both at [0], and [1].
> > > > There was an attempt to rectify it, by adding a delay between training
> > > > and probing [2], but that doesn't seem to be the issue here.
> > > > It appears that when we probe further into the card, such as devfn >
> > > > 1, we trigger the bug.
> > > > I've added a print statement that prints the devfn, address, and size
> > > > information, which you can see below.
> > > >
> > > > I've attempted setting the available number of lanes to 1 as well, to
> > > > no difference.
> > > >
> > > > If anyone could point me in the right direction as to where to
> > > > continue debugging, I'd greatly appreciate it.
> > > >
> > > > [0] https://github.com/ayufan-rock64/linux-build/issues/254
> > > > [1] https://github.com/rockchip-linux/kernel/issues/116
> > > > [2] https://github.com/ayufan-rock64/linux-kernel/commit/3cde5c624c9c39aa03251a55c2d26a48b5bdca5b
> > > >
> > > > [  198.491458] rockchip-pcie f8000000.pcie: missing legacy phy; search
> > > > for per-lane PHY
> > > > [  198.492986] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
> > > > [  198.493060] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
> > > > [  198.550444] rockchip-pcie f8000000.pcie: current link width is x1
> > > > [  198.550458] rockchip-pcie f8000000.pcie: idling lane 1
> > > > [  198.550479] rockchip-pcie f8000000.pcie: idling lane 2
> > > > [  198.550490] rockchip-pcie f8000000.pcie: idling lane 3
> > > > [  198.550608] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
> > > > [  198.550625] rockchip-pcie f8000000.pcie: Parsing ranges property...
> > > > [  198.550656] rockchip-pcie f8000000.pcie:   MEM
> > > > 0xfa000000..0xfbdfffff -> 0xfa000000
> > > > [  198.550676] rockchip-pcie f8000000.pcie:    IO
> > > > 0xfbe00000..0xfbefffff -> 0xfbe00000
> > > > [  198.552908] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
> > > > [  198.552933] pci_bus 0000:00: root bus resource [bus 00-1f]
> > > > [  198.552943] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
> > > > [  198.552954] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
> > > > (bus address [0xfbe00000-0xfbefffff])
> > > > [  198.552965] pci_bus 0000:00: scanning bus
> > > > [  198.554198] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
> > > > [  198.555508] pci 0000:00:00.0: supports D1
> > > > [  198.555516] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
> > > > [  198.556023] pci 0000:00:00.0: PME# disabled
> > > > [  198.561245] pci_bus 0000:00: fixups for bus
> > > > [  198.561269] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
> > > > [  198.561277] pci 0000:00:00.0: bridge configuration invalid ([bus
> > > > 00-00]), reconfiguring
> > > > [  198.566429] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> > > > [  198.567008] pci_bus 0000:01: scanning bus
> > > > [  198.567171] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
> > > > [  198.567420] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> > > > [  198.567515] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> > > > 64bit pref]
> > > > [  198.567608] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> > > > 64bit pref]
> > > > [  198.567665] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
> > > > [  198.567673] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
> > > > [  198.567730] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> > > > [  198.567815] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
> > > > [  198.569051] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
> > > > limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
> > > > with 8 GT/s x16 link)
> > > > [  198.570225] pci 0000:01:00.0: vgaarb: VGA device added:
> > > > decodes=io+mem,owns=none,locks=none
> > > > [  198.570481] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
> > > > [  198.570663] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> > > > [  198.571039] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
> > > > <snip>
> > > > [  198.749857] pci_bus 0000:01: read pcie, devfn 1, at 100, size 2
> > > > [  198.750252] pci_bus 0000:01: read pcie, devfn 2, at 0, size 4
> > > > [  198.750881] Internal error: synchronous external abort: 96000210
> > > > [#1] PREEMPT SMP
> > >
> > > Is there really supposed to be a device at 01:00.2?
> > >
> > > Maybe this is just the PCIe Unsupported Request error that we expect
> > > to get when trying to read config space of a device that doesn't
> > > exist.
> > >
> > > On "most" platforms, we just get ~0 data back when that happens, but
> > > I'm not sure that's always the case on arm64.  I think it depends on
> > > how the PCIe host bridge is designed, and there might be some CPU
> > > configuration, too.
> >
> > Yes, this is a GTX645 video card.
> > Nvidia cards usually have two to three devices,
> > The GPU proper, the audio device for the hdmi output, and the i2c controller.
> >
> > I do think that this driver is missing sanity checking on the
> > addressing, since the BRCM driver for the RPI4 doesn't try to
> > enumerate a video card, since it checks if the MMIO space is large
> > enough to fit the BAR before assigning the addresses. See [3]. Also in
> > that thread he was able to increase the address space provided to the
> > BRCM driver and fix the issue, but I don't see how we could do that on
> > the rk3399.
> >
> > pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff 64bit pref] is
> > 128 MB, which already exceeds our address space.
> > I think the driver is just overflowing the address space.
>
> If we don't have enough space to assign all the device BARs, I think a
> driver will still be able to claim the device, but when the driver
> calls pci_enable_device(), it should fail.  Lack of address space
> should not cause a PCIe error.
>
> But in this case, none of that matters because we're still enumerating
> devices in pci_scan_root_bus_bridge().  We haven't gotten to the point
> of trying to bind drivers to devices, so the driver isn't involved at
> all yet.
For clarification, the driver I'm referring to is the rk3399-pcie host driver.
>
> The backtrace says we're trying to read the Vendor ID of a device, and
> your debug output suggests we're trying to enumerate 01:00.2.  If you
> put that card in another system, you could find out how many functions
> it has.
>
> Or if you swapped this with other cards where you know the number of
> functions, you could see if the external abort always happens when
> probing for the first unimplemented function.
This card definitely has more than one function.
Before my original message I hacked in some code to make the driver
return 0xff when devfn > 1, and the scan passed, but as soon as
nouveau attempted to access the device, the entire kernel exploded.

Another reason I believe the address assignments are overflowing and
corrupting other address assignments is after the external abort, the
entire PCIE controller is inaccessible.
$ lspci
pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.1/config
lspci: Unable to read the standard configuration space header of
device 0000:01:00.1
pcilib: Cannot open /sys/bus/pci/devices/0000:00:00.0/config
lspci: Unable to read the standard configuration space header of
device 0000:00:00.0
pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.0/config
lspci: Unable to read the standard configuration space header of
device 0000:01:00.0

Attempting to rescan the bus or any other pci function results in a
hung kernel task.
>
> If the Root Port (00:00.0) supports AER, you could also dump out the
> status registers from the AER capability and figure out whether it
> logged a PCIe error.  This would be sort of like what
> aer_process_err_devices() does.  A bit of a hassle to do this by hand
> in the exception path, but could be enlightening, just as a debug
> tool.

Is there a way to handle external synchronous aborts in a device driver?
If so, I'll definitely look into plugging in the aer status functions.

>
> > [3] https://twitter.com/domipheus/status/1167586160077627393
> > >
> > > > [  198.751581] Modules linked in: drm_panel_orientation_quirks
> > > > pcie_rockchip_host(+) cpufreq_dt sch_fq_codel ip_tables x_tables ipv6
> > > > crc_ccitt nf_defrag_ipv6
> > > > [  198.752861] CPU: 1 PID: 1686 Comm: systemd-udevd Not tainted
> > > > 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #5
> > > > [  198.753791] Hardware name: Pine64 RockPro64 (DT)
> > > > [  198.754215] pstate: 60400085 (nZCv daIf +PAN -UAO)
> > > > [  198.754672] pc : __raw_readl+0x0/0x8 [pcie_rockchip_host]
> > > > [  198.755172] lr : rockchip_pcie_rd_conf+0x140/0x1dc [pcie_rockchip_host]
> > > > [  198.755773] sp : ffff8000132af530
> > > > [  198.756079] x29: ffff8000132af530 x28: 0000000000000000
> > > > [  198.756565] x27: 0000000000000001 x26: 0000000000000000
> > > > [  198.757049] x25: ffff0000c20ac000 x24: 0000000000002000
> > > > [  198.757534] x23: ffff0000c20ae5c0 x22: ffff8000132af5d4
> > > > [  198.758018] x21: 0000000000002000 x20: 0000000000000004
> > > > [  198.758502] x19: 0000000000102000 x18: 0000000000000001
> > > > [  198.758987] x17: 0000000000000000 x16: 0000000000000000
> > > > [  198.759472] x15: ffffffffffffffff x14: ffff80001159bcc8
> > > > [  198.759957] x13: 0000000000000000 x12: ffff800011b2c000
> > > > [  198.760441] x11: ffff8000115bf000 x10: ffff800011310018
> > > > [  198.760926] x9 : 00000000fffb9fff x8 : 0000000000000001
> > > > [  198.761410] x7 : 0000000000000000 x6 : ffff0000f7492548
> > > > [  198.761894] x5 : 0000000000000001 x4 : ffff0000f7492548
> > > > [  198.762379] x3 : 0000000000000000 x2 : 0000000000c00008
> > > > [  198.762863] x1 : ffff80001dc00008 x0 : ffff80001a102000
> > > > [  198.763348] Call trace:
> > > > [  198.763583]  __raw_readl+0x0/0x8 [pcie_rockchip_host]
> > > > [  198.764057]  pci_bus_read_config_dword+0x88/0xd0
> > > > [  198.764484]  pci_bus_generic_read_dev_vendor_id+0x40/0x1b8
> > > > [  198.764982]  pci_bus_read_dev_vendor_id+0x58/0x88
> > > > [  198.765413]  pci_scan_single_device+0x84/0xf8
> > > > [  198.765812]  pci_scan_slot+0x7c/0x120
> > > > [  198.766149]  pci_scan_child_bus_extend+0x68/0x2dc
> > > > [  198.766579]  pci_scan_bridge_extend+0x350/0x588
> > > > [  198.766992]  pci_scan_child_bus_extend+0x21c/0x2dc
> > > > [  198.767430]  pci_scan_child_bus+0x24/0x30
> > > > [  198.767797]  pci_scan_root_bus_bridge+0xc4/0xd0
> > > > [  198.768215]  rockchip_pcie_probe+0x610/0x74c [pcie_rockchip_host]
> > > > [  198.768770]  platform_drv_probe+0x58/0xa8
> > > > [  198.769139]  really_probe+0xe0/0x318
> > > > [  198.769468]  driver_probe_device+0x5c/0xf0
> > > > [  198.769844]  device_driver_attach+0x74/0x80
> > > > [  198.770227]  __driver_attach+0x64/0xe8
> > > > [  198.770572]  bus_for_each_dev+0x84/0xd8
> > > > [  198.770924]  driver_attach+0x30/0x40
> > > > [  198.771253]  bus_add_driver+0x188/0x1e8
> > > > [  198.771605]  driver_register+0x64/0x110
> > > > [  198.771956]  __platform_driver_register+0x54/0x60
> > > > [  198.772388]  rockchip_pcie_driver_init+0x28/0x10000 [pcie_rockchip_host]
> > > > [  198.772998]  do_one_initcall+0x94/0x390
> > > > [  198.773353]  do_init_module+0x88/0x268
> > > > [  198.773697]  load_module+0x1e18/0x2198
> > > > [  198.774043]  __do_sys_finit_module+0xd0/0xe8
> > > > [  198.774435]  __arm64_sys_finit_module+0x28/0x38
> > > > [  198.774858]  el0_svc_common.constprop.3+0xa4/0x1d8
> > > > [  198.775297]  el0_svc_handler+0x34/0xa0
> > > > [  198.775645]  el0_svc+0x14/0x40
> > > > [  198.775928]  el0_sync_handler+0x118/0x290
> > > > [  198.776295]  el0_sync+0x164/0x180
> > > > [  198.776609] Code: bad PC value
> > > > [  198.776897] ---[ end trace 88fc77651b5e2909 ]---

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-12 15:55       ` Peter Geis
@ 2019-11-12 19:15         ` Robin Murphy
  2019-11-12 19:41           ` Peter Geis
  0 siblings, 1 reply; 20+ messages in thread
From: Robin Murphy @ 2019-11-12 19:15 UTC (permalink / raw)
  To: Peter Geis, Bjorn Helgaas
  Cc: linux-pci, Shawn Lin, Heiko Stuebner, open list:ARM/Rockchip SoC...

On 12/11/2019 3:55 pm, Peter Geis wrote:
> On Mon, Nov 11, 2019 at 9:29 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>
>> On Mon, Nov 11, 2019 at 07:30:15PM -0500, Peter Geis wrote:
>>> On Mon, Nov 11, 2019 at 7:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>> On Mon, Nov 04, 2019 at 01:55:40PM -0500, Peter Geis wrote:
>>>>> Good Morning,
>>>>>
>>>>> I'm attempting to debug an issue with the rockpro64 pcie port.
>>>>> It would appear that the port does not like various cards, including
>>>>> cards of the same make that randomly work or do not work, such as
>>>>> Intel i340 based NICs.
>>>>> I'm experiencing it with a GTX645 gpu.
>>>>>
>>>>> This seems to be a long running issue, referenced both at [0], and [1].
>>>>> There was an attempt to rectify it, by adding a delay between training
>>>>> and probing [2], but that doesn't seem to be the issue here.
>>>>> It appears that when we probe further into the card, such as devfn >
>>>>> 1, we trigger the bug.
>>>>> I've added a print statement that prints the devfn, address, and size
>>>>> information, which you can see below.
>>>>>
>>>>> I've attempted setting the available number of lanes to 1 as well, to
>>>>> no difference.
>>>>>
>>>>> If anyone could point me in the right direction as to where to
>>>>> continue debugging, I'd greatly appreciate it.
>>>>>
>>>>> [0] https://github.com/ayufan-rock64/linux-build/issues/254
>>>>> [1] https://github.com/rockchip-linux/kernel/issues/116
>>>>> [2] https://github.com/ayufan-rock64/linux-kernel/commit/3cde5c624c9c39aa03251a55c2d26a48b5bdca5b
>>>>>
>>>>> [  198.491458] rockchip-pcie f8000000.pcie: missing legacy phy; search
>>>>> for per-lane PHY
>>>>> [  198.492986] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
>>>>> [  198.493060] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
>>>>> [  198.550444] rockchip-pcie f8000000.pcie: current link width is x1
>>>>> [  198.550458] rockchip-pcie f8000000.pcie: idling lane 1
>>>>> [  198.550479] rockchip-pcie f8000000.pcie: idling lane 2
>>>>> [  198.550490] rockchip-pcie f8000000.pcie: idling lane 3
>>>>> [  198.550608] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
>>>>> [  198.550625] rockchip-pcie f8000000.pcie: Parsing ranges property...
>>>>> [  198.550656] rockchip-pcie f8000000.pcie:   MEM
>>>>> 0xfa000000..0xfbdfffff -> 0xfa000000
>>>>> [  198.550676] rockchip-pcie f8000000.pcie:    IO
>>>>> 0xfbe00000..0xfbefffff -> 0xfbe00000
>>>>> [  198.552908] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
>>>>> [  198.552933] pci_bus 0000:00: root bus resource [bus 00-1f]
>>>>> [  198.552943] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
>>>>> [  198.552954] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
>>>>> (bus address [0xfbe00000-0xfbefffff])
>>>>> [  198.552965] pci_bus 0000:00: scanning bus
>>>>> [  198.554198] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
>>>>> [  198.555508] pci 0000:00:00.0: supports D1
>>>>> [  198.555516] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
>>>>> [  198.556023] pci 0000:00:00.0: PME# disabled
>>>>> [  198.561245] pci_bus 0000:00: fixups for bus
>>>>> [  198.561269] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
>>>>> [  198.561277] pci 0000:00:00.0: bridge configuration invalid ([bus
>>>>> 00-00]), reconfiguring
>>>>> [  198.566429] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
>>>>> [  198.567008] pci_bus 0000:01: scanning bus
>>>>> [  198.567171] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
>>>>> [  198.567420] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
>>>>> [  198.567515] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
>>>>> 64bit pref]
>>>>> [  198.567608] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
>>>>> 64bit pref]
>>>>> [  198.567665] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
>>>>> [  198.567673] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
>>>>> [  198.567730] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
>>>>> [  198.567815] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
>>>>> [  198.569051] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
>>>>> limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
>>>>> with 8 GT/s x16 link)
>>>>> [  198.570225] pci 0000:01:00.0: vgaarb: VGA device added:
>>>>> decodes=io+mem,owns=none,locks=none
>>>>> [  198.570481] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
>>>>> [  198.570663] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
>>>>> [  198.571039] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
>>>>> <snip>
>>>>> [  198.749857] pci_bus 0000:01: read pcie, devfn 1, at 100, size 2
>>>>> [  198.750252] pci_bus 0000:01: read pcie, devfn 2, at 0, size 4
>>>>> [  198.750881] Internal error: synchronous external abort: 96000210
>>>>> [#1] PREEMPT SMP
>>>>
>>>> Is there really supposed to be a device at 01:00.2?
>>>>
>>>> Maybe this is just the PCIe Unsupported Request error that we expect
>>>> to get when trying to read config space of a device that doesn't
>>>> exist.
>>>>
>>>> On "most" platforms, we just get ~0 data back when that happens, but
>>>> I'm not sure that's always the case on arm64.  I think it depends on
>>>> how the PCIe host bridge is designed, and there might be some CPU
>>>> configuration, too.
>>>
>>> Yes, this is a GTX645 video card.
>>> Nvidia cards usually have two to three devices,
>>> The GPU proper, the audio device for the hdmi output, and the i2c controller.
>>>
>>> I do think that this driver is missing sanity checking on the
>>> addressing, since the BRCM driver for the RPI4 doesn't try to
>>> enumerate a video card, since it checks if the MMIO space is large
>>> enough to fit the BAR before assigning the addresses. See [3]. Also in
>>> that thread he was able to increase the address space provided to the
>>> BRCM driver and fix the issue, but I don't see how we could do that on
>>> the rk3399.
>>>
>>> pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff 64bit pref] is
>>> 128 MB, which already exceeds our address space.
>>> I think the driver is just overflowing the address space.
>>
>> If we don't have enough space to assign all the device BARs, I think a
>> driver will still be able to claim the device, but when the driver
>> calls pci_enable_device(), it should fail.  Lack of address space
>> should not cause a PCIe error.
>>
>> But in this case, none of that matters because we're still enumerating
>> devices in pci_scan_root_bus_bridge().  We haven't gotten to the point
>> of trying to bind drivers to devices, so the driver isn't involved at
>> all yet.
> For clarification, the driver I'm referring to is the rk3399-pcie host driver.
>>
>> The backtrace says we're trying to read the Vendor ID of a device, and
>> your debug output suggests we're trying to enumerate 01:00.2.  If you
>> put that card in another system, you could find out how many functions
>> it has.
>>
>> Or if you swapped this with other cards where you know the number of
>> functions, you could see if the external abort always happens when
>> probing for the first unimplemented function.
> This card definitely has more than one function.

FWIW, random lspci logs I grovelled up off the internet show cards with 
the same IDs only implementing functions 0 and 1, which does suggest 
that maybe function 2 really doesn't exist but the card handles 
unsupported config requests in a way that this particular bridge/root 
port doesn't quite deal with properly.

> Before my original message I hacked in some code to make the driver
> return 0xff when devfn > 1, and the scan passed, but as soon as
> nouveau attempted to access the device, the entire kernel exploded.
> 
> Another reason I believe the address assignments are overflowing and
> corrupting other address assignments is after the external abort, the
> entire PCIE controller is inaccessible.
> $ lspci
> pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.1/config
> lspci: Unable to read the standard configuration space header of
> device 0000:01:00.1
> pcilib: Cannot open /sys/bus/pci/devices/0000:00:00.0/config
> lspci: Unable to read the standard configuration space header of
> device 0000:00:00.0
> pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.0/config
> lspci: Unable to read the standard configuration space header of
> device 0000:01:00.0
> 
> Attempting to rescan the bus or any other pci function results in a
> hung kernel task.

You crashed deep in the kernel part-way through the driver's probe 
routine; don't expect any partly-initialised bits of that driver to 
actually work. I bet you also can't load/unload modules and do various 
other things either, because the thread will almost certainly have been 
holding a bunch of locks, too. It's not worth trying to reason about the 
state of anything after that kind of unexpected crash ;)

>> If the Root Port (00:00.0) supports AER, you could also dump out the
>> status registers from the AER capability and figure out whether it
>> logged a PCIe error.  This would be sort of like what
>> aer_process_err_devices() does.  A bit of a hassle to do this by hand
>> in the exception path, but could be enlightening, just as a debug
>> tool.
> 
> Is there a way to handle external synchronous aborts in a device driver?

Not easily on arm64 - the closest thing we have is some arch-internal 
machinery for undefined instruction hooks. You'd have to hack something 
gruesome into do_sea()...

For starters it might be worth enabling all the debug prints in 
rockchip-pcie to see if there's any error IRQ corresponding to the 
aborted access.

Robin.

> If so, I'll definitely look into plugging in the aer status functions.
> 
>>
>>> [3] https://twitter.com/domipheus/status/1167586160077627393
>>>>
>>>>> [  198.751581] Modules linked in: drm_panel_orientation_quirks
>>>>> pcie_rockchip_host(+) cpufreq_dt sch_fq_codel ip_tables x_tables ipv6
>>>>> crc_ccitt nf_defrag_ipv6
>>>>> [  198.752861] CPU: 1 PID: 1686 Comm: systemd-udevd Not tainted
>>>>> 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #5
>>>>> [  198.753791] Hardware name: Pine64 RockPro64 (DT)
>>>>> [  198.754215] pstate: 60400085 (nZCv daIf +PAN -UAO)
>>>>> [  198.754672] pc : __raw_readl+0x0/0x8 [pcie_rockchip_host]
>>>>> [  198.755172] lr : rockchip_pcie_rd_conf+0x140/0x1dc [pcie_rockchip_host]
>>>>> [  198.755773] sp : ffff8000132af530
>>>>> [  198.756079] x29: ffff8000132af530 x28: 0000000000000000
>>>>> [  198.756565] x27: 0000000000000001 x26: 0000000000000000
>>>>> [  198.757049] x25: ffff0000c20ac000 x24: 0000000000002000
>>>>> [  198.757534] x23: ffff0000c20ae5c0 x22: ffff8000132af5d4
>>>>> [  198.758018] x21: 0000000000002000 x20: 0000000000000004
>>>>> [  198.758502] x19: 0000000000102000 x18: 0000000000000001
>>>>> [  198.758987] x17: 0000000000000000 x16: 0000000000000000
>>>>> [  198.759472] x15: ffffffffffffffff x14: ffff80001159bcc8
>>>>> [  198.759957] x13: 0000000000000000 x12: ffff800011b2c000
>>>>> [  198.760441] x11: ffff8000115bf000 x10: ffff800011310018
>>>>> [  198.760926] x9 : 00000000fffb9fff x8 : 0000000000000001
>>>>> [  198.761410] x7 : 0000000000000000 x6 : ffff0000f7492548
>>>>> [  198.761894] x5 : 0000000000000001 x4 : ffff0000f7492548
>>>>> [  198.762379] x3 : 0000000000000000 x2 : 0000000000c00008
>>>>> [  198.762863] x1 : ffff80001dc00008 x0 : ffff80001a102000
>>>>> [  198.763348] Call trace:
>>>>> [  198.763583]  __raw_readl+0x0/0x8 [pcie_rockchip_host]
>>>>> [  198.764057]  pci_bus_read_config_dword+0x88/0xd0
>>>>> [  198.764484]  pci_bus_generic_read_dev_vendor_id+0x40/0x1b8
>>>>> [  198.764982]  pci_bus_read_dev_vendor_id+0x58/0x88
>>>>> [  198.765413]  pci_scan_single_device+0x84/0xf8
>>>>> [  198.765812]  pci_scan_slot+0x7c/0x120
>>>>> [  198.766149]  pci_scan_child_bus_extend+0x68/0x2dc
>>>>> [  198.766579]  pci_scan_bridge_extend+0x350/0x588
>>>>> [  198.766992]  pci_scan_child_bus_extend+0x21c/0x2dc
>>>>> [  198.767430]  pci_scan_child_bus+0x24/0x30
>>>>> [  198.767797]  pci_scan_root_bus_bridge+0xc4/0xd0
>>>>> [  198.768215]  rockchip_pcie_probe+0x610/0x74c [pcie_rockchip_host]
>>>>> [  198.768770]  platform_drv_probe+0x58/0xa8
>>>>> [  198.769139]  really_probe+0xe0/0x318
>>>>> [  198.769468]  driver_probe_device+0x5c/0xf0
>>>>> [  198.769844]  device_driver_attach+0x74/0x80
>>>>> [  198.770227]  __driver_attach+0x64/0xe8
>>>>> [  198.770572]  bus_for_each_dev+0x84/0xd8
>>>>> [  198.770924]  driver_attach+0x30/0x40
>>>>> [  198.771253]  bus_add_driver+0x188/0x1e8
>>>>> [  198.771605]  driver_register+0x64/0x110
>>>>> [  198.771956]  __platform_driver_register+0x54/0x60
>>>>> [  198.772388]  rockchip_pcie_driver_init+0x28/0x10000 [pcie_rockchip_host]
>>>>> [  198.772998]  do_one_initcall+0x94/0x390
>>>>> [  198.773353]  do_init_module+0x88/0x268
>>>>> [  198.773697]  load_module+0x1e18/0x2198
>>>>> [  198.774043]  __do_sys_finit_module+0xd0/0xe8
>>>>> [  198.774435]  __arm64_sys_finit_module+0x28/0x38
>>>>> [  198.774858]  el0_svc_common.constprop.3+0xa4/0x1d8
>>>>> [  198.775297]  el0_svc_handler+0x34/0xa0
>>>>> [  198.775645]  el0_svc+0x14/0x40
>>>>> [  198.775928]  el0_sync_handler+0x118/0x290
>>>>> [  198.776295]  el0_sync+0x164/0x180
>>>>> [  198.776609] Code: bad PC value
>>>>> [  198.776897] ---[ end trace 88fc77651b5e2909 ]---
> 
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-12 19:15         ` Robin Murphy
@ 2019-11-12 19:41           ` Peter Geis
  2019-11-13  1:06             ` Peter Geis
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Geis @ 2019-11-12 19:41 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Bjorn Helgaas, linux-pci, Shawn Lin, Heiko Stuebner,
	open list:ARM/Rockchip SoC...

On Tue, Nov 12, 2019 at 2:15 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 12/11/2019 3:55 pm, Peter Geis wrote:
> > On Mon, Nov 11, 2019 at 9:29 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >>
> >> On Mon, Nov 11, 2019 at 07:30:15PM -0500, Peter Geis wrote:
> >>> On Mon, Nov 11, 2019 at 7:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >>>> On Mon, Nov 04, 2019 at 01:55:40PM -0500, Peter Geis wrote:
> >>>>> Good Morning,
> >>>>>
> >>>>> I'm attempting to debug an issue with the rockpro64 pcie port.
> >>>>> It would appear that the port does not like various cards, including
> >>>>> cards of the same make that randomly work or do not work, such as
> >>>>> Intel i340 based NICs.
> >>>>> I'm experiencing it with a GTX645 gpu.
> >>>>>
> >>>>> This seems to be a long running issue, referenced both at [0], and [1].
> >>>>> There was an attempt to rectify it, by adding a delay between training
> >>>>> and probing [2], but that doesn't seem to be the issue here.
> >>>>> It appears that when we probe further into the card, such as devfn >
> >>>>> 1, we trigger the bug.
> >>>>> I've added a print statement that prints the devfn, address, and size
> >>>>> information, which you can see below.
> >>>>>
> >>>>> I've attempted setting the available number of lanes to 1 as well, to
> >>>>> no difference.
> >>>>>
> >>>>> If anyone could point me in the right direction as to where to
> >>>>> continue debugging, I'd greatly appreciate it.
> >>>>>
> >>>>> [0] https://github.com/ayufan-rock64/linux-build/issues/254
> >>>>> [1] https://github.com/rockchip-linux/kernel/issues/116
> >>>>> [2] https://github.com/ayufan-rock64/linux-kernel/commit/3cde5c624c9c39aa03251a55c2d26a48b5bdca5b
> >>>>>
> >>>>> [  198.491458] rockchip-pcie f8000000.pcie: missing legacy phy; search
> >>>>> for per-lane PHY
> >>>>> [  198.492986] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
> >>>>> [  198.493060] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
> >>>>> [  198.550444] rockchip-pcie f8000000.pcie: current link width is x1
> >>>>> [  198.550458] rockchip-pcie f8000000.pcie: idling lane 1
> >>>>> [  198.550479] rockchip-pcie f8000000.pcie: idling lane 2
> >>>>> [  198.550490] rockchip-pcie f8000000.pcie: idling lane 3
> >>>>> [  198.550608] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
> >>>>> [  198.550625] rockchip-pcie f8000000.pcie: Parsing ranges property...
> >>>>> [  198.550656] rockchip-pcie f8000000.pcie:   MEM
> >>>>> 0xfa000000..0xfbdfffff -> 0xfa000000
> >>>>> [  198.550676] rockchip-pcie f8000000.pcie:    IO
> >>>>> 0xfbe00000..0xfbefffff -> 0xfbe00000
> >>>>> [  198.552908] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
> >>>>> [  198.552933] pci_bus 0000:00: root bus resource [bus 00-1f]
> >>>>> [  198.552943] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
> >>>>> [  198.552954] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
> >>>>> (bus address [0xfbe00000-0xfbefffff])
> >>>>> [  198.552965] pci_bus 0000:00: scanning bus
> >>>>> [  198.554198] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
> >>>>> [  198.555508] pci 0000:00:00.0: supports D1
> >>>>> [  198.555516] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
> >>>>> [  198.556023] pci 0000:00:00.0: PME# disabled
> >>>>> [  198.561245] pci_bus 0000:00: fixups for bus
> >>>>> [  198.561269] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
> >>>>> [  198.561277] pci 0000:00:00.0: bridge configuration invalid ([bus
> >>>>> 00-00]), reconfiguring
> >>>>> [  198.566429] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> >>>>> [  198.567008] pci_bus 0000:01: scanning bus
> >>>>> [  198.567171] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
> >>>>> [  198.567420] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> >>>>> [  198.567515] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> >>>>> 64bit pref]
> >>>>> [  198.567608] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> >>>>> 64bit pref]
> >>>>> [  198.567665] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
> >>>>> [  198.567673] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
> >>>>> [  198.567730] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> >>>>> [  198.567815] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
> >>>>> [  198.569051] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
> >>>>> limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
> >>>>> with 8 GT/s x16 link)
> >>>>> [  198.570225] pci 0000:01:00.0: vgaarb: VGA device added:
> >>>>> decodes=io+mem,owns=none,locks=none
> >>>>> [  198.570481] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
> >>>>> [  198.570663] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> >>>>> [  198.571039] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
> >>>>> <snip>
> >>>>> [  198.749857] pci_bus 0000:01: read pcie, devfn 1, at 100, size 2
> >>>>> [  198.750252] pci_bus 0000:01: read pcie, devfn 2, at 0, size 4
> >>>>> [  198.750881] Internal error: synchronous external abort: 96000210
> >>>>> [#1] PREEMPT SMP
> >>>>
> >>>> Is there really supposed to be a device at 01:00.2?
> >>>>
> >>>> Maybe this is just the PCIe Unsupported Request error that we expect
> >>>> to get when trying to read config space of a device that doesn't
> >>>> exist.
> >>>>
> >>>> On "most" platforms, we just get ~0 data back when that happens, but
> >>>> I'm not sure that's always the case on arm64.  I think it depends on
> >>>> how the PCIe host bridge is designed, and there might be some CPU
> >>>> configuration, too.
> >>>
> >>> Yes, this is a GTX645 video card.
> >>> Nvidia cards usually have two to three devices,
> >>> The GPU proper, the audio device for the hdmi output, and the i2c controller.
> >>>
> >>> I do think that this driver is missing sanity checking on the
> >>> addressing, since the BRCM driver for the RPI4 doesn't try to
> >>> enumerate a video card, since it checks if the MMIO space is large
> >>> enough to fit the BAR before assigning the addresses. See [3]. Also in
> >>> that thread he was able to increase the address space provided to the
> >>> BRCM driver and fix the issue, but I don't see how we could do that on
> >>> the rk3399.
> >>>
> >>> pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff 64bit pref] is
> >>> 128 MB, which already exceeds our address space.
> >>> I think the driver is just overflowing the address space.
> >>
> >> If we don't have enough space to assign all the device BARs, I think a
> >> driver will still be able to claim the device, but when the driver
> >> calls pci_enable_device(), it should fail.  Lack of address space
> >> should not cause a PCIe error.
> >>
> >> But in this case, none of that matters because we're still enumerating
> >> devices in pci_scan_root_bus_bridge().  We haven't gotten to the point
> >> of trying to bind drivers to devices, so the driver isn't involved at
> >> all yet.
> > For clarification, the driver I'm referring to is the rk3399-pcie host driver.
> >>
> >> The backtrace says we're trying to read the Vendor ID of a device, and
> >> your debug output suggests we're trying to enumerate 01:00.2.  If you
> >> put that card in another system, you could find out how many functions
> >> it has.
> >>
> >> Or if you swapped this with other cards where you know the number of
> >> functions, you could see if the external abort always happens when
> >> probing for the first unimplemented function.
> > This card definitely has more than one function.
>
> FWIW, random lspci logs I grovelled up off the internet show cards with
> the same IDs only implementing functions 0 and 1, which does suggest
> that maybe function 2 really doesn't exist but the card handles
> unsupported config requests in a way that this particular bridge/root
> port doesn't quite deal with properly.

Thanks! Is there a sane way we could make the kernel handle this in
place of the controller?
(It's apparently based on the designware ip, but it doesn't use their
driver for some reason)

>
> > Before my original message I hacked in some code to make the driver
> > return 0xff when devfn > 1, and the scan passed, but as soon as
> > nouveau attempted to access the device, the entire kernel exploded.
> >
> > Another reason I believe the address assignments are overflowing and
> > corrupting other address assignments is after the external abort, the
> > entire PCIE controller is inaccessible.
> > $ lspci
> > pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.1/config
> > lspci: Unable to read the standard configuration space header of
> > device 0000:01:00.1
> > pcilib: Cannot open /sys/bus/pci/devices/0000:00:00.0/config
> > lspci: Unable to read the standard configuration space header of
> > device 0000:00:00.0
> > pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.0/config
> > lspci: Unable to read the standard configuration space header of
> > device 0000:01:00.0
> >
> > Attempting to rescan the bus or any other pci function results in a
> > hung kernel task.
>
> You crashed deep in the kernel part-way through the driver's probe
> routine; don't expect any partly-initialised bits of that driver to
> actually work. I bet you also can't load/unload modules and do various
> other things either, because the thread will almost certainly have been
> holding a bunch of locks, too. It's not worth trying to reason about the
> state of anything after that kind of unexpected crash ;)

That is observed, but I think it is actually more destructive than that.
Attempting to do an access after the fact, then performing a soft
reset results in a spectacular crash when it tries to bring the BIG
cluster online.

[    0.203160] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
[    5.355606] CPU4: failed to come online
[    5.363258] CPU4: failed in unknown state : 0x0
Unhandled Exception in EL3.
x30 =           0xUhandled Exception in L3.
x30 =           UNhandled ExcEption in EL3.
x30 =   0xUnhandled Excepti2n in EL_.
x30 =           0xUnhandled Exception in EL3.
x30 =           lxUnhadled Exception in EL3.
x30 =           0xUnhandled Excepion in EL3.
x30 =           0xUnhandled Exception in EL3.
x30 =   0xUnhandled Eception in EL3.
x30 =           0xUnhandled Exception in EL3.
x30 =           0xUnhandled Exception in EL3.x30 =
0xUnhandled Exception in EL3.
x30 =           0xUnhandled Excepton i2 EL3.
x30 =           0xUnhandUnhandled Exceptionein EL3.lx30 =
 0xUnhandled Exception in EL3.
x30 =           0xUnhandled Unhanded Exception in EL3.
x30 =           0xUnhandled Exceptin in EL3Unhandled Exception inEL3.
x0 =            0xUnhandled Exception in EL3.
x30 =           0xUnhandled Exception in EL3.
x30 =           0xUnhandled Exception in EL.
x30 =           0xUnhandled Exception in EL3.
x30 =           0xUnhandled Exception in EL3.
x30 =           0xUnhandled xception in EL3
x30 =           0xUnhcndled Exception in EL3.
x30=            0xUnhandled Excextion in EL3.
x30 =   0xUnhandled Exception in EL3.
x30 =           0x[   10.484411] CPU5: failed to come online
[   10.484807] CPU5: failed in unknown state : 0x0
[   10.485374] smp: Brought up 1 node, 4 CPUs

It goes on until it tries to do it's first DMA setup, then just hangs.
>
> >> If the Root Port (00:00.0) supports AER, you could also dump out the
> >> status registers from the AER capability and figure out whether it
> >> logged a PCIe error.  This would be sort of like what
> >> aer_process_err_devices() does.  A bit of a hassle to do this by hand
> >> in the exception path, but could be enlightening, just as a debug
> >> tool.
> >
> > Is there a way to handle external synchronous aborts in a device driver?
>
> Not easily on arm64 - the closest thing we have is some arch-internal
> machinery for undefined instruction hooks. You'd have to hack something
> gruesome into do_sea()...
>
> For starters it might be worth enabling all the debug prints in
> rockchip-pcie to see if there's any error IRQ corresponding to the
> aborted access.

Debug prints are already enabled, I had to hack in the print to show
what addresses it was crashing on.

Your explanation was the last piece for me to understand how this all
works, thanks!
>
> Robin.
>
> > If so, I'll definitely look into plugging in the aer status functions.
> >
> >>
> >>> [3] https://twitter.com/domipheus/status/1167586160077627393
> >>>>
> >>>>> [  198.751581] Modules linked in: drm_panel_orientation_quirks
> >>>>> pcie_rockchip_host(+) cpufreq_dt sch_fq_codel ip_tables x_tables ipv6
> >>>>> crc_ccitt nf_defrag_ipv6
> >>>>> [  198.752861] CPU: 1 PID: 1686 Comm: systemd-udevd Not tainted
> >>>>> 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #5
> >>>>> [  198.753791] Hardware name: Pine64 RockPro64 (DT)
> >>>>> [  198.754215] pstate: 60400085 (nZCv daIf +PAN -UAO)
> >>>>> [  198.754672] pc : __raw_readl+0x0/0x8 [pcie_rockchip_host]
> >>>>> [  198.755172] lr : rockchip_pcie_rd_conf+0x140/0x1dc [pcie_rockchip_host]
> >>>>> [  198.755773] sp : ffff8000132af530
> >>>>> [  198.756079] x29: ffff8000132af530 x28: 0000000000000000
> >>>>> [  198.756565] x27: 0000000000000001 x26: 0000000000000000
> >>>>> [  198.757049] x25: ffff0000c20ac000 x24: 0000000000002000
> >>>>> [  198.757534] x23: ffff0000c20ae5c0 x22: ffff8000132af5d4
> >>>>> [  198.758018] x21: 0000000000002000 x20: 0000000000000004
> >>>>> [  198.758502] x19: 0000000000102000 x18: 0000000000000001
> >>>>> [  198.758987] x17: 0000000000000000 x16: 0000000000000000
> >>>>> [  198.759472] x15: ffffffffffffffff x14: ffff80001159bcc8
> >>>>> [  198.759957] x13: 0000000000000000 x12: ffff800011b2c000
> >>>>> [  198.760441] x11: ffff8000115bf000 x10: ffff800011310018
> >>>>> [  198.760926] x9 : 00000000fffb9fff x8 : 0000000000000001
> >>>>> [  198.761410] x7 : 0000000000000000 x6 : ffff0000f7492548
> >>>>> [  198.761894] x5 : 0000000000000001 x4 : ffff0000f7492548
> >>>>> [  198.762379] x3 : 0000000000000000 x2 : 0000000000c00008
> >>>>> [  198.762863] x1 : ffff80001dc00008 x0 : ffff80001a102000
> >>>>> [  198.763348] Call trace:
> >>>>> [  198.763583]  __raw_readl+0x0/0x8 [pcie_rockchip_host]
> >>>>> [  198.764057]  pci_bus_read_config_dword+0x88/0xd0
> >>>>> [  198.764484]  pci_bus_generic_read_dev_vendor_id+0x40/0x1b8
> >>>>> [  198.764982]  pci_bus_read_dev_vendor_id+0x58/0x88
> >>>>> [  198.765413]  pci_scan_single_device+0x84/0xf8
> >>>>> [  198.765812]  pci_scan_slot+0x7c/0x120
> >>>>> [  198.766149]  pci_scan_child_bus_extend+0x68/0x2dc
> >>>>> [  198.766579]  pci_scan_bridge_extend+0x350/0x588
> >>>>> [  198.766992]  pci_scan_child_bus_extend+0x21c/0x2dc
> >>>>> [  198.767430]  pci_scan_child_bus+0x24/0x30
> >>>>> [  198.767797]  pci_scan_root_bus_bridge+0xc4/0xd0
> >>>>> [  198.768215]  rockchip_pcie_probe+0x610/0x74c [pcie_rockchip_host]
> >>>>> [  198.768770]  platform_drv_probe+0x58/0xa8
> >>>>> [  198.769139]  really_probe+0xe0/0x318
> >>>>> [  198.769468]  driver_probe_device+0x5c/0xf0
> >>>>> [  198.769844]  device_driver_attach+0x74/0x80
> >>>>> [  198.770227]  __driver_attach+0x64/0xe8
> >>>>> [  198.770572]  bus_for_each_dev+0x84/0xd8
> >>>>> [  198.770924]  driver_attach+0x30/0x40
> >>>>> [  198.771253]  bus_add_driver+0x188/0x1e8
> >>>>> [  198.771605]  driver_register+0x64/0x110
> >>>>> [  198.771956]  __platform_driver_register+0x54/0x60
> >>>>> [  198.772388]  rockchip_pcie_driver_init+0x28/0x10000 [pcie_rockchip_host]
> >>>>> [  198.772998]  do_one_initcall+0x94/0x390
> >>>>> [  198.773353]  do_init_module+0x88/0x268
> >>>>> [  198.773697]  load_module+0x1e18/0x2198
> >>>>> [  198.774043]  __do_sys_finit_module+0xd0/0xe8
> >>>>> [  198.774435]  __arm64_sys_finit_module+0x28/0x38
> >>>>> [  198.774858]  el0_svc_common.constprop.3+0xa4/0x1d8
> >>>>> [  198.775297]  el0_svc_handler+0x34/0xa0
> >>>>> [  198.775645]  el0_svc+0x14/0x40
> >>>>> [  198.775928]  el0_sync_handler+0x118/0x290
> >>>>> [  198.776295]  el0_sync+0x164/0x180
> >>>>> [  198.776609] Code: bad PC value
> >>>>> [  198.776897] ---[ end trace 88fc77651b5e2909 ]---
> >
> > _______________________________________________
> > Linux-rockchip mailing list
> > Linux-rockchip@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-rockchip
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-12 19:41           ` Peter Geis
@ 2019-11-13  1:06             ` Peter Geis
  2019-11-13  1:19               ` Peter Geis
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Geis @ 2019-11-13  1:06 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Bjorn Helgaas, linux-pci, Shawn Lin, Heiko Stuebner,
	open list:ARM/Rockchip SoC...

On Tue, Nov 12, 2019 at 2:41 PM Peter Geis <pgwipeout@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 2:15 PM Robin Murphy <robin.murphy@arm.com> wrote:
> >
> > On 12/11/2019 3:55 pm, Peter Geis wrote:
> > > On Mon, Nov 11, 2019 at 9:29 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >>
> > >> On Mon, Nov 11, 2019 at 07:30:15PM -0500, Peter Geis wrote:
> > >>> On Mon, Nov 11, 2019 at 7:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >>>> On Mon, Nov 04, 2019 at 01:55:40PM -0500, Peter Geis wrote:
> > >>>>> Good Morning,
> > >>>>>
> > >>>>> I'm attempting to debug an issue with the rockpro64 pcie port.
> > >>>>> It would appear that the port does not like various cards, including
> > >>>>> cards of the same make that randomly work or do not work, such as
> > >>>>> Intel i340 based NICs.
> > >>>>> I'm experiencing it with a GTX645 gpu.
> > >>>>>
> > >>>>> This seems to be a long running issue, referenced both at [0], and [1].
> > >>>>> There was an attempt to rectify it, by adding a delay between training
> > >>>>> and probing [2], but that doesn't seem to be the issue here.
> > >>>>> It appears that when we probe further into the card, such as devfn >
> > >>>>> 1, we trigger the bug.
> > >>>>> I've added a print statement that prints the devfn, address, and size
> > >>>>> information, which you can see below.
> > >>>>>
> > >>>>> I've attempted setting the available number of lanes to 1 as well, to
> > >>>>> no difference.
> > >>>>>
> > >>>>> If anyone could point me in the right direction as to where to
> > >>>>> continue debugging, I'd greatly appreciate it.
> > >>>>>
> > >>>>> [0] https://github.com/ayufan-rock64/linux-build/issues/254
> > >>>>> [1] https://github.com/rockchip-linux/kernel/issues/116
> > >>>>> [2] https://github.com/ayufan-rock64/linux-kernel/commit/3cde5c624c9c39aa03251a55c2d26a48b5bdca5b
> > >>>>>
> > >>>>> [  198.491458] rockchip-pcie f8000000.pcie: missing legacy phy; search
> > >>>>> for per-lane PHY
> > >>>>> [  198.492986] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
> > >>>>> [  198.493060] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
> > >>>>> [  198.550444] rockchip-pcie f8000000.pcie: current link width is x1
> > >>>>> [  198.550458] rockchip-pcie f8000000.pcie: idling lane 1
> > >>>>> [  198.550479] rockchip-pcie f8000000.pcie: idling lane 2
> > >>>>> [  198.550490] rockchip-pcie f8000000.pcie: idling lane 3
> > >>>>> [  198.550608] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
> > >>>>> [  198.550625] rockchip-pcie f8000000.pcie: Parsing ranges property...
> > >>>>> [  198.550656] rockchip-pcie f8000000.pcie:   MEM
> > >>>>> 0xfa000000..0xfbdfffff -> 0xfa000000
> > >>>>> [  198.550676] rockchip-pcie f8000000.pcie:    IO
> > >>>>> 0xfbe00000..0xfbefffff -> 0xfbe00000
> > >>>>> [  198.552908] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
> > >>>>> [  198.552933] pci_bus 0000:00: root bus resource [bus 00-1f]
> > >>>>> [  198.552943] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
> > >>>>> [  198.552954] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
> > >>>>> (bus address [0xfbe00000-0xfbefffff])
> > >>>>> [  198.552965] pci_bus 0000:00: scanning bus
> > >>>>> [  198.554198] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
> > >>>>> [  198.555508] pci 0000:00:00.0: supports D1
> > >>>>> [  198.555516] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
> > >>>>> [  198.556023] pci 0000:00:00.0: PME# disabled
> > >>>>> [  198.561245] pci_bus 0000:00: fixups for bus
> > >>>>> [  198.561269] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
> > >>>>> [  198.561277] pci 0000:00:00.0: bridge configuration invalid ([bus
> > >>>>> 00-00]), reconfiguring
> > >>>>> [  198.566429] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> > >>>>> [  198.567008] pci_bus 0000:01: scanning bus
> > >>>>> [  198.567171] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
> > >>>>> [  198.567420] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> > >>>>> [  198.567515] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> > >>>>> 64bit pref]
> > >>>>> [  198.567608] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> > >>>>> 64bit pref]
> > >>>>> [  198.567665] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
> > >>>>> [  198.567673] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
> > >>>>> [  198.567730] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> > >>>>> [  198.567815] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
> > >>>>> [  198.569051] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
> > >>>>> limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
> > >>>>> with 8 GT/s x16 link)
> > >>>>> [  198.570225] pci 0000:01:00.0: vgaarb: VGA device added:
> > >>>>> decodes=io+mem,owns=none,locks=none
> > >>>>> [  198.570481] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
> > >>>>> [  198.570663] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> > >>>>> [  198.571039] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
> > >>>>> <snip>
> > >>>>> [  198.749857] pci_bus 0000:01: read pcie, devfn 1, at 100, size 2
> > >>>>> [  198.750252] pci_bus 0000:01: read pcie, devfn 2, at 0, size 4
> > >>>>> [  198.750881] Internal error: synchronous external abort: 96000210
> > >>>>> [#1] PREEMPT SMP
> > >>>>
> > >>>> Is there really supposed to be a device at 01:00.2?
> > >>>>
> > >>>> Maybe this is just the PCIe Unsupported Request error that we expect
> > >>>> to get when trying to read config space of a device that doesn't
> > >>>> exist.
> > >>>>
> > >>>> On "most" platforms, we just get ~0 data back when that happens, but
> > >>>> I'm not sure that's always the case on arm64.  I think it depends on
> > >>>> how the PCIe host bridge is designed, and there might be some CPU
> > >>>> configuration, too.
> > >>>
> > >>> Yes, this is a GTX645 video card.
> > >>> Nvidia cards usually have two to three devices,
> > >>> The GPU proper, the audio device for the hdmi output, and the i2c controller.
> > >>>
> > >>> I do think that this driver is missing sanity checking on the
> > >>> addressing, since the BRCM driver for the RPI4 doesn't try to
> > >>> enumerate a video card, since it checks if the MMIO space is large
> > >>> enough to fit the BAR before assigning the addresses. See [3]. Also in
> > >>> that thread he was able to increase the address space provided to the
> > >>> BRCM driver and fix the issue, but I don't see how we could do that on
> > >>> the rk3399.
> > >>>
> > >>> pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff 64bit pref] is
> > >>> 128 MB, which already exceeds our address space.
> > >>> I think the driver is just overflowing the address space.
> > >>
> > >> If we don't have enough space to assign all the device BARs, I think a
> > >> driver will still be able to claim the device, but when the driver
> > >> calls pci_enable_device(), it should fail.  Lack of address space
> > >> should not cause a PCIe error.
> > >>
> > >> But in this case, none of that matters because we're still enumerating
> > >> devices in pci_scan_root_bus_bridge().  We haven't gotten to the point
> > >> of trying to bind drivers to devices, so the driver isn't involved at
> > >> all yet.
> > > For clarification, the driver I'm referring to is the rk3399-pcie host driver.
> > >>
> > >> The backtrace says we're trying to read the Vendor ID of a device, and
> > >> your debug output suggests we're trying to enumerate 01:00.2.  If you
> > >> put that card in another system, you could find out how many functions
> > >> it has.
> > >>
> > >> Or if you swapped this with other cards where you know the number of
> > >> functions, you could see if the external abort always happens when
> > >> probing for the first unimplemented function.
> > > This card definitely has more than one function.
> >
> > FWIW, random lspci logs I grovelled up off the internet show cards with
> > the same IDs only implementing functions 0 and 1, which does suggest
> > that maybe function 2 really doesn't exist but the card handles
> > unsupported config requests in a way that this particular bridge/root
> > port doesn't quite deal with properly.
>
> Thanks! Is there a sane way we could make the kernel handle this in
> place of the controller?
> (It's apparently based on the designware ip, but it doesn't use their
> driver for some reason)
>
> >
> > > Before my original message I hacked in some code to make the driver
> > > return 0xff when devfn > 1, and the scan passed, but as soon as
> > > nouveau attempted to access the device, the entire kernel exploded.

This is what happens when probing is prevented from reading above
devfn 1, and nouveau gets a shot.
[   76.658128] SError Interrupt on CPU4, code 0xbf000000 -- SError
[   76.658129] CPU: 4 PID: 1752 Comm: systemd-udevd Not tainted
5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #16
[   76.658130] Hardware name: Pine64 RockPro64 (DT)
[   76.658131] pstate: 60400005 (nZCv daif +PAN -UAO)
[   76.658131] pc : nvkm_device_ctor+0x2cc/0x2e18 [nouveau]
[   76.658132] lr : nvkm_device_ctor+0x2c0/0x2e18 [nouveau]
[   76.658133] sp : ffff800012daf740
[   76.658134] x29: ffff800012daf740 x28: ffff0000c2db9800
[   76.658136] x27: 0000000000000000 x26: ffff0000c7f2e0b0
[   76.658138] x25: 0000000000000000 x24: 0000000000000000
[   76.658139] x23: 0000000000000000 x22: ffff800009e1b1e0
[   76.658141] x21: ffff0000c335b940 x20: ffff800009ede660
[   76.658142] x19: 0000000000000000 x18: 0000000000000001
[   76.658144] x17: 0000000000000000 x16: 0000000000000000
[   76.658145] x15: ffff0000c335bdc0 x14: ffff8000140a0000
[   76.658147] x13: 0000000000110000 x12: 0040000000000001
[   76.658148] x11: ffff0000c12b8180 x10: 0000000000000085
[   76.658150] x9 : ffff800013c00000 x8 : 0000000000110000
[   76.658152] x7 : 0000000000000000 x6 : ffff0000f7ff0008
[   76.658153] x5 : ffff800013d0ffff x4 : ffff800013c00000
[   76.658154] x3 : ffff800013c00004 x2 : 0140000000000000
[   76.658156] x1 : 0000000000000000 x0 : ffff800013c00000
[   76.658158] Kernel panic - not syncing: Asynchronous SError Interrupt
[   76.658159] CPU: 4 PID: 1752 Comm: systemd-udevd Not tainted
5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #16
[   76.658160] Hardware name: Pine64 RockPro64 (DT)
[   76.658161] Call trace:
[   76.658161]  dump_backtrace+0x0/0x148
[   76.658162]  show_stack+0x24/0x30
[   76.658163]  dump_stack+0xa8/0xf0
[   76.658163]  panic+0x160/0x36c
[   76.658164]  nmi_panic+0x94/0x98
[   76.658164]  arm64_serror_panic+0x84/0x90
[   76.658165]  do_serror+0x88/0x140
[   76.658166]  el1_error+0xc4/0x170
[   76.658166]  nvkm_device_ctor+0x2cc/0x2e18 [nouveau]
[   76.658167]  nvkm_device_pci_new+0xf8/0x2c8 [nouveau]
[   76.658168]  nouveau_drm_probe+0x60/0x288 [nouveau]
[   76.658168]  local_pci_probe+0x44/0x98
[   76.658169]  pci_device_probe+0x118/0x1b0
[   76.658170]  really_probe+0xe0/0x320
[   76.658171]  driver_probe_device+0x5c/0xf0
[   76.658171]  device_driver_attach+0x74/0x80
[   76.658172]  __driver_attach+0x64/0xe0
[   76.658173]  bus_for_each_dev+0x80/0xd0
[   76.658173]  driver_attach+0x30/0x40
[   76.658174]  bus_add_driver+0x14c/0x1f0
[   76.658175]  driver_register+0x64/0x110
[   76.658175]  __pci_register_driver+0x58/0x68
[   76.658176]  nouveau_drm_init+0x17c/0x10000 [nouveau]
[   76.658177]  do_one_initcall+0x50/0x298
[   76.658178]  do_init_module+0x88/0x270
[   76.658178]  load_module+0x2068/0x2600
[   76.658179]  __do_sys_finit_module+0xd0/0xe8
[   76.658180]  __arm64_sys_finit_module+0x28/0x38
[   76.658180]  el0_svc_common.constprop.0+0x7c/0x1f0
[   76.658181]  el0_svc_handler+0x34/0xa0
[   76.658182]  el0_svc+0x14/0x40
[   76.658182]  el0_sync_handler+0x118/0x280
[   76.658183]  el0_sync+0x164/0x180
[   76.658248] SMP: stopping secondary CPUs
[   76.658249] Kernel Offset: disabled
[   76.658250] CPU features: 0x10002,2000600c
[   76.658250] Memory Limit: none


> > >
> > > Another reason I believe the address assignments are overflowing and
> > > corrupting other address assignments is after the external abort, the
> > > entire PCIE controller is inaccessible.
> > > $ lspci
> > > pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.1/config
> > > lspci: Unable to read the standard configuration space header of
> > > device 0000:01:00.1
> > > pcilib: Cannot open /sys/bus/pci/devices/0000:00:00.0/config
> > > lspci: Unable to read the standard configuration space header of
> > > device 0000:00:00.0
> > > pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.0/config
> > > lspci: Unable to read the standard configuration space header of
> > > device 0000:01:00.0
> > >
> > > Attempting to rescan the bus or any other pci function results in a
> > > hung kernel task.
> >
> > You crashed deep in the kernel part-way through the driver's probe
> > routine; don't expect any partly-initialised bits of that driver to
> > actually work. I bet you also can't load/unload modules and do various
> > other things either, because the thread will almost certainly have been
> > holding a bunch of locks, too. It's not worth trying to reason about the
> > state of anything after that kind of unexpected crash ;)
>
> That is observed, but I think it is actually more destructive than that.
> Attempting to do an access after the fact, then performing a soft
> reset results in a spectacular crash when it tries to bring the BIG
> cluster online.
>
> [    0.203160] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
> [    5.355606] CPU4: failed to come online
> [    5.363258] CPU4: failed in unknown state : 0x0
> Unhandled Exception in EL3.
> x30 =           0xUhandled Exception in L3.
> x30 =           UNhandled ExcEption in EL3.
> x30 =   0xUnhandled Excepti2n in EL_.
> x30 =           0xUnhandled Exception in EL3.
> x30 =           lxUnhadled Exception in EL3.
> x30 =           0xUnhandled Excepion in EL3.
> x30 =           0xUnhandled Exception in EL3.
> x30 =   0xUnhandled Eception in EL3.
> x30 =           0xUnhandled Exception in EL3.
> x30 =           0xUnhandled Exception in EL3.x30 =
> 0xUnhandled Exception in EL3.
> x30 =           0xUnhandled Excepton i2 EL3.
> x30 =           0xUnhandUnhandled Exceptionein EL3.lx30 =
>  0xUnhandled Exception in EL3.
> x30 =           0xUnhandled Unhanded Exception in EL3.
> x30 =           0xUnhandled Exceptin in EL3Unhandled Exception inEL3.
> x0 =            0xUnhandled Exception in EL3.
> x30 =           0xUnhandled Exception in EL3.
> x30 =           0xUnhandled Exception in EL.
> x30 =           0xUnhandled Exception in EL3.
> x30 =           0xUnhandled Exception in EL3.
> x30 =           0xUnhandled xception in EL3
> x30 =           0xUnhcndled Exception in EL3.
> x30=            0xUnhandled Excextion in EL3.
> x30 =   0xUnhandled Exception in EL3.
> x30 =           0x[   10.484411] CPU5: failed to come online
> [   10.484807] CPU5: failed in unknown state : 0x0
> [   10.485374] smp: Brought up 1 node, 4 CPUs
>
> It goes on until it tries to do it's first DMA setup, then just hangs.
> >
> > >> If the Root Port (00:00.0) supports AER, you could also dump out the
> > >> status registers from the AER capability and figure out whether it
> > >> logged a PCIe error.  This would be sort of like what
> > >> aer_process_err_devices() does.  A bit of a hassle to do this by hand
> > >> in the exception path, but could be enlightening, just as a debug
> > >> tool.
> > >
> > > Is there a way to handle external synchronous aborts in a device driver?
> >
> > Not easily on arm64 - the closest thing we have is some arch-internal
> > machinery for undefined instruction hooks. You'd have to hack something
> > gruesome into do_sea()...
> >
> > For starters it might be worth enabling all the debug prints in
> > rockchip-pcie to see if there's any error IRQ corresponding to the
> > aborted access.
>
> Debug prints are already enabled, I had to hack in the print to show
> what addresses it was crashing on.
>
> Your explanation was the last piece for me to understand how this all
> works, thanks!
> >
> > Robin.
> >
> > > If so, I'll definitely look into plugging in the aer status functions.
> > >
> > >>
> > >>> [3] https://twitter.com/domipheus/status/1167586160077627393
> > >>>>
> > >>>>> [  198.751581] Modules linked in: drm_panel_orientation_quirks
> > >>>>> pcie_rockchip_host(+) cpufreq_dt sch_fq_codel ip_tables x_tables ipv6
> > >>>>> crc_ccitt nf_defrag_ipv6
> > >>>>> [  198.752861] CPU: 1 PID: 1686 Comm: systemd-udevd Not tainted
> > >>>>> 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #5
> > >>>>> [  198.753791] Hardware name: Pine64 RockPro64 (DT)
> > >>>>> [  198.754215] pstate: 60400085 (nZCv daIf +PAN -UAO)
> > >>>>> [  198.754672] pc : __raw_readl+0x0/0x8 [pcie_rockchip_host]
> > >>>>> [  198.755172] lr : rockchip_pcie_rd_conf+0x140/0x1dc [pcie_rockchip_host]
> > >>>>> [  198.755773] sp : ffff8000132af530
> > >>>>> [  198.756079] x29: ffff8000132af530 x28: 0000000000000000
> > >>>>> [  198.756565] x27: 0000000000000001 x26: 0000000000000000
> > >>>>> [  198.757049] x25: ffff0000c20ac000 x24: 0000000000002000
> > >>>>> [  198.757534] x23: ffff0000c20ae5c0 x22: ffff8000132af5d4
> > >>>>> [  198.758018] x21: 0000000000002000 x20: 0000000000000004
> > >>>>> [  198.758502] x19: 0000000000102000 x18: 0000000000000001
> > >>>>> [  198.758987] x17: 0000000000000000 x16: 0000000000000000
> > >>>>> [  198.759472] x15: ffffffffffffffff x14: ffff80001159bcc8
> > >>>>> [  198.759957] x13: 0000000000000000 x12: ffff800011b2c000
> > >>>>> [  198.760441] x11: ffff8000115bf000 x10: ffff800011310018
> > >>>>> [  198.760926] x9 : 00000000fffb9fff x8 : 0000000000000001
> > >>>>> [  198.761410] x7 : 0000000000000000 x6 : ffff0000f7492548
> > >>>>> [  198.761894] x5 : 0000000000000001 x4 : ffff0000f7492548
> > >>>>> [  198.762379] x3 : 0000000000000000 x2 : 0000000000c00008
> > >>>>> [  198.762863] x1 : ffff80001dc00008 x0 : ffff80001a102000
> > >>>>> [  198.763348] Call trace:
> > >>>>> [  198.763583]  __raw_readl+0x0/0x8 [pcie_rockchip_host]
> > >>>>> [  198.764057]  pci_bus_read_config_dword+0x88/0xd0
> > >>>>> [  198.764484]  pci_bus_generic_read_dev_vendor_id+0x40/0x1b8
> > >>>>> [  198.764982]  pci_bus_read_dev_vendor_id+0x58/0x88
> > >>>>> [  198.765413]  pci_scan_single_device+0x84/0xf8
> > >>>>> [  198.765812]  pci_scan_slot+0x7c/0x120
> > >>>>> [  198.766149]  pci_scan_child_bus_extend+0x68/0x2dc
> > >>>>> [  198.766579]  pci_scan_bridge_extend+0x350/0x588
> > >>>>> [  198.766992]  pci_scan_child_bus_extend+0x21c/0x2dc
> > >>>>> [  198.767430]  pci_scan_child_bus+0x24/0x30
> > >>>>> [  198.767797]  pci_scan_root_bus_bridge+0xc4/0xd0
> > >>>>> [  198.768215]  rockchip_pcie_probe+0x610/0x74c [pcie_rockchip_host]
> > >>>>> [  198.768770]  platform_drv_probe+0x58/0xa8
> > >>>>> [  198.769139]  really_probe+0xe0/0x318
> > >>>>> [  198.769468]  driver_probe_device+0x5c/0xf0
> > >>>>> [  198.769844]  device_driver_attach+0x74/0x80
> > >>>>> [  198.770227]  __driver_attach+0x64/0xe8
> > >>>>> [  198.770572]  bus_for_each_dev+0x84/0xd8
> > >>>>> [  198.770924]  driver_attach+0x30/0x40
> > >>>>> [  198.771253]  bus_add_driver+0x188/0x1e8
> > >>>>> [  198.771605]  driver_register+0x64/0x110
> > >>>>> [  198.771956]  __platform_driver_register+0x54/0x60
> > >>>>> [  198.772388]  rockchip_pcie_driver_init+0x28/0x10000 [pcie_rockchip_host]
> > >>>>> [  198.772998]  do_one_initcall+0x94/0x390
> > >>>>> [  198.773353]  do_init_module+0x88/0x268
> > >>>>> [  198.773697]  load_module+0x1e18/0x2198
> > >>>>> [  198.774043]  __do_sys_finit_module+0xd0/0xe8
> > >>>>> [  198.774435]  __arm64_sys_finit_module+0x28/0x38
> > >>>>> [  198.774858]  el0_svc_common.constprop.3+0xa4/0x1d8
> > >>>>> [  198.775297]  el0_svc_handler+0x34/0xa0
> > >>>>> [  198.775645]  el0_svc+0x14/0x40
> > >>>>> [  198.775928]  el0_sync_handler+0x118/0x290
> > >>>>> [  198.776295]  el0_sync+0x164/0x180
> > >>>>> [  198.776609] Code: bad PC value
> > >>>>> [  198.776897] ---[ end trace 88fc77651b5e2909 ]---
> > >
> > > _______________________________________________
> > > Linux-rockchip mailing list
> > > Linux-rockchip@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-rockchip
> > >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-13  1:06             ` Peter Geis
@ 2019-11-13  1:19               ` Peter Geis
  2019-11-21  0:36                 ` Peter Geis
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Geis @ 2019-11-13  1:19 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Bjorn Helgaas, linux-pci, Shawn Lin, Heiko Stuebner,
	open list:ARM/Rockchip SoC...

On Tue, Nov 12, 2019 at 8:06 PM Peter Geis <pgwipeout@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 2:41 PM Peter Geis <pgwipeout@gmail.com> wrote:
> >
> > On Tue, Nov 12, 2019 at 2:15 PM Robin Murphy <robin.murphy@arm.com> wrote:
> > >
> > > On 12/11/2019 3:55 pm, Peter Geis wrote:
> > > > On Mon, Nov 11, 2019 at 9:29 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >>
> > > >> On Mon, Nov 11, 2019 at 07:30:15PM -0500, Peter Geis wrote:
> > > >>> On Mon, Nov 11, 2019 at 7:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >>>> On Mon, Nov 04, 2019 at 01:55:40PM -0500, Peter Geis wrote:
> > > >>>>> Good Morning,
> > > >>>>>
> > > >>>>> I'm attempting to debug an issue with the rockpro64 pcie port.
> > > >>>>> It would appear that the port does not like various cards, including
> > > >>>>> cards of the same make that randomly work or do not work, such as
> > > >>>>> Intel i340 based NICs.
> > > >>>>> I'm experiencing it with a GTX645 gpu.
> > > >>>>>
> > > >>>>> This seems to be a long running issue, referenced both at [0], and [1].
> > > >>>>> There was an attempt to rectify it, by adding a delay between training
> > > >>>>> and probing [2], but that doesn't seem to be the issue here.
> > > >>>>> It appears that when we probe further into the card, such as devfn >
> > > >>>>> 1, we trigger the bug.
> > > >>>>> I've added a print statement that prints the devfn, address, and size
> > > >>>>> information, which you can see below.
> > > >>>>>
> > > >>>>> I've attempted setting the available number of lanes to 1 as well, to
> > > >>>>> no difference.
> > > >>>>>
> > > >>>>> If anyone could point me in the right direction as to where to
> > > >>>>> continue debugging, I'd greatly appreciate it.
> > > >>>>>
> > > >>>>> [0] https://github.com/ayufan-rock64/linux-build/issues/254
> > > >>>>> [1] https://github.com/rockchip-linux/kernel/issues/116
> > > >>>>> [2] https://github.com/ayufan-rock64/linux-kernel/commit/3cde5c624c9c39aa03251a55c2d26a48b5bdca5b
> > > >>>>>
> > > >>>>> [  198.491458] rockchip-pcie f8000000.pcie: missing legacy phy; search
> > > >>>>> for per-lane PHY
> > > >>>>> [  198.492986] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
> > > >>>>> [  198.493060] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
> > > >>>>> [  198.550444] rockchip-pcie f8000000.pcie: current link width is x1
> > > >>>>> [  198.550458] rockchip-pcie f8000000.pcie: idling lane 1
> > > >>>>> [  198.550479] rockchip-pcie f8000000.pcie: idling lane 2
> > > >>>>> [  198.550490] rockchip-pcie f8000000.pcie: idling lane 3
> > > >>>>> [  198.550608] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
> > > >>>>> [  198.550625] rockchip-pcie f8000000.pcie: Parsing ranges property...
> > > >>>>> [  198.550656] rockchip-pcie f8000000.pcie:   MEM
> > > >>>>> 0xfa000000..0xfbdfffff -> 0xfa000000
> > > >>>>> [  198.550676] rockchip-pcie f8000000.pcie:    IO
> > > >>>>> 0xfbe00000..0xfbefffff -> 0xfbe00000
> > > >>>>> [  198.552908] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
> > > >>>>> [  198.552933] pci_bus 0000:00: root bus resource [bus 00-1f]
> > > >>>>> [  198.552943] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
> > > >>>>> [  198.552954] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
> > > >>>>> (bus address [0xfbe00000-0xfbefffff])
> > > >>>>> [  198.552965] pci_bus 0000:00: scanning bus
> > > >>>>> [  198.554198] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
> > > >>>>> [  198.555508] pci 0000:00:00.0: supports D1
> > > >>>>> [  198.555516] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
> > > >>>>> [  198.556023] pci 0000:00:00.0: PME# disabled
> > > >>>>> [  198.561245] pci_bus 0000:00: fixups for bus
> > > >>>>> [  198.561269] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
> > > >>>>> [  198.561277] pci 0000:00:00.0: bridge configuration invalid ([bus
> > > >>>>> 00-00]), reconfiguring
> > > >>>>> [  198.566429] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> > > >>>>> [  198.567008] pci_bus 0000:01: scanning bus
> > > >>>>> [  198.567171] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
> > > >>>>> [  198.567420] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> > > >>>>> [  198.567515] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> > > >>>>> 64bit pref]
> > > >>>>> [  198.567608] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> > > >>>>> 64bit pref]
> > > >>>>> [  198.567665] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
> > > >>>>> [  198.567673] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
> > > >>>>> [  198.567730] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> > > >>>>> [  198.567815] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
> > > >>>>> [  198.569051] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
> > > >>>>> limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
> > > >>>>> with 8 GT/s x16 link)
> > > >>>>> [  198.570225] pci 0000:01:00.0: vgaarb: VGA device added:
> > > >>>>> decodes=io+mem,owns=none,locks=none
> > > >>>>> [  198.570481] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
> > > >>>>> [  198.570663] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> > > >>>>> [  198.571039] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
> > > >>>>> <snip>
> > > >>>>> [  198.749857] pci_bus 0000:01: read pcie, devfn 1, at 100, size 2
> > > >>>>> [  198.750252] pci_bus 0000:01: read pcie, devfn 2, at 0, size 4
> > > >>>>> [  198.750881] Internal error: synchronous external abort: 96000210
> > > >>>>> [#1] PREEMPT SMP
> > > >>>>
> > > >>>> Is there really supposed to be a device at 01:00.2?
> > > >>>>
> > > >>>> Maybe this is just the PCIe Unsupported Request error that we expect
> > > >>>> to get when trying to read config space of a device that doesn't
> > > >>>> exist.
> > > >>>>
> > > >>>> On "most" platforms, we just get ~0 data back when that happens, but
> > > >>>> I'm not sure that's always the case on arm64.  I think it depends on
> > > >>>> how the PCIe host bridge is designed, and there might be some CPU
> > > >>>> configuration, too.
> > > >>>
> > > >>> Yes, this is a GTX645 video card.
> > > >>> Nvidia cards usually have two to three devices,
> > > >>> The GPU proper, the audio device for the hdmi output, and the i2c controller.
> > > >>>
> > > >>> I do think that this driver is missing sanity checking on the
> > > >>> addressing, since the BRCM driver for the RPI4 doesn't try to
> > > >>> enumerate a video card, since it checks if the MMIO space is large
> > > >>> enough to fit the BAR before assigning the addresses. See [3]. Also in
> > > >>> that thread he was able to increase the address space provided to the
> > > >>> BRCM driver and fix the issue, but I don't see how we could do that on
> > > >>> the rk3399.
> > > >>>
> > > >>> pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff 64bit pref] is
> > > >>> 128 MB, which already exceeds our address space.
> > > >>> I think the driver is just overflowing the address space.
> > > >>
> > > >> If we don't have enough space to assign all the device BARs, I think a
> > > >> driver will still be able to claim the device, but when the driver
> > > >> calls pci_enable_device(), it should fail.  Lack of address space
> > > >> should not cause a PCIe error.
> > > >>
> > > >> But in this case, none of that matters because we're still enumerating
> > > >> devices in pci_scan_root_bus_bridge().  We haven't gotten to the point
> > > >> of trying to bind drivers to devices, so the driver isn't involved at
> > > >> all yet.
> > > > For clarification, the driver I'm referring to is the rk3399-pcie host driver.
> > > >>
> > > >> The backtrace says we're trying to read the Vendor ID of a device, and
> > > >> your debug output suggests we're trying to enumerate 01:00.2.  If you
> > > >> put that card in another system, you could find out how many functions
> > > >> it has.
> > > >>
> > > >> Or if you swapped this with other cards where you know the number of
> > > >> functions, you could see if the external abort always happens when
> > > >> probing for the first unimplemented function.
> > > > This card definitely has more than one function.
> > >
> > > FWIW, random lspci logs I grovelled up off the internet show cards with
> > > the same IDs only implementing functions 0 and 1, which does suggest
> > > that maybe function 2 really doesn't exist but the card handles
> > > unsupported config requests in a way that this particular bridge/root
> > > port doesn't quite deal with properly.
> >
> > Thanks! Is there a sane way we could make the kernel handle this in
> > place of the controller?
> > (It's apparently based on the designware ip, but it doesn't use their
> > driver for some reason)
> >
> > >
> > > > Before my original message I hacked in some code to make the driver
> > > > return 0xff when devfn > 1, and the scan passed, but as soon as
> > > > nouveau attempted to access the device, the entire kernel exploded.
>
> This is what happens when probing is prevented from reading above
> devfn 1, and nouveau gets a shot.
> [   76.658128] SError Interrupt on CPU4, code 0xbf000000 -- SError
> [   76.658129] CPU: 4 PID: 1752 Comm: systemd-udevd Not tainted
> 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #16
> [   76.658130] Hardware name: Pine64 RockPro64 (DT)
> [   76.658131] pstate: 60400005 (nZCv daif +PAN -UAO)
> [   76.658131] pc : nvkm_device_ctor+0x2cc/0x2e18 [nouveau]
> [   76.658132] lr : nvkm_device_ctor+0x2c0/0x2e18 [nouveau]
> [   76.658133] sp : ffff800012daf740
> [   76.658134] x29: ffff800012daf740 x28: ffff0000c2db9800
> [   76.658136] x27: 0000000000000000 x26: ffff0000c7f2e0b0
> [   76.658138] x25: 0000000000000000 x24: 0000000000000000
> [   76.658139] x23: 0000000000000000 x22: ffff800009e1b1e0
> [   76.658141] x21: ffff0000c335b940 x20: ffff800009ede660
> [   76.658142] x19: 0000000000000000 x18: 0000000000000001
> [   76.658144] x17: 0000000000000000 x16: 0000000000000000
> [   76.658145] x15: ffff0000c335bdc0 x14: ffff8000140a0000
> [   76.658147] x13: 0000000000110000 x12: 0040000000000001
> [   76.658148] x11: ffff0000c12b8180 x10: 0000000000000085
> [   76.658150] x9 : ffff800013c00000 x8 : 0000000000110000
> [   76.658152] x7 : 0000000000000000 x6 : ffff0000f7ff0008
> [   76.658153] x5 : ffff800013d0ffff x4 : ffff800013c00000
> [   76.658154] x3 : ffff800013c00004 x2 : 0140000000000000
> [   76.658156] x1 : 0000000000000000 x0 : ffff800013c00000
> [   76.658158] Kernel panic - not syncing: Asynchronous SError Interrupt
> [   76.658159] CPU: 4 PID: 1752 Comm: systemd-udevd Not tainted
> 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #16
> [   76.658160] Hardware name: Pine64 RockPro64 (DT)
> [   76.658161] Call trace:
> [   76.658161]  dump_backtrace+0x0/0x148
> [   76.658162]  show_stack+0x24/0x30
> [   76.658163]  dump_stack+0xa8/0xf0
> [   76.658163]  panic+0x160/0x36c
> [   76.658164]  nmi_panic+0x94/0x98
> [   76.658164]  arm64_serror_panic+0x84/0x90
> [   76.658165]  do_serror+0x88/0x140
> [   76.658166]  el1_error+0xc4/0x170
> [   76.658166]  nvkm_device_ctor+0x2cc/0x2e18 [nouveau]
> [   76.658167]  nvkm_device_pci_new+0xf8/0x2c8 [nouveau]
> [   76.658168]  nouveau_drm_probe+0x60/0x288 [nouveau]
> [   76.658168]  local_pci_probe+0x44/0x98
> [   76.658169]  pci_device_probe+0x118/0x1b0
> [   76.658170]  really_probe+0xe0/0x320
> [   76.658171]  driver_probe_device+0x5c/0xf0
> [   76.658171]  device_driver_attach+0x74/0x80
> [   76.658172]  __driver_attach+0x64/0xe0
> [   76.658173]  bus_for_each_dev+0x80/0xd0
> [   76.658173]  driver_attach+0x30/0x40
> [   76.658174]  bus_add_driver+0x14c/0x1f0
> [   76.658175]  driver_register+0x64/0x110
> [   76.658175]  __pci_register_driver+0x58/0x68
> [   76.658176]  nouveau_drm_init+0x17c/0x10000 [nouveau]
> [   76.658177]  do_one_initcall+0x50/0x298
> [   76.658178]  do_init_module+0x88/0x270
> [   76.658178]  load_module+0x2068/0x2600
> [   76.658179]  __do_sys_finit_module+0xd0/0xe8
> [   76.658180]  __arm64_sys_finit_module+0x28/0x38
> [   76.658180]  el0_svc_common.constprop.0+0x7c/0x1f0
> [   76.658181]  el0_svc_handler+0x34/0xa0
> [   76.658182]  el0_svc+0x14/0x40
> [   76.658182]  el0_sync_handler+0x118/0x280
> [   76.658183]  el0_sync+0x164/0x180
> [   76.658248] SMP: stopping secondary CPUs
> [   76.658249] Kernel Offset: disabled
> [   76.658250] CPU features: 0x10002,2000600c
> [   76.658250] Memory Limit: none

Last log, I promise.
This is what happens when it is locked out of devfn > 1, and nouveau
is not available.

[    4.199087] vcc3v3_pcie: supplied by vcc12v_dcin
[   55.617783] ehci-pci: EHCI PCI platform driver
[   55.868935] ohci-pci: OHCI PCI platform driver
[   65.426085] rockchip-pcie f8000000.pcie: missing legacy phy; search
for per-lane PHY
[   65.426645] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
[   65.426662] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
[   65.507012] rockchip-pcie f8000000.pcie: current link width is x4
[   65.507078] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
[   65.507088] rockchip-pcie f8000000.pcie: Parsing ranges property...
[   65.507103] rockchip-pcie f8000000.pcie:   MEM
0xfa000000..0xfbdfffff -> 0xfa000000
[   65.507112] rockchip-pcie f8000000.pcie:    IO
0xfbe00000..0xfbffffff -> 0xfbe00000
[   65.507368] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
[   65.507376] pci_bus 0000:00: root bus resource [bus 00-1f]
[   65.507382] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
[   65.507390] pci_bus 0000:00: root bus resource [io
0x0000-0x1fffff] (bus address [0xfbe00000-0xfbffffff])
[   65.507394] pci_bus 0000:00: scanning bus
[   65.507423] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
[   65.507543] pci 0000:00:00.0: supports D1
[   65.507547] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
[   65.507555] pci 0000:00:00.0: PME# disabled
[   65.511436] pci_bus 0000:00: fixups for bus
[   65.511452] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
[   65.511457] pci 0000:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[   65.511471] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
[   65.511633] pci_bus 0000:01: scanning bus
[   65.511739] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
[   65.512205] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
[   65.512274] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
64bit pref]
[   65.512345] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
64bit pref]
[   65.512390] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
[   65.512394] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
[   65.512435] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[   65.512485] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
[   65.513319] pci 0000:01:00.0: 8.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x4 link at 0000:00:00.0 (capable of 126.016 Gb/s
with 8 GT/s x16 link)
[   65.513568] pci 0000:01:00.0: vgaarb: VGA device added:
decodes=io+mem,owns=none,locks=none
[   65.513744] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
[   65.513900] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[   65.514179] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
[   65.515129] rockchip-pcie f8000000.pcie: phy link changes
[   65.527011] pci_bus 0000:01: fixups for bus
[   65.527021] pci_bus 0000:01: bus scan returning with max=01
[   65.527030] pci_bus 0000:01: busn_res: [bus 01-1f] end is updated to 01
[   65.527042] pci_bus 0000:00: bus scan returning with max=01
[   65.527076] pci 0000:00:00.0: BAR 14: no space for [mem size 0x0c000000]
[   65.527081] pci 0000:00:00.0: BAR 14: failed to assign [mem size 0x0c000000]
[   65.527091] pci 0000:01:00.0: BAR 1: no space for [mem size
0x08000000 64bit pref]
[   65.527095] pci 0000:01:00.0: BAR 1: failed to assign [mem size
0x08000000 64bit pref]
[   65.527100] pci 0000:01:00.0: BAR 3: no space for [mem size
0x02000000 64bit pref]
[   65.527104] pci 0000:01:00.0: BAR 3: failed to assign [mem size
0x02000000 64bit pref]
[   65.527109] pci 0000:01:00.0: BAR 0: no space for [mem size 0x01000000]
[   65.527119] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x01000000]
[   65.527123] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00080000 pref]
[   65.527129] pci 0000:01:00.0: BAR 6: failed to assign [mem size
0x00080000 pref]
[   65.527134] pci 0000:01:00.1: BAR 0: no space for [mem size 0x00004000]
[   65.527138] pci 0000:01:00.1: BAR 0: failed to assign [mem size 0x00004000]
[   65.527142] pci 0000:01:00.0: BAR 5: no space for [io  size 0x0080]
[   65.527146] pci 0000:01:00.0: BAR 5: failed to assign [io  size 0x0080]
[   65.527152] pci 0000:00:00.0: PCI bridge to [bus 01]
[   65.529410] pcieport 0000:00:00.0: assign IRQ: got 238
[   65.529454] pcieport 0000:00:00.0: enabling bus mastering
[   65.529807] pcieport 0000:00:00.0: PME: Signaling with IRQ 239
[   65.530265] pcieport 0000:00:00.0: AER: enabled with IRQ 239
[   65.530685] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0


>
>
> > > >
> > > > Another reason I believe the address assignments are overflowing and
> > > > corrupting other address assignments is after the external abort, the
> > > > entire PCIE controller is inaccessible.
> > > > $ lspci
> > > > pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.1/config
> > > > lspci: Unable to read the standard configuration space header of
> > > > device 0000:01:00.1
> > > > pcilib: Cannot open /sys/bus/pci/devices/0000:00:00.0/config
> > > > lspci: Unable to read the standard configuration space header of
> > > > device 0000:00:00.0
> > > > pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.0/config
> > > > lspci: Unable to read the standard configuration space header of
> > > > device 0000:01:00.0
> > > >
> > > > Attempting to rescan the bus or any other pci function results in a
> > > > hung kernel task.
> > >
> > > You crashed deep in the kernel part-way through the driver's probe
> > > routine; don't expect any partly-initialised bits of that driver to
> > > actually work. I bet you also can't load/unload modules and do various
> > > other things either, because the thread will almost certainly have been
> > > holding a bunch of locks, too. It's not worth trying to reason about the
> > > state of anything after that kind of unexpected crash ;)
> >
> > That is observed, but I think it is actually more destructive than that.
> > Attempting to do an access after the fact, then performing a soft
> > reset results in a spectacular crash when it tries to bring the BIG
> > cluster online.
> >
> > [    0.203160] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
> > [    5.355606] CPU4: failed to come online
> > [    5.363258] CPU4: failed in unknown state : 0x0
> > Unhandled Exception in EL3.
> > x30 =           0xUhandled Exception in L3.
> > x30 =           UNhandled ExcEption in EL3.
> > x30 =   0xUnhandled Excepti2n in EL_.
> > x30 =           0xUnhandled Exception in EL3.
> > x30 =           lxUnhadled Exception in EL3.
> > x30 =           0xUnhandled Excepion in EL3.
> > x30 =           0xUnhandled Exception in EL3.
> > x30 =   0xUnhandled Eception in EL3.
> > x30 =           0xUnhandled Exception in EL3.
> > x30 =           0xUnhandled Exception in EL3.x30 =
> > 0xUnhandled Exception in EL3.
> > x30 =           0xUnhandled Excepton i2 EL3.
> > x30 =           0xUnhandUnhandled Exceptionein EL3.lx30 =
> >  0xUnhandled Exception in EL3.
> > x30 =           0xUnhandled Unhanded Exception in EL3.
> > x30 =           0xUnhandled Exceptin in EL3Unhandled Exception inEL3.
> > x0 =            0xUnhandled Exception in EL3.
> > x30 =           0xUnhandled Exception in EL3.
> > x30 =           0xUnhandled Exception in EL.
> > x30 =           0xUnhandled Exception in EL3.
> > x30 =           0xUnhandled Exception in EL3.
> > x30 =           0xUnhandled xception in EL3
> > x30 =           0xUnhcndled Exception in EL3.
> > x30=            0xUnhandled Excextion in EL3.
> > x30 =   0xUnhandled Exception in EL3.
> > x30 =           0x[   10.484411] CPU5: failed to come online
> > [   10.484807] CPU5: failed in unknown state : 0x0
> > [   10.485374] smp: Brought up 1 node, 4 CPUs
> >
> > It goes on until it tries to do it's first DMA setup, then just hangs.
> > >
> > > >> If the Root Port (00:00.0) supports AER, you could also dump out the
> > > >> status registers from the AER capability and figure out whether it
> > > >> logged a PCIe error.  This would be sort of like what
> > > >> aer_process_err_devices() does.  A bit of a hassle to do this by hand
> > > >> in the exception path, but could be enlightening, just as a debug
> > > >> tool.
> > > >
> > > > Is there a way to handle external synchronous aborts in a device driver?
> > >
> > > Not easily on arm64 - the closest thing we have is some arch-internal
> > > machinery for undefined instruction hooks. You'd have to hack something
> > > gruesome into do_sea()...
> > >
> > > For starters it might be worth enabling all the debug prints in
> > > rockchip-pcie to see if there's any error IRQ corresponding to the
> > > aborted access.
> >
> > Debug prints are already enabled, I had to hack in the print to show
> > what addresses it was crashing on.
> >
> > Your explanation was the last piece for me to understand how this all
> > works, thanks!
> > >
> > > Robin.
> > >
> > > > If so, I'll definitely look into plugging in the aer status functions.
> > > >
> > > >>
> > > >>> [3] https://twitter.com/domipheus/status/1167586160077627393
> > > >>>>
> > > >>>>> [  198.751581] Modules linked in: drm_panel_orientation_quirks
> > > >>>>> pcie_rockchip_host(+) cpufreq_dt sch_fq_codel ip_tables x_tables ipv6
> > > >>>>> crc_ccitt nf_defrag_ipv6
> > > >>>>> [  198.752861] CPU: 1 PID: 1686 Comm: systemd-udevd Not tainted
> > > >>>>> 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #5
> > > >>>>> [  198.753791] Hardware name: Pine64 RockPro64 (DT)
> > > >>>>> [  198.754215] pstate: 60400085 (nZCv daIf +PAN -UAO)
> > > >>>>> [  198.754672] pc : __raw_readl+0x0/0x8 [pcie_rockchip_host]
> > > >>>>> [  198.755172] lr : rockchip_pcie_rd_conf+0x140/0x1dc [pcie_rockchip_host]
> > > >>>>> [  198.755773] sp : ffff8000132af530
> > > >>>>> [  198.756079] x29: ffff8000132af530 x28: 0000000000000000
> > > >>>>> [  198.756565] x27: 0000000000000001 x26: 0000000000000000
> > > >>>>> [  198.757049] x25: ffff0000c20ac000 x24: 0000000000002000
> > > >>>>> [  198.757534] x23: ffff0000c20ae5c0 x22: ffff8000132af5d4
> > > >>>>> [  198.758018] x21: 0000000000002000 x20: 0000000000000004
> > > >>>>> [  198.758502] x19: 0000000000102000 x18: 0000000000000001
> > > >>>>> [  198.758987] x17: 0000000000000000 x16: 0000000000000000
> > > >>>>> [  198.759472] x15: ffffffffffffffff x14: ffff80001159bcc8
> > > >>>>> [  198.759957] x13: 0000000000000000 x12: ffff800011b2c000
> > > >>>>> [  198.760441] x11: ffff8000115bf000 x10: ffff800011310018
> > > >>>>> [  198.760926] x9 : 00000000fffb9fff x8 : 0000000000000001
> > > >>>>> [  198.761410] x7 : 0000000000000000 x6 : ffff0000f7492548
> > > >>>>> [  198.761894] x5 : 0000000000000001 x4 : ffff0000f7492548
> > > >>>>> [  198.762379] x3 : 0000000000000000 x2 : 0000000000c00008
> > > >>>>> [  198.762863] x1 : ffff80001dc00008 x0 : ffff80001a102000
> > > >>>>> [  198.763348] Call trace:
> > > >>>>> [  198.763583]  __raw_readl+0x0/0x8 [pcie_rockchip_host]
> > > >>>>> [  198.764057]  pci_bus_read_config_dword+0x88/0xd0
> > > >>>>> [  198.764484]  pci_bus_generic_read_dev_vendor_id+0x40/0x1b8
> > > >>>>> [  198.764982]  pci_bus_read_dev_vendor_id+0x58/0x88
> > > >>>>> [  198.765413]  pci_scan_single_device+0x84/0xf8
> > > >>>>> [  198.765812]  pci_scan_slot+0x7c/0x120
> > > >>>>> [  198.766149]  pci_scan_child_bus_extend+0x68/0x2dc
> > > >>>>> [  198.766579]  pci_scan_bridge_extend+0x350/0x588
> > > >>>>> [  198.766992]  pci_scan_child_bus_extend+0x21c/0x2dc
> > > >>>>> [  198.767430]  pci_scan_child_bus+0x24/0x30
> > > >>>>> [  198.767797]  pci_scan_root_bus_bridge+0xc4/0xd0
> > > >>>>> [  198.768215]  rockchip_pcie_probe+0x610/0x74c [pcie_rockchip_host]
> > > >>>>> [  198.768770]  platform_drv_probe+0x58/0xa8
> > > >>>>> [  198.769139]  really_probe+0xe0/0x318
> > > >>>>> [  198.769468]  driver_probe_device+0x5c/0xf0
> > > >>>>> [  198.769844]  device_driver_attach+0x74/0x80
> > > >>>>> [  198.770227]  __driver_attach+0x64/0xe8
> > > >>>>> [  198.770572]  bus_for_each_dev+0x84/0xd8
> > > >>>>> [  198.770924]  driver_attach+0x30/0x40
> > > >>>>> [  198.771253]  bus_add_driver+0x188/0x1e8
> > > >>>>> [  198.771605]  driver_register+0x64/0x110
> > > >>>>> [  198.771956]  __platform_driver_register+0x54/0x60
> > > >>>>> [  198.772388]  rockchip_pcie_driver_init+0x28/0x10000 [pcie_rockchip_host]
> > > >>>>> [  198.772998]  do_one_initcall+0x94/0x390
> > > >>>>> [  198.773353]  do_init_module+0x88/0x268
> > > >>>>> [  198.773697]  load_module+0x1e18/0x2198
> > > >>>>> [  198.774043]  __do_sys_finit_module+0xd0/0xe8
> > > >>>>> [  198.774435]  __arm64_sys_finit_module+0x28/0x38
> > > >>>>> [  198.774858]  el0_svc_common.constprop.3+0xa4/0x1d8
> > > >>>>> [  198.775297]  el0_svc_handler+0x34/0xa0
> > > >>>>> [  198.775645]  el0_svc+0x14/0x40
> > > >>>>> [  198.775928]  el0_sync_handler+0x118/0x290
> > > >>>>> [  198.776295]  el0_sync+0x164/0x180
> > > >>>>> [  198.776609] Code: bad PC value
> > > >>>>> [  198.776897] ---[ end trace 88fc77651b5e2909 ]---
> > > >
> > > > _______________________________________________
> > > > Linux-rockchip mailing list
> > > > Linux-rockchip@lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-rockchip
> > > >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-13  1:19               ` Peter Geis
@ 2019-11-21  0:36                 ` Peter Geis
  2019-11-21  2:03                   ` Shawn Lin
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Geis @ 2019-11-21  0:36 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Bjorn Helgaas, linux-pci, Shawn Lin, Heiko Stuebner,
	open list:ARM/Rockchip SoC...

On Tue, Nov 12, 2019 at 8:19 PM Peter Geis <pgwipeout@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 8:06 PM Peter Geis <pgwipeout@gmail.com> wrote:
> >
> > On Tue, Nov 12, 2019 at 2:41 PM Peter Geis <pgwipeout@gmail.com> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 2:15 PM Robin Murphy <robin.murphy@arm.com> wrote:
> > > >
> > > > On 12/11/2019 3:55 pm, Peter Geis wrote:
> > > > > On Mon, Nov 11, 2019 at 9:29 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > >>
> > > > >> On Mon, Nov 11, 2019 at 07:30:15PM -0500, Peter Geis wrote:
> > > > >>> On Mon, Nov 11, 2019 at 7:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > >>>> On Mon, Nov 04, 2019 at 01:55:40PM -0500, Peter Geis wrote:
> > > > >>>>> Good Morning,
> > > > >>>>>
> > > > >>>>> I'm attempting to debug an issue with the rockpro64 pcie port.
> > > > >>>>> It would appear that the port does not like various cards, including
> > > > >>>>> cards of the same make that randomly work or do not work, such as
> > > > >>>>> Intel i340 based NICs.
> > > > >>>>> I'm experiencing it with a GTX645 gpu.
> > > > >>>>>
> > > > >>>>> This seems to be a long running issue, referenced both at [0], and [1].
> > > > >>>>> There was an attempt to rectify it, by adding a delay between training
> > > > >>>>> and probing [2], but that doesn't seem to be the issue here.
> > > > >>>>> It appears that when we probe further into the card, such as devfn >
> > > > >>>>> 1, we trigger the bug.
> > > > >>>>> I've added a print statement that prints the devfn, address, and size
> > > > >>>>> information, which you can see below.
> > > > >>>>>
> > > > >>>>> I've attempted setting the available number of lanes to 1 as well, to
> > > > >>>>> no difference.
> > > > >>>>>
> > > > >>>>> If anyone could point me in the right direction as to where to
> > > > >>>>> continue debugging, I'd greatly appreciate it.
> > > > >>>>>
> > > > >>>>> [0] https://github.com/ayufan-rock64/linux-build/issues/254
> > > > >>>>> [1] https://github.com/rockchip-linux/kernel/issues/116
> > > > >>>>> [2] https://github.com/ayufan-rock64/linux-kernel/commit/3cde5c624c9c39aa03251a55c2d26a48b5bdca5b
> > > > >>>>>
> > > > >>>>> [  198.491458] rockchip-pcie f8000000.pcie: missing legacy phy; search
> > > > >>>>> for per-lane PHY
> > > > >>>>> [  198.492986] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
> > > > >>>>> [  198.493060] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
> > > > >>>>> [  198.550444] rockchip-pcie f8000000.pcie: current link width is x1
> > > > >>>>> [  198.550458] rockchip-pcie f8000000.pcie: idling lane 1
> > > > >>>>> [  198.550479] rockchip-pcie f8000000.pcie: idling lane 2
> > > > >>>>> [  198.550490] rockchip-pcie f8000000.pcie: idling lane 3
> > > > >>>>> [  198.550608] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
> > > > >>>>> [  198.550625] rockchip-pcie f8000000.pcie: Parsing ranges property...
> > > > >>>>> [  198.550656] rockchip-pcie f8000000.pcie:   MEM
> > > > >>>>> 0xfa000000..0xfbdfffff -> 0xfa000000
> > > > >>>>> [  198.550676] rockchip-pcie f8000000.pcie:    IO
> > > > >>>>> 0xfbe00000..0xfbefffff -> 0xfbe00000
> > > > >>>>> [  198.552908] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
> > > > >>>>> [  198.552933] pci_bus 0000:00: root bus resource [bus 00-1f]
> > > > >>>>> [  198.552943] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
> > > > >>>>> [  198.552954] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
> > > > >>>>> (bus address [0xfbe00000-0xfbefffff])
> > > > >>>>> [  198.552965] pci_bus 0000:00: scanning bus
> > > > >>>>> [  198.554198] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
> > > > >>>>> [  198.555508] pci 0000:00:00.0: supports D1
> > > > >>>>> [  198.555516] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
> > > > >>>>> [  198.556023] pci 0000:00:00.0: PME# disabled
> > > > >>>>> [  198.561245] pci_bus 0000:00: fixups for bus
> > > > >>>>> [  198.561269] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
> > > > >>>>> [  198.561277] pci 0000:00:00.0: bridge configuration invalid ([bus
> > > > >>>>> 00-00]), reconfiguring
> > > > >>>>> [  198.566429] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> > > > >>>>> [  198.567008] pci_bus 0000:01: scanning bus
> > > > >>>>> [  198.567171] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
> > > > >>>>> [  198.567420] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> > > > >>>>> [  198.567515] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> > > > >>>>> 64bit pref]
> > > > >>>>> [  198.567608] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> > > > >>>>> 64bit pref]
> > > > >>>>> [  198.567665] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
> > > > >>>>> [  198.567673] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
> > > > >>>>> [  198.567730] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> > > > >>>>> [  198.567815] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
> > > > >>>>> [  198.569051] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
> > > > >>>>> limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
> > > > >>>>> with 8 GT/s x16 link)
> > > > >>>>> [  198.570225] pci 0000:01:00.0: vgaarb: VGA device added:
> > > > >>>>> decodes=io+mem,owns=none,locks=none
> > > > >>>>> [  198.570481] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
> > > > >>>>> [  198.570663] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> > > > >>>>> [  198.571039] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
> > > > >>>>> <snip>
> > > > >>>>> [  198.749857] pci_bus 0000:01: read pcie, devfn 1, at 100, size 2
> > > > >>>>> [  198.750252] pci_bus 0000:01: read pcie, devfn 2, at 0, size 4
> > > > >>>>> [  198.750881] Internal error: synchronous external abort: 96000210
> > > > >>>>> [#1] PREEMPT SMP
> > > > >>>>
> > > > >>>> Is there really supposed to be a device at 01:00.2?
> > > > >>>>
> > > > >>>> Maybe this is just the PCIe Unsupported Request error that we expect
> > > > >>>> to get when trying to read config space of a device that doesn't
> > > > >>>> exist.
> > > > >>>>
> > > > >>>> On "most" platforms, we just get ~0 data back when that happens, but
> > > > >>>> I'm not sure that's always the case on arm64.  I think it depends on
> > > > >>>> how the PCIe host bridge is designed, and there might be some CPU
> > > > >>>> configuration, too.
> > > > >>>
> > > > >>> Yes, this is a GTX645 video card.
> > > > >>> Nvidia cards usually have two to three devices,
> > > > >>> The GPU proper, the audio device for the hdmi output, and the i2c controller.
> > > > >>>
> > > > >>> I do think that this driver is missing sanity checking on the
> > > > >>> addressing, since the BRCM driver for the RPI4 doesn't try to
> > > > >>> enumerate a video card, since it checks if the MMIO space is large
> > > > >>> enough to fit the BAR before assigning the addresses. See [3]. Also in
> > > > >>> that thread he was able to increase the address space provided to the
> > > > >>> BRCM driver and fix the issue, but I don't see how we could do that on
> > > > >>> the rk3399.
> > > > >>>
> > > > >>> pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff 64bit pref] is
> > > > >>> 128 MB, which already exceeds our address space.
> > > > >>> I think the driver is just overflowing the address space.
> > > > >>
> > > > >> If we don't have enough space to assign all the device BARs, I think a
> > > > >> driver will still be able to claim the device, but when the driver
> > > > >> calls pci_enable_device(), it should fail.  Lack of address space
> > > > >> should not cause a PCIe error.
> > > > >>
> > > > >> But in this case, none of that matters because we're still enumerating
> > > > >> devices in pci_scan_root_bus_bridge().  We haven't gotten to the point
> > > > >> of trying to bind drivers to devices, so the driver isn't involved at
> > > > >> all yet.
> > > > > For clarification, the driver I'm referring to is the rk3399-pcie host driver.
> > > > >>
> > > > >> The backtrace says we're trying to read the Vendor ID of a device, and
> > > > >> your debug output suggests we're trying to enumerate 01:00.2.  If you
> > > > >> put that card in another system, you could find out how many functions
> > > > >> it has.
> > > > >>
> > > > >> Or if you swapped this with other cards where you know the number of
> > > > >> functions, you could see if the external abort always happens when
> > > > >> probing for the first unimplemented function.
> > > > > This card definitely has more than one function.
> > > >
> > > > FWIW, random lspci logs I grovelled up off the internet show cards with
> > > > the same IDs only implementing functions 0 and 1, which does suggest
> > > > that maybe function 2 really doesn't exist but the card handles
> > > > unsupported config requests in a way that this particular bridge/root
> > > > port doesn't quite deal with properly.
> > >
> > > Thanks! Is there a sane way we could make the kernel handle this in
> > > place of the controller?
> > > (It's apparently based on the designware ip, but it doesn't use their
> > > driver for some reason)
> > >
> > > >
> > > > > Before my original message I hacked in some code to make the driver
> > > > > return 0xff when devfn > 1, and the scan passed, but as soon as
> > > > > nouveau attempted to access the device, the entire kernel exploded.
> >
> > This is what happens when probing is prevented from reading above
> > devfn 1, and nouveau gets a shot.
> > [   76.658128] SError Interrupt on CPU4, code 0xbf000000 -- SError
> > [   76.658129] CPU: 4 PID: 1752 Comm: systemd-udevd Not tainted
> > 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #16
> > [   76.658130] Hardware name: Pine64 RockPro64 (DT)
> > [   76.658131] pstate: 60400005 (nZCv daif +PAN -UAO)
> > [   76.658131] pc : nvkm_device_ctor+0x2cc/0x2e18 [nouveau]
> > [   76.658132] lr : nvkm_device_ctor+0x2c0/0x2e18 [nouveau]
> > [   76.658133] sp : ffff800012daf740
> > [   76.658134] x29: ffff800012daf740 x28: ffff0000c2db9800
> > [   76.658136] x27: 0000000000000000 x26: ffff0000c7f2e0b0
> > [   76.658138] x25: 0000000000000000 x24: 0000000000000000
> > [   76.658139] x23: 0000000000000000 x22: ffff800009e1b1e0
> > [   76.658141] x21: ffff0000c335b940 x20: ffff800009ede660
> > [   76.658142] x19: 0000000000000000 x18: 0000000000000001
> > [   76.658144] x17: 0000000000000000 x16: 0000000000000000
> > [   76.658145] x15: ffff0000c335bdc0 x14: ffff8000140a0000
> > [   76.658147] x13: 0000000000110000 x12: 0040000000000001
> > [   76.658148] x11: ffff0000c12b8180 x10: 0000000000000085
> > [   76.658150] x9 : ffff800013c00000 x8 : 0000000000110000
> > [   76.658152] x7 : 0000000000000000 x6 : ffff0000f7ff0008
> > [   76.658153] x5 : ffff800013d0ffff x4 : ffff800013c00000
> > [   76.658154] x3 : ffff800013c00004 x2 : 0140000000000000
> > [   76.658156] x1 : 0000000000000000 x0 : ffff800013c00000
> > [   76.658158] Kernel panic - not syncing: Asynchronous SError Interrupt
> > [   76.658159] CPU: 4 PID: 1752 Comm: systemd-udevd Not tainted
> > 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #16
> > [   76.658160] Hardware name: Pine64 RockPro64 (DT)
> > [   76.658161] Call trace:
> > [   76.658161]  dump_backtrace+0x0/0x148
> > [   76.658162]  show_stack+0x24/0x30
> > [   76.658163]  dump_stack+0xa8/0xf0
> > [   76.658163]  panic+0x160/0x36c
> > [   76.658164]  nmi_panic+0x94/0x98
> > [   76.658164]  arm64_serror_panic+0x84/0x90
> > [   76.658165]  do_serror+0x88/0x140
> > [   76.658166]  el1_error+0xc4/0x170
> > [   76.658166]  nvkm_device_ctor+0x2cc/0x2e18 [nouveau]
> > [   76.658167]  nvkm_device_pci_new+0xf8/0x2c8 [nouveau]
> > [   76.658168]  nouveau_drm_probe+0x60/0x288 [nouveau]
> > [   76.658168]  local_pci_probe+0x44/0x98
> > [   76.658169]  pci_device_probe+0x118/0x1b0
> > [   76.658170]  really_probe+0xe0/0x320
> > [   76.658171]  driver_probe_device+0x5c/0xf0
> > [   76.658171]  device_driver_attach+0x74/0x80
> > [   76.658172]  __driver_attach+0x64/0xe0
> > [   76.658173]  bus_for_each_dev+0x80/0xd0
> > [   76.658173]  driver_attach+0x30/0x40
> > [   76.658174]  bus_add_driver+0x14c/0x1f0
> > [   76.658175]  driver_register+0x64/0x110
> > [   76.658175]  __pci_register_driver+0x58/0x68
> > [   76.658176]  nouveau_drm_init+0x17c/0x10000 [nouveau]
> > [   76.658177]  do_one_initcall+0x50/0x298
> > [   76.658178]  do_init_module+0x88/0x270
> > [   76.658178]  load_module+0x2068/0x2600
> > [   76.658179]  __do_sys_finit_module+0xd0/0xe8
> > [   76.658180]  __arm64_sys_finit_module+0x28/0x38
> > [   76.658180]  el0_svc_common.constprop.0+0x7c/0x1f0
> > [   76.658181]  el0_svc_handler+0x34/0xa0
> > [   76.658182]  el0_svc+0x14/0x40
> > [   76.658182]  el0_sync_handler+0x118/0x280
> > [   76.658183]  el0_sync+0x164/0x180
> > [   76.658248] SMP: stopping secondary CPUs
> > [   76.658249] Kernel Offset: disabled
> > [   76.658250] CPU features: 0x10002,2000600c
> > [   76.658250] Memory Limit: none
>
> Last log, I promise.
> This is what happens when it is locked out of devfn > 1, and nouveau
> is not available.
>
> [    4.199087] vcc3v3_pcie: supplied by vcc12v_dcin
> [   55.617783] ehci-pci: EHCI PCI platform driver
> [   55.868935] ohci-pci: OHCI PCI platform driver
> [   65.426085] rockchip-pcie f8000000.pcie: missing legacy phy; search
> for per-lane PHY
> [   65.426645] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
> [   65.426662] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
> [   65.507012] rockchip-pcie f8000000.pcie: current link width is x4
> [   65.507078] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
> [   65.507088] rockchip-pcie f8000000.pcie: Parsing ranges property...
> [   65.507103] rockchip-pcie f8000000.pcie:   MEM
> 0xfa000000..0xfbdfffff -> 0xfa000000
> [   65.507112] rockchip-pcie f8000000.pcie:    IO
> 0xfbe00000..0xfbffffff -> 0xfbe00000
> [   65.507368] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
> [   65.507376] pci_bus 0000:00: root bus resource [bus 00-1f]
> [   65.507382] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
> [   65.507390] pci_bus 0000:00: root bus resource [io
> 0x0000-0x1fffff] (bus address [0xfbe00000-0xfbffffff])
> [   65.507394] pci_bus 0000:00: scanning bus
> [   65.507423] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
> [   65.507543] pci 0000:00:00.0: supports D1
> [   65.507547] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
> [   65.507555] pci 0000:00:00.0: PME# disabled
> [   65.511436] pci_bus 0000:00: fixups for bus
> [   65.511452] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
> [   65.511457] pci 0000:00:00.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [   65.511471] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> [   65.511633] pci_bus 0000:01: scanning bus
> [   65.511739] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
> [   65.512205] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> [   65.512274] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> 64bit pref]
> [   65.512345] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> 64bit pref]
> [   65.512390] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
> [   65.512394] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
> [   65.512435] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> [   65.512485] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
> [   65.513319] pci 0000:01:00.0: 8.000 Gb/s available PCIe bandwidth,
> limited by 2.5 GT/s x4 link at 0000:00:00.0 (capable of 126.016 Gb/s
> with 8 GT/s x16 link)
> [   65.513568] pci 0000:01:00.0: vgaarb: VGA device added:
> decodes=io+mem,owns=none,locks=none
> [   65.513744] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
> [   65.513900] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> [   65.514179] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
> [   65.515129] rockchip-pcie f8000000.pcie: phy link changes
> [   65.527011] pci_bus 0000:01: fixups for bus
> [   65.527021] pci_bus 0000:01: bus scan returning with max=01
> [   65.527030] pci_bus 0000:01: busn_res: [bus 01-1f] end is updated to 01
> [   65.527042] pci_bus 0000:00: bus scan returning with max=01
> [   65.527076] pci 0000:00:00.0: BAR 14: no space for [mem size 0x0c000000]
> [   65.527081] pci 0000:00:00.0: BAR 14: failed to assign [mem size 0x0c000000]
> [   65.527091] pci 0000:01:00.0: BAR 1: no space for [mem size
> 0x08000000 64bit pref]
> [   65.527095] pci 0000:01:00.0: BAR 1: failed to assign [mem size
> 0x08000000 64bit pref]
> [   65.527100] pci 0000:01:00.0: BAR 3: no space for [mem size
> 0x02000000 64bit pref]
> [   65.527104] pci 0000:01:00.0: BAR 3: failed to assign [mem size
> 0x02000000 64bit pref]
> [   65.527109] pci 0000:01:00.0: BAR 0: no space for [mem size 0x01000000]
> [   65.527119] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x01000000]
> [   65.527123] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00080000 pref]
> [   65.527129] pci 0000:01:00.0: BAR 6: failed to assign [mem size
> 0x00080000 pref]
> [   65.527134] pci 0000:01:00.1: BAR 0: no space for [mem size 0x00004000]
> [   65.527138] pci 0000:01:00.1: BAR 0: failed to assign [mem size 0x00004000]
> [   65.527142] pci 0000:01:00.0: BAR 5: no space for [io  size 0x0080]
> [   65.527146] pci 0000:01:00.0: BAR 5: failed to assign [io  size 0x0080]
> [   65.527152] pci 0000:00:00.0: PCI bridge to [bus 01]
> [   65.529410] pcieport 0000:00:00.0: assign IRQ: got 238
> [   65.529454] pcieport 0000:00:00.0: enabling bus mastering
> [   65.529807] pcieport 0000:00:00.0: PME: Signaling with IRQ 239
> [   65.530265] pcieport 0000:00:00.0: AER: enabled with IRQ 239
> [   65.530685] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0

An update to this issue, I purchased a pcie switch and tried testing
with a few pcie devices I have around.
Under no circumstances can I get it to read the switch, it aborts
immediately upon reading the first register.
Of note is it keeps reporting the following :
[   64.510788] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
[   64.510793] pci 0000:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[   64.510807] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
<snip>
[   64.527368] pci_bus 0000:01: fixups for bus
[   64.527399] pci 0000:01:00.0: scanning [bus 00-00] behind bridge, pass 0
[   64.527404] pci 0000:01:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[   64.527452] pci 0000:01:00.0: scanning [bus 00-00] behind bridge, pass 1
[   64.527850] pci_bus 0000:02: scanning bus
And aborts here.

Thoughts?

>
>
> >
> >
> > > > >
> > > > > Another reason I believe the address assignments are overflowing and
> > > > > corrupting other address assignments is after the external abort, the
> > > > > entire PCIE controller is inaccessible.
> > > > > $ lspci
> > > > > pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.1/config
> > > > > lspci: Unable to read the standard configuration space header of
> > > > > device 0000:01:00.1
> > > > > pcilib: Cannot open /sys/bus/pci/devices/0000:00:00.0/config
> > > > > lspci: Unable to read the standard configuration space header of
> > > > > device 0000:00:00.0
> > > > > pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.0/config
> > > > > lspci: Unable to read the standard configuration space header of
> > > > > device 0000:01:00.0
> > > > >
> > > > > Attempting to rescan the bus or any other pci function results in a
> > > > > hung kernel task.
> > > >
> > > > You crashed deep in the kernel part-way through the driver's probe
> > > > routine; don't expect any partly-initialised bits of that driver to
> > > > actually work. I bet you also can't load/unload modules and do various
> > > > other things either, because the thread will almost certainly have been
> > > > holding a bunch of locks, too. It's not worth trying to reason about the
> > > > state of anything after that kind of unexpected crash ;)
> > >
> > > That is observed, but I think it is actually more destructive than that.
> > > Attempting to do an access after the fact, then performing a soft
> > > reset results in a spectacular crash when it tries to bring the BIG
> > > cluster online.
> > >
> > > [    0.203160] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
> > > [    5.355606] CPU4: failed to come online
> > > [    5.363258] CPU4: failed in unknown state : 0x0
> > > Unhandled Exception in EL3.
> > > x30 =           0xUhandled Exception in L3.
> > > x30 =           UNhandled ExcEption in EL3.
> > > x30 =   0xUnhandled Excepti2n in EL_.
> > > x30 =           0xUnhandled Exception in EL3.
> > > x30 =           lxUnhadled Exception in EL3.
> > > x30 =           0xUnhandled Excepion in EL3.
> > > x30 =           0xUnhandled Exception in EL3.
> > > x30 =   0xUnhandled Eception in EL3.
> > > x30 =           0xUnhandled Exception in EL3.
> > > x30 =           0xUnhandled Exception in EL3.x30 =
> > > 0xUnhandled Exception in EL3.
> > > x30 =           0xUnhandled Excepton i2 EL3.
> > > x30 =           0xUnhandUnhandled Exceptionein EL3.lx30 =
> > >  0xUnhandled Exception in EL3.
> > > x30 =           0xUnhandled Unhanded Exception in EL3.
> > > x30 =           0xUnhandled Exceptin in EL3Unhandled Exception inEL3.
> > > x0 =            0xUnhandled Exception in EL3.
> > > x30 =           0xUnhandled Exception in EL3.
> > > x30 =           0xUnhandled Exception in EL.
> > > x30 =           0xUnhandled Exception in EL3.
> > > x30 =           0xUnhandled Exception in EL3.
> > > x30 =           0xUnhandled xception in EL3
> > > x30 =           0xUnhcndled Exception in EL3.
> > > x30=            0xUnhandled Excextion in EL3.
> > > x30 =   0xUnhandled Exception in EL3.
> > > x30 =           0x[   10.484411] CPU5: failed to come online
> > > [   10.484807] CPU5: failed in unknown state : 0x0
> > > [   10.485374] smp: Brought up 1 node, 4 CPUs
> > >
> > > It goes on until it tries to do it's first DMA setup, then just hangs.
> > > >
> > > > >> If the Root Port (00:00.0) supports AER, you could also dump out the
> > > > >> status registers from the AER capability and figure out whether it
> > > > >> logged a PCIe error.  This would be sort of like what
> > > > >> aer_process_err_devices() does.  A bit of a hassle to do this by hand
> > > > >> in the exception path, but could be enlightening, just as a debug
> > > > >> tool.
> > > > >
> > > > > Is there a way to handle external synchronous aborts in a device driver?
> > > >
> > > > Not easily on arm64 - the closest thing we have is some arch-internal
> > > > machinery for undefined instruction hooks. You'd have to hack something
> > > > gruesome into do_sea()...
> > > >
> > > > For starters it might be worth enabling all the debug prints in
> > > > rockchip-pcie to see if there's any error IRQ corresponding to the
> > > > aborted access.
> > >
> > > Debug prints are already enabled, I had to hack in the print to show
> > > what addresses it was crashing on.
> > >
> > > Your explanation was the last piece for me to understand how this all
> > > works, thanks!
> > > >
> > > > Robin.
> > > >
> > > > > If so, I'll definitely look into plugging in the aer status functions.
> > > > >
> > > > >>
> > > > >>> [3] https://twitter.com/domipheus/status/1167586160077627393
> > > > >>>>
> > > > >>>>> [  198.751581] Modules linked in: drm_panel_orientation_quirks
> > > > >>>>> pcie_rockchip_host(+) cpufreq_dt sch_fq_codel ip_tables x_tables ipv6
> > > > >>>>> crc_ccitt nf_defrag_ipv6
> > > > >>>>> [  198.752861] CPU: 1 PID: 1686 Comm: systemd-udevd Not tainted
> > > > >>>>> 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #5
> > > > >>>>> [  198.753791] Hardware name: Pine64 RockPro64 (DT)
> > > > >>>>> [  198.754215] pstate: 60400085 (nZCv daIf +PAN -UAO)
> > > > >>>>> [  198.754672] pc : __raw_readl+0x0/0x8 [pcie_rockchip_host]
> > > > >>>>> [  198.755172] lr : rockchip_pcie_rd_conf+0x140/0x1dc [pcie_rockchip_host]
> > > > >>>>> [  198.755773] sp : ffff8000132af530
> > > > >>>>> [  198.756079] x29: ffff8000132af530 x28: 0000000000000000
> > > > >>>>> [  198.756565] x27: 0000000000000001 x26: 0000000000000000
> > > > >>>>> [  198.757049] x25: ffff0000c20ac000 x24: 0000000000002000
> > > > >>>>> [  198.757534] x23: ffff0000c20ae5c0 x22: ffff8000132af5d4
> > > > >>>>> [  198.758018] x21: 0000000000002000 x20: 0000000000000004
> > > > >>>>> [  198.758502] x19: 0000000000102000 x18: 0000000000000001
> > > > >>>>> [  198.758987] x17: 0000000000000000 x16: 0000000000000000
> > > > >>>>> [  198.759472] x15: ffffffffffffffff x14: ffff80001159bcc8
> > > > >>>>> [  198.759957] x13: 0000000000000000 x12: ffff800011b2c000
> > > > >>>>> [  198.760441] x11: ffff8000115bf000 x10: ffff800011310018
> > > > >>>>> [  198.760926] x9 : 00000000fffb9fff x8 : 0000000000000001
> > > > >>>>> [  198.761410] x7 : 0000000000000000 x6 : ffff0000f7492548
> > > > >>>>> [  198.761894] x5 : 0000000000000001 x4 : ffff0000f7492548
> > > > >>>>> [  198.762379] x3 : 0000000000000000 x2 : 0000000000c00008
> > > > >>>>> [  198.762863] x1 : ffff80001dc00008 x0 : ffff80001a102000
> > > > >>>>> [  198.763348] Call trace:
> > > > >>>>> [  198.763583]  __raw_readl+0x0/0x8 [pcie_rockchip_host]
> > > > >>>>> [  198.764057]  pci_bus_read_config_dword+0x88/0xd0
> > > > >>>>> [  198.764484]  pci_bus_generic_read_dev_vendor_id+0x40/0x1b8
> > > > >>>>> [  198.764982]  pci_bus_read_dev_vendor_id+0x58/0x88
> > > > >>>>> [  198.765413]  pci_scan_single_device+0x84/0xf8
> > > > >>>>> [  198.765812]  pci_scan_slot+0x7c/0x120
> > > > >>>>> [  198.766149]  pci_scan_child_bus_extend+0x68/0x2dc
> > > > >>>>> [  198.766579]  pci_scan_bridge_extend+0x350/0x588
> > > > >>>>> [  198.766992]  pci_scan_child_bus_extend+0x21c/0x2dc
> > > > >>>>> [  198.767430]  pci_scan_child_bus+0x24/0x30
> > > > >>>>> [  198.767797]  pci_scan_root_bus_bridge+0xc4/0xd0
> > > > >>>>> [  198.768215]  rockchip_pcie_probe+0x610/0x74c [pcie_rockchip_host]
> > > > >>>>> [  198.768770]  platform_drv_probe+0x58/0xa8
> > > > >>>>> [  198.769139]  really_probe+0xe0/0x318
> > > > >>>>> [  198.769468]  driver_probe_device+0x5c/0xf0
> > > > >>>>> [  198.769844]  device_driver_attach+0x74/0x80
> > > > >>>>> [  198.770227]  __driver_attach+0x64/0xe8
> > > > >>>>> [  198.770572]  bus_for_each_dev+0x84/0xd8
> > > > >>>>> [  198.770924]  driver_attach+0x30/0x40
> > > > >>>>> [  198.771253]  bus_add_driver+0x188/0x1e8
> > > > >>>>> [  198.771605]  driver_register+0x64/0x110
> > > > >>>>> [  198.771956]  __platform_driver_register+0x54/0x60
> > > > >>>>> [  198.772388]  rockchip_pcie_driver_init+0x28/0x10000 [pcie_rockchip_host]
> > > > >>>>> [  198.772998]  do_one_initcall+0x94/0x390
> > > > >>>>> [  198.773353]  do_init_module+0x88/0x268
> > > > >>>>> [  198.773697]  load_module+0x1e18/0x2198
> > > > >>>>> [  198.774043]  __do_sys_finit_module+0xd0/0xe8
> > > > >>>>> [  198.774435]  __arm64_sys_finit_module+0x28/0x38
> > > > >>>>> [  198.774858]  el0_svc_common.constprop.3+0xa4/0x1d8
> > > > >>>>> [  198.775297]  el0_svc_handler+0x34/0xa0
> > > > >>>>> [  198.775645]  el0_svc+0x14/0x40
> > > > >>>>> [  198.775928]  el0_sync_handler+0x118/0x290
> > > > >>>>> [  198.776295]  el0_sync+0x164/0x180
> > > > >>>>> [  198.776609] Code: bad PC value
> > > > >>>>> [  198.776897] ---[ end trace 88fc77651b5e2909 ]---
> > > > >
> > > > > _______________________________________________
> > > > > Linux-rockchip mailing list
> > > > > Linux-rockchip@lists.infradead.org
> > > > > http://lists.infradead.org/mailman/listinfo/linux-rockchip
> > > > >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-21  0:36                 ` Peter Geis
@ 2019-11-21  2:03                   ` Shawn Lin
  2019-11-22  1:02                     ` Peter Geis
  2019-11-22 14:36                     ` Bjorn Helgaas
  0 siblings, 2 replies; 20+ messages in thread
From: Shawn Lin @ 2019-11-21  2:03 UTC (permalink / raw)
  To: Peter Geis, Robin Murphy
  Cc: shawn.lin, Bjorn Helgaas, linux-pci, Heiko Stuebner,
	open list:ARM/Rockchip SoC...

[-- Attachment #1: Type: text/plain, Size: 27337 bytes --]


On 2019/11/21 8:36, Peter Geis wrote:
> On Tue, Nov 12, 2019 at 8:19 PM Peter Geis <pgwipeout@gmail.com> wrote:
>>
>> On Tue, Nov 12, 2019 at 8:06 PM Peter Geis <pgwipeout@gmail.com> wrote:
>>>
>>> On Tue, Nov 12, 2019 at 2:41 PM Peter Geis <pgwipeout@gmail.com> wrote:
>>>>
>>>> On Tue, Nov 12, 2019 at 2:15 PM Robin Murphy <robin.murphy@arm.com> wrote:
>>>>>
>>>>> On 12/11/2019 3:55 pm, Peter Geis wrote:
>>>>>> On Mon, Nov 11, 2019 at 9:29 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>>>>>
>>>>>>> On Mon, Nov 11, 2019 at 07:30:15PM -0500, Peter Geis wrote:
>>>>>>>> On Mon, Nov 11, 2019 at 7:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>>>>>>> On Mon, Nov 04, 2019 at 01:55:40PM -0500, Peter Geis wrote:
>>>>>>>>>> Good Morning,
>>>>>>>>>>
>>>>>>>>>> I'm attempting to debug an issue with the rockpro64 pcie port.
>>>>>>>>>> It would appear that the port does not like various cards, including
>>>>>>>>>> cards of the same make that randomly work or do not work, such as
>>>>>>>>>> Intel i340 based NICs.
>>>>>>>>>> I'm experiencing it with a GTX645 gpu.
>>>>>>>>>>
>>>>>>>>>> This seems to be a long running issue, referenced both at [0], and [1].
>>>>>>>>>> There was an attempt to rectify it, by adding a delay between training
>>>>>>>>>> and probing [2], but that doesn't seem to be the issue here.
>>>>>>>>>> It appears that when we probe further into the card, such as devfn >
>>>>>>>>>> 1, we trigger the bug.
>>>>>>>>>> I've added a print statement that prints the devfn, address, and size
>>>>>>>>>> information, which you can see below.
>>>>>>>>>>
>>>>>>>>>> I've attempted setting the available number of lanes to 1 as well, to
>>>>>>>>>> no difference.
>>>>>>>>>>
>>>>>>>>>> If anyone could point me in the right direction as to where to
>>>>>>>>>> continue debugging, I'd greatly appreciate it.
>>>>>>>>>>
>>>>>>>>>> [0] https://github.com/ayufan-rock64/linux-build/issues/254
>>>>>>>>>> [1] https://github.com/rockchip-linux/kernel/issues/116
>>>>>>>>>> [2] https://github.com/ayufan-rock64/linux-kernel/commit/3cde5c624c9c39aa03251a55c2d26a48b5bdca5b
>>>>>>>>>>
>>>>>>>>>> [  198.491458] rockchip-pcie f8000000.pcie: missing legacy phy; search
>>>>>>>>>> for per-lane PHY
>>>>>>>>>> [  198.492986] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
>>>>>>>>>> [  198.493060] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
>>>>>>>>>> [  198.550444] rockchip-pcie f8000000.pcie: current link width is x1
>>>>>>>>>> [  198.550458] rockchip-pcie f8000000.pcie: idling lane 1
>>>>>>>>>> [  198.550479] rockchip-pcie f8000000.pcie: idling lane 2
>>>>>>>>>> [  198.550490] rockchip-pcie f8000000.pcie: idling lane 3
>>>>>>>>>> [  198.550608] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
>>>>>>>>>> [  198.550625] rockchip-pcie f8000000.pcie: Parsing ranges property...
>>>>>>>>>> [  198.550656] rockchip-pcie f8000000.pcie:   MEM
>>>>>>>>>> 0xfa000000..0xfbdfffff -> 0xfa000000
>>>>>>>>>> [  198.550676] rockchip-pcie f8000000.pcie:    IO
>>>>>>>>>> 0xfbe00000..0xfbefffff -> 0xfbe00000
>>>>>>>>>> [  198.552908] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
>>>>>>>>>> [  198.552933] pci_bus 0000:00: root bus resource [bus 00-1f]
>>>>>>>>>> [  198.552943] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
>>>>>>>>>> [  198.552954] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
>>>>>>>>>> (bus address [0xfbe00000-0xfbefffff])
>>>>>>>>>> [  198.552965] pci_bus 0000:00: scanning bus
>>>>>>>>>> [  198.554198] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
>>>>>>>>>> [  198.555508] pci 0000:00:00.0: supports D1
>>>>>>>>>> [  198.555516] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
>>>>>>>>>> [  198.556023] pci 0000:00:00.0: PME# disabled
>>>>>>>>>> [  198.561245] pci_bus 0000:00: fixups for bus
>>>>>>>>>> [  198.561269] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
>>>>>>>>>> [  198.561277] pci 0000:00:00.0: bridge configuration invalid ([bus
>>>>>>>>>> 00-00]), reconfiguring
>>>>>>>>>> [  198.566429] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
>>>>>>>>>> [  198.567008] pci_bus 0000:01: scanning bus
>>>>>>>>>> [  198.567171] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
>>>>>>>>>> [  198.567420] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
>>>>>>>>>> [  198.567515] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
>>>>>>>>>> 64bit pref]
>>>>>>>>>> [  198.567608] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
>>>>>>>>>> 64bit pref]
>>>>>>>>>> [  198.567665] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
>>>>>>>>>> [  198.567673] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
>>>>>>>>>> [  198.567730] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
>>>>>>>>>> [  198.567815] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
>>>>>>>>>> [  198.569051] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
>>>>>>>>>> limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
>>>>>>>>>> with 8 GT/s x16 link)
>>>>>>>>>> [  198.570225] pci 0000:01:00.0: vgaarb: VGA device added:
>>>>>>>>>> decodes=io+mem,owns=none,locks=none
>>>>>>>>>> [  198.570481] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
>>>>>>>>>> [  198.570663] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
>>>>>>>>>> [  198.571039] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
>>>>>>>>>> <snip>
>>>>>>>>>> [  198.749857] pci_bus 0000:01: read pcie, devfn 1, at 100, size 2
>>>>>>>>>> [  198.750252] pci_bus 0000:01: read pcie, devfn 2, at 0, size 4
>>>>>>>>>> [  198.750881] Internal error: synchronous external abort: 96000210
>>>>>>>>>> [#1] PREEMPT SMP
>>>>>>>>>
>>>>>>>>> Is there really supposed to be a device at 01:00.2?
>>>>>>>>>
>>>>>>>>> Maybe this is just the PCIe Unsupported Request error that we expect
>>>>>>>>> to get when trying to read config space of a device that doesn't
>>>>>>>>> exist.
>>>>>>>>>
>>>>>>>>> On "most" platforms, we just get ~0 data back when that happens, but
>>>>>>>>> I'm not sure that's always the case on arm64.  I think it depends on
>>>>>>>>> how the PCIe host bridge is designed, and there might be some CPU
>>>>>>>>> configuration, too.
>>>>>>>>
>>>>>>>> Yes, this is a GTX645 video card.
>>>>>>>> Nvidia cards usually have two to three devices,
>>>>>>>> The GPU proper, the audio device for the hdmi output, and the i2c controller.
>>>>>>>>
>>>>>>>> I do think that this driver is missing sanity checking on the
>>>>>>>> addressing, since the BRCM driver for the RPI4 doesn't try to
>>>>>>>> enumerate a video card, since it checks if the MMIO space is large
>>>>>>>> enough to fit the BAR before assigning the addresses. See [3]. Also in
>>>>>>>> that thread he was able to increase the address space provided to the
>>>>>>>> BRCM driver and fix the issue, but I don't see how we could do that on
>>>>>>>> the rk3399.
>>>>>>>>
>>>>>>>> pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff 64bit pref] is
>>>>>>>> 128 MB, which already exceeds our address space.
>>>>>>>> I think the driver is just overflowing the address space.
>>>>>>>
>>>>>>> If we don't have enough space to assign all the device BARs, I think a
>>>>>>> driver will still be able to claim the device, but when the driver
>>>>>>> calls pci_enable_device(), it should fail.  Lack of address space
>>>>>>> should not cause a PCIe error.
>>>>>>>
>>>>>>> But in this case, none of that matters because we're still enumerating
>>>>>>> devices in pci_scan_root_bus_bridge().  We haven't gotten to the point
>>>>>>> of trying to bind drivers to devices, so the driver isn't involved at
>>>>>>> all yet.
>>>>>> For clarification, the driver I'm referring to is the rk3399-pcie host driver.
>>>>>>>
>>>>>>> The backtrace says we're trying to read the Vendor ID of a device, and
>>>>>>> your debug output suggests we're trying to enumerate 01:00.2.  If you
>>>>>>> put that card in another system, you could find out how many functions
>>>>>>> it has.
>>>>>>>
>>>>>>> Or if you swapped this with other cards where you know the number of
>>>>>>> functions, you could see if the external abort always happens when
>>>>>>> probing for the first unimplemented function.
>>>>>> This card definitely has more than one function.
>>>>>
>>>>> FWIW, random lspci logs I grovelled up off the internet show cards with
>>>>> the same IDs only implementing functions 0 and 1, which does suggest
>>>>> that maybe function 2 really doesn't exist but the card handles
>>>>> unsupported config requests in a way that this particular bridge/root
>>>>> port doesn't quite deal with properly.
>>>>
>>>> Thanks! Is there a sane way we could make the kernel handle this in
>>>> place of the controller?
>>>> (It's apparently based on the designware ip, but it doesn't use their
>>>> driver for some reason)
>>>>
>>>>>
>>>>>> Before my original message I hacked in some code to make the driver
>>>>>> return 0xff when devfn > 1, and the scan passed, but as soon as
>>>>>> nouveau attempted to access the device, the entire kernel exploded.
>>>
>>> This is what happens when probing is prevented from reading above
>>> devfn 1, and nouveau gets a shot.
>>> [   76.658128] SError Interrupt on CPU4, code 0xbf000000 -- SError
>>> [   76.658129] CPU: 4 PID: 1752 Comm: systemd-udevd Not tainted
>>> 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #16
>>> [   76.658130] Hardware name: Pine64 RockPro64 (DT)
>>> [   76.658131] pstate: 60400005 (nZCv daif +PAN -UAO)
>>> [   76.658131] pc : nvkm_device_ctor+0x2cc/0x2e18 [nouveau]
>>> [   76.658132] lr : nvkm_device_ctor+0x2c0/0x2e18 [nouveau]
>>> [   76.658133] sp : ffff800012daf740
>>> [   76.658134] x29: ffff800012daf740 x28: ffff0000c2db9800
>>> [   76.658136] x27: 0000000000000000 x26: ffff0000c7f2e0b0
>>> [   76.658138] x25: 0000000000000000 x24: 0000000000000000
>>> [   76.658139] x23: 0000000000000000 x22: ffff800009e1b1e0
>>> [   76.658141] x21: ffff0000c335b940 x20: ffff800009ede660
>>> [   76.658142] x19: 0000000000000000 x18: 0000000000000001
>>> [   76.658144] x17: 0000000000000000 x16: 0000000000000000
>>> [   76.658145] x15: ffff0000c335bdc0 x14: ffff8000140a0000
>>> [   76.658147] x13: 0000000000110000 x12: 0040000000000001
>>> [   76.658148] x11: ffff0000c12b8180 x10: 0000000000000085
>>> [   76.658150] x9 : ffff800013c00000 x8 : 0000000000110000
>>> [   76.658152] x7 : 0000000000000000 x6 : ffff0000f7ff0008
>>> [   76.658153] x5 : ffff800013d0ffff x4 : ffff800013c00000
>>> [   76.658154] x3 : ffff800013c00004 x2 : 0140000000000000
>>> [   76.658156] x1 : 0000000000000000 x0 : ffff800013c00000
>>> [   76.658158] Kernel panic - not syncing: Asynchronous SError Interrupt
>>> [   76.658159] CPU: 4 PID: 1752 Comm: systemd-udevd Not tainted
>>> 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #16
>>> [   76.658160] Hardware name: Pine64 RockPro64 (DT)
>>> [   76.658161] Call trace:
>>> [   76.658161]  dump_backtrace+0x0/0x148
>>> [   76.658162]  show_stack+0x24/0x30
>>> [   76.658163]  dump_stack+0xa8/0xf0
>>> [   76.658163]  panic+0x160/0x36c
>>> [   76.658164]  nmi_panic+0x94/0x98
>>> [   76.658164]  arm64_serror_panic+0x84/0x90
>>> [   76.658165]  do_serror+0x88/0x140
>>> [   76.658166]  el1_error+0xc4/0x170
>>> [   76.658166]  nvkm_device_ctor+0x2cc/0x2e18 [nouveau]
>>> [   76.658167]  nvkm_device_pci_new+0xf8/0x2c8 [nouveau]
>>> [   76.658168]  nouveau_drm_probe+0x60/0x288 [nouveau]
>>> [   76.658168]  local_pci_probe+0x44/0x98
>>> [   76.658169]  pci_device_probe+0x118/0x1b0
>>> [   76.658170]  really_probe+0xe0/0x320
>>> [   76.658171]  driver_probe_device+0x5c/0xf0
>>> [   76.658171]  device_driver_attach+0x74/0x80
>>> [   76.658172]  __driver_attach+0x64/0xe0
>>> [   76.658173]  bus_for_each_dev+0x80/0xd0
>>> [   76.658173]  driver_attach+0x30/0x40
>>> [   76.658174]  bus_add_driver+0x14c/0x1f0
>>> [   76.658175]  driver_register+0x64/0x110
>>> [   76.658175]  __pci_register_driver+0x58/0x68
>>> [   76.658176]  nouveau_drm_init+0x17c/0x10000 [nouveau]
>>> [   76.658177]  do_one_initcall+0x50/0x298
>>> [   76.658178]  do_init_module+0x88/0x270
>>> [   76.658178]  load_module+0x2068/0x2600
>>> [   76.658179]  __do_sys_finit_module+0xd0/0xe8
>>> [   76.658180]  __arm64_sys_finit_module+0x28/0x38
>>> [   76.658180]  el0_svc_common.constprop.0+0x7c/0x1f0
>>> [   76.658181]  el0_svc_handler+0x34/0xa0
>>> [   76.658182]  el0_svc+0x14/0x40
>>> [   76.658182]  el0_sync_handler+0x118/0x280
>>> [   76.658183]  el0_sync+0x164/0x180
>>> [   76.658248] SMP: stopping secondary CPUs
>>> [   76.658249] Kernel Offset: disabled
>>> [   76.658250] CPU features: 0x10002,2000600c
>>> [   76.658250] Memory Limit: none
>>
>> Last log, I promise.
>> This is what happens when it is locked out of devfn > 1, and nouveau
>> is not available.
>>
>> [    4.199087] vcc3v3_pcie: supplied by vcc12v_dcin
>> [   55.617783] ehci-pci: EHCI PCI platform driver
>> [   55.868935] ohci-pci: OHCI PCI platform driver
>> [   65.426085] rockchip-pcie f8000000.pcie: missing legacy phy; search
>> for per-lane PHY
>> [   65.426645] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
>> [   65.426662] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
>> [   65.507012] rockchip-pcie f8000000.pcie: current link width is x4
>> [   65.507078] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
>> [   65.507088] rockchip-pcie f8000000.pcie: Parsing ranges property...
>> [   65.507103] rockchip-pcie f8000000.pcie:   MEM
>> 0xfa000000..0xfbdfffff -> 0xfa000000
>> [   65.507112] rockchip-pcie f8000000.pcie:    IO
>> 0xfbe00000..0xfbffffff -> 0xfbe00000
>> [   65.507368] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
>> [   65.507376] pci_bus 0000:00: root bus resource [bus 00-1f]
>> [   65.507382] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
>> [   65.507390] pci_bus 0000:00: root bus resource [io
>> 0x0000-0x1fffff] (bus address [0xfbe00000-0xfbffffff])
>> [   65.507394] pci_bus 0000:00: scanning bus
>> [   65.507423] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
>> [   65.507543] pci 0000:00:00.0: supports D1
>> [   65.507547] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
>> [   65.507555] pci 0000:00:00.0: PME# disabled
>> [   65.511436] pci_bus 0000:00: fixups for bus
>> [   65.511452] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
>> [   65.511457] pci 0000:00:00.0: bridge configuration invalid ([bus
>> 00-00]), reconfiguring
>> [   65.511471] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
>> [   65.511633] pci_bus 0000:01: scanning bus
>> [   65.511739] pci 0000:01:00.0: [10de:11c4] type 00 class 0x030000
>> [   65.512205] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
>> [   65.512274] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
>> 64bit pref]
>> [   65.512345] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
>> 64bit pref]
>> [   65.512390] pci 0000:01:00.0: reg 0x24: initial BAR value 0x00000000 invalid
>> [   65.512394] pci 0000:01:00.0: reg 0x24: [io  size 0x0080]
>> [   65.512435] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
>> [   65.512485] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
>> [   65.513319] pci 0000:01:00.0: 8.000 Gb/s available PCIe bandwidth,
>> limited by 2.5 GT/s x4 link at 0000:00:00.0 (capable of 126.016 Gb/s
>> with 8 GT/s x16 link)
>> [   65.513568] pci 0000:01:00.0: vgaarb: VGA device added:
>> decodes=io+mem,owns=none,locks=none
>> [   65.513744] pci 0000:01:00.1: [10de:0e0b] type 00 class 0x040300
>> [   65.513900] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
>> [   65.514179] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 256)
>> [   65.515129] rockchip-pcie f8000000.pcie: phy link changes
>> [   65.527011] pci_bus 0000:01: fixups for bus
>> [   65.527021] pci_bus 0000:01: bus scan returning with max=01
>> [   65.527030] pci_bus 0000:01: busn_res: [bus 01-1f] end is updated to 01
>> [   65.527042] pci_bus 0000:00: bus scan returning with max=01
>> [   65.527076] pci 0000:00:00.0: BAR 14: no space for [mem size 0x0c000000]
>> [   65.527081] pci 0000:00:00.0: BAR 14: failed to assign [mem size 0x0c000000]
>> [   65.527091] pci 0000:01:00.0: BAR 1: no space for [mem size
>> 0x08000000 64bit pref]
>> [   65.527095] pci 0000:01:00.0: BAR 1: failed to assign [mem size
>> 0x08000000 64bit pref]
>> [   65.527100] pci 0000:01:00.0: BAR 3: no space for [mem size
>> 0x02000000 64bit pref]
>> [   65.527104] pci 0000:01:00.0: BAR 3: failed to assign [mem size
>> 0x02000000 64bit pref]
>> [   65.527109] pci 0000:01:00.0: BAR 0: no space for [mem size 0x01000000]
>> [   65.527119] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x01000000]
>> [   65.527123] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00080000 pref]
>> [   65.527129] pci 0000:01:00.0: BAR 6: failed to assign [mem size
>> 0x00080000 pref]
>> [   65.527134] pci 0000:01:00.1: BAR 0: no space for [mem size 0x00004000]
>> [   65.527138] pci 0000:01:00.1: BAR 0: failed to assign [mem size 0x00004000]
>> [   65.527142] pci 0000:01:00.0: BAR 5: no space for [io  size 0x0080]
>> [   65.527146] pci 0000:01:00.0: BAR 5: failed to assign [io  size 0x0080]
>> [   65.527152] pci 0000:00:00.0: PCI bridge to [bus 01]
>> [   65.529410] pcieport 0000:00:00.0: assign IRQ: got 238
>> [   65.529454] pcieport 0000:00:00.0: enabling bus mastering
>> [   65.529807] pcieport 0000:00:00.0: PME: Signaling with IRQ 239
>> [   65.530265] pcieport 0000:00:00.0: AER: enabled with IRQ 239
>> [   65.530685] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
> 
> An update to this issue, I purchased a pcie switch and tried testing
> with a few pcie devices I have around.
> Under no circumstances can I get it to read the switch, it aborts
> immediately upon reading the first register.
> Of note is it keeps reporting the following :
> [   64.510788] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
> [   64.510793] pci 0000:00:00.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [   64.510807] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> <snip>
> [   64.527368] pci_bus 0000:01: fixups for bus
> [   64.527399] pci 0000:01:00.0: scanning [bus 00-00] behind bridge, pass 0
> [   64.527404] pci 0000:01:00.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [   64.527452] pci 0000:01:00.0: scanning [bus 00-00] behind bridge, pass 1
> [   64.527850] pci_bus 0000:02: scanning bus
> And aborts here.
> 
> Thoughts?

Not having too much time to follow up w/ upstream development, sorry.
I attach a patch based on Linux-next for debugging, but I don't know if
it could work as expected since I can't test it now. But at least it
shows what the problems were.

> 
>>
>>
>>>
>>>
>>>>>>
>>>>>> Another reason I believe the address assignments are overflowing and
>>>>>> corrupting other address assignments is after the external abort, the
>>>>>> entire PCIE controller is inaccessible.
>>>>>> $ lspci
>>>>>> pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.1/config
>>>>>> lspci: Unable to read the standard configuration space header of
>>>>>> device 0000:01:00.1
>>>>>> pcilib: Cannot open /sys/bus/pci/devices/0000:00:00.0/config
>>>>>> lspci: Unable to read the standard configuration space header of
>>>>>> device 0000:00:00.0
>>>>>> pcilib: Cannot open /sys/bus/pci/devices/0000:01:00.0/config
>>>>>> lspci: Unable to read the standard configuration space header of
>>>>>> device 0000:01:00.0
>>>>>>
>>>>>> Attempting to rescan the bus or any other pci function results in a
>>>>>> hung kernel task.
>>>>>
>>>>> You crashed deep in the kernel part-way through the driver's probe
>>>>> routine; don't expect any partly-initialised bits of that driver to
>>>>> actually work. I bet you also can't load/unload modules and do various
>>>>> other things either, because the thread will almost certainly have been
>>>>> holding a bunch of locks, too. It's not worth trying to reason about the
>>>>> state of anything after that kind of unexpected crash ;)
>>>>
>>>> That is observed, but I think it is actually more destructive than that.
>>>> Attempting to do an access after the fact, then performing a soft
>>>> reset results in a spectacular crash when it tries to bring the BIG
>>>> cluster online.
>>>>
>>>> [    0.203160] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
>>>> [    5.355606] CPU4: failed to come online
>>>> [    5.363258] CPU4: failed in unknown state : 0x0
>>>> Unhandled Exception in EL3.
>>>> x30 =           0xUhandled Exception in L3.
>>>> x30 =           UNhandled ExcEption in EL3.
>>>> x30 =   0xUnhandled Excepti2n in EL_.
>>>> x30 =           0xUnhandled Exception in EL3.
>>>> x30 =           lxUnhadled Exception in EL3.
>>>> x30 =           0xUnhandled Excepion in EL3.
>>>> x30 =           0xUnhandled Exception in EL3.
>>>> x30 =   0xUnhandled Eception in EL3.
>>>> x30 =           0xUnhandled Exception in EL3.
>>>> x30 =           0xUnhandled Exception in EL3.x30 =
>>>> 0xUnhandled Exception in EL3.
>>>> x30 =           0xUnhandled Excepton i2 EL3.
>>>> x30 =           0xUnhandUnhandled Exceptionein EL3.lx30 =
>>>>   0xUnhandled Exception in EL3.
>>>> x30 =           0xUnhandled Unhanded Exception in EL3.
>>>> x30 =           0xUnhandled Exceptin in EL3Unhandled Exception inEL3.
>>>> x0 =            0xUnhandled Exception in EL3.
>>>> x30 =           0xUnhandled Exception in EL3.
>>>> x30 =           0xUnhandled Exception in EL.
>>>> x30 =           0xUnhandled Exception in EL3.
>>>> x30 =           0xUnhandled Exception in EL3.
>>>> x30 =           0xUnhandled xception in EL3
>>>> x30 =           0xUnhcndled Exception in EL3.
>>>> x30=            0xUnhandled Excextion in EL3.
>>>> x30 =   0xUnhandled Exception in EL3.
>>>> x30 =           0x[   10.484411] CPU5: failed to come online
>>>> [   10.484807] CPU5: failed in unknown state : 0x0
>>>> [   10.485374] smp: Brought up 1 node, 4 CPUs
>>>>
>>>> It goes on until it tries to do it's first DMA setup, then just hangs.
>>>>>
>>>>>>> If the Root Port (00:00.0) supports AER, you could also dump out the
>>>>>>> status registers from the AER capability and figure out whether it
>>>>>>> logged a PCIe error.  This would be sort of like what
>>>>>>> aer_process_err_devices() does.  A bit of a hassle to do this by hand
>>>>>>> in the exception path, but could be enlightening, just as a debug
>>>>>>> tool.
>>>>>>
>>>>>> Is there a way to handle external synchronous aborts in a device driver?
>>>>>
>>>>> Not easily on arm64 - the closest thing we have is some arch-internal
>>>>> machinery for undefined instruction hooks. You'd have to hack something
>>>>> gruesome into do_sea()...
>>>>>
>>>>> For starters it might be worth enabling all the debug prints in
>>>>> rockchip-pcie to see if there's any error IRQ corresponding to the
>>>>> aborted access.
>>>>
>>>> Debug prints are already enabled, I had to hack in the print to show
>>>> what addresses it was crashing on.
>>>>
>>>> Your explanation was the last piece for me to understand how this all
>>>> works, thanks!
>>>>>
>>>>> Robin.
>>>>>
>>>>>> If so, I'll definitely look into plugging in the aer status functions.
>>>>>>
>>>>>>>
>>>>>>>> [3] https://twitter.com/domipheus/status/1167586160077627393
>>>>>>>>>
>>>>>>>>>> [  198.751581] Modules linked in: drm_panel_orientation_quirks
>>>>>>>>>> pcie_rockchip_host(+) cpufreq_dt sch_fq_codel ip_tables x_tables ipv6
>>>>>>>>>> crc_ccitt nf_defrag_ipv6
>>>>>>>>>> [  198.752861] CPU: 1 PID: 1686 Comm: systemd-udevd Not tainted
>>>>>>>>>> 5.4.0-rc5-next-20191031-00001-gddbfb17ac1c4-dirty #5
>>>>>>>>>> [  198.753791] Hardware name: Pine64 RockPro64 (DT)
>>>>>>>>>> [  198.754215] pstate: 60400085 (nZCv daIf +PAN -UAO)
>>>>>>>>>> [  198.754672] pc : __raw_readl+0x0/0x8 [pcie_rockchip_host]
>>>>>>>>>> [  198.755172] lr : rockchip_pcie_rd_conf+0x140/0x1dc [pcie_rockchip_host]
>>>>>>>>>> [  198.755773] sp : ffff8000132af530
>>>>>>>>>> [  198.756079] x29: ffff8000132af530 x28: 0000000000000000
>>>>>>>>>> [  198.756565] x27: 0000000000000001 x26: 0000000000000000
>>>>>>>>>> [  198.757049] x25: ffff0000c20ac000 x24: 0000000000002000
>>>>>>>>>> [  198.757534] x23: ffff0000c20ae5c0 x22: ffff8000132af5d4
>>>>>>>>>> [  198.758018] x21: 0000000000002000 x20: 0000000000000004
>>>>>>>>>> [  198.758502] x19: 0000000000102000 x18: 0000000000000001
>>>>>>>>>> [  198.758987] x17: 0000000000000000 x16: 0000000000000000
>>>>>>>>>> [  198.759472] x15: ffffffffffffffff x14: ffff80001159bcc8
>>>>>>>>>> [  198.759957] x13: 0000000000000000 x12: ffff800011b2c000
>>>>>>>>>> [  198.760441] x11: ffff8000115bf000 x10: ffff800011310018
>>>>>>>>>> [  198.760926] x9 : 00000000fffb9fff x8 : 0000000000000001
>>>>>>>>>> [  198.761410] x7 : 0000000000000000 x6 : ffff0000f7492548
>>>>>>>>>> [  198.761894] x5 : 0000000000000001 x4 : ffff0000f7492548
>>>>>>>>>> [  198.762379] x3 : 0000000000000000 x2 : 0000000000c00008
>>>>>>>>>> [  198.762863] x1 : ffff80001dc00008 x0 : ffff80001a102000
>>>>>>>>>> [  198.763348] Call trace:
>>>>>>>>>> [  198.763583]  __raw_readl+0x0/0x8 [pcie_rockchip_host]
>>>>>>>>>> [  198.764057]  pci_bus_read_config_dword+0x88/0xd0
>>>>>>>>>> [  198.764484]  pci_bus_generic_read_dev_vendor_id+0x40/0x1b8
>>>>>>>>>> [  198.764982]  pci_bus_read_dev_vendor_id+0x58/0x88
>>>>>>>>>> [  198.765413]  pci_scan_single_device+0x84/0xf8
>>>>>>>>>> [  198.765812]  pci_scan_slot+0x7c/0x120
>>>>>>>>>> [  198.766149]  pci_scan_child_bus_extend+0x68/0x2dc
>>>>>>>>>> [  198.766579]  pci_scan_bridge_extend+0x350/0x588
>>>>>>>>>> [  198.766992]  pci_scan_child_bus_extend+0x21c/0x2dc
>>>>>>>>>> [  198.767430]  pci_scan_child_bus+0x24/0x30
>>>>>>>>>> [  198.767797]  pci_scan_root_bus_bridge+0xc4/0xd0
>>>>>>>>>> [  198.768215]  rockchip_pcie_probe+0x610/0x74c [pcie_rockchip_host]
>>>>>>>>>> [  198.768770]  platform_drv_probe+0x58/0xa8
>>>>>>>>>> [  198.769139]  really_probe+0xe0/0x318
>>>>>>>>>> [  198.769468]  driver_probe_device+0x5c/0xf0
>>>>>>>>>> [  198.769844]  device_driver_attach+0x74/0x80
>>>>>>>>>> [  198.770227]  __driver_attach+0x64/0xe8
>>>>>>>>>> [  198.770572]  bus_for_each_dev+0x84/0xd8
>>>>>>>>>> [  198.770924]  driver_attach+0x30/0x40
>>>>>>>>>> [  198.771253]  bus_add_driver+0x188/0x1e8
>>>>>>>>>> [  198.771605]  driver_register+0x64/0x110
>>>>>>>>>> [  198.771956]  __platform_driver_register+0x54/0x60
>>>>>>>>>> [  198.772388]  rockchip_pcie_driver_init+0x28/0x10000 [pcie_rockchip_host]
>>>>>>>>>> [  198.772998]  do_one_initcall+0x94/0x390
>>>>>>>>>> [  198.773353]  do_init_module+0x88/0x268
>>>>>>>>>> [  198.773697]  load_module+0x1e18/0x2198
>>>>>>>>>> [  198.774043]  __do_sys_finit_module+0xd0/0xe8
>>>>>>>>>> [  198.774435]  __arm64_sys_finit_module+0x28/0x38
>>>>>>>>>> [  198.774858]  el0_svc_common.constprop.3+0xa4/0x1d8
>>>>>>>>>> [  198.775297]  el0_svc_handler+0x34/0xa0
>>>>>>>>>> [  198.775645]  el0_svc+0x14/0x40
>>>>>>>>>> [  198.775928]  el0_sync_handler+0x118/0x290
>>>>>>>>>> [  198.776295]  el0_sync+0x164/0x180
>>>>>>>>>> [  198.776609] Code: bad PC value
>>>>>>>>>> [  198.776897] ---[ end trace 88fc77651b5e2909 ]---
>>>>>>
>>>>>> _______________________________________________
>>>>>> Linux-rockchip mailing list
>>>>>> Linux-rockchip@lists.infradead.org
>>>>>> http://lists.infradead.org/mailman/listinfo/linux-rockchip
>>>>>>
> 
> 
> 

[-- Attachment #2: 0001-WFT-PCI-rockchip-play-game-with-unsupported-request-.patch --]
[-- Type: text/plain, Size: 9960 bytes --]

From da9db487615c3687a0823c54d8d0790cbd4f79a2 Mon Sep 17 00:00:00 2001
From: Shawn Lin <shawn.lin@rock-chips.com>
Date: Thu, 21 Nov 2019 09:45:12 +0800
Subject: [PATCH] WFT: PCI: rockchip: play game with unsupported request from
 EP

Native defect prevents this RC far from supporting any response
from EP which UR filed is set. Take a big hammer to take over
Serror handler and work around it.

Maybe ATF is the right place in the future to cover this kind
of defect, but all right, for ARM32, imx6q_pcie_abort_handler
almost handle the same defect.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
---
 arch/arm64/include/asm/system_misc.h        |  5 +-
 arch/arm64/kernel/entry.S                   | 86 ++++++++++++++++++++---------
 arch/arm64/mm/fault.c                       | 44 ++++++++++++++-
 drivers/pci/controller/pcie-rockchip-host.c | 15 +++++
 drivers/pci/controller/pcie-rockchip.h      |  4 ++
 drivers/pci/probe.c                         |  7 ++-
 6 files changed, 132 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 1ab63cf..b45e4e1 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -28,7 +28,10 @@ void arm64_notify_die(const char *str, struct pt_regs *regs,
 void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
 					     struct pt_regs *),
 			   int sig, int code, const char *name);
-
+void hook_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
+			struct pt_regs *), int sig, int code, const char *name);
+void *hook_serror_handler(int (*fn)(unsigned long, unsigned int,
+		        struct pt_regs *));
 struct mm_struct;
 extern void __show_regs(struct pt_regs *);
 
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 583f71a..321f9ff 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -471,7 +471,7 @@ ENTRY(vectors)
 	kernel_ventry	1, sync_invalid			// Synchronous EL1t
 	kernel_ventry	1, irq_invalid			// IRQ EL1t
 	kernel_ventry	1, fiq_invalid			// FIQ EL1t
-	kernel_ventry	1, error_invalid		// Error EL1t
+	kernel_ventry   1, error                // Error EL1h
 
 	kernel_ventry	1, sync				// Synchronous EL1h
 	kernel_ventry	1, irq				// IRQ EL1h
@@ -562,6 +562,36 @@ el1_sync_invalid:
 	inv_entry 1, BAD_SYNC
 ENDPROC(el1_sync_invalid)
 
+       .align  6
+el1_error:
+       kernel_entry 1
+       mrs     x1, esr_el1                     // read the syndrome register
+       lsr     x24, x1, #ESR_ELx_EC_SHIFT      // exception class
+       cmp     x24, #ESR_ELx_EC_SERROR         // SError exception in EL1
+       b.ne    el1_error_inv
+el1_serr:
+       mrs     x0, far_el1
+       enable_dbg
+       // re-enable interrupts if they were enabled in the aborted context
+       tbnz    x23, #7, 1f                     // PSR_I_BIT
+       //enable_irq
+       msr     daifclr, #2
+1:
+       mov     x2, sp                          // struct pt_regs
+       bl      do_serr_abort
+
+       // disable interrupts before pulling preserved data off the stack
+       //disable_irq
+       msr     daifset, #2
+       kernel_exit 1
+el1_error_inv:
+       enable_dbg
+       mov     x0, sp
+       mov     x2, x1
+       mov     x1, #BAD_ERROR
+       b       bad_mode
+ENDPROC(el1_error)
+
 el1_irq_invalid:
 	inv_entry 1, BAD_IRQ
 ENDPROC(el1_irq_invalid)
@@ -667,6 +697,7 @@ el0_irq_compat:
 	kernel_entry 0, 32
 	b	el0_irq_naked
 
+       .align  6
 el0_error_compat:
 	kernel_entry 0, 32
 	b	el0_error_naked
@@ -697,30 +728,6 @@ el0_irq_naked:
 	b	ret_to_user
 ENDPROC(el0_irq)
 
-el1_error:
-	kernel_entry 1
-	mrs	x1, esr_el1
-	gic_prio_kentry_setup tmp=x2
-	enable_dbg
-	mov	x0, sp
-	bl	do_serror
-	kernel_exit 1
-ENDPROC(el1_error)
-
-el0_error:
-	kernel_entry 0
-el0_error_naked:
-	mrs	x25, esr_el1
-	gic_prio_kentry_setup tmp=x2
-	ct_user_exit_irqoff
-	enable_dbg
-	mov	x0, sp
-	mov	x1, x25
-	bl	do_serror
-	enable_da_f
-	b	ret_to_user
-ENDPROC(el0_error)
-
 /*
  * Ok, we need to do extra processing, enter the slow path.
  */
@@ -865,6 +872,35 @@ __entry_tramp_data_start:
 #endif /* CONFIG_RANDOMIZE_BASE */
 #endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */
 
+       .align  6
+el0_error:
+       kernel_entry 0
+el0_error_naked:
+       mrs     x25, esr_el1                    // read the syndrome register
+       lsr     x24, x25, #ESR_ELx_EC_SHIFT     // exception class
+       cmp     x24, #ESR_ELx_EC_SERROR         // SError exception in EL0
+       b.ne    el0_error_inv
+el0_serr:
+       mrs     x26, far_el1
+       // enable interrupts before calling the main handler
+       msr     daifclr, #(8 | 2)
+       //ct_user_exit
+#ifdef CONFIG_CONTEXT_TRACKING
+               bl      context_tracking_user_exit
+#endif
+       bic     x0, x26, #(0xff << 56)
+       mov     x1, x25
+       mov     x2, sp
+       bl      do_serr_abort
+       b       ret_to_user
+el0_error_inv:
+       enable_dbg
+       mov     x0, sp
+       mov     x1, #BAD_ERROR
+       mov     x2, x25
+       b       bad_mode
+ENDPROC(el0_error)
+
 /*
  * Register switch for AArch64. The callee-saved registers need to be saved
  * and restored. On entry:
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 077b02a..5adb7ef 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -48,7 +48,7 @@ struct fault_info {
 	const char *name;
 };
 
-static const struct fault_info fault_info[];
+static struct fault_info fault_info[] __ro_after_init;
 static struct fault_info debug_fault_info[];
 
 static inline const struct fault_info *esr_to_fault_info(unsigned int esr)
@@ -660,7 +660,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	return 0;
 }
 
-static const struct fault_info fault_info[] = {
+static struct fault_info fault_info[] __ro_after_init = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"ttbr address size fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"level 1 address size fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"level 2 address size fault"	},
@@ -727,6 +727,46 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 63"			},
 };
 
+void __init hook_fault_code(int nr,
+                       int (*fn)(unsigned long, unsigned int, struct pt_regs *),
+                       int sig, int code, const char *name)
+{
+       BUG_ON(nr < 0 || nr >= ARRAY_SIZE(fault_info));
+
+       fault_info[nr].fn       = fn;
+       fault_info[nr].sig      = sig;
+       fault_info[nr].code     = code;
+       fault_info[nr].name     = name;
+}
+
+static int (*serror_handler)(unsigned long, unsigned int,
+                            struct pt_regs *) __ro_after_init;
+
+void *__init hook_serror_handler(int (*fn)(unsigned long, unsigned int,
+                                struct pt_regs *))
+{
+       void *ret = serror_handler;
+
+       serror_handler = fn;
+       return ret;
+}
+
+asmlinkage void  do_serr_abort(unsigned long addr, unsigned int esr,
+                                        struct pt_regs *regs)
+{
+	const struct fault_info *inf;
+
+	inf = esr_to_fault_info(esr);
+       if (serror_handler)
+               if (!serror_handler(addr, esr, regs))
+                       return;
+
+       pr_alert("Unhandled SError: (0x%08x) at 0x%016lx\n", esr, addr);
+       __show_regs(regs);
+
+       arm64_notify_die(inf->name, regs, inf->sig, inf->code, (void __user *)addr, esr);
+}
+
 void do_mem_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 {
 	const struct fault_info *inf = esr_to_fault_info(esr);
diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
index ee83f84..5c5578b 100644
--- a/drivers/pci/controller/pcie-rockchip-host.c
+++ b/drivers/pci/controller/pcie-rockchip-host.c
@@ -958,6 +958,17 @@ static int __maybe_unused rockchip_pcie_resume_noirq(struct device *dev)
 	return err;
 }
 
+#ifdef CONFIG_ARM64
+static int (*serror_chain)(unsigned long addr, unsigned int esr,
+                               struct pt_regs *regs);
+static int do_rockchip_pcie_serror(unsigned long addr, unsigned int esr,
+                               struct pt_regs *regs)
+{
+       printk("%s:hey!\n", __func__);
+       return 0;
+}
+#endif
+
 static int rockchip_pcie_probe(struct platform_device *pdev)
 {
 	struct rockchip_pcie *rockchip;
@@ -1028,6 +1039,10 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
 	bridge->map_irq = of_irq_parse_and_map_pci;
 	bridge->swizzle_irq = pci_common_swizzle;
 
+       #ifdef CONFIG_ARM64
+       serror_chain = hook_serror_handler(do_rockchip_pcie_serror);
+       #endif
+
 	err = pci_scan_root_bus_bridge(bridge);
 	if (err < 0)
 		goto err_remove_irq_domain;
diff --git a/drivers/pci/controller/pcie-rockchip.h b/drivers/pci/controller/pcie-rockchip.h
index d90dfb3..aa0829c 100644
--- a/drivers/pci/controller/pcie-rockchip.h
+++ b/drivers/pci/controller/pcie-rockchip.h
@@ -13,6 +13,10 @@
 
 #include <linux/kernel.h>
 #include <linux/pci.h>
+#ifdef CONFIG_ARM64
+#include <asm/signal.h>
+#include <asm/system_misc.h>
+#endif
 
 /*
  * The upper 16 bits of PCIE_CLIENT_CONFIG are a write mask for the lower 16
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 512cb43..0d04bc5 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2254,7 +2254,12 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
 	    bridge->device == 0x80b5)
 		return pci_idt_bus_quirk(bus, devfn, l, timeout);
 #endif
-
+	/*
+	 * Some broken switches need RC write something to its RO vender ID,
+	 * otherwise it kick back 0x0 for any CFG read. I see how windows +
+	 * BIOS enumerate it from analyzer.... Nee proper quirks help here?
+	 * */
+	pci_bus_write_config_dword(bus, devfn, PCI_VENDOR_ID, 0x00240000);
 	return pci_bus_generic_read_dev_vendor_id(bus, devfn, l, timeout);
 }
 EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-21  2:03                   ` Shawn Lin
@ 2019-11-22  1:02                     ` Peter Geis
  2019-11-22 14:11                       ` Peter Geis
  2019-11-22 14:36                     ` Bjorn Helgaas
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Geis @ 2019-11-22  1:02 UTC (permalink / raw)
  To: Shawn Lin
  Cc: Robin Murphy, Bjorn Helgaas, linux-pci, Heiko Stuebner,
	open list:ARM/Rockchip SoC...

On Wed, Nov 20, 2019 at 9:05 PM Shawn Lin <shawn.lin@rock-chips.com> wrote:
>
<snip>
>
> Not having too much time to follow up w/ upstream development, sorry.
> I attach a patch based on Linux-next for debugging, but I don't know if
> it could work as expected since I can't test it now. But at least it
> shows what the problems were.

Well it is certainly hideous, and does not compile cleanly (built-in
complains, module is broken), but it finished scanning all buses.
I'll do some more serious testing and get back to you with more results.
Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-22  1:02                     ` Peter Geis
@ 2019-11-22 14:11                       ` Peter Geis
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Geis @ 2019-11-22 14:11 UTC (permalink / raw)
  To: Shawn Lin
  Cc: Robin Murphy, Bjorn Helgaas, linux-pci, Heiko Stuebner,
	open list:ARM/Rockchip SoC...

On Thu, Nov 21, 2019 at 8:02 PM Peter Geis <pgwipeout@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 9:05 PM Shawn Lin <shawn.lin@rock-chips.com> wrote:
> >
> <snip>
> >
> > Not having too much time to follow up w/ upstream development, sorry.
> > I attach a patch based on Linux-next for debugging, but I don't know if
> > it could work as expected since I can't test it now. But at least it
> > shows what the problems were.
>
> Well it is certainly hideous, and does not compile cleanly (built-in
> complains, module is broken), but it finished scanning all buses.
> I'll do some more serious testing and get back to you with more results.
> Thanks!

As promised, here is the results of this patch:
# lspci
00:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd RK3399 PCI
Express Root Port
01:00.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
02:01.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
02:03.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
02:05.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
02:07.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
04:00.0 VGA compatible controller: NVIDIA Corporation GK106 [GeForce
GTX 645 OEM] (rev a1)
04:00.1 Audio device: NVIDIA Corporation GK106 HDMI Audio Controller (rev a1)
06:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe
SATA 6.0 Gb/s controller (rev 11)
06:00.1 IDE interface: Marvell Technology Group Ltd. 88SE912x IDE
Controller (rev 11)

# lspci -t
-[0000:00]---00.0-[01-06]----00.0-[02-06]--+-01.0-[03]--
                                           +-03.0-[04]--+-00.0
                                           |            \-00.1
                                           +-05.0-[05]--
                                           \-07.0-[06]--+-00.0
                                                        \-00.1

# lspci -v
00:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd RK3399 PCI
Express Root Port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 228
        Bus: primary=00, secondary=01, subordinate=06, sec-latency=0
        I/O behind bridge: 00000000-00000fff [size=4K]
        Memory behind bridge: None
        Prefetchable memory behind bridge: 00000000-000fffff [size=1M]
        Capabilities: [80] Power Management version 3
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable+ 64bit+
        Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
        Capabilities: [c0] Express Root Port (Slot+), MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [274] Transaction Processing Hints
        Kernel driver in use: pcieport

01:00.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
(prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 229
        Bus: primary=01, secondary=02, subordinate=06, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Upstream Port, MSI 00
        Capabilities: [c0] Subsystem: ASMedia Technology Inc. ASM1184e
PCIe Switch Port
        Capabilities: [100] Virtual Channel
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [300] Vendor Specific Information: ID=0000 Rev=0
Len=c00 <?>
        Kernel driver in use: pcieport

02:01.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
(prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 231
        Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Downstream Port (Slot+), MSI 00
        Capabilities: [c0] Subsystem: ASMedia Technology Inc. ASM1184e
PCIe Switch Port
        Capabilities: [100] Virtual Channel
        Capabilities: [200] Advanced Error Reporting
        Kernel driver in use: pcieport

02:03.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
(prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 233
        Bus: primary=02, secondary=04, subordinate=04, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Downstream Port (Slot+), MSI 00
        Capabilities: [c0] Subsystem: ASMedia Technology Inc. ASM1184e
PCIe Switch Port
        Capabilities: [100] Virtual Channel
        Capabilities: [200] Advanced Error Reporting
        Kernel driver in use: pcieport

02:05.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
(prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 234
        Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Downstream Port (Slot+), MSI 00
        Capabilities: [c0] Subsystem: ASMedia Technology Inc. ASM1184e
PCIe Switch Port
        Capabilities: [100] Virtual Channel
        Capabilities: [200] Advanced Error Reporting
        Kernel driver in use: pcieport

02:07.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
(prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 235
        Bus: primary=02, secondary=06, subordinate=06, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Downstream Port (Slot+), MSI 00
        Capabilities: [c0] Subsystem: ASMedia Technology Inc. ASM1184e
PCIe Switch Port
        Capabilities: [100] Virtual Channel
        Capabilities: [200] Advanced Error Reporting
        Kernel driver in use: pcieport

04:00.0 VGA compatible controller: NVIDIA Corporation GK106 [GeForce
GTX 645 OEM] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Pegatron GK106 [GeForce GTX 645 OEM]
        Flags: fast devsel, IRQ 232
        Memory at <unassigned> (64-bit, prefetchable) [disabled]
        Memory at <unassigned> (64-bit, prefetchable) [disabled]
        I/O ports at <unassigned> [disabled]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1
Len=024 <?>
        Capabilities: [900] Secondary PCI Express <?>
        Kernel modules: nouveau

04:00.1 Audio device: NVIDIA Corporation GK106 HDMI Audio Controller (rev a1)
        Subsystem: Pegatron GK106 HDMI Audio Controller
        Flags: fast devsel
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00

06:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe
SATA 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
        Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
6.0 Gb/s controller
        Flags: fast devsel, IRQ 232
        I/O ports at 0000
        I/O ports at 0000
        I/O ports at 0000
        I/O ports at 0000
        I/O ports at 0000
        Memory at <ignored> (32-bit, non-prefetchable)
        Expansion ROM at <ignored> [disabled]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
        Capabilities: [70] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting

06:00.1 IDE interface: Marvell Technology Group Ltd. 88SE912x IDE
Controller (rev 11) (prog-if 8f [PCI native mode controller, supports
both channels switched to ISA compatibility mode, supports bus
mastering])
        Subsystem: Marvell Technology Group Ltd. 88SE912x IDE Controller
        Flags: fast devsel
        I/O ports at 0000
        I/O ports at 0000
        I/O ports at 0000
        I/O ports at 0000
        I/O ports at 0000
        Memory at <ignored> (32-bit, non-prefetchable)
        Expansion ROM at <unassigned>
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
        Capabilities: [70] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting

dmesg:
[   52.602805] rockchip-pcie f8000000.pcie: missing legacy phy; search
for per-lane PHY
[   52.633171] rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
[   52.636853] rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
[   52.665513] rockchip-pcie f8000000.pcie: current link width is x1
[   52.667675] rockchip-pcie f8000000.pcie: idling lane 1
[   52.670208] rockchip-pcie f8000000.pcie: idling lane 2
[   52.673324] rockchip-pcie f8000000.pcie: idling lane 3
[   52.678183] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges:
[   52.681530] rockchip-pcie f8000000.pcie: Parsing ranges property...
[   52.685148] rockchip-pcie f8000000.pcie:   MEM
0xfa000000..0xfbdfffff -> 0xfa000000
[   52.688901] rockchip-pcie f8000000.pcie:    IO
0xfbe00000..0xfbffffff -> 0xfbe00000
[   52.702615] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
[   52.705800] pci_bus 0000:00: root bus resource [bus 00-1f]
[   52.707798] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff]
[   52.710864] pci_bus 0000:00: root bus resource [io
0x0000-0x1fffff] (bus address [0xfbe00000-0xfbffffff])
[   52.714658] pci_bus 0000:00: scanning bus
[   52.717990] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400
[   52.724951] pci 0000:00:00.0: supports D1
[   52.726604] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
[   52.729216] pci 0000:00:00.0: PME# disabled
[   52.981495] pci_bus 0000:00: fixups for bus
[   52.983383] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
[   52.986165] pci 0000:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[   52.989673] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
[   52.998243] pci_bus 0000:01: scanning bus
[   53.001836] pci 0000:01:00.0: [1b21:1184] type 01 class 0x060400
[   53.007401] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256)
[   53.010465] pci 0000:01:00.0: enabling Extended Tags
[   53.017344] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
[   53.019592] pci 0000:01:00.0: PME# disabled
[   53.024976] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 4.000 Gb/s
with 5 GT/s x1 link)
[   53.041892] rockchip-pcie f8000000.pcie: phy link changes
[   53.271553] pci_bus 0000:01: fixups for bus
[   53.274158] pci 0000:01:00.0: scanning [bus 00-00] behind bridge, pass 0
[   53.276941] pci 0000:01:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[   53.279910] pci 0000:01:00.0: scanning [bus 00-00] behind bridge, pass 1
[   53.290186] pci_bus 0000:02: scanning bus
[   53.291820] do_rockchip_pcie_serror:hey!
[   53.293905] do_rockchip_pcie_serror:hey!
[   53.305314] pci 0000:02:01.0: [1b21:1184] type 01 class 0x060400
[   53.310885] pci 0000:02:01.0: Max Payload Size set to 256 (was 128, max 256)
[   53.313851] pci 0000:02:01.0: enabling Extended Tags
[   53.319975] pci 0000:02:01.0: PME# supported from D0 D3hot D3cold
[   53.322901] pci 0000:02:01.0: PME# disabled
[   53.344870] do_rockchip_pcie_serror:hey!
[   53.346430] do_rockchip_pcie_serror:hey!
[   53.357285] pci 0000:02:03.0: [1b21:1184] type 01 class 0x060400
[   53.362818] pci 0000:02:03.0: Max Payload Size set to 256 (was 128, max 256)
[   53.365783] pci 0000:02:03.0: enabling Extended Tags
[   53.371869] pci 0000:02:03.0: PME# supported from D0 D3hot D3cold
[   53.374766] pci 0000:02:03.0: PME# disabled
[   53.396764] do_rockchip_pcie_serror:hey!
[   53.398333] do_rockchip_pcie_serror:hey!
[   53.409167] pci 0000:02:05.0: [1b21:1184] type 01 class 0x060400
[   53.414653] pci 0000:02:05.0: Max Payload Size set to 256 (was 128, max 256)
[   53.417619] pci 0000:02:05.0: enabling Extended Tags
[   53.423265] pci 0000:02:05.0: PME# supported from D0 D3hot D3cold
[   53.426049] pci 0000:02:05.0: PME# disabled
[   53.447276] do_rockchip_pcie_serror:hey!
[   53.449363] do_rockchip_pcie_serror:hey!
[   53.459200] pci 0000:02:07.0: [1b21:1184] type 01 class 0x060400
[   53.465029] pci 0000:02:07.0: Max Payload Size set to 256 (was 128, max 256)
[   53.467434] pci 0000:02:07.0: enabling Extended Tags
[   53.473584] pci 0000:02:07.0: PME# supported from D0 D3hot D3cold
[   53.475787] pci 0000:02:07.0: PME# disabled
[   53.498792] do_rockchip_pcie_serror:hey!
[   53.500963] do_rockchip_pcie_serror:hey!
[   53.509676] do_rockchip_pcie_serror:hey!
[   53.511148] do_rockchip_pcie_serror:hey!
[   53.521486] do_rockchip_pcie_serror:hey!
[   53.523028] do_rockchip_pcie_serror:hey!
[   53.532712] do_rockchip_pcie_serror:hey!
[   53.534233] do_rockchip_pcie_serror:hey!
[   53.542761] do_rockchip_pcie_serror:hey!
[   53.544797] do_rockchip_pcie_serror:hey!
[   53.553497] do_rockchip_pcie_serror:hey!
[   53.554955] do_rockchip_pcie_serror:hey!
[   53.565383] do_rockchip_pcie_serror:hey!
[   53.566939] do_rockchip_pcie_serror:hey!
[   53.576816] do_rockchip_pcie_serror:hey!
[   53.578358] do_rockchip_pcie_serror:hey!
[   53.586948] do_rockchip_pcie_serror:hey!
[   53.588974] do_rockchip_pcie_serror:hey!
[   53.597684] do_rockchip_pcie_serror:hey!
[   53.599158] do_rockchip_pcie_serror:hey!
[   53.608521] do_rockchip_pcie_serror:hey!
[   53.610013] do_rockchip_pcie_serror:hey!
[   53.618541] do_rockchip_pcie_serror:hey!
[   53.620010] do_rockchip_pcie_serror:hey!
[   53.629141] do_rockchip_pcie_serror:hey!
[   53.630609] do_rockchip_pcie_serror:hey!
[   53.639288] do_rockchip_pcie_serror:hey!
[   53.641271] do_rockchip_pcie_serror:hey!
[   53.649883] do_rockchip_pcie_serror:hey!
[   53.651344] do_rockchip_pcie_serror:hey!
[   53.660619] do_rockchip_pcie_serror:hey!
[   53.662088] do_rockchip_pcie_serror:hey!
[   53.670721] do_rockchip_pcie_serror:hey!
[   53.672701] do_rockchip_pcie_serror:hey!
[   53.681443] do_rockchip_pcie_serror:hey!
[   53.682922] do_rockchip_pcie_serror:hey!
[   53.692871] do_rockchip_pcie_serror:hey!
[   53.694405] do_rockchip_pcie_serror:hey!
[   53.702984] do_rockchip_pcie_serror:hey!
[   53.705005] do_rockchip_pcie_serror:hey!
[   53.713716] do_rockchip_pcie_serror:hey!
[   53.715179] do_rockchip_pcie_serror:hey!
[   53.725222] do_rockchip_pcie_serror:hey!
[   53.726755] do_rockchip_pcie_serror:hey!
[   53.736770] do_rockchip_pcie_serror:hey!
[   53.738283] do_rockchip_pcie_serror:hey!
[   53.746842] do_rockchip_pcie_serror:hey!
[   53.748855] do_rockchip_pcie_serror:hey!
[   53.757557] pci_bus 0000:02: fixups for bus
[   53.759331] pci 0000:02:01.0: scanning [bus 00-00] behind bridge, pass 0
[   53.762066] pci 0000:02:01.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[   53.765616] pci 0000:02:03.0: scanning [bus 00-00] behind bridge, pass 0
[   53.767824] pci 0000:02:03.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[   53.771232] pci 0000:02:05.0: scanning [bus 00-00] behind bridge, pass 0
[   53.774037] pci 0000:02:05.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[   53.777467] pci 0000:02:07.0: scanning [bus 00-00] behind bridge, pass 0
[   53.779646] pci 0000:02:07.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[   53.783096] pci 0000:02:01.0: scanning [bus 00-00] behind bridge, pass 1
[   53.793143] pci_bus 0000:03: scanning bus
[   53.794767] do_rockchip_pcie_serror:hey!
[   53.798261] do_rockchip_pcie_serror:hey!
[   54.026694] pci_bus 0000:03: fixups for bus
[   54.028933] pci_bus 0000:03: bus scan returning with max=03
[   54.030975] pci_bus 0000:03: busn_res: [bus 03-1f] end is updated to 03
[   54.034051] pci 0000:02:03.0: scanning [bus 00-00] behind bridge, pass 1
[   54.043471] pci_bus 0000:04: scanning bus
[   54.047067] pci 0000:04:00.0: [10de:11c4] type 00 class 0x030000
[   54.051173] pci 0000:04:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
[   54.054502] pci 0000:04:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
64bit pref]
[   54.058096] pci 0000:04:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
64bit pref]
[   54.061432] pci 0000:04:00.0: reg 0x24: initial BAR value 0x00000000 invalid
[   54.063740] pci 0000:04:00.0: reg 0x24: [io  size 0x0080]
[   54.066588] pci 0000:04:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[   54.070087] pci 0000:04:00.0: Max Payload Size set to 256 (was 128, max 256)
[   54.079340] pci 0000:04:00.0: 2.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 126.016 Gb/s
with 8 GT/s x16 link)
[   54.090881] pci 0000:04:00.0: vgaarb: VGA device added:
decodes=io+mem,owns=none,locks=none
[   54.098178] pci 0000:04:00.1: [10de:0e0b] type 00 class 0x040300
[   54.101950] pci 0000:04:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[   54.106759] pci 0000:04:00.1: Max Payload Size set to 256 (was 128, max 256)
[   54.121728] do_rockchip_pcie_serror:hey!
[   54.123286] do_rockchip_pcie_serror:hey!
[   54.125381] do_rockchip_pcie_serror:hey!
[   54.126852] do_rockchip_pcie_serror:hey!
[   54.128867] do_rockchip_pcie_serror:hey!
[   54.130331] do_rockchip_pcie_serror:hey!
[   54.131747] do_rockchip_pcie_serror:hey!
[   54.133663] do_rockchip_pcie_serror:hey!
[   54.135140] do_rockchip_pcie_serror:hey!
[   54.137111] do_rockchip_pcie_serror:hey!
[   54.138586] do_rockchip_pcie_serror:hey!
[   54.139984] do_rockchip_pcie_serror:hey!
[   54.371838] pci_bus 0000:04: fixups for bus
[   54.374185] pci_bus 0000:04: bus scan returning with max=04
[   54.377066] pci_bus 0000:04: busn_res: [bus 04-1f] end is updated to 04
[   54.379567] pci 0000:02:05.0: scanning [bus 00-00] behind bridge, pass 1
[   54.389098] pci_bus 0000:05: scanning bus
[   54.390718] do_rockchip_pcie_serror:hey!
[   54.392729] do_rockchip_pcie_serror:hey!
[   54.622660] pci_bus 0000:05: fixups for bus
[   54.624945] pci_bus 0000:05: bus scan returning with max=05
[   54.626966] pci_bus 0000:05: busn_res: [bus 05-1f] end is updated to 05
[   54.629967] pci 0000:02:07.0: scanning [bus 00-00] behind bridge, pass 1
[   54.639123] pci_bus 0000:06: scanning bus
[   54.642628] pci 0000:06:00.0: [1b4b:9123] type 00 class 0x010601
[   54.646519] pci 0000:06:00.0: reg 0x10: initial BAR value 0x00008000 invalid
[   54.649456] pci 0000:06:00.0: reg 0x10: [io  size 0x0008]
[   54.651786] pci 0000:06:00.0: reg 0x14: initial BAR value 0x00008040 invalid
[   54.654657] pci 0000:06:00.0: reg 0x14: [io  size 0x0004]
[   54.657499] pci 0000:06:00.0: reg 0x18: initial BAR value 0x00008100 invalid
[   54.659776] pci 0000:06:00.0: reg 0x18: [io  size 0x0008]
[   54.662646] pci 0000:06:00.0: reg 0x1c: initial BAR value 0x00008140 invalid
[   54.665553] pci 0000:06:00.0: reg 0x1c: [io  size 0x0004]
[   54.667861] pci 0000:06:00.0: reg 0x20: [io  0x800000-0x80000f]
[   54.670926] pci 0000:06:00.0: reg 0x24: [mem 0x00900000-0x009007ff]
[   54.673999] pci 0000:06:00.0: reg 0x30: [mem 0xd0000000-0xd000ffff pref]
[   54.677231] pci 0000:06:00.0: Max Payload Size set to 256 (was 128, max 512)
[   54.680024] pci 0000:06:00.0: Enabling fixed DMA alias to 00.1
[   54.685613] pci 0000:06:00.0: PME# supported from D3hot
[   54.687586] pci 0000:06:00.0: PME# disabled
[   54.691486] pci 0000:06:00.0: 2.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 4.000 Gb/s
with 5 GT/s x1 link)
[   54.707650] pci 0000:06:00.1: [1b4b:91a4] type 00 class 0x01018f
[   54.711661] pci 0000:06:00.1: reg 0x10: initial BAR value 0x00008200 invalid
[   54.714870] pci 0000:06:00.1: reg 0x10: [io  size 0x0008]
[   54.717850] pci 0000:06:00.1: reg 0x14: initial BAR value 0x00008240 invalid
[   54.720987] pci 0000:06:00.1: reg 0x14: [io  size 0x0004]
[   54.723317] pci 0000:06:00.1: reg 0x18: initial BAR value 0x00008300 invalid
[   54.726269] pci 0000:06:00.1: reg 0x18: [io  size 0x0008]
[   54.729128] pci 0000:06:00.1: reg 0x1c: initial BAR value 0x00008340 invalid
[   54.731406] pci 0000:06:00.1: reg 0x1c: [io  size 0x0004]
[   54.734321] pci 0000:06:00.1: reg 0x20: [io  0x820000-0x82000f]
[   54.737333] pci 0000:06:00.1: reg 0x24: [mem 0x00920000-0x0092000f]
[   54.739855] pci 0000:06:00.1: reg 0x30: [mem 0x00000000-0x0000ffff pref]
[   54.743153] pci 0000:06:00.1: Max Payload Size set to 256 (was 128, max 512)
[   54.749557] pci 0000:06:00.1: PME# supported from D3hot
[   54.751556] pci 0000:06:00.1: PME# disabled
[   54.764125] do_rockchip_pcie_serror:hey!
[   54.766254] do_rockchip_pcie_serror:hey!
[   54.767753] do_rockchip_pcie_serror:hey!
[   54.769735] do_rockchip_pcie_serror:hey!
[   54.771227] do_rockchip_pcie_serror:hey!
[   54.773130] do_rockchip_pcie_serror:hey!
[   54.774626] do_rockchip_pcie_serror:hey!
[   54.776019] do_rockchip_pcie_serror:hey!
[   54.778028] do_rockchip_pcie_serror:hey!
[   54.779489] do_rockchip_pcie_serror:hey!
[   54.781441] do_rockchip_pcie_serror:hey!
[   54.782900] do_rockchip_pcie_serror:hey!
[   55.024561] pci_bus 0000:06: fixups for bus
[   55.026347] pci_bus 0000:06: bus scan returning with max=06
[   55.028883] pci_bus 0000:06: busn_res: [bus 06-1f] end is updated to 06
[   55.031193] pci_bus 0000:02: bus scan returning with max=06
[   55.033657] pci_bus 0000:02: busn_res: [bus 02-1f] end is updated to 06
[   55.035974] pci_bus 0000:01: bus scan returning with max=06
[   55.038913] pci_bus 0000:01: busn_res: [bus 01-1f] end is updated to 06
[   55.041730] pci_bus 0000:00: bus scan returning with max=06
[   55.045837] pci 0000:00:00.0: BAR 14: no space for [mem size 0x10000000]
[   55.048083] pci 0000:00:00.0: BAR 14: failed to assign [mem size 0x10000000]
[   55.051414] pci 0000:01:00.0: BAR 15: no space for [mem size
0x0c000000 64bit pref]
[   55.054841] pci 0000:01:00.0: BAR 15: failed to assign [mem size
0x0c000000 64bit pref]
[   55.058132] pci 0000:01:00.0: BAR 14: no space for [mem size 0x01c00000]
[   55.061000] pci 0000:01:00.0: BAR 14: failed to assign [mem size 0x01c00000]
[   55.063297] pci 0000:01:00.0: BAR 13: no space for [io  size 0x2000]
[   55.065958] pci 0000:01:00.0: BAR 13: failed to assign [io  size 0x2000]
[   55.069058] pci 0000:02:03.0: BAR 15: no space for [mem size
0x0c000000 64bit pref]
[   55.071603] pci 0000:02:03.0: BAR 15: failed to assign [mem size
0x0c000000 64bit pref]
[   55.074791] pci 0000:02:03.0: BAR 14: no space for [mem size 0x01800000]
[   55.077569] pci 0000:02:03.0: BAR 14: failed to assign [mem size 0x01800000]
[   55.079880] pci 0000:02:07.0: BAR 14: no space for [mem size 0x00100000]
[   55.082650] pci 0000:02:07.0: BAR 14: failed to assign [mem size 0x00100000]
[   55.085505] pci 0000:02:03.0: BAR 13: no space for [io  size 0x1000]
[   55.087606] pci 0000:02:03.0: BAR 13: failed to assign [io  size 0x1000]
[   55.090372] pci 0000:02:07.0: BAR 13: no space for [io  size 0x1000]
[   55.093071] pci 0000:02:07.0: BAR 13: failed to assign [io  size 0x1000]
[   55.095351] pci 0000:02:01.0: PCI bridge to [bus 03]
[   55.098535] pci 0000:04:00.0: BAR 1: no space for [mem size
0x08000000 64bit pref]
[   55.101694] pci 0000:04:00.0: BAR 1: failed to assign [mem size
0x08000000 64bit pref]
[   55.104897] pci 0000:04:00.0: BAR 3: no space for [mem size
0x02000000 64bit pref]
[   55.107397] pci 0000:04:00.0: BAR 3: failed to assign [mem size
0x02000000 64bit pref]
[   55.110532] pci 0000:04:00.0: BAR 0: no space for [mem size 0x01000000]
[   55.113275] pci 0000:04:00.0: BAR 0: failed to assign [mem size 0x01000000]
[   55.115578] pci 0000:04:00.0: BAR 6: no space for [mem size 0x00080000 pref]
[   55.118419] pci 0000:04:00.0: BAR 6: failed to assign [mem size
0x00080000 pref]
[   55.121504] pci 0000:04:00.1: BAR 0: no space for [mem size 0x00004000]
[   55.123684] pci 0000:04:00.1: BAR 0: failed to assign [mem size 0x00004000]
[   55.126517] pci 0000:04:00.0: BAR 5: no space for [io  size 0x0080]
[   55.129224] pci 0000:04:00.0: BAR 5: failed to assign [io  size 0x0080]
[   55.131475] pci 0000:02:03.0: PCI bridge to [bus 04]
[   55.134370] pci 0000:02:05.0: PCI bridge to [bus 05]
[   55.137676] pci 0000:06:00.0: BAR 6: no space for [mem size 0x00010000 pref]
[   55.139982] pci 0000:06:00.0: BAR 6: failed to assign [mem size
0x00010000 pref]
[   55.143057] pci 0000:06:00.1: BAR 6: no space for [mem size 0x00010000 pref]
[   55.145881] pci 0000:06:00.1: BAR 6: failed to assign [mem size
0x00010000 pref]
[   55.148931] pci 0000:06:00.0: BAR 5: no space for [mem size 0x00000800]
[   55.151104] pci 0000:06:00.0: BAR 5: failed to assign [mem size 0x00000800]
[   55.153922] pci 0000:06:00.0: BAR 4: no space for [io  size 0x0010]
[   55.156004] pci 0000:06:00.0: BAR 4: failed to assign [io  size 0x0010]
[   55.158777] pci 0000:06:00.1: BAR 4: no space for [io  size 0x0010]
[   55.161404] pci 0000:06:00.1: BAR 4: failed to assign [io  size 0x0010]
[   55.163607] pci 0000:06:00.1: BAR 5: no space for [mem size 0x00000010]
[   55.166364] pci 0000:06:00.1: BAR 5: failed to assign [mem size 0x00000010]
[   55.169246] pci 0000:06:00.0: BAR 0: no space for [io  size 0x0008]
[   55.171325] pci 0000:06:00.0: BAR 0: failed to assign [io  size 0x0008]
[   55.174077] pci 0000:06:00.0: BAR 2: no space for [io  size 0x0008]
[   55.176759] pci 0000:06:00.0: BAR 2: failed to assign [io  size 0x0008]
[   55.178950] pci 0000:06:00.1: BAR 0: no space for [io  size 0x0008]
[   55.181860] pci 0000:06:00.1: BAR 0: failed to assign [io  size 0x0008]
[   55.184698] pci 0000:06:00.1: BAR 2: no space for [io  size 0x0008]
[   55.186780] pci 0000:06:00.1: BAR 2: failed to assign [io  size 0x0008]
[   55.189510] pci 0000:06:00.0: BAR 1: no space for [io  size 0x0004]
[   55.191587] pci 0000:06:00.0: BAR 1: failed to assign [io  size 0x0004]
[   55.194302] pci 0000:06:00.0: BAR 3: no space for [io  size 0x0004]
[   55.196965] pci 0000:06:00.0: BAR 3: failed to assign [io  size 0x0004]
[   55.199148] pci 0000:06:00.1: BAR 1: no space for [io  size 0x0004]
[   55.201769] pci 0000:06:00.1: BAR 1: failed to assign [io  size 0x0004]
[   55.203952] pci 0000:06:00.1: BAR 3: no space for [io  size 0x0004]
[   55.206664] pci 0000:06:00.1: BAR 3: failed to assign [io  size 0x0004]
[   55.209535] pci 0000:02:07.0: PCI bridge to [bus 06]
[   55.211789] pci 0000:01:00.0: PCI bridge to [bus 02-06]
[   55.214636] pci 0000:00:00.0: PCI bridge to [bus 01-06]
[   55.225253] pcieport 0000:00:00.0: assign IRQ: got 227
[   55.227779] pcieport 0000:00:00.0: enabling bus mastering
[   55.243313] pcieport 0000:00:00.0: PME: Signaling with IRQ 228
[   55.263779] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.263979] pcieport 0000:00:00.0: AER: enabled with IRQ 228
[   55.265375] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.268163] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.268891] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.269629] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.270355] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.271108] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.271846] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.272592] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.273320] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.274031] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.274750] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.275454] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.276187] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.276906] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.277664] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.278393] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.279110] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.279825] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.280551] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.281285] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.282016] pcieport 0000:01:00.0: assign IRQ: got 227
[   55.282597] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.282781] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.285874] rockchip-pcie f8000000.pcie: correctable error interrupt received
[   55.286630] pcieport 0000:01:00.0: enabling bus mastering
[   55.311138] pcieport 0000:02:01.0: assign IRQ: got 230
[   55.314740] pcieport 0000:02:01.0: enabling bus mastering
[   55.344990] pcieport 0000:02:03.0: assign IRQ: got 232
[   55.347735] pcieport 0000:02:03.0: enabling bus mastering
[   55.374090] pcieport 0000:02:05.0: assign IRQ: got 230
[   55.377646] pcieport 0000:02:05.0: enabling bus mastering
[   55.405254] pcieport 0000:02:07.0: assign IRQ: got 232
[   55.407938] pcieport 0000:02:07.0: enabling bus mastering
[   55.434151] pci 0000:04:00.1: D0 power state depends on 0000:04:00.0
<snip> when it tries to load nouveau
[   80.221309] nouveau 0000:04:00.0: assign IRQ: got 232
[   80.221367] do_rockchip_pcie_serror:hey!
[   80.221369] do_rockchip_pcie_serror:hey!
[   80.221375] nouveau 0000:04:00.0: unknown chipset (00000000)
[   80.221886] nouveau: probe of 0000:04:00.0 failed with error -12

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-21  2:03                   ` Shawn Lin
  2019-11-22  1:02                     ` Peter Geis
@ 2019-11-22 14:36                     ` Bjorn Helgaas
  2019-11-22 15:08                       ` Peter Geis
  1 sibling, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2019-11-22 14:36 UTC (permalink / raw)
  To: Peter Geis
  Cc: Robin Murphy, linux-pci, Heiko Stuebner, linux-rockchip, Shawn Lin

On Thu, Nov 21, 2019 at 10:03:27AM +0800, Shawn Lin wrote:
> From da9db487615c3687a0823c54d8d0790cbd4f79a2 Mon Sep 17 00:00:00 2001
> From: Shawn Lin <shawn.lin@rock-chips.com>
> Date: Thu, 21 Nov 2019 09:45:12 +0800
> Subject: [PATCH] WFT: PCI: rockchip: play game with unsupported request from
>  EP
> 
> Native defect prevents this RC far from supporting any response
> from EP which UR filed is set. Take a big hammer to take over
> Serror handler and work around it.

Peter, now that you have a way to at least boot, can you collect the
output of "sudo lspci -vvxxx"?

When we're enumerating devices, we don't know ahead of time which
devices exist, so we try to read the Vendor ID for each *possible*
device.  If the device doesn't exist, that config read should be
terminated with an Unsupported Request completion (see the
implementation note at the end of PCIe r5.0, sec 2.3.2).  So far this
is all standard PCIe hardware behavior (and sorry if this is all
obvious).

Sec 6.2.5 and 6.2.6 show several configuration bits that affect how
that UR is handled and ultimately reported, possibly via SERR#.  In
most Linux systems, a UR completion does not cause an SERR#, but the
PCI core does not manage those configuration bits directly (it
probably *should* be more proactive about this), so this is more a
result of how platform firmware set them than anything else.

I'm speculating that you may have PCI_COMMAND_SERR and similar bits
set, which will cause SERR# in cases where we don't see SERR# on other
platforms.  Your lspci output should show this.

Bjorn

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUG] rk3399-rockpro64 pcie synchronous external abort
  2019-11-22 14:36                     ` Bjorn Helgaas
@ 2019-11-22 15:08                       ` Peter Geis
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Geis @ 2019-11-22 15:08 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Robin Murphy, linux-pci, Heiko Stuebner,
	open list:ARM/Rockchip SoC...,
	Shawn Lin

On Fri, Nov 22, 2019 at 9:36 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Nov 21, 2019 at 10:03:27AM +0800, Shawn Lin wrote:
> > From da9db487615c3687a0823c54d8d0790cbd4f79a2 Mon Sep 17 00:00:00 2001
> > From: Shawn Lin <shawn.lin@rock-chips.com>
> > Date: Thu, 21 Nov 2019 09:45:12 +0800
> > Subject: [PATCH] WFT: PCI: rockchip: play game with unsupported request from
> >  EP
> >
> > Native defect prevents this RC far from supporting any response
> > from EP which UR filed is set. Take a big hammer to take over
> > Serror handler and work around it.
>
> Peter, now that you have a way to at least boot, can you collect the
> output of "sudo lspci -vvxxx"?
>
> When we're enumerating devices, we don't know ahead of time which
> devices exist, so we try to read the Vendor ID for each *possible*
> device.  If the device doesn't exist, that config read should be
> terminated with an Unsupported Request completion (see the
> implementation note at the end of PCIe r5.0, sec 2.3.2).  So far this
> is all standard PCIe hardware behavior (and sorry if this is all
> obvious).

Everything I know about pcie I've learned from this event, so nothing
is obvious to me ;)

>
> Sec 6.2.5 and 6.2.6 show several configuration bits that affect how
> that UR is handled and ultimately reported, possibly via SERR#.  In
> most Linux systems, a UR completion does not cause an SERR#, but the
> PCI core does not manage those configuration bits directly (it
> probably *should* be more proactive about this), so this is more a
> result of how platform firmware set them than anything else.
>
> I'm speculating that you may have PCI_COMMAND_SERR and similar bits
> set, which will cause SERR# in cases where we don't see SERR# on other
> platforms.  Your lspci output should show this.
>
> Bjorn

As requested:
# lspci -vvxxx
00:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd RK3399 PCI
Express Root Port (prog-if 00 [Normal decode])
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 228
        Bus: primary=00, secondary=01, subordinate=06, sec-latency=0
        I/O behind bridge: 00000000-00000fff [size=4K]
        Memory behind bridge: None
        Prefetchable memory behind bridge: 00000000-000fffff [size=1M]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [80] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA
PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable+ 64bit+
                Address: 00000000fee30040  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
                Vector table: BAR=0 offset=00000000
                PBA: BAR=0 offset=00000008
        Capabilities: [c0] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L1,
Exit Latency L1 <8us
                        ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt+
                LnkSta: Speed 2.5GT/s (ok), Width x1 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug- Surprise-
                        Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Off, PwrInd Off, Power+ Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL+ CmdCplt-
PresDet- Interlock-
                        Changed: MRL- PresDet- LinkState-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna+ CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range B, TimeoutDis+,
LTR+, OBFF Via message ARIFwd+
                         AtomicOpsCap: Routing+ 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
                         AtomicOpsCtl: ReqEn- EgressBlck-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+
ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
                RootCmd: CERptEn+ NFERptEn+ FERptEn+
                RootSta: CERcvd+ MultCERcvd+ UERcvd- MultUERcvd-
                         FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
                ErrorSrc: ERR_COR: 0238 ERR_FATAL/NONFATAL: 0000
        Capabilities: [274 v1] Transaction Processing Hints
                Interrupt vector mode supported
                Device specific mode supported
                Steering table in TPH capability structure
        Kernel driver in use: pcieport
00: 87 1d 00 01 04 04 10 00 00 00 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 01 06 00 00 00 00 00
20: f0 ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 80 00 00 00 00 00 00 00 e3 01 02 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 01 90 03 5a 08 00 00 00 00 00 00 00 00 00 00 00
90: 05 b0 81 01 40 00 e3 fe 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 11 c0 00 00 00 00 00 00 08 00 00 00 00 00 00 00
c0: 10 00 42 01 01 81 00 00 3f 28 00 00 41 a8 61 00
d0: 48 0c 11 10 00 00 00 00 c0 07 20 00 08 00 00 00
e0: 00 00 00 00 72 18 04 00 00 04 00 00 00 00 00 00
f0: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

01:00.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
(prog-if 00 [Normal decode])
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 229
        Bus: primary=01, secondary=02, subordinate=06, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee30040  Data: 0000
        Capabilities: [78] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [80] Express (v2) Upstream Port, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+
SlotPowerLimit 0.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1,
Exit Latency L0s <2us, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported,
TimeoutDis-, LTR-, OBFF Not Supported
                         AtomicOpsCap: Routing-
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: EgressBlck-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [c0] Subsystem: ASMedia Technology Inc. ASM1184e
PCIe Switch Port
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [200 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr+
                AERCap: First Error Pointer: 1f, ECRCGenCap+
ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [300 v1] Vendor Specific Information: ID=0000
Rev=0 Len=c00 <?>
        Kernel driver in use: pcieport
00: 21 1b 84 11 04 04 10 00 00 00 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 01 02 06 00 f1 01 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 e3 01 02 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 05 78 81 00 40 00 e3 fe 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 01 80 03 c8 08 00 00 00
80: 10 c0 52 00 21 80 00 00 30 29 10 00 12 dc 43 00
90: 40 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 0d 00 00 00 21 1b 8f 11 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

02:01.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
(prog-if 00 [Normal decode])
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 231
        Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee30040  Data: 0000
        Capabilities: [78] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr+ TransPend-
                LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1,
Exit Latency L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug- Surprise-
                        Slot #1, PowerLimit 26.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet- Interlock-
                        Changed: MRL- PresDet- LinkState-
                DevCap2: Completion Timeout: Not Supported,
TimeoutDis-, LTR-, OBFF Not Supported ARIFwd-
                         AtomicOpsCap: Routing-
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                         AtomicOpsCtl: EgressBlck-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [c0] Subsystem: ASMedia Technology Inc. ASM1184e
PCIe Switch Port
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending+ InProgress-
        Capabilities: [200 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr+
                AERCap: First Error Pointer: 1f, ECRCGenCap+
ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Kernel driver in use: pcieport
00: 21 1b 84 11 04 04 10 00 00 00 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 02 03 03 00 f1 01 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 e6 01 02 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 05 78 81 00 40 00 e3 fe 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 01 80 03 c8 08 00 00 00
80: 10 c0 62 01 21 80 00 00 30 29 10 00 12 fc 73 01
90: 00 00 11 10 00 0d 08 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 0d 00 00 00 21 1b 8f 11 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

02:03.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
(prog-if 00 [Normal decode])
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 233
        Bus: primary=02, secondary=04, subordinate=04, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee30040  Data: 0000
        Capabilities: [78] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq-
AuxPwr+ TransPend-
                LnkCap: Port #3, Speed 5GT/s, Width x1, ASPM L0s L1,
Exit Latency L0s <2us, L1 unlimited
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug- Surprise-
                        Slot #3, PowerLimit 26.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet- Interlock-
                        Changed: MRL- PresDet- LinkState+
                DevCap2: Completion Timeout: Not Supported,
TimeoutDis-, LTR-, OBFF Not Supported ARIFwd-
                         AtomicOpsCap: Routing-
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                         AtomicOpsCtl: EgressBlck-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [c0] Subsystem: ASMedia Technology Inc. ASM1184e
PCIe Switch Port
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [200 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr+
                AERCap: First Error Pointer: 1f, ECRCGenCap+
ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Kernel driver in use: pcieport
00: 21 1b 84 11 04 04 10 00 00 00 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 02 04 04 00 f1 01 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 e8 01 02 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 05 78 81 00 40 00 e3 fe 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 01 80 03 c8 08 00 00 00
80: 10 c0 62 01 21 80 00 00 30 29 11 00 12 dc 73 03
90: 40 00 11 70 00 0d 18 00 00 00 00 01 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 0d 00 00 00 21 1b 8f 11 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

02:05.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
(prog-if 00 [Normal decode])
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 234
        Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee30040  Data: 0000
        Capabilities: [78] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr+ TransPend-
                LnkCap: Port #5, Speed 5GT/s, Width x1, ASPM L0s L1,
Exit Latency L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug- Surprise-
                        Slot #5, PowerLimit 26.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet- Interlock-
                        Changed: MRL- PresDet- LinkState-
                DevCap2: Completion Timeout: Not Supported,
TimeoutDis-, LTR-, OBFF Not Supported ARIFwd-
                         AtomicOpsCap: Routing-
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                         AtomicOpsCtl: EgressBlck-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [c0] Subsystem: ASMedia Technology Inc. ASM1184e
PCIe Switch Port
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending+ InProgress-
        Capabilities: [200 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr+
                AERCap: First Error Pointer: 1f, ECRCGenCap+
ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Kernel driver in use: pcieport
00: 21 1b 84 11 04 04 10 00 00 00 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 02 05 05 00 f1 01 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 e6 01 02 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 05 78 81 00 40 00 e3 fe 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 01 80 03 c8 08 00 00 00
80: 10 c0 62 01 21 80 00 00 30 29 10 00 12 fc 73 05
90: 00 00 11 10 00 0d 28 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 0d 00 00 00 21 1b 8f 11 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

02:07.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port
(prog-if 00 [Normal decode])
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 235
        Bus: primary=02, secondary=06, subordinate=06, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee30040  Data: 0000
        Capabilities: [78] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq-
AuxPwr+ TransPend-
                LnkCap: Port #7, Speed 5GT/s, Width x1, ASPM L0s L1,
Exit Latency L0s <2us, L1 unlimited
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                        TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug- Surprise-
                        Slot #7, PowerLimit 26.000W; Interlock-
NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet- Interlock-
                        Changed: MRL- PresDet- LinkState+
                DevCap2: Completion Timeout: Not Supported,
TimeoutDis-, LTR-, OBFF Not Supported ARIFwd-
                         AtomicOpsCap: Routing-
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                         AtomicOpsCtl: EgressBlck-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [c0] Subsystem: ASMedia Technology Inc. ASM1184e
PCIe Switch Port
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [200 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout-
AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr+
                AERCap: First Error Pointer: 1f, ECRCGenCap+
ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres-
HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Kernel driver in use: pcieport
00: 21 1b 84 11 04 04 10 00 00 00 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 02 06 06 00 f1 01 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 e8 01 02 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 05 78 81 00 40 00 e3 fe 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 01 80 03 c8 08 00 00 00
80: 10 c0 62 01 21 80 00 00 30 29 11 00 12 dc 73 07
90: 40 00 12 70 00 0d 38 00 00 00 00 01 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 0d 00 00 00 21 1b 8f 11 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

04:00.0 VGA compatible controller: NVIDIA Corporation GK106 [GeForce
GTX 645 OEM] (rev a1) (prog-if 00 [VGA controlle
r])
        Subsystem: Pegatron GK106 [GeForce GTX 645 OEM]
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 232
        Region 1: Memory at <unassigned> (64-bit, prefetchable) [disabled]
        Region 3: Memory at <unassigned> (64-bit, prefetchable) [disabled]
        Region 5: I/O ports at <unassigned> [disabled]
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [78] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+
FLReset- SlotPowerLimit 26.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr- TransPend-
                LnkCap: Port #3, Speed 8GT/s, Width x16, ASPM L0s L1,
Exit Latency L0s <512ns, L1 <4us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+,
LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [128 v1] Power Budgeting <?>
        Capabilities: [600 v1] Vendor Specific Information: ID=0001
Rev=1 Len=024 <?>
        Capabilities: [900 v1] Secondary PCI Express <?>
        Kernel modules: nouveau
00: de 10 c4 11 00 00 10 00 a1 00 00 03 00 00 80 00
10: 00 00 00 00 0c 00 00 00 00 00 00 00 0c 00 00 00
20: 00 00 00 00 01 00 00 00 00 00 00 00 0a 1b cc 90
30: 00 00 00 00 60 00 00 00 00 00 00 00 e8 01 00 00
40: 0a 1b cc 90 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 01 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 b4 02 00 e1 8d 68 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 00 00
a0: 00 00 00 00 0e 00 00 00 03 00 00 00 00 00 00 00
b0: 00 00 00 00 09 00 14 01 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

04:00.1 Audio device: NVIDIA Corporation GK106 HDMI Audio Controller (rev a1)
        Subsystem: Pegatron GK106 HDMI Audio Controller
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin B routed to IRQ 0
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [78] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+
FLReset- SlotPowerLimit 26.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr- TransPend-
                LnkCap: Port #3, Speed 8GT/s, Width x16, ASPM L0s L1,
Exit Latency L0s <512ns, L1 <4us
                        ClockPM+ Surprise- LLActRep- BwNot-
ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+,
LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
00: de 10 0b 0e 00 00 10 00 a1 00 03 04 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 0a 1b cc 90
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 02 00 00
40: 0a 1b cc 90 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 68 00
80: 30 29 00 00 03 3d 45 03 40 00 11 10 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 00 00
a0: 00 00 00 00 0e 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

06:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe
SATA 6.0 Gb/s controller (rev 11) (prog-if 01 [A
HCI 1.0])
        Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
6.0 Gb/s controller
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 232
        Region 0: I/O ports at 0000
        Region 1: I/O ports at 0000
        Region 2: I/O ports at 0000
        Region 3: I/O ports at 0000
        Region 4: I/O ports at 0000
        Expansion ROM at <ignored> [disabled]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s
<1us, L1 <8us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+
FLReset-
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+
AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1,
Exit Latency L0s <512ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (ok), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported,
TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP+ Rollover- Timeout-
AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap-
ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres-
HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
00: 4b 1b 23 91 03 00 10 00 11 01 06 01 00 00 80 00
10: 01 80 00 00 41 80 00 00 01 81 00 00 41 81 00 00
20: 01 00 80 00 00 00 90 00 00 00 00 00 4b 1b 23 91
30: 00 00 00 d0 40 00 00 00 00 00 00 00 e8 01 00 00
40: 01 50 03 40 00 00 00 00 00 00 00 00 00 00 00 00
50: 05 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 10 00 12 00 02 87 68 00 30 20 09 00 12 3c 03 00
80: 40 00 12 10 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
a0: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

06:00.1 IDE interface: Marvell Technology Group Ltd. 88SE912x IDE
Controller (rev 11) (prog-if 8f [PCI native mode co
ntroller, supports both channels switched to ISA compatibility mode,
supports bus mastering])
        Subsystem: Marvell Technology Group Ltd. 88SE912x IDE Controller
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin B routed to IRQ 0
        Region 0: I/O ports at 0000
        Region 1: I/O ports at 0000
        Region 2: I/O ports at 0000
        Region 3: I/O ports at 0000
        Region 4: I/O ports at 0000
        Region 5: Memory at <ignored> (32-bit, non-prefetchable)
        Expansion ROM at <unassigned>
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s
<1us, L1 <8us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+
AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1,
Exit Latency L0s <512ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (ok), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported,
TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP+ Rollover- Timeout-
AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap-
ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
00: 4b 1b a4 91 03 00 10 00 11 8f 01 01 00 00 80 00
10: 01 82 00 00 41 82 00 00 01 83 00 00 41 83 00 00
20: 01 00 82 00 00 00 92 00 00 00 00 00 4b 1b a4 91
30: 01 00 00 00 40 00 00 00 00 00 00 00 00 02 00 00
40: 01 50 03 40 00 00 00 00 00 00 00 00 00 00 00 00
50: 05 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 10 00 12 00 02 87 68 00 30 20 09 00 12 3c 03 00
80: 40 00 12 10 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-11-22 15:08 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-04 18:55 [BUG] rk3399-rockpro64 pcie synchronous external abort Peter Geis
2019-11-09  1:08 ` Peter Geis
2019-11-09 16:37   ` Doug Anderson
2019-11-10 15:43     ` Peter Geis
2019-11-12  0:03       ` Bjorn Helgaas
2019-11-12  0:21         ` Peter Geis
2019-11-12  0:13 ` Bjorn Helgaas
2019-11-12  0:30   ` Peter Geis
2019-11-12  2:29     ` Bjorn Helgaas
2019-11-12 15:55       ` Peter Geis
2019-11-12 19:15         ` Robin Murphy
2019-11-12 19:41           ` Peter Geis
2019-11-13  1:06             ` Peter Geis
2019-11-13  1:19               ` Peter Geis
2019-11-21  0:36                 ` Peter Geis
2019-11-21  2:03                   ` Shawn Lin
2019-11-22  1:02                     ` Peter Geis
2019-11-22 14:11                       ` Peter Geis
2019-11-22 14:36                     ` Bjorn Helgaas
2019-11-22 15:08                       ` Peter Geis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).