linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* imx8mp pci hang during init
@ 2023-08-16 22:25 Tim Harvey
  2023-08-17 17:12 ` Bjorn Helgaas
  0 siblings, 1 reply; 7+ messages in thread
From: Tim Harvey @ 2023-08-16 22:25 UTC (permalink / raw)
  To: linux-pci, Richard Zhu, Lucas Stach, Linux ARM Mailing List,
	Jingoo Han, Gustavo Pimentel, Manivannan Sadhasivam

Greetings,

I'm experiencing a hang during pci init appx 60% of boots with an
imx8mp board connected to a Diodes Incorporated PI7C9X2G608G Gen2
switch. When it does not hang PCIe links at the expected gen2 (limit
of the switch) and appears to behave correctly. The PCI clock to the
imx8mp and the switch in this case is provided by a AB55703HCHCF 2
output 100MHz LPHSCL clock generator (one output to the SoC, the other
to the PI7C9X2G608G). I've found that if I set 'fsl,max-link-speed =
<1>' to limit the link to gen1 I get the expected gen1 link and never
hang but setting it to 2 or leaving it to the default of 3 produces
the issue.

A previous version of the same board which does not include an
on-board switch and instead runs the clk and pcie lane to a miniPCIe
socket properly links and operates with a variety of gen1/gen2/gen3
devices. Additionally with an imx8mm soc and the same switch I have
never experienced this hang.

I've reproduced this behavior on every kernel I've tried including
6.1, 6.5-rc6 and NXP's downstream lf-6.1.y.

A successful link at gen2 using Linux 6.5-rc6 looks like this:
...
[ 0.324855] Asymmetric key parser 'x509' registered
[ 0.335035] clk: failed to reparent gic to sys_pll2_500m: -16
[ 0.342998] SoC: i.MX8MP revision 1.1
[ 0.347029] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 0.352678] 30890000.serial: ttymxc1 at MMIO 0x30890000 (irq = 15,
base_baud = 1500000) is a IMX
[ 0.359364] printk: console [ttymxc1] enabled
[ 0.359364] printk: console [ttymxc1] enabled
[ 0.368034] printk: bootconsole [ec_imx6q0] disabled
[ 0.368034] printk: bootconsole [ec_imx6q0] disabled
[ 0.380670] i2c_dev: i2c /dev entries driver
[ 0.386079] ledtrig-cpu: registered to indicate activity on CPUs
[ 0.392117] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
[ 0.399731] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7
counters available
[ 0.412372] Loading compiled-in X.509 certificates
[ 0.431454] gpio gpiochip0: Static allocation of GPIO base is
deprecated, use dynamic allocation.
[ 0.441555] gpio gpiochip1: Static allocation of GPIO base is
deprecated, use dynamic allocation.
[ 0.451671] gpio gpiochip2: Static allocation of GPIO base is
deprecated, use dynamic allocation.
[ 0.461744] gpio gpiochip3: Static allocation of GPIO base is
deprecated, use dynamic allocation.
[ 0.471859] gpio gpiochip4: Static allocation of GPIO base is
deprecated, use dynamic allocation.
[ 0.495753] imx6q-pcie 33800000.pcie: host bridge /soc@0/pcie@33800000 ranges:
[ 0.496382] clk: Not disabling unused clocks
[ 0.503042] imx6q-pcie 33800000.pcie: IO 0x001ff80000..0x001ff8ffff ->
0x0000000000
[ 0.515505] imx6q-pcie 33800000.pcie: MEM 0x0018000000..0x001fefffff
-> 0x0018000000
[ 0.739498] imx6q-pcie 33800000.pcie: iATU: unroll T, 4 ob, 4 ib,
align 64K, limit 16G
[ 0.847519] imx6q-pcie 33800000.pcie: PCIe Gen.1 x1 link up
[ 0.953163] imx6q-pcie 33800000.pcie: PCIe Gen.2 x1 link up
[ 0.958773] imx6q-pcie 33800000.pcie: Link up, Gen2
[ 0.963663] imx6q-pcie 33800000.pcie: PCIe Gen.2 x1 link up
[ 0.969553] imx6q-pcie 33800000.pcie: PCI host bridge to bus 0000:00
[ 0.975928] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.981426] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 0.987618] pci_bus 0000:00: root bus resource [mem 0x18000000-0x1fefffff]
[ 0.994530] pci 0000:00:00.0: [16c3:abcd] type 01 class 0x060400
[ 1.000563] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff]
[ 1.006847] pci 0000:00:00.0: reg 0x38: [mem 0x00000000-0x0000ffff pref]
[ 1.013600] pci 0000:00:00.0: supports D1
[ 1.017620] pci 0000:00:00.0: PME# supported from D0 D1 D3hot D3cold
[ 1.025843] pci 0000:01:00.0: [12d8:2608] type 01 class 0x060400
[ 1.032244] pci 0000:01:00.0: supports D1 D2
[ 1.036529] pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 1.043510] pci 0000:01:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[ 1.051961] pci 0000:02:01.0: [12d8:2608] type 01 class 0x060400
[ 1.058401] pci 0000:02:01.0: supports D1 D2
[ 1.062695] pci 0000:02:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 1.069746] pci 0000:02:02.0: [12d8:2608] type 01 class 0x060400
[ 1.076151] pci 0000:02:02.0: supports D1 D2
[ 1.080435] pci 0000:02:02.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 1.087447] pci 0000:02:03.0: [12d8:2608] type 01 class 0x060400
[ 1.093834] pci 0000:02:03.0: supports D1 D2
[ 1.098111] pci 0000:02:03.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 1.105110] pci 0000:02:04.0: [12d8:2608] type 01 class 0x060400
[ 1.111504] pci 0000:02:04.0: supports D1 D2
[ 1.115781] pci 0000:02:04.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 1.124178] pci 0000:02:01.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[ 1.132218] pci 0000:02:02.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[ 1.140251] pci 0000:02:03.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[ 1.148282] pci 0000:02:04.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[ 1.156651] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 41
[ 1.163569] pci_bus 0000:42: busn_res: [bus 42-ff] end is updated to 80
[ 1.170472] pci_bus 0000:81: busn_res: [bus 81-ff] end is updated to bf
[ 1.177356] pci_bus 0000:c0: busn_res: [bus c0-ff] end is updated to fe
[ 1.183999] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to fe
[ 1.190666] pci 0000:00:00.0: BAR 0: assigned [mem 0x18000000-0x180fffff]
[ 1.197470] pci 0000:00:00.0: BAR 8: assigned [mem 0x18100000-0x188fffff]
[ 1.204277] pci 0000:00:00.0: BAR 9: assigned [mem 0x18900000-0x190fffff pref]
[ 1.211513] pci 0000:00:00.0: BAR 6: assigned [mem 0x19100000-0x1910ffff pref]
[ 1.218748] pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x4fff]
[ 1.224857] pci 0000:01:00.0: BAR 8: assigned [mem 0x18100000-0x188fffff]
[ 1.231657] pci 0000:01:00.0: BAR 9: assigned [mem
0x18900000-0x190fffff 64bit pref]
[ 1.239410] pci 0000:01:00.0: BAR 7: assigned [io 0x1000-0x4fff]
[ 1.245524] pci 0000:02:01.0: BAR 8: assigned [mem 0x18100000-0x182fffff]
[ 1.252322] pci 0000:02:01.0: BAR 9: assigned [mem
0x18900000-0x18afffff 64bit pref]
[ 1.260077] pci 0000:02:02.0: BAR 8: assigned [mem 0x18300000-0x184fffff]
[ 1.266874] pci 0000:02:02.0: BAR 9: assigned [mem
0x18b00000-0x18cfffff 64bit pref]
[ 1.274632] pci 0000:02:03.0: BAR 8: assigned [mem 0x18500000-0x186fffff]
[ 1.281444] pci 0000:02:03.0: BAR 9: assigned [mem
0x18d00000-0x18efffff 64bit pref]
[ 1.289205] pci 0000:02:04.0: BAR 8: assigned [mem 0x18700000-0x188fffff]
[ 1.296003] pci 0000:02:04.0: BAR 9: assigned [mem
0x18f00000-0x190fffff 64bit pref]
[ 1.303757] pci 0000:02:01.0: BAR 7: assigned [io 0x1000-0x1fff]
[ 1.309859] pci 0000:02:02.0: BAR 7: assigned [io 0x2000-0x2fff]
[ 1.315962] pci 0000:02:03.0: BAR 7: assigned [io 0x3000-0x3fff]
[ 1.322070] pci 0000:02:04.0: BAR 7: assigned [io 0x4000-0x4fff]
[ 1.328183] pci 0000:02:01.0: PCI bridge to [bus 03-41]
[ 1.333426] pci 0000:02:01.0: bridge window [io 0x1000-0x1fff]
[ 1.339543] pci 0000:02:01.0: bridge window [mem 0x18100000-0x182fffff]
[ 1.346347] pci 0000:02:01.0: bridge window [mem 0x18900000-0x18afffff
64bit pref]
[ 1.354114] pci 0000:02:02.0: PCI bridge to [bus 42-80]
[ 1.359352] pci 0000:02:02.0: bridge window [io 0x2000-0x2fff]
[ 1.365464] pci 0000:02:02.0: bridge window [mem 0x18300000-0x184fffff]
[ 1.372267] pci 0000:02:02.0: bridge window [mem 0x18b00000-0x18cfffff
64bit pref]
[ 1.380039] pci 0000:02:03.0: PCI bridge to [bus 81-bf]
[ 1.385282] pci 0000:02:03.0: bridge window [io 0x3000-0x3fff]
[ 1.391407] pci 0000:02:03.0: bridge window [mem 0x18500000-0x186fffff]
[ 1.398215] pci 0000:02:03.0: bridge window [mem 0x18d00000-0x18efffff
64bit pref]
[ 1.405988] pci 0000:02:04.0: PCI bridge to [bus c0-fe]
[ 1.411227] pci 0000:02:04.0: bridge window [io 0x4000-0x4fff]
[ 1.417341] pci 0000:02:04.0: bridge window [mem 0x18700000-0x188fffff]
[ 1.424144] pci 0000:02:04.0: bridge window [mem 0x18f00000-0x190fffff
64bit pref]
...

A hang looks like this:
...
[ 0.319551] Asymmetric key parser 'x509' registered
[ 0.329739] clk: failed to reparent gic to sys_pll2_500m: -16
[ 0.337699] SoC: i.MX8MP revision 1.1
[ 0.341761] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 0.347374] 30890000.serial: ttymxc1 at MMIO 0x30890000 (irq = 15,
base_baud = 1500000) is a IMX
[ 0.354071] printk: console [ttymxc1] enabled
[ 0.354071] printk: console [ttymxc1] enabled
[ 0.362759] printk: bootconsole [ec_imx6q0] disabled
[ 0.362759] printk: bootconsole [ec_imx6q0] disabled
[ 0.384699] i2c_dev: i2c /dev entries driver
[ 0.390010] ledtrig-cpu: registered to indicate activity on CPUs
[ 0.396043] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
[ 0.403648] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7
counters available
[ 0.416370] Loading compiled-in X.509 certificates
[ 0.435397] gpio gpiochip0: Static allocation of GPIO base is
deprecated, use dynamic allocation.
[ 0.445488] gpio gpiochip1: Static allocation of GPIO base is
deprecated, use dynamic allocation.
[ 0.455627] gpio gpiochip2: Static allocation of GPIO base is
deprecated, use dynamic allocation.
[ 0.465687] gpio gpiochip3: Static allocation of GPIO base is
deprecated, use dynamic allocation.
[ 0.475771] gpio gpiochip4: Static allocation of GPIO base is
deprecated, use dynamic allocation.
[ 0.499660] imx6q-pcie 33800000.pcie: host bridge /soc@0/pcie@33800000 ranges:
[ 0.500276] clk: Not disabling unused clocks
[ 0.506960] imx6q-pcie 33800000.pcie: IO 0x001ff80000..0x001ff8ffff ->
0x0000000000
[ 0.519401] imx6q-pcie 33800000.pcie: MEM 0x0018000000..0x001fefffff
-> 0x0018000000
[ 0.743554] imx6q-pcie 33800000.pcie: iATU: unroll T, 4 ob, 4 ib,
align 64K, limit 16G
[ 0.851578] imx6q-pcie 33800000.pcie: PCIe Gen.1 x1 link up
^^^ hang at this point until watchdog resets

Note the 'clk: failed to reparent gic to sys_pll2_500m: -16' message
which I do not get on the same board with an imx8mm SoC. I'm not sure
if that is pointing to an issue at this point or not but I see it on
the imx8mp even when PCI init behaves normally. Also note above I have
kept the kernel from disabling unused clocks as a precaution.

I have verified with a scope that PERST# is low until the imx6 PCIe
driver drives it high and that the external PCI clock going to the
imx8mp as well as the switch looks proper.

The relevant imx8mp dt looks like this:

pcie0_refclk: pcie0-refclk {
  compatible = "fixed-clock";
  #clock-cells = <0>;
  clock-frequency = <100000000>;
};

&pcie_phy {
  fsl,refclk-pad-mode = <IMX8_PCIE_REFCLK_PAD_INPUT>;
  fsl,clkreq-unsupported;
  clocks = <&pcie0_refclk>;
  clock-names = "ref";
  status = "okay";
};

&pcie {
  pinctrl-names = "default";
  pinctrl-0 = <&pinctrl_pcie0>;
  reset-gpio = <&gpio2 17 GPIO_ACTIVE_LOW>;
  status = "okay";
};

If I start adding printk's in imx8_pcie_phy_power_on I don't even make
it to checking for link and it appears to lockup at the first dbi
transaction making me think this is a clock or power domain issue.

I've also posted this question to IMX Community [1]

Does anyone have any ideas what I should be looking for here?

Best regards,

Tim
[1] https://community.nxp.com/t5/i-MX-Processors/IMX8MP-hang-during-PCI-init/m-p/1706034#M210939

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: imx8mp pci hang during init
  2023-08-16 22:25 imx8mp pci hang during init Tim Harvey
@ 2023-08-17 17:12 ` Bjorn Helgaas
  2023-08-18  0:05   ` Maciej W. Rozycki
  0 siblings, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2023-08-17 17:12 UTC (permalink / raw)
  To: Tim Harvey
  Cc: linux-pci, Richard Zhu, Lucas Stach, Linux ARM Mailing List,
	Jingoo Han, Gustavo Pimentel, Manivannan Sadhasivam,
	Maciej W. Rozycki

[+cc Maciej, smells similar to a89c82249c37 ("PCI: Work around PCIe
link training failures") ]

On Wed, Aug 16, 2023 at 03:25:36PM -0700, Tim Harvey wrote:
> Greetings,
> 
> I'm experiencing a hang during pci init appx 60% of boots with an
> imx8mp board connected to a Diodes Incorporated PI7C9X2G608G Gen2
> switch. When it does not hang PCIe links at the expected gen2 (limit
> of the switch) and appears to behave correctly. The PCI clock to the
> imx8mp and the switch in this case is provided by a AB55703HCHCF 2
> output 100MHz LPHSCL clock generator (one output to the SoC, the other
> to the PI7C9X2G608G). I've found that if I set 'fsl,max-link-speed =
> <1>' to limit the link to gen1 I get the expected gen1 link and never
> hang but setting it to 2 or leaving it to the default of 3 produces
> the issue.
> 
> A previous version of the same board which does not include an
> on-board switch and instead runs the clk and pcie lane to a miniPCIe
> socket properly links and operates with a variety of gen1/gen2/gen3
> devices. Additionally with an imx8mm soc and the same switch I have
> never experienced this hang.
> 
> I've reproduced this behavior on every kernel I've tried including
> 6.1, 6.5-rc6 and NXP's downstream lf-6.1.y.
> 
> A successful link at gen2 using Linux 6.5-rc6 looks like this:
> ...
> [ 0.324855] Asymmetric key parser 'x509' registered
> [ 0.335035] clk: failed to reparent gic to sys_pll2_500m: -16
> [ 0.342998] SoC: i.MX8MP revision 1.1
> [ 0.347029] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [ 0.352678] 30890000.serial: ttymxc1 at MMIO 0x30890000 (irq = 15,
> base_baud = 1500000) is a IMX
> [ 0.359364] printk: console [ttymxc1] enabled
> [ 0.359364] printk: console [ttymxc1] enabled
> [ 0.368034] printk: bootconsole [ec_imx6q0] disabled
> [ 0.368034] printk: bootconsole [ec_imx6q0] disabled
> [ 0.380670] i2c_dev: i2c /dev entries driver
> [ 0.386079] ledtrig-cpu: registered to indicate activity on CPUs
> [ 0.392117] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> [ 0.399731] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7
> counters available
> [ 0.412372] Loading compiled-in X.509 certificates
> [ 0.431454] gpio gpiochip0: Static allocation of GPIO base is
> deprecated, use dynamic allocation.
> [ 0.441555] gpio gpiochip1: Static allocation of GPIO base is
> deprecated, use dynamic allocation.
> [ 0.451671] gpio gpiochip2: Static allocation of GPIO base is
> deprecated, use dynamic allocation.
> [ 0.461744] gpio gpiochip3: Static allocation of GPIO base is
> deprecated, use dynamic allocation.
> [ 0.471859] gpio gpiochip4: Static allocation of GPIO base is
> deprecated, use dynamic allocation.
> [ 0.495753] imx6q-pcie 33800000.pcie: host bridge /soc@0/pcie@33800000 ranges:
> [ 0.496382] clk: Not disabling unused clocks
> [ 0.503042] imx6q-pcie 33800000.pcie: IO 0x001ff80000..0x001ff8ffff ->
> 0x0000000000
> [ 0.515505] imx6q-pcie 33800000.pcie: MEM 0x0018000000..0x001fefffff
> -> 0x0018000000
> [ 0.739498] imx6q-pcie 33800000.pcie: iATU: unroll T, 4 ob, 4 ib,
> align 64K, limit 16G
> [ 0.847519] imx6q-pcie 33800000.pcie: PCIe Gen.1 x1 link up
> [ 0.953163] imx6q-pcie 33800000.pcie: PCIe Gen.2 x1 link up
> [ 0.958773] imx6q-pcie 33800000.pcie: Link up, Gen2
> [ 0.963663] imx6q-pcie 33800000.pcie: PCIe Gen.2 x1 link up
> [ 0.969553] imx6q-pcie 33800000.pcie: PCI host bridge to bus 0000:00
> [ 0.975928] pci_bus 0000:00: root bus resource [bus 00-ff]
> [ 0.981426] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
> [ 0.987618] pci_bus 0000:00: root bus resource [mem 0x18000000-0x1fefffff]
> [ 0.994530] pci 0000:00:00.0: [16c3:abcd] type 01 class 0x060400
> [ 1.000563] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff]
> [ 1.006847] pci 0000:00:00.0: reg 0x38: [mem 0x00000000-0x0000ffff pref]
> [ 1.013600] pci 0000:00:00.0: supports D1
> [ 1.017620] pci 0000:00:00.0: PME# supported from D0 D1 D3hot D3cold
> [ 1.025843] pci 0000:01:00.0: [12d8:2608] type 01 class 0x060400
> [ 1.032244] pci 0000:01:00.0: supports D1 D2
> [ 1.036529] pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot D3cold
> [ 1.043510] pci 0000:01:00.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [ 1.051961] pci 0000:02:01.0: [12d8:2608] type 01 class 0x060400
> [ 1.058401] pci 0000:02:01.0: supports D1 D2
> [ 1.062695] pci 0000:02:01.0: PME# supported from D0 D1 D2 D3hot D3cold
> [ 1.069746] pci 0000:02:02.0: [12d8:2608] type 01 class 0x060400
> [ 1.076151] pci 0000:02:02.0: supports D1 D2
> [ 1.080435] pci 0000:02:02.0: PME# supported from D0 D1 D2 D3hot D3cold
> [ 1.087447] pci 0000:02:03.0: [12d8:2608] type 01 class 0x060400
> [ 1.093834] pci 0000:02:03.0: supports D1 D2
> [ 1.098111] pci 0000:02:03.0: PME# supported from D0 D1 D2 D3hot D3cold
> [ 1.105110] pci 0000:02:04.0: [12d8:2608] type 01 class 0x060400
> [ 1.111504] pci 0000:02:04.0: supports D1 D2
> [ 1.115781] pci 0000:02:04.0: PME# supported from D0 D1 D2 D3hot D3cold
> [ 1.124178] pci 0000:02:01.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [ 1.132218] pci 0000:02:02.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [ 1.140251] pci 0000:02:03.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [ 1.148282] pci 0000:02:04.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [ 1.156651] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 41
> [ 1.163569] pci_bus 0000:42: busn_res: [bus 42-ff] end is updated to 80
> [ 1.170472] pci_bus 0000:81: busn_res: [bus 81-ff] end is updated to bf
> [ 1.177356] pci_bus 0000:c0: busn_res: [bus c0-ff] end is updated to fe
> [ 1.183999] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to fe
> [ 1.190666] pci 0000:00:00.0: BAR 0: assigned [mem 0x18000000-0x180fffff]
> [ 1.197470] pci 0000:00:00.0: BAR 8: assigned [mem 0x18100000-0x188fffff]
> [ 1.204277] pci 0000:00:00.0: BAR 9: assigned [mem 0x18900000-0x190fffff pref]
> [ 1.211513] pci 0000:00:00.0: BAR 6: assigned [mem 0x19100000-0x1910ffff pref]
> [ 1.218748] pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x4fff]
> [ 1.224857] pci 0000:01:00.0: BAR 8: assigned [mem 0x18100000-0x188fffff]
> [ 1.231657] pci 0000:01:00.0: BAR 9: assigned [mem
> 0x18900000-0x190fffff 64bit pref]
> [ 1.239410] pci 0000:01:00.0: BAR 7: assigned [io 0x1000-0x4fff]
> [ 1.245524] pci 0000:02:01.0: BAR 8: assigned [mem 0x18100000-0x182fffff]
> [ 1.252322] pci 0000:02:01.0: BAR 9: assigned [mem
> 0x18900000-0x18afffff 64bit pref]
> [ 1.260077] pci 0000:02:02.0: BAR 8: assigned [mem 0x18300000-0x184fffff]
> [ 1.266874] pci 0000:02:02.0: BAR 9: assigned [mem
> 0x18b00000-0x18cfffff 64bit pref]
> [ 1.274632] pci 0000:02:03.0: BAR 8: assigned [mem 0x18500000-0x186fffff]
> [ 1.281444] pci 0000:02:03.0: BAR 9: assigned [mem
> 0x18d00000-0x18efffff 64bit pref]
> [ 1.289205] pci 0000:02:04.0: BAR 8: assigned [mem 0x18700000-0x188fffff]
> [ 1.296003] pci 0000:02:04.0: BAR 9: assigned [mem
> 0x18f00000-0x190fffff 64bit pref]
> [ 1.303757] pci 0000:02:01.0: BAR 7: assigned [io 0x1000-0x1fff]
> [ 1.309859] pci 0000:02:02.0: BAR 7: assigned [io 0x2000-0x2fff]
> [ 1.315962] pci 0000:02:03.0: BAR 7: assigned [io 0x3000-0x3fff]
> [ 1.322070] pci 0000:02:04.0: BAR 7: assigned [io 0x4000-0x4fff]
> [ 1.328183] pci 0000:02:01.0: PCI bridge to [bus 03-41]
> [ 1.333426] pci 0000:02:01.0: bridge window [io 0x1000-0x1fff]
> [ 1.339543] pci 0000:02:01.0: bridge window [mem 0x18100000-0x182fffff]
> [ 1.346347] pci 0000:02:01.0: bridge window [mem 0x18900000-0x18afffff
> 64bit pref]
> [ 1.354114] pci 0000:02:02.0: PCI bridge to [bus 42-80]
> [ 1.359352] pci 0000:02:02.0: bridge window [io 0x2000-0x2fff]
> [ 1.365464] pci 0000:02:02.0: bridge window [mem 0x18300000-0x184fffff]
> [ 1.372267] pci 0000:02:02.0: bridge window [mem 0x18b00000-0x18cfffff
> 64bit pref]
> [ 1.380039] pci 0000:02:03.0: PCI bridge to [bus 81-bf]
> [ 1.385282] pci 0000:02:03.0: bridge window [io 0x3000-0x3fff]
> [ 1.391407] pci 0000:02:03.0: bridge window [mem 0x18500000-0x186fffff]
> [ 1.398215] pci 0000:02:03.0: bridge window [mem 0x18d00000-0x18efffff
> 64bit pref]
> [ 1.405988] pci 0000:02:04.0: PCI bridge to [bus c0-fe]
> [ 1.411227] pci 0000:02:04.0: bridge window [io 0x4000-0x4fff]
> [ 1.417341] pci 0000:02:04.0: bridge window [mem 0x18700000-0x188fffff]
> [ 1.424144] pci 0000:02:04.0: bridge window [mem 0x18f00000-0x190fffff
> 64bit pref]
> ...
> 
> A hang looks like this:
> ...
> [ 0.319551] Asymmetric key parser 'x509' registered
> [ 0.329739] clk: failed to reparent gic to sys_pll2_500m: -16
> [ 0.337699] SoC: i.MX8MP revision 1.1
> [ 0.341761] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [ 0.347374] 30890000.serial: ttymxc1 at MMIO 0x30890000 (irq = 15,
> base_baud = 1500000) is a IMX
> [ 0.354071] printk: console [ttymxc1] enabled
> [ 0.354071] printk: console [ttymxc1] enabled
> [ 0.362759] printk: bootconsole [ec_imx6q0] disabled
> [ 0.362759] printk: bootconsole [ec_imx6q0] disabled
> [ 0.384699] i2c_dev: i2c /dev entries driver
> [ 0.390010] ledtrig-cpu: registered to indicate activity on CPUs
> [ 0.396043] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
> [ 0.403648] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7
> counters available
> [ 0.416370] Loading compiled-in X.509 certificates
> [ 0.435397] gpio gpiochip0: Static allocation of GPIO base is
> deprecated, use dynamic allocation.
> [ 0.445488] gpio gpiochip1: Static allocation of GPIO base is
> deprecated, use dynamic allocation.
> [ 0.455627] gpio gpiochip2: Static allocation of GPIO base is
> deprecated, use dynamic allocation.
> [ 0.465687] gpio gpiochip3: Static allocation of GPIO base is
> deprecated, use dynamic allocation.
> [ 0.475771] gpio gpiochip4: Static allocation of GPIO base is
> deprecated, use dynamic allocation.
> [ 0.499660] imx6q-pcie 33800000.pcie: host bridge /soc@0/pcie@33800000 ranges:
> [ 0.500276] clk: Not disabling unused clocks
> [ 0.506960] imx6q-pcie 33800000.pcie: IO 0x001ff80000..0x001ff8ffff ->
> 0x0000000000
> [ 0.519401] imx6q-pcie 33800000.pcie: MEM 0x0018000000..0x001fefffff
> -> 0x0018000000
> [ 0.743554] imx6q-pcie 33800000.pcie: iATU: unroll T, 4 ob, 4 ib,
> align 64K, limit 16G
> [ 0.851578] imx6q-pcie 33800000.pcie: PCIe Gen.1 x1 link up
> ^^^ hang at this point until watchdog resets
> 
> Note the 'clk: failed to reparent gic to sys_pll2_500m: -16' message
> which I do not get on the same board with an imx8mm SoC. I'm not sure
> if that is pointing to an issue at this point or not but I see it on
> the imx8mp even when PCI init behaves normally. Also note above I have
> kept the kernel from disabling unused clocks as a precaution.
> 
> I have verified with a scope that PERST# is low until the imx6 PCIe
> driver drives it high and that the external PCI clock going to the
> imx8mp as well as the switch looks proper.
> 
> The relevant imx8mp dt looks like this:
> 
> pcie0_refclk: pcie0-refclk {
>   compatible = "fixed-clock";
>   #clock-cells = <0>;
>   clock-frequency = <100000000>;
> };
> 
> &pcie_phy {
>   fsl,refclk-pad-mode = <IMX8_PCIE_REFCLK_PAD_INPUT>;
>   fsl,clkreq-unsupported;
>   clocks = <&pcie0_refclk>;
>   clock-names = "ref";
>   status = "okay";
> };
> 
> &pcie {
>   pinctrl-names = "default";
>   pinctrl-0 = <&pinctrl_pcie0>;
>   reset-gpio = <&gpio2 17 GPIO_ACTIVE_LOW>;
>   status = "okay";
> };
> 
> If I start adding printk's in imx8_pcie_phy_power_on I don't even make
> it to checking for link and it appears to lockup at the first dbi
> transaction making me think this is a clock or power domain issue.
> 
> I've also posted this question to IMX Community [1]
> 
> Does anyone have any ideas what I should be looking for here?
> 
> Best regards,
> 
> Tim
> [1] https://community.nxp.com/t5/i-MX-Processors/IMX8MP-hang-during-PCI-init/m-p/1706034#M210939

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: imx8mp pci hang during init
  2023-08-17 17:12 ` Bjorn Helgaas
@ 2023-08-18  0:05   ` Maciej W. Rozycki
  2023-08-18 22:12     ` Tim Harvey
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej W. Rozycki @ 2023-08-18  0:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Tim Harvey, linux-pci, Richard Zhu, Lucas Stach,
	Linux ARM Mailing List, Jingoo Han, Gustavo Pimentel,
	Manivannan Sadhasivam

On Thu, 17 Aug 2023, Bjorn Helgaas wrote:

> [+cc Maciej, smells similar to a89c82249c37 ("PCI: Work around PCIe
> link training failures") ]

 Quite so indeed.

> > [ 0.499660] imx6q-pcie 33800000.pcie: host bridge /soc@0/pcie@33800000 ranges:
> > [ 0.500276] clk: Not disabling unused clocks
> > [ 0.506960] imx6q-pcie 33800000.pcie: IO 0x001ff80000..0x001ff8ffff ->
> > 0x0000000000
> > [ 0.519401] imx6q-pcie 33800000.pcie: MEM 0x0018000000..0x001fefffff
> > -> 0x0018000000
> > [ 0.743554] imx6q-pcie 33800000.pcie: iATU: unroll T, 4 ob, 4 ib,
> > align 64K, limit 16G
> > [ 0.851578] imx6q-pcie 33800000.pcie: PCIe Gen.1 x1 link up
> > ^^^ hang at this point until watchdog resets

 So I think it's important to figure out where exactly in the kernel code 
the hang happens; this is presumably in host-bridge-specific link bring-up 
code polling link status, which may have to be updated according to or 
otherwise make use of a89c82249c37.  It may also be something completely 
different of course.

 Can you see if you can bump the link up beyond 2.5GT/s by poking at host 
bridge registers by hand with `setpci' once the link been successfully 
established at 2.5GT/s?

  Maciej

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: imx8mp pci hang during init
  2023-08-18  0:05   ` Maciej W. Rozycki
@ 2023-08-18 22:12     ` Tim Harvey
  2023-08-28 18:51       ` Tim Harvey
  0 siblings, 1 reply; 7+ messages in thread
From: Tim Harvey @ 2023-08-18 22:12 UTC (permalink / raw)
  To: Maciej W. Rozycki, Bjorn Helgaas, Marek Vasut
  Cc: linux-pci, Richard Zhu, Lucas Stach, Linux ARM Mailing List,
	Jingoo Han, Gustavo Pimentel, Manivannan Sadhasivam

On Thu, Aug 17, 2023 at 5:05 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote:
>
> On Thu, 17 Aug 2023, Bjorn Helgaas wrote:
>
> > [+cc Maciej, smells similar to a89c82249c37 ("PCI: Work around PCIe
> > link training failures") ]
>
>  Quite so indeed.
>
> > > [ 0.499660] imx6q-pcie 33800000.pcie: host bridge /soc@0/pcie@33800000 ranges:
> > > [ 0.500276] clk: Not disabling unused clocks
> > > [ 0.506960] imx6q-pcie 33800000.pcie: IO 0x001ff80000..0x001ff8ffff ->
> > > 0x0000000000
> > > [ 0.519401] imx6q-pcie 33800000.pcie: MEM 0x0018000000..0x001fefffff
> > > -> 0x0018000000
> > > [ 0.743554] imx6q-pcie 33800000.pcie: iATU: unroll T, 4 ob, 4 ib,
> > > align 64K, limit 16G
> > > [ 0.851578] imx6q-pcie 33800000.pcie: PCIe Gen.1 x1 link up
> > > ^^^ hang at this point until watchdog resets
>

Maciej and Bjorn,

Thank you for the responses!

>  So I think it's important to figure out where exactly in the kernel code
> the hang happens; this is presumably in host-bridge-specific link bring-up
> code polling link status, which may have to be updated according to or
> otherwise make use of a89c82249c37.  It may also be something completely
> different of course.
>

It's hanging in imx6_pcie_start_link() after the PCI_EXP_LNKCAP
register is updated to allow gen2 during the subsequent
dw_pcie_wait_for_link, specifically within the dw_pcie_read_dbi
function that does a memory read. Due to the mem read hanging the CPU
this tells me that the DWC core has crashed at this point.

What I found is that if I essentially revert the effect of commit
fa33a6d87eac ("PCI: imx6: Start link in Gen1 before negotiating for
Gen2 mode") to start linking at gen3 (or forced to gen2) it appears to
downgrade to gen2 (due to the PI7C9X2G608GPB being a gen2 switch) and
work fine:
diff --git a/drivers/pci/controller/dwc/pci-imx6.c
b/drivers/pci/controller/dwc/pci-imx6.c
index 27aaa2a6bf39..81caaef76e8a 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -876,6 +876,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
        u32 tmp;
        int ret;

+#if 0
        /*
         * Force Gen1 operation when starting the link.  In case the link is
         * started in Gen2 mode, there is a possibility the devices on the
@@ -887,6 +888,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
        tmp |= PCI_EXP_LNKCAP_SLS_2_5GB;
        dw_pcie_writel_dbi(pci, offset + PCI_EXP_LNKCAP, tmp);
        dw_pcie_dbi_ro_wr_dis(pci);
+#endif

        /* Start LTSSM. */
        imx6_pcie_ltssm_enable(dev);
@@ -895,6 +897,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
        if (ret)
                goto err_reset_phy;

+#if 0
        if (pci->link_gen > 1) {
                /* Allow faster modes after the link is up */
                dw_pcie_dbi_ro_wr_en(pci);
@@ -937,6 +940,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
        } else {
                dev_info(dev, "Link: Only Gen1 is enabled\n");
        }
+#endif

        imx6_pcie->link_is_up = true;
        tmp = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA);

So I think you are correct in that I need to do the same thing that
was done in a89c82249c37 for the imx6 dwc driver which essentially
forces it the other way going from gen1->gen2->gen3 (upward) instead
of gen3->gen2->gen1 (downard).

I've cc'd Marek who authored commit fa33a6d87eac ("PCI: imx6: Start
link in Gen1 before negotiating for Gen2 mode") in hopes that he might
remember what switch or switches he needed this change for. I'm not
even clear what IMX6 SoC 10 years ago even had gen2 capability.

I found that the PI7C9X2G608GPB used here has an errata "E11: GEN2
Change-Rate Issue with Certain Root Complex Platforms" that describes
an issue observed in certain PCIe gen3 platforms during a rate change
from 2.5Gbps to 5Gbps caused by the switch entering a recovery state
that can timeout at which point according to the errata "After the
link-down process, all the registers are reset to the default values"
which is likely whats causing the DWC controller to hang.

My gut feel is that commit a89c82249c37 ("PCI: Work around PCIe link
training failures") likely would resolve the issues that Marek had
which prompted him to make the imx6 driver go from gen1 upward and
that if we changed the driver to go from gen3 downward it would
resolve my issue as well. However, I don't know what the 'correct'
link training sequence should really be (upward or downward) so it's
hard to say what the right workaround is. Is there a correct link
training sequence and if so how many controllers are using it vs
having to reverse it to workaround hardware quirks?

>  Can you see if you can bump the link up beyond 2.5GT/s by poking at host
> bridge registers by hand with `setpci' once the link been successfully
> established at 2.5GT/s?
>

I'll have to try that. Instead of using the PCI_EXP_LNKCTL like the
pcie_retrain_link() function does the imx6 driver touches some DWC
register that I don't have documentation for so essentially what your
asking will test retraining the more standard way using the config
registers:
                /*
                 * Start Directed Speed Change so the best possible
                 * speed both link partners support can be negotiated.
                 */
                tmp = dw_pcie_readl_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL);
                tmp |= PORT_LOGIC_SPEED_CHANGE;
                dw_pcie_writel_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL, tmp);
                dw_pcie_dbi_ro_wr_dis(pci);

Best regards,

Tim

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: imx8mp pci hang during init
  2023-08-18 22:12     ` Tim Harvey
@ 2023-08-28 18:51       ` Tim Harvey
  2023-08-28 19:24         ` Marek Vasut
  2023-08-29 10:58         ` Maciej W. Rozycki
  0 siblings, 2 replies; 7+ messages in thread
From: Tim Harvey @ 2023-08-28 18:51 UTC (permalink / raw)
  To: Maciej W. Rozycki, Bjorn Helgaas, Marek Vasut
  Cc: linux-pci, Richard Zhu, Lucas Stach, Linux ARM Mailing List,
	Jingoo Han, Gustavo Pimentel, Manivannan Sadhasivam

On Fri, Aug 18, 2023 at 3:12 PM Tim Harvey <tharvey@gateworks.com> wrote:
>
> On Thu, Aug 17, 2023 at 5:05 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote:
> >
> > On Thu, 17 Aug 2023, Bjorn Helgaas wrote:
> >
> > > [+cc Maciej, smells similar to a89c82249c37 ("PCI: Work around PCIe
> > > link training failures") ]
> >
> >  Quite so indeed.
> >
> > > > [ 0.499660] imx6q-pcie 33800000.pcie: host bridge /soc@0/pcie@33800000 ranges:
> > > > [ 0.500276] clk: Not disabling unused clocks
> > > > [ 0.506960] imx6q-pcie 33800000.pcie: IO 0x001ff80000..0x001ff8ffff ->
> > > > 0x0000000000
> > > > [ 0.519401] imx6q-pcie 33800000.pcie: MEM 0x0018000000..0x001fefffff
> > > > -> 0x0018000000
> > > > [ 0.743554] imx6q-pcie 33800000.pcie: iATU: unroll T, 4 ob, 4 ib,
> > > > align 64K, limit 16G
> > > > [ 0.851578] imx6q-pcie 33800000.pcie: PCIe Gen.1 x1 link up
> > > > ^^^ hang at this point until watchdog resets
> >
>
> Maciej and Bjorn,
>
> Thank you for the responses!
>
> >  So I think it's important to figure out where exactly in the kernel code
> > the hang happens; this is presumably in host-bridge-specific link bring-up
> > code polling link status, which may have to be updated according to or
> > otherwise make use of a89c82249c37.  It may also be something completely
> > different of course.
> >
>
> It's hanging in imx6_pcie_start_link() after the PCI_EXP_LNKCAP
> register is updated to allow gen2 during the subsequent
> dw_pcie_wait_for_link, specifically within the dw_pcie_read_dbi
> function that does a memory read. Due to the mem read hanging the CPU
> this tells me that the DWC core has crashed at this point.
>
> What I found is that if I essentially revert the effect of commit
> fa33a6d87eac ("PCI: imx6: Start link in Gen1 before negotiating for
> Gen2 mode") to start linking at gen3 (or forced to gen2) it appears to
> downgrade to gen2 (due to the PI7C9X2G608GPB being a gen2 switch) and
> work fine:
> diff --git a/drivers/pci/controller/dwc/pci-imx6.c
> b/drivers/pci/controller/dwc/pci-imx6.c
> index 27aaa2a6bf39..81caaef76e8a 100644
> --- a/drivers/pci/controller/dwc/pci-imx6.c
> +++ b/drivers/pci/controller/dwc/pci-imx6.c
> @@ -876,6 +876,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
>         u32 tmp;
>         int ret;
>
> +#if 0
>         /*
>          * Force Gen1 operation when starting the link.  In case the link is
>          * started in Gen2 mode, there is a possibility the devices on the
> @@ -887,6 +888,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
>         tmp |= PCI_EXP_LNKCAP_SLS_2_5GB;
>         dw_pcie_writel_dbi(pci, offset + PCI_EXP_LNKCAP, tmp);
>         dw_pcie_dbi_ro_wr_dis(pci);
> +#endif
>
>         /* Start LTSSM. */
>         imx6_pcie_ltssm_enable(dev);
> @@ -895,6 +897,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
>         if (ret)
>                 goto err_reset_phy;
>
> +#if 0
>         if (pci->link_gen > 1) {
>                 /* Allow faster modes after the link is up */
>                 dw_pcie_dbi_ro_wr_en(pci);
> @@ -937,6 +940,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
>         } else {
>                 dev_info(dev, "Link: Only Gen1 is enabled\n");
>         }
> +#endif
>
>         imx6_pcie->link_is_up = true;
>         tmp = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA);
>
> So I think you are correct in that I need to do the same thing that
> was done in a89c82249c37 for the imx6 dwc driver which essentially
> forces it the other way going from gen1->gen2->gen3 (upward) instead
> of gen3->gen2->gen1 (downard).
>
> I've cc'd Marek who authored commit fa33a6d87eac ("PCI: imx6: Start
> link in Gen1 before negotiating for Gen2 mode") in hopes that he might
> remember what switch or switches he needed this change for. I'm not
> even clear what IMX6 SoC 10 years ago even had gen2 capability.
>
> I found that the PI7C9X2G608GPB used here has an errata "E11: GEN2
> Change-Rate Issue with Certain Root Complex Platforms" that describes
> an issue observed in certain PCIe gen3 platforms during a rate change
> from 2.5Gbps to 5Gbps caused by the switch entering a recovery state
> that can timeout at which point according to the errata "After the
> link-down process, all the registers are reset to the default values"
> which is likely whats causing the DWC controller to hang.
>
> My gut feel is that commit a89c82249c37 ("PCI: Work around PCIe link
> training failures") likely would resolve the issues that Marek had
> which prompted him to make the imx6 driver go from gen1 upward and
> that if we changed the driver to go from gen3 downward it would
> resolve my issue as well. However, I don't know what the 'correct'
> link training sequence should really be (upward or downward) so it's
> hard to say what the right workaround is. Is there a correct link
> training sequence and if so how many controllers are using it vs
> having to reverse it to workaround hardware quirks?
>
> >  Can you see if you can bump the link up beyond 2.5GT/s by poking at host
> > bridge registers by hand with `setpci' once the link been successfully
> > established at 2.5GT/s?
> >
>
> I'll have to try that. Instead of using the PCI_EXP_LNKCTL like the
> pcie_retrain_link() function does the imx6 driver touches some DWC
> register that I don't have documentation for so essentially what your
> asking will test retraining the more standard way using the config
> registers:
>                 /*
>                  * Start Directed Speed Change so the best possible
>                  * speed both link partners support can be negotiated.
>                  */
>                 tmp = dw_pcie_readl_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL);
>                 tmp |= PORT_LOGIC_SPEED_CHANGE;
>                 dw_pcie_writel_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL, tmp);
>                 dw_pcie_dbi_ro_wr_dis(pci);
>
> Best regards,
>
> Tim

Maciej and Bjorn,

Seeing as Marek encountered a switch that had some issue starting at
Gen2 and I've encountered a switch that has an issue starting at Gen1
then moving to Gen2 how do you suggest dealing with this?

It seems to me that pci quirks require knowing the device so don't
help until you've established a link and can get to config space, or
perhaps this means the switch needs to be defined in DT so that a dt
compatible could be used for the quirk?

Does the PCIe specification specify that link training should start
with the highest possible speed then downgrade? I find that most of
the other PCI host controller drivers I've looked at all work this
way. I have only found the force gen2 first behavior in pci-imx6.c and
pcie-fu740.c. Maybe a dt property to force gen2 first is needed to
resolve this.

Best regards,

Tim

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: imx8mp pci hang during init
  2023-08-28 18:51       ` Tim Harvey
@ 2023-08-28 19:24         ` Marek Vasut
  2023-08-29 10:58         ` Maciej W. Rozycki
  1 sibling, 0 replies; 7+ messages in thread
From: Marek Vasut @ 2023-08-28 19:24 UTC (permalink / raw)
  To: Tim Harvey, Maciej W. Rozycki, Bjorn Helgaas
  Cc: linux-pci, Richard Zhu, Lucas Stach, Linux ARM Mailing List,
	Jingoo Han, Gustavo Pimentel, Manivannan Sadhasivam

On 8/28/23 20:51, Tim Harvey wrote:
> On Fri, Aug 18, 2023 at 3:12 PM Tim Harvey <tharvey@gateworks.com> wrote:
>>
>> On Thu, Aug 17, 2023 at 5:05 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote:
>>>
>>> On Thu, 17 Aug 2023, Bjorn Helgaas wrote:
>>>
>>>> [+cc Maciej, smells similar to a89c82249c37 ("PCI: Work around PCIe
>>>> link training failures") ]
>>>
>>>   Quite so indeed.
>>>
>>>>> [ 0.499660] imx6q-pcie 33800000.pcie: host bridge /soc@0/pcie@33800000 ranges:
>>>>> [ 0.500276] clk: Not disabling unused clocks
>>>>> [ 0.506960] imx6q-pcie 33800000.pcie: IO 0x001ff80000..0x001ff8ffff ->
>>>>> 0x0000000000
>>>>> [ 0.519401] imx6q-pcie 33800000.pcie: MEM 0x0018000000..0x001fefffff
>>>>> -> 0x0018000000
>>>>> [ 0.743554] imx6q-pcie 33800000.pcie: iATU: unroll T, 4 ob, 4 ib,
>>>>> align 64K, limit 16G
>>>>> [ 0.851578] imx6q-pcie 33800000.pcie: PCIe Gen.1 x1 link up
>>>>> ^^^ hang at this point until watchdog resets
>>>
>>
>> Maciej and Bjorn,
>>
>> Thank you for the responses!
>>
>>>   So I think it's important to figure out where exactly in the kernel code
>>> the hang happens; this is presumably in host-bridge-specific link bring-up
>>> code polling link status, which may have to be updated according to or
>>> otherwise make use of a89c82249c37.  It may also be something completely
>>> different of course.
>>>
>>
>> It's hanging in imx6_pcie_start_link() after the PCI_EXP_LNKCAP
>> register is updated to allow gen2 during the subsequent
>> dw_pcie_wait_for_link, specifically within the dw_pcie_read_dbi
>> function that does a memory read. Due to the mem read hanging the CPU
>> this tells me that the DWC core has crashed at this point.
>>
>> What I found is that if I essentially revert the effect of commit
>> fa33a6d87eac ("PCI: imx6: Start link in Gen1 before negotiating for
>> Gen2 mode") to start linking at gen3 (or forced to gen2) it appears to
>> downgrade to gen2 (due to the PI7C9X2G608GPB being a gen2 switch) and
>> work fine:
>> diff --git a/drivers/pci/controller/dwc/pci-imx6.c
>> b/drivers/pci/controller/dwc/pci-imx6.c
>> index 27aaa2a6bf39..81caaef76e8a 100644
>> --- a/drivers/pci/controller/dwc/pci-imx6.c
>> +++ b/drivers/pci/controller/dwc/pci-imx6.c
>> @@ -876,6 +876,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
>>          u32 tmp;
>>          int ret;
>>
>> +#if 0
>>          /*
>>           * Force Gen1 operation when starting the link.  In case the link is
>>           * started in Gen2 mode, there is a possibility the devices on the
>> @@ -887,6 +888,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
>>          tmp |= PCI_EXP_LNKCAP_SLS_2_5GB;
>>          dw_pcie_writel_dbi(pci, offset + PCI_EXP_LNKCAP, tmp);
>>          dw_pcie_dbi_ro_wr_dis(pci);
>> +#endif
>>
>>          /* Start LTSSM. */
>>          imx6_pcie_ltssm_enable(dev);
>> @@ -895,6 +897,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
>>          if (ret)
>>                  goto err_reset_phy;
>>
>> +#if 0
>>          if (pci->link_gen > 1) {
>>                  /* Allow faster modes after the link is up */
>>                  dw_pcie_dbi_ro_wr_en(pci);
>> @@ -937,6 +940,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
>>          } else {
>>                  dev_info(dev, "Link: Only Gen1 is enabled\n");
>>          }
>> +#endif
>>
>>          imx6_pcie->link_is_up = true;
>>          tmp = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA);
>>
>> So I think you are correct in that I need to do the same thing that
>> was done in a89c82249c37 for the imx6 dwc driver which essentially
>> forces it the other way going from gen1->gen2->gen3 (upward) instead
>> of gen3->gen2->gen1 (downard).
>>
>> I've cc'd Marek who authored commit fa33a6d87eac ("PCI: imx6: Start
>> link in Gen1 before negotiating for Gen2 mode") in hopes that he might
>> remember what switch or switches he needed this change for. I'm not
>> even clear what IMX6 SoC 10 years ago even had gen2 capability.
>>
>> I found that the PI7C9X2G608GPB used here has an errata "E11: GEN2
>> Change-Rate Issue with Certain Root Complex Platforms" that describes
>> an issue observed in certain PCIe gen3 platforms during a rate change
>> from 2.5Gbps to 5Gbps caused by the switch entering a recovery state
>> that can timeout at which point according to the errata "After the
>> link-down process, all the registers are reset to the default values"
>> which is likely whats causing the DWC controller to hang.
>>
>> My gut feel is that commit a89c82249c37 ("PCI: Work around PCIe link
>> training failures") likely would resolve the issues that Marek had
>> which prompted him to make the imx6 driver go from gen1 upward and
>> that if we changed the driver to go from gen3 downward it would
>> resolve my issue as well. However, I don't know what the 'correct'
>> link training sequence should really be (upward or downward) so it's
>> hard to say what the right workaround is. Is there a correct link
>> training sequence and if so how many controllers are using it vs
>> having to reverse it to workaround hardware quirks?
>>
>>>   Can you see if you can bump the link up beyond 2.5GT/s by poking at host
>>> bridge registers by hand with `setpci' once the link been successfully
>>> established at 2.5GT/s?
>>>
>>
>> I'll have to try that. Instead of using the PCI_EXP_LNKCTL like the
>> pcie_retrain_link() function does the imx6 driver touches some DWC
>> register that I don't have documentation for so essentially what your
>> asking will test retraining the more standard way using the config
>> registers:
>>                  /*
>>                   * Start Directed Speed Change so the best possible
>>                   * speed both link partners support can be negotiated.
>>                   */
>>                  tmp = dw_pcie_readl_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL);
>>                  tmp |= PORT_LOGIC_SPEED_CHANGE;
>>                  dw_pcie_writel_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL, tmp);
>>                  dw_pcie_dbi_ro_wr_dis(pci);
>>
>> Best regards,
>>
>> Tim
> 
> Maciej and Bjorn,
> 
> Seeing as Marek encountered a switch that had some issue starting at
> Gen2 and I've encountered a switch that has an issue starting at Gen1
> then moving to Gen2 how do you suggest dealing with this?
> 
> It seems to me that pci quirks require knowing the device so don't
> help until you've established a link and can get to config space, or
> perhaps this means the switch needs to be defined in DT so that a dt
> compatible could be used for the quirk?
> 
> Does the PCIe specification specify that link training should start
> with the highest possible speed then downgrade? I find that most of
> the other PCI host controller drivers I've looked at all work this
> way. I have only found the force gen2 first behavior in pci-imx6.c and
> pcie-fu740.c. Maybe a dt property to force gen2 first is needed to
> resolve this.

One idea which came to mind just now -- maybe you can describe the PCIe 
device in DT:

arch/arm/boot/dts/nxp/imx/imx6q-utilite-pro.dts

326 &pcie {
327         pcie@0,0 {
328                 reg = <0x000000 0 0 0 0>;
329                 #address-cells = <3>;
330                 #size-cells = <2>;
331
332                 /* non-removable i211 ethernet card */
333                 eth1: intel,i211@pcie0,0 {
...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: imx8mp pci hang during init
  2023-08-28 18:51       ` Tim Harvey
  2023-08-28 19:24         ` Marek Vasut
@ 2023-08-29 10:58         ` Maciej W. Rozycki
  1 sibling, 0 replies; 7+ messages in thread
From: Maciej W. Rozycki @ 2023-08-29 10:58 UTC (permalink / raw)
  To: Tim Harvey
  Cc: Bjorn Helgaas, Marek Vasut, linux-pci, Richard Zhu, Lucas Stach,
	Linux ARM Mailing List, Jingoo Han, Gustavo Pimentel,
	Manivannan Sadhasivam

Tim,

> It seems to me that pci quirks require knowing the device so don't
> help until you've established a link and can get to config space, or
> perhaps this means the switch needs to be defined in DT so that a dt
> compatible could be used for the quirk?

 This is why I took a different approach with my a89c82249c37 ("PCI: Work 
around PCIe link training failures").  Initially as a regular quirk 
applied to all devices (i.e. matching on PCI_ANY_ID:PCI_ANY_ID) and then, 
following Bjorn's suggestion, invoked directly from `pci_device_add' and 
`pcie_wait_for_link_delay'.

> Does the PCIe specification specify that link training should start
> with the highest possible speed then downgrade? I find that most of
> the other PCI host controller drivers I've looked at all work this
> way. I have only found the force gen2 first behavior in pci-imx6.c and
> pcie-fu740.c. Maybe a dt property to force gen2 first is needed to
> resolve this.

 It works the other way round.  Link is always established at 2.5GT/s and 
once successful the endpoints send each other information, the so called 
"training sets", about their capabilities, including speeds supported.  
Then they switch to the highest speed supported within the Target Link 
Speed (TLS) setting in the Link Control 2 register of both ends.  If there 
are reliability issues at the higher rate, the endpoints are supposed to 
reduce the link speed.  Reducing the speed, both by clamping with TLS and 
in the case of reliability issues, is always done by removing said speed 
from the list reported in the respective device's training set.

 I don't know what's causing some devices to fail to switch to the higher 
speed when unclamped with TLS and yet to switch successfully when first 
clamped with TLS and then the clamping removed.  In principle unclamping 
by hand should just mimic what happens in the unclamped case: the other 
endpoint sees a higher speed advertised, so both endpoints switch to it.  
I suppose the hardware state machine is just tough to get right and doing 
things by hand prevents the broken ones from getting into an odd state due 
to unfortunate timing or whatever.  Unfortunately the device manufacturers 
involved declined to comment.

  Maciej

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-29 10:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-16 22:25 imx8mp pci hang during init Tim Harvey
2023-08-17 17:12 ` Bjorn Helgaas
2023-08-18  0:05   ` Maciej W. Rozycki
2023-08-18 22:12     ` Tim Harvey
2023-08-28 18:51       ` Tim Harvey
2023-08-28 19:24         ` Marek Vasut
2023-08-29 10:58         ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).