On 29/10/2020 11:41, Toke Høiland-Jørgensen wrote: > Bjorn Helgaas writes: > >> [+cc Pali, Marek, Thomas, Jason] >> >> On Wed, Oct 28, 2020 at 04:40:00PM +0000, ™֟☻̭҇ Ѽ ҉ ® wrote: >>> On 28/10/2020 16:08, Toke Høiland-Jørgensen wrote: >>>> Bjorn Helgaas writes: >>>>> On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote: >>>>>> Toke Høiland-Jørgensen writes: >>>>>>> Bjorn Helgaas writes: >>>>>>> >>>>>>>> [+cc vtolkm] >>>>>>>> >>>>>>>> On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote: >>>>>>>>> Hi everyone >>>>>>>>> >>>>>>>>> I'm trying to get a mainline kernel to run on my Turris Omnia, and am >>>>>>>>> having some trouble getting the PCI bus to work correctly. Specifically, >>>>>>>>> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with >>>>>>>>> the resource request fix[0] applied on top. >>>>>>>>> >>>>>>>>> The kernel boots fine, and the patch in [0] makes the PCI devices show >>>>>>>>> up. But I'm still getting initialisation errors like these: >>>>>>>>> >>>>>>>>> [ 1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff) >>>>>>>>> [ 1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff) >>>>>>>>> [ 1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff) >>>>>>>>> [ 1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff) >>>>>>>>> >>>>>>>>> and the WiFi drivers fail to initialise with what appears to me to be >>>>>>>>> errors related to the bus rather than to the drivers themselves: >>>>>>>>> >>>>>>>>> [ 3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver >>>>>>>>> [ 3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95 >>>>>>>>> [ 3.524473] ath9k 0000:01:00.0: Failed to initialize device >>>>>>>>> [ 3.530081] ath9k: probe of 0000:01:00.0 failed with error -95 >>>>>>>>> [ 3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134 >>>>>>>>> [ 3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142) >>>>>>>>> [ 3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible) >>>>>>>>> [ 3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110 >>>>>>>>> [ 3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110 >>>>>>>>> >>>>>>>>> lspci looks OK, though: >>>>>>>>> >>>>>>>>> # lspci >>>>>>>>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04) >>>>>>>>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04) >>>>>>>>> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04) >>>>>>>>> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01) >>>>>>>>> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff) >>>>>>>>> >>>>>>>>> Does anyone have any clue what could be going on here? Is this a bug, or >>>>>>>>> did I miss something in my config or other initialisation? I've tried >>>>>>>>> with both the stock u-boot distributed with the board, and with an >>>>>>>>> upstream u-boot from latest master; doesn't seem to make any different. >>>>>>>> Can you try turning off CONFIG_PCIEASPM? We had a similar recent >>>>>>>> report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I >>>>>>>> don't think we have a fix yet. >>>>>>> Yes! Turning that off does indeed help! Thanks a bunch :) >>>>>>> >>>>>>> You mention that bisecting this would be helpful - I can try that >>>>>>> tomorrow; any idea when this was last working? >>>>>> OK, so I tried to bisect this, but, erm, I couldn't find a working >>>>>> revision to start from? I went all the way back to 4.10 (which is the >>>>>> first version to include the device tree file for the Omnia), and even >>>>>> on that, the wireless cards were failing to initialise with ASPM >>>>>> enabled... >>>>> I have no personal experience with this device; all I know is that the >>>>> bugzilla suggests that it worked in v5.4, which isn't much help. >>>>> >>>>> Possibly the apparent regression was really a .config change, i.e., >>>>> CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it >>>>> "worked" but got enabled later and it started failing? >>>> Yeah, I suspect so. The OpenWrt config disables CONFIG_PCIEASPM by >>>> default and only turns it on for specific targets. So I guess that it's >>>> most likely that this has never worked... >>>> >>>>> Maybe the debug patch below would be worth trying to see if it makes >>>>> any difference? If it *does* help, try omitting the first hunk to see >>>>> if we just need to apply the quirk_enable_clear_retrain_link() quirk. >>>> Tried, doesn't help... >>>> >>>> -Toke >>> Found this patch >>> >>> https://github.com/openwrt/openwrt/blob/7c0496f29bed87326f1bf591ca25ace82373cfc7/target/linux/mvebu/patches-5.4/405-PCI-aardvark-Improve-link-training.patch >>> >>> that mentions the Compex WLE900VX card, which reading the lspci verbose >>> output from the bugtracker seems to the device being troubled. >> Interesting. Indeed, the Compex WLE900VX card seems to have the >> Qualcomm Atheros QCA9880 on it, and it looks like Toke's system has >> the same device in it. >> >> The patch you mention (https://git.kernel.org/linus/43fc679ced18) is >> for aardvark, so of course doesn't help mvebu. >> >> PCIe hardware is supposed to automatically negotiate the highest link >> speed supported by both ends. But software *is* allowed to set an >> upper limit (the Target Link Speed in Link Control 2). If we initiate >> a retrain and the link doesn't come back up, I wonder if we should try >> to help the hardware out by using Target Link Speed to limit to a >> lower speed and attempting another retrain, something like this hacky >> patch: (please collect the dmesg log if you try this) > Well, I tried it, but don't see any of the 'lnkcap2' output from that > new function: > > [ 1.545853] mvebu-pcie soc:pcie: host bridge /soc/pcie ranges: > [ 1.545878] mvebu-pcie soc:pcie: MEM 0x00f1080000..0x00f1081fff -> 0x0000080000 > [ 1.545894] mvebu-pcie soc:pcie: MEM 0x00f1040000..0x00f1041fff -> 0x0000040000 > [ 1.545907] mvebu-pcie soc:pcie: MEM 0x00f1044000..0x00f1045fff -> 0x0000044000 > [ 1.545920] mvebu-pcie soc:pcie: MEM 0x00f1048000..0x00f1049fff -> 0x0000048000 > [ 1.545933] mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000 > [ 1.545945] mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0100000000 > [ 1.545958] mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0200000000 > [ 1.545970] mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0200000000 > [ 1.545982] mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0300000000 > [ 1.545994] mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0300000000 > [ 1.546006] mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0400000000 > [ 1.546014] mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0400000000 > [ 1.546181] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00 > [ 1.546190] pci_bus 0000:00: root bus resource [bus 00-ff] > [ 1.546197] pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] (bus address [0x00080000-0x00081fff]) > [ 1.546204] pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] (bus address [0x00040000-0x00041fff]) > [ 1.546210] pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] (bus address [0x00044000-0x00045fff]) > [ 1.546216] pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] (bus address [0x00048000-0x00049fff]) > [ 1.546220] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff] > [ 1.546225] pci_bus 0000:00: root bus resource [io 0x1000-0xeffff] > [ 1.546294] pci 0000:00:01.0: [11ab:6820] type 01 class 0x060400 > [ 1.546308] pci 0000:00:01.0: reg 0x38: [mem 0x00000000-0x000007ff pref] > [ 1.546482] pci 0000:00:02.0: [11ab:6820] type 01 class 0x060400 > [ 1.546495] pci 0000:00:02.0: reg 0x38: [mem 0x00000000-0x000007ff pref] > [ 1.546643] pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400 > [ 1.546656] pci 0000:00:03.0: reg 0x38: [mem 0x00000000-0x000007ff pref] > [ 1.547379] PCI: bus0: Fast back to back transfers disabled > [ 1.547387] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring > [ 1.547394] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring > [ 1.547402] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring > [ 1.547484] pci 0000:01:00.0: [168c:002e] type 00 class 0x028000 > [ 1.547507] pci 0000:01:00.0: reg 0x10: [mem 0xe8000000-0xe800ffff 64bit] > [ 1.547615] pci 0000:01:00.0: supports D1 > [ 1.547620] pci 0000:01:00.0: PME# supported from D0 D1 D3hot > [ 1.547730] pci 0000:00:01.0: ASPM: current common clock configuration is inconsistent, reconfiguring > [ 1.631937] PCI: bus2: Fast back to back transfers enabled > [ 1.631945] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02 > [ 1.632655] PCI: bus3: Fast back to back transfers enabled > [ 1.632662] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03 > [ 1.632694] pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff] > [ 1.632702] pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff] > [ 1.632710] pci 0000:00:01.0: BAR 6: assigned [mem 0xe0100000-0xe01007ff pref] > [ 1.632718] pci 0000:00:02.0: BAR 6: assigned [mem 0xe0500000-0xe05007ff pref] > [ 1.632726] pci 0000:00:03.0: BAR 6: assigned [mem 0xe0600000-0xe06007ff pref] > [ 1.632734] pci 0000:01:00.0: BAR 0: assigned [mem 0xe0000000-0xe000ffff 64bit] > [ 1.632741] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff) > [ 1.632746] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff) > [ 1.632752] pci 0000:00:01.0: PCI bridge to [bus 01] > [ 1.632760] pci 0000:00:01.0: bridge window [mem 0xe0000000-0xe00fffff] > [ 1.632769] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0200000-0xe03fffff 64bit] > [ 1.632776] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff) > [ 1.632782] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff) > [ 1.632788] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0400000-0xe040ffff pref] > [ 1.632793] pci 0000:00:02.0: PCI bridge to [bus 02] > [ 1.632800] pci 0000:00:02.0: bridge window [mem 0xe0200000-0xe04fffff] > [ 1.632807] pci 0000:00:03.0: PCI bridge to [bus 03] > > (and then later, still): > [ 3.476364] pci 0000:00:01.0: enabling device (0140 -> 0142) > [ 3.477542] ata1: SATA link down (SStatus 0 SControl 300) > [ 3.482126] ath9k 0000:01:00.0: enabling device (0000 -> 0002) > [ 3.487487] ata2: SATA link down (SStatus 0 SControl 300) > [ 3.493379] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver > [ 3.505891] ath: phy0: Unable to initialize hardware; initialization status: -95 > [ 3.513325] ath9k 0000:01:00.0: Failed to initialize device > [ 3.518933] ath9k: probe of 0000:01:00.0 failed with error -95 > [ 3.524862] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134 > [ 3.531904] pci 0000:00:02.0: enabling device (0140 -> 0142) > [ 3.537590] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible) > [ 3.577436] ath10k_pci 0000:02:00.0: failed to wake up device : -110 > [ 3.583948] ath10k_pci: probe of 0000:02:00.0 failed with error -110 > > > -Toke > Same result my end - run tested with next-20201027 N.B. node does not boot anymore with next-20201028, but that that is independent of this patch and apparently another issue.