linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PCI trouble on mvebu (Turris Omnia)
@ 2020-10-27 15:43 Toke Høiland-Jørgensen
  2020-10-27 17:20 ` Bjorn Helgaas
  2020-10-27 18:03 ` Marek Behun
  0 siblings, 2 replies; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-27 15:43 UTC (permalink / raw)
  To: linux-pci, linux-arm-kernel, Rob Herring; +Cc: Ilias Apalodimas

Hi everyone

I'm trying to get a mainline kernel to run on my Turris Omnia, and am
having some trouble getting the PCI bus to work correctly. Specifically,
I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
the resource request fix[0] applied on top.

The kernel boots fine, and the patch in [0] makes the PCI devices show
up. But I'm still getting initialisation errors like these:

[    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
[    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
[    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)

and the WiFi drivers fail to initialise with what appears to me to be
errors related to the bus rather than to the drivers themselves:

[    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
[    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
[    3.524473] ath9k 0000:01:00.0: Failed to initialize device
[    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
[    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
[    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
[    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
[    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110

lspci looks OK, though:

# lspci
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)

Does anyone have any clue what could be going on here? Is this a bug, or
did I miss something in my config or other initialisation? I've tried
with both the stock u-boot distributed with the board, and with an
upstream u-boot from latest master; doesn't seem to make any different.

Any pointers will be greatly appreciated!

Thanks,

-Toke


[0] https://lore.kernel.org/linux-pci/20201023145252.2691779-1-robh@kernel.org/

Full dmesg:

[    1.546457] pci 0000:00:02.0: [11ab:6820] type 01 class 0x060400
[    1.546469] pci 0000:00:02.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.546615] pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400
[    1.546627] pci 0000:00:03.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.547341] PCI: bus0: Fast back to back transfers disabled
[    1.547349] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.547356] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.547363] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.547444] pci 0000:01:00.0: [168c:002e] type 00 class 0x028000
[    1.547466] pci 0000:01:00.0: reg 0x10: [mem 0xe8000000-0xe800ffff 64bit]
[    1.547576] pci 0000:01:00.0: supports D1
[    1.547581] pci 0000:01:00.0: PME# supported from D0 D1 D3hot
[    1.547692] pci 0000:00:01.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.601932] PCI: bus1: Fast back to back transfers enabled
[    1.601941] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.602039] pci 0000:02:00.0: [168c:003c] type 00 class 0x028000
[    1.602063] pci 0000:02:00.0: reg 0x10: [mem 0xea000000-0xea1fffff 64bit]
[    1.602096] pci 0000:02:00.0: reg 0x30: [mem 0xea200000-0xea20ffff pref]
[    1.602174] pci 0000:02:00.0: supports D1 D2
[    1.602273] pci 0000:00:02.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.631918] PCI: bus2: Fast back to back transfers enabled
[    1.631926] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
[    1.632623] PCI: bus3: Fast back to back transfers enabled
[    1.632630] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
[    1.632663] pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
[    1.632671] pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
[    1.632679] pci 0000:00:01.0: BAR 6: assigned [mem 0xe0100000-0xe01007ff pref]
[    1.632687] pci 0000:00:02.0: BAR 6: assigned [mem 0xe0500000-0xe05007ff pref]
[    1.632694] pci 0000:00:03.0: BAR 6: assigned [mem 0xe0600000-0xe06007ff pref]
[    1.632701] pci 0000:01:00.0: BAR 0: assigned [mem 0xe0000000-0xe000ffff 64bit]
[    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
[    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[    1.632720] pci 0000:00:01.0: PCI bridge to [bus 01]
[    1.632728] pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xe00fffff]
[    1.632737] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0200000-0xe03fffff 64bit]
[    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
[    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[    1.632757] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0400000-0xe040ffff pref]
[    1.632762] pci 0000:00:02.0: PCI bridge to [bus 02]
[    1.632768] pci 0000:00:02.0:   bridge window [mem 0xe0200000-0xe04fffff]
[    1.632774] pci 0000:00:03.0: PCI bridge to [bus 03]
[    1.633030] mv_xor f1060800.xor: Marvell shared XOR driver
[    1.691640] mv_xor f1060800.xor: Marvell XOR (Descriptor Mode): ( xor cpy intr )
[    1.691756] mv_xor f1060900.xor: Marvell shared XOR driver
[    1.751635] mv_xor f1060900.xor: Marvell XOR (Descriptor Mode): ( xor cpy intr )
[    1.769386] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    1.770240] printk: console [ttyS0] disabled
[    1.790351] f1012000.serial: ttyS0 at MMIO 0xf1012000 (irq = 30, base_baud = 15625000) is a 16550A
[    3.040783] printk: console [ttyS0] enabled
[    3.065621] f1012100.serial: ttyS1 at MMIO 0xf1012100 (irq = 31, base_baud = 15625000) is a 16550A
[    3.075329] ahci-mvebu f10a8000.sata: supply ahci not found, using dummy regulator
[    3.082990] ahci-mvebu f10a8000.sata: supply phy not found, using dummy regulator
[    3.090499] ahci-mvebu f10a8000.sata: supply target not found, using dummy regulator
[    3.098335] ahci-mvebu f10a8000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode
[    3.107411] ahci-mvebu f10a8000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs 
[    3.116657] scsi host0: ahci-mvebu
[    3.120302] scsi host1: ahci-mvebu
[    3.123825] ata1: SATA max UDMA/133 mmio [mem 0xf10a8000-0xf10a9fff] port 0x100 irq 53
[    3.131768] ata2: SATA max UDMA/133 mmio [mem 0xf10a8000-0xf10a9fff] port 0x180 irq 53
[    3.140560] spi-nor spi0.0: s25fl164k (8192 Kbytes)
[    3.145494] 2 fixed-partitions partitions found on MTD device spi0.0
[    3.151868] Creating 2 MTD partitions on "spi0.0":
[    3.156671] 0x000000000000-0x000000100000 : "U-Boot"
[    3.171461] 0x000000100000-0x000000800000 : "Rescue system"
[    3.191747] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[    3.199597] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[    3.209859] libphy: Fixed MDIO Bus: probed
[    3.214141] tun: Universal TUN/TAP device driver, 1.6
[    3.219584] libphy: orion_mdio_bus: probed
[    3.224542] mv88e6085 f1072004.mdio-mii:10: switch 0x1760 detected: Marvell 88E6176, revision 1
[    3.450274] libphy: mv88e6xxx SMI: probed
[    3.461815] mvneta f1070000.ethernet eth0: Using hardware mac address d8:58:d7:00:4e:98
[    3.470606] mvneta f1030000.ethernet eth1: Using hardware mac address d8:58:d7:00:4e:96
[    3.479356] mvneta f1034000.ethernet eth2: Using hardware mac address d8:58:d7:00:4e:97
[    3.482630] ata1: SATA link down (SStatus 0 SControl 300)
[    3.487588] pci 0000:00:01.0: enabling device (0140 -> 0142)
[    3.492831] ata2: SATA link down (SStatus 0 SControl 300)
[    3.498496] ath9k 0000:01:00.0: enabling device (0000 -> 0002)
[    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
[    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
[    3.524473] ath9k 0000:01:00.0: Failed to initialize device
[    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
[    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
[    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
[    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
[    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
[    3.601529] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    3.608072] ehci-pci: EHCI PCI platform driver
[    3.612553] ehci-orion: EHCI orion driver
[    3.616675] orion-ehci f1058000.usb: EHCI Host Controller
[    3.622105] orion-ehci f1058000.usb: new USB bus registered, assigned bus number 1
[    3.629733] orion-ehci f1058000.usb: irq 49, io mem 0xf1058000
[    3.661261] orion-ehci f1058000.usb: USB 2.0 started, EHCI 1.00
[    3.667530] hub 1-0:1.0: USB hub found
[    3.671321] hub 1-0:1.0: 1 port detected
[    3.675700] xhci-hcd f10f0000.usb3: xHCI Host Controller
[    3.681034] xhci-hcd f10f0000.usb3: new USB bus registered, assigned bus number 2
[    3.688599] xhci-hcd f10f0000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x0000000000010010
[    3.697867] xhci-hcd f10f0000.usb3: irq 55, io mem 0xf10f0000
[    3.703905] hub 2-0:1.0: USB hub found
[    3.707678] hub 2-0:1.0: 1 port detected
[    3.711767] xhci-hcd f10f0000.usb3: xHCI Host Controller
[    3.717096] xhci-hcd f10f0000.usb3: new USB bus registered, assigned bus number 3
[    3.724621] xhci-hcd f10f0000.usb3: Host supports USB 3.0 SuperSpeed
[    3.731026] usb usb3: We don't know the algorithms for LPM for this host, disabling LPM.
[    3.739388] hub 3-0:1.0: USB hub found
[    3.743167] hub 3-0:1.0: 1 port detected
[    3.747339] xhci-hcd f10f8000.usb3: xHCI Host Controller
[    3.752684] xhci-hcd f10f8000.usb3: new USB bus registered, assigned bus number 4
[    3.760230] xhci-hcd f10f8000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x0000000000010010
[    3.769502] xhci-hcd f10f8000.usb3: irq 56, io mem 0xf10f8000
[    3.775527] hub 4-0:1.0: USB hub found
[    3.779298] hub 4-0:1.0: 1 port detected
[    3.783756] xhci-hcd f10f8000.usb3: xHCI Host Controller
[    3.789086] xhci-hcd f10f8000.usb3: new USB bus registered, assigned bus number 5
[    3.796610] xhci-hcd f10f8000.usb3: Host supports USB 3.0 SuperSpeed
[    3.803012] usb usb5: We don't know the algorithms for LPM for this host, disabling LPM.
[    3.811375] hub 5-0:1.0: USB hub found
[    3.815147] hub 5-0:1.0: 1 port detected
[    3.819312] usbcore: registered new interface driver usb-storage
[    3.826044] armada38x-rtc f10a3800.rtc: registered as rtc0
[    3.831632] armada38x-rtc f10a3800.rtc: setting system clock to 2020-10-27T15:31:52 UTC (1603812712)
[    3.840905] i2c /dev entries driver
[    3.846565] orion_wdt: Initial timeout 171 sec
[    3.851350] sdhci: Secure Digital Host Controller Interface driver
[    3.857544] sdhci: Copyright(c) Pierre Ossman
[    3.862041] sdhci-pltfm: SDHCI platform and OF driver helper
[    3.868792] marvell-cesa f1090000.crypto: CESA device successfully registered
[    3.876106] usbcore: registered new interface driver usbhid
[    3.881715] usbhid: USB HID core driver
[    3.885678] GACT probability on
[    3.888837] Mirror/redirect action on
[    3.892589] Simple TC action Loaded
[    3.893793] mmc0: SDHCI controller on f10d8000.sdhci [f10d8000.sdhci] using ADMA
[    3.896117] u32 classifier
[    3.906258]     Performance counters on
[    3.910113]     input device check on
[    3.913812]     Actions configured
[    3.917606] NET: Registered protocol family 10
[    3.922867] Segment Routing with IPv6
[    3.926605] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    3.932956] NET: Registered protocol family 17
[    3.937500] 8021q: 802.1Q VLAN Support v1.8
[    3.941850] ThumbEE CPU extension supported.
[    3.946133] Registering SWP/SWPB emulation handler
[    3.951024] Loading compiled-in X.509 certificates
[    3.956916] Btrfs loaded, crc32c=crc32c-generic
[    3.961872] mv88e6085 f1072004.mdio-mii:10: switch 0x1760 detected: Marvell 88E6176, revision 1
[    4.027817] mmc0: new high speed MMC card at address 0001
[    4.033650] mmcblk0: mmc0:0001 H8G4a\x92 7.28 GiB 
[    4.038323] mmcblk0boot0: mmc0:0001 H8G4a\x92 partition 1 4.00 MiB
[    4.044421] mmcblk0boot1: mmc0:0001 H8G4a\x92 partition 2 4.00 MiB
[    4.050457] mmcblk0rpmb: mmc0:0001 H8G4a\x92 partition 3 4.00 MiB, chardev (250:0)
[    4.059708]  mmcblk0: p1
[    4.081276] usb 2-1: new high-speed USB device number 2 using xhci-hcd
[    4.169488] libphy: mv88e6xxx SMI: probed
[    4.261911] usb-storage 2-1:1.0: USB Mass Storage device detected
[    4.268229] scsi host2: usb-storage 2-1:1.0
[    4.816096] mv88e6085 f1072004.mdio-mii:10 lan0 (uninitialized): PHY [mv88e6xxx-1:00] driver [Marvell 88E1540] (irq=70)
[    4.842702] mv88e6085 f1072004.mdio-mii:10 lan1 (uninitialized): PHY [mv88e6xxx-1:01] driver [Marvell 88E1540] (irq=71)
[    4.869246] mv88e6085 f1072004.mdio-mii:10 lan2 (uninitialized): PHY [mv88e6xxx-1:02] driver [Marvell 88E1540] (irq=72)
[    4.895772] mv88e6085 f1072004.mdio-mii:10 lan3 (uninitialized): PHY [mv88e6xxx-1:03] driver [Marvell 88E1540] (irq=73)
[    4.920733] mv88e6085 f1072004.mdio-mii:10 lan4 (uninitialized): PHY [mv88e6xxx-1:04] driver [Marvell 88E1540] (irq=74)
[    4.939701] mv88e6085 f1072004.mdio-mii:10: configuring for fixed/rgmii-id link mode
[    4.950089] mv88e6085 f1072004.mdio-mii:10: Link is Up - 1Gbps/Full - flow control off
[    4.958047] DSA: tree 0 setup
[    4.961339] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[    4.970623] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[    4.977231] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[    4.985879] cfg80211: failed to load regulatory.db
[    4.990987] Waiting 2 sec before mounting root device...
[    5.351539] scsi 2:0:0:0: Direct-Access     General  UDisk            5.00 PQ: 0 ANSI: 2
[    5.360060] sd 2:0:0:0: [sda] 7987200 512-byte logical blocks: (4.09 GB/3.81 GiB)
[    5.367691] sd 2:0:0:0: [sda] Write Protect is off
[    5.372503] sd 2:0:0:0: [sda] Mode Sense: 0b 00 00 08
[    5.372605] sd 2:0:0:0: [sda] No Caching mode page found
[    5.377931] sd 2:0:0:0: [sda] Assuming drive cache: write through
[    5.435076]  sda: sda1
[    5.438130] sd 2:0:0:0: [sda] Attached SCSI removable disk
[    7.047873] BTRFS: device fsid 448334b8-1b27-4738-8118-9e70b56b1e58 devid 1 transid 680 /dev/root scanned by swapper/0 (1)
[    7.059562] BTRFS info (device mmcblk0p1): disk space caching is enabled
[    7.066294] BTRFS info (device mmcblk0p1): has skinny extents
[    7.078585] BTRFS info (device mmcblk0p1): enabling ssd optimizations
[    7.087624] VFS: Mounted root (btrfs filesystem) on device 0:12.
[    7.094044] devtmpfs: mounted
[    7.097581] Freeing unused kernel memory: 1024K
[    7.131431] Run /sbin/init as init process
[    7.135536]   with arguments:
[    7.135539]     /sbin/init
[    7.135541]     earlyprintk
[    7.135543]   with environment:
[    7.135545]     HOME=/
[    7.135548]     TERM=linux
[    7.220335] random: fast init done
[    7.650974] systemd[1]: systemd 246.6-1.1-arch running in system mode. (+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
[    7.674141] systemd[1]: Detected architecture arm.
[    7.752534] systemd[1]: Set hostname to <omnia-arch>.
[    7.938493] systemd[164]: /usr/lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 1.
[    8.148416] systemd[1]: Queued start job for default target Graphical Interface.
[    8.156570] random: systemd: uninitialized urandom read (16 bytes read)
[    8.164923] systemd[1]: Created slice system-getty.slice.
[    8.201373] random: systemd: uninitialized urandom read (16 bytes read)
[    8.208682] systemd[1]: Created slice system-modprobe.slice.
[    8.241347] random: systemd: uninitialized urandom read (16 bytes read)
[    8.248610] systemd[1]: Created slice system-serial\x2dgetty.slice.
[    8.281970] systemd[1]: Created slice User and Session Slice.
[    8.321507] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[    8.371436] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[    8.421373] systemd[1]: Condition check resulted in Arbitrary Executable File Formats File System Automount Point being skipped.
[    8.433099] systemd[1]: Reached target Local Encrypted Volumes.
[    8.481453] systemd[1]: Reached target Paths.
[    8.521358] systemd[1]: Reached target Remote File Systems.
[    8.571330] systemd[1]: Reached target Slices.
[    8.611374] systemd[1]: Reached target Swap.
[    8.641568] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[    8.693521] systemd[1]: Listening on Process Core Dump Socket.
[    8.745061] systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
[    8.759882] systemd[1]: Listening on Journal Socket (/dev/log).
[    8.801664] systemd[1]: Listening on Journal Socket.
[    8.848051] systemd[1]: Listening on Network Service Netlink Socket.
[    8.892567] systemd[1]: Listening on udev Control Socket.
[    8.941553] systemd[1]: Listening on udev Kernel Socket.
[    8.981628] systemd[1]: Condition check resulted in Huge Pages File System being skipped.
[    8.990034] systemd[1]: Condition check resulted in POSIX Message Queue File System being skipped.
[    8.999279] systemd[1]: Condition check resulted in Kernel Debug File System being skipped.
[    9.010575] systemd[1]: Mounting Kernel Trace File System...
[    9.043918] systemd[1]: Mounting Temporary Directory (/tmp)...
[    9.081515] systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
[    9.095686] systemd[1]: Starting Load Kernel Module drm...
[    9.138063] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
[    9.148885] systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
[    9.157149] systemd[1]: Condition check resulted in FUSE Control File System being skipped.
[    9.165757] systemd[1]: Condition check resulted in Kernel Configuration File System being skipped.
[    9.177585] systemd[1]: Starting Remount Root and Kernel File Systems...
[    9.211505] systemd[1]: Condition check resulted in Repartition Root Disk being skipped.
[    9.222607] systemd[1]: Starting Apply Kernel Variables...
[    9.264359] systemd[1]: Starting Coldplug All udev Devices...
[    9.305343] systemd[1]: Mounted Kernel Trace File System.
[    9.352014] systemd[1]: Mounted Temporary Directory (/tmp).
[    9.391997] systemd[1]: modprobe@drm.service: Succeeded.
[    9.398342] systemd[1]: Finished Load Kernel Module drm.
[    9.436436] systemd[1]: Finished Remount Root and Kernel File Systems.
[    9.472894] systemd[1]: Finished Apply Kernel Variables.
[    9.514310] systemd[1]: Condition check resulted in First Boot Wizard being skipped.
[    9.529349] systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
[    9.540514] systemd[1]: Starting Load/Save Random Seed...
[    9.561757] systemd[1]: Condition check resulted in Create System Users being skipped.
[    9.578340] systemd[1]: Starting Create Static Device Nodes in /dev...
[    9.639784] systemd[1]: Finished Create Static Device Nodes in /dev.
[    9.692025] systemd[1]: Reached target Local File Systems (Pre).
[    9.741485] systemd[1]: Condition check resulted in Virtual Machine and Container Storage (Compatibility) being skipped.
[    9.752637] systemd[1]: Reached target Local File Systems.
[    9.794672] systemd[1]: Started Entropy Daemon based on the HAVEGE algorithm.
[    9.831672] systemd[1]: Condition check resulted in Rebuild Dynamic Linker Cache being skipped.
[    9.844130] systemd[1]: Starting Journal Service...
[    9.861510] systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
[    9.885260] systemd[1]: Starting Rule-based Manager for Device Events and Files...
[    9.932999] systemd[1]: Finished Coldplug All udev Devices.
[   10.175983] systemd[1]: Started Journal Service.
[   11.579842] mvneta f1070000.ethernet eth0: configuring for fixed/rgmii link mode
[   11.607754] mvneta f1070000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
[   11.787479] mvneta f1034000.ethernet eth2: PHY [f1072004.mdio-mii:01] driver [Marvell 88E1510] (irq=POLL)
[   11.817734] mvneta f1034000.ethernet eth2: configuring for phy/sgmii link mode
[   12.102369] BTRFS info (device mmcblk0p1): devid 1 device path /dev/root changed to /dev/mmcblk0p1 scanned by systemd-udevd (194)
[   13.131291] random: crng init done
[   13.134710] random: 7 urandom warning(s) missed due to ratelimiting
[   14.961639] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[   14.969684] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 15:43 PCI trouble on mvebu (Turris Omnia) Toke Høiland-Jørgensen
@ 2020-10-27 17:20 ` Bjorn Helgaas
  2020-10-27 17:44   ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-27 18:56   ` Toke Høiland-Jørgensen
  2020-10-27 18:03 ` Marek Behun
  1 sibling, 2 replies; 62+ messages in thread
From: Bjorn Helgaas @ 2020-10-27 17:20 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas, vtolkm

[+cc vtolkm]

On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
> Hi everyone
> 
> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
> having some trouble getting the PCI bus to work correctly. Specifically,
> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
> the resource request fix[0] applied on top.
> 
> The kernel boots fine, and the patch in [0] makes the PCI devices show
> up. But I'm still getting initialisation errors like these:
> 
> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> 
> and the WiFi drivers fail to initialise with what appears to me to be
> errors related to the bus rather than to the drivers themselves:
> 
> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> 
> lspci looks OK, though:
> 
> # lspci
> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
> 
> Does anyone have any clue what could be going on here? Is this a bug, or
> did I miss something in my config or other initialisation? I've tried
> with both the stock u-boot distributed with the board, and with an
> upstream u-boot from latest master; doesn't seem to make any different.

Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
don't think we have a fix yet.

> [0] https://lore.kernel.org/linux-pci/20201023145252.2691779-1-robh@kernel.org/
> 
> Full dmesg:
> 
> [    1.546457] pci 0000:00:02.0: [11ab:6820] type 01 class 0x060400
> [    1.546469] pci 0000:00:02.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
> [    1.546615] pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400
> [    1.546627] pci 0000:00:03.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
> [    1.547341] PCI: bus0: Fast back to back transfers disabled
> [    1.547349] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [    1.547356] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [    1.547363] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [    1.547444] pci 0000:01:00.0: [168c:002e] type 00 class 0x028000
> [    1.547466] pci 0000:01:00.0: reg 0x10: [mem 0xe8000000-0xe800ffff 64bit]
> [    1.547576] pci 0000:01:00.0: supports D1
> [    1.547581] pci 0000:01:00.0: PME# supported from D0 D1 D3hot
> [    1.547692] pci 0000:00:01.0: ASPM: current common clock configuration is inconsistent, reconfiguring
> [    1.601932] PCI: bus1: Fast back to back transfers enabled
> [    1.601941] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> [    1.602039] pci 0000:02:00.0: [168c:003c] type 00 class 0x028000
> [    1.602063] pci 0000:02:00.0: reg 0x10: [mem 0xea000000-0xea1fffff 64bit]
> [    1.602096] pci 0000:02:00.0: reg 0x30: [mem 0xea200000-0xea20ffff pref]
> [    1.602174] pci 0000:02:00.0: supports D1 D2
> [    1.602273] pci 0000:00:02.0: ASPM: current common clock configuration is inconsistent, reconfiguring
> [    1.631918] PCI: bus2: Fast back to back transfers enabled
> [    1.631926] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
> [    1.632623] PCI: bus3: Fast back to back transfers enabled
> [    1.632630] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
> [    1.632663] pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
> [    1.632671] pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
> [    1.632679] pci 0000:00:01.0: BAR 6: assigned [mem 0xe0100000-0xe01007ff pref]
> [    1.632687] pci 0000:00:02.0: BAR 6: assigned [mem 0xe0500000-0xe05007ff pref]
> [    1.632694] pci 0000:00:03.0: BAR 6: assigned [mem 0xe0600000-0xe06007ff pref]
> [    1.632701] pci 0000:01:00.0: BAR 0: assigned [mem 0xe0000000-0xe000ffff 64bit]
> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> [    1.632720] pci 0000:00:01.0: PCI bridge to [bus 01]
> [    1.632728] pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xe00fffff]
> [    1.632737] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0200000-0xe03fffff 64bit]
> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> [    1.632757] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0400000-0xe040ffff pref]
> [    1.632762] pci 0000:00:02.0: PCI bridge to [bus 02]
> [    1.632768] pci 0000:00:02.0:   bridge window [mem 0xe0200000-0xe04fffff]
> [    1.632774] pci 0000:00:03.0: PCI bridge to [bus 03]
> [    1.633030] mv_xor f1060800.xor: Marvell shared XOR driver
> [    1.691640] mv_xor f1060800.xor: Marvell XOR (Descriptor Mode): ( xor cpy intr )
> [    1.691756] mv_xor f1060900.xor: Marvell shared XOR driver
> [    1.751635] mv_xor f1060900.xor: Marvell XOR (Descriptor Mode): ( xor cpy intr )
> [    1.769386] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> [    1.770240] printk: console [ttyS0] disabled
> [    1.790351] f1012000.serial: ttyS0 at MMIO 0xf1012000 (irq = 30, base_baud = 15625000) is a 16550A
> [    3.040783] printk: console [ttyS0] enabled
> [    3.065621] f1012100.serial: ttyS1 at MMIO 0xf1012100 (irq = 31, base_baud = 15625000) is a 16550A
> [    3.075329] ahci-mvebu f10a8000.sata: supply ahci not found, using dummy regulator
> [    3.082990] ahci-mvebu f10a8000.sata: supply phy not found, using dummy regulator
> [    3.090499] ahci-mvebu f10a8000.sata: supply target not found, using dummy regulator
> [    3.098335] ahci-mvebu f10a8000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode
> [    3.107411] ahci-mvebu f10a8000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs 
> [    3.116657] scsi host0: ahci-mvebu
> [    3.120302] scsi host1: ahci-mvebu
> [    3.123825] ata1: SATA max UDMA/133 mmio [mem 0xf10a8000-0xf10a9fff] port 0x100 irq 53
> [    3.131768] ata2: SATA max UDMA/133 mmio [mem 0xf10a8000-0xf10a9fff] port 0x180 irq 53
> [    3.140560] spi-nor spi0.0: s25fl164k (8192 Kbytes)
> [    3.145494] 2 fixed-partitions partitions found on MTD device spi0.0
> [    3.151868] Creating 2 MTD partitions on "spi0.0":
> [    3.156671] 0x000000000000-0x000000100000 : "U-Boot"
> [    3.171461] 0x000000100000-0x000000800000 : "Rescue system"
> [    3.191747] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
> [    3.199597] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> [    3.209859] libphy: Fixed MDIO Bus: probed
> [    3.214141] tun: Universal TUN/TAP device driver, 1.6
> [    3.219584] libphy: orion_mdio_bus: probed
> [    3.224542] mv88e6085 f1072004.mdio-mii:10: switch 0x1760 detected: Marvell 88E6176, revision 1
> [    3.450274] libphy: mv88e6xxx SMI: probed
> [    3.461815] mvneta f1070000.ethernet eth0: Using hardware mac address d8:58:d7:00:4e:98
> [    3.470606] mvneta f1030000.ethernet eth1: Using hardware mac address d8:58:d7:00:4e:96
> [    3.479356] mvneta f1034000.ethernet eth2: Using hardware mac address d8:58:d7:00:4e:97
> [    3.482630] ata1: SATA link down (SStatus 0 SControl 300)
> [    3.487588] pci 0000:00:01.0: enabling device (0140 -> 0142)
> [    3.492831] ata2: SATA link down (SStatus 0 SControl 300)
> [    3.498496] ath9k 0000:01:00.0: enabling device (0000 -> 0002)
> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> [    3.601529] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> [    3.608072] ehci-pci: EHCI PCI platform driver
> [    3.612553] ehci-orion: EHCI orion driver
> [    3.616675] orion-ehci f1058000.usb: EHCI Host Controller
> [    3.622105] orion-ehci f1058000.usb: new USB bus registered, assigned bus number 1
> [    3.629733] orion-ehci f1058000.usb: irq 49, io mem 0xf1058000
> [    3.661261] orion-ehci f1058000.usb: USB 2.0 started, EHCI 1.00
> [    3.667530] hub 1-0:1.0: USB hub found
> [    3.671321] hub 1-0:1.0: 1 port detected
> [    3.675700] xhci-hcd f10f0000.usb3: xHCI Host Controller
> [    3.681034] xhci-hcd f10f0000.usb3: new USB bus registered, assigned bus number 2
> [    3.688599] xhci-hcd f10f0000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x0000000000010010
> [    3.697867] xhci-hcd f10f0000.usb3: irq 55, io mem 0xf10f0000
> [    3.703905] hub 2-0:1.0: USB hub found
> [    3.707678] hub 2-0:1.0: 1 port detected
> [    3.711767] xhci-hcd f10f0000.usb3: xHCI Host Controller
> [    3.717096] xhci-hcd f10f0000.usb3: new USB bus registered, assigned bus number 3
> [    3.724621] xhci-hcd f10f0000.usb3: Host supports USB 3.0 SuperSpeed
> [    3.731026] usb usb3: We don't know the algorithms for LPM for this host, disabling LPM.
> [    3.739388] hub 3-0:1.0: USB hub found
> [    3.743167] hub 3-0:1.0: 1 port detected
> [    3.747339] xhci-hcd f10f8000.usb3: xHCI Host Controller
> [    3.752684] xhci-hcd f10f8000.usb3: new USB bus registered, assigned bus number 4
> [    3.760230] xhci-hcd f10f8000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x0000000000010010
> [    3.769502] xhci-hcd f10f8000.usb3: irq 56, io mem 0xf10f8000
> [    3.775527] hub 4-0:1.0: USB hub found
> [    3.779298] hub 4-0:1.0: 1 port detected
> [    3.783756] xhci-hcd f10f8000.usb3: xHCI Host Controller
> [    3.789086] xhci-hcd f10f8000.usb3: new USB bus registered, assigned bus number 5
> [    3.796610] xhci-hcd f10f8000.usb3: Host supports USB 3.0 SuperSpeed
> [    3.803012] usb usb5: We don't know the algorithms for LPM for this host, disabling LPM.
> [    3.811375] hub 5-0:1.0: USB hub found
> [    3.815147] hub 5-0:1.0: 1 port detected
> [    3.819312] usbcore: registered new interface driver usb-storage
> [    3.826044] armada38x-rtc f10a3800.rtc: registered as rtc0
> [    3.831632] armada38x-rtc f10a3800.rtc: setting system clock to 2020-10-27T15:31:52 UTC (1603812712)
> [    3.840905] i2c /dev entries driver
> [    3.846565] orion_wdt: Initial timeout 171 sec
> [    3.851350] sdhci: Secure Digital Host Controller Interface driver
> [    3.857544] sdhci: Copyright(c) Pierre Ossman
> [    3.862041] sdhci-pltfm: SDHCI platform and OF driver helper
> [    3.868792] marvell-cesa f1090000.crypto: CESA device successfully registered
> [    3.876106] usbcore: registered new interface driver usbhid
> [    3.881715] usbhid: USB HID core driver
> [    3.885678] GACT probability on
> [    3.888837] Mirror/redirect action on
> [    3.892589] Simple TC action Loaded
> [    3.893793] mmc0: SDHCI controller on f10d8000.sdhci [f10d8000.sdhci] using ADMA
> [    3.896117] u32 classifier
> [    3.906258]     Performance counters on
> [    3.910113]     input device check on
> [    3.913812]     Actions configured
> [    3.917606] NET: Registered protocol family 10
> [    3.922867] Segment Routing with IPv6
> [    3.926605] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
> [    3.932956] NET: Registered protocol family 17
> [    3.937500] 8021q: 802.1Q VLAN Support v1.8
> [    3.941850] ThumbEE CPU extension supported.
> [    3.946133] Registering SWP/SWPB emulation handler
> [    3.951024] Loading compiled-in X.509 certificates
> [    3.956916] Btrfs loaded, crc32c=crc32c-generic
> [    3.961872] mv88e6085 f1072004.mdio-mii:10: switch 0x1760 detected: Marvell 88E6176, revision 1
> [    4.027817] mmc0: new high speed MMC card at address 0001
> [    4.033650] mmcblk0: mmc0:0001 H8G4a\x92 7.28 GiB 
> [    4.038323] mmcblk0boot0: mmc0:0001 H8G4a\x92 partition 1 4.00 MiB
> [    4.044421] mmcblk0boot1: mmc0:0001 H8G4a\x92 partition 2 4.00 MiB
> [    4.050457] mmcblk0rpmb: mmc0:0001 H8G4a\x92 partition 3 4.00 MiB, chardev (250:0)
> [    4.059708]  mmcblk0: p1
> [    4.081276] usb 2-1: new high-speed USB device number 2 using xhci-hcd
> [    4.169488] libphy: mv88e6xxx SMI: probed
> [    4.261911] usb-storage 2-1:1.0: USB Mass Storage device detected
> [    4.268229] scsi host2: usb-storage 2-1:1.0
> [    4.816096] mv88e6085 f1072004.mdio-mii:10 lan0 (uninitialized): PHY [mv88e6xxx-1:00] driver [Marvell 88E1540] (irq=70)
> [    4.842702] mv88e6085 f1072004.mdio-mii:10 lan1 (uninitialized): PHY [mv88e6xxx-1:01] driver [Marvell 88E1540] (irq=71)
> [    4.869246] mv88e6085 f1072004.mdio-mii:10 lan2 (uninitialized): PHY [mv88e6xxx-1:02] driver [Marvell 88E1540] (irq=72)
> [    4.895772] mv88e6085 f1072004.mdio-mii:10 lan3 (uninitialized): PHY [mv88e6xxx-1:03] driver [Marvell 88E1540] (irq=73)
> [    4.920733] mv88e6085 f1072004.mdio-mii:10 lan4 (uninitialized): PHY [mv88e6xxx-1:04] driver [Marvell 88E1540] (irq=74)
> [    4.939701] mv88e6085 f1072004.mdio-mii:10: configuring for fixed/rgmii-id link mode
> [    4.950089] mv88e6085 f1072004.mdio-mii:10: Link is Up - 1Gbps/Full - flow control off
> [    4.958047] DSA: tree 0 setup
> [    4.961339] cfg80211: Loading compiled-in X.509 certificates for regulatory database
> [    4.970623] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
> [    4.977231] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
> [    4.985879] cfg80211: failed to load regulatory.db
> [    4.990987] Waiting 2 sec before mounting root device...
> [    5.351539] scsi 2:0:0:0: Direct-Access     General  UDisk            5.00 PQ: 0 ANSI: 2
> [    5.360060] sd 2:0:0:0: [sda] 7987200 512-byte logical blocks: (4.09 GB/3.81 GiB)
> [    5.367691] sd 2:0:0:0: [sda] Write Protect is off
> [    5.372503] sd 2:0:0:0: [sda] Mode Sense: 0b 00 00 08
> [    5.372605] sd 2:0:0:0: [sda] No Caching mode page found
> [    5.377931] sd 2:0:0:0: [sda] Assuming drive cache: write through
> [    5.435076]  sda: sda1
> [    5.438130] sd 2:0:0:0: [sda] Attached SCSI removable disk
> [    7.047873] BTRFS: device fsid 448334b8-1b27-4738-8118-9e70b56b1e58 devid 1 transid 680 /dev/root scanned by swapper/0 (1)
> [    7.059562] BTRFS info (device mmcblk0p1): disk space caching is enabled
> [    7.066294] BTRFS info (device mmcblk0p1): has skinny extents
> [    7.078585] BTRFS info (device mmcblk0p1): enabling ssd optimizations
> [    7.087624] VFS: Mounted root (btrfs filesystem) on device 0:12.
> [    7.094044] devtmpfs: mounted
> [    7.097581] Freeing unused kernel memory: 1024K
> [    7.131431] Run /sbin/init as init process
> [    7.135536]   with arguments:
> [    7.135539]     /sbin/init
> [    7.135541]     earlyprintk
> [    7.135543]   with environment:
> [    7.135545]     HOME=/
> [    7.135548]     TERM=linux
> [    7.220335] random: fast init done
> [    7.650974] systemd[1]: systemd 246.6-1.1-arch running in system mode. (+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
> [    7.674141] systemd[1]: Detected architecture arm.
> [    7.752534] systemd[1]: Set hostname to <omnia-arch>.
> [    7.938493] systemd[164]: /usr/lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 1.
> [    8.148416] systemd[1]: Queued start job for default target Graphical Interface.
> [    8.156570] random: systemd: uninitialized urandom read (16 bytes read)
> [    8.164923] systemd[1]: Created slice system-getty.slice.
> [    8.201373] random: systemd: uninitialized urandom read (16 bytes read)
> [    8.208682] systemd[1]: Created slice system-modprobe.slice.
> [    8.241347] random: systemd: uninitialized urandom read (16 bytes read)
> [    8.248610] systemd[1]: Created slice system-serial\x2dgetty.slice.
> [    8.281970] systemd[1]: Created slice User and Session Slice.
> [    8.321507] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
> [    8.371436] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
> [    8.421373] systemd[1]: Condition check resulted in Arbitrary Executable File Formats File System Automount Point being skipped.
> [    8.433099] systemd[1]: Reached target Local Encrypted Volumes.
> [    8.481453] systemd[1]: Reached target Paths.
> [    8.521358] systemd[1]: Reached target Remote File Systems.
> [    8.571330] systemd[1]: Reached target Slices.
> [    8.611374] systemd[1]: Reached target Swap.
> [    8.641568] systemd[1]: Listening on Device-mapper event daemon FIFOs.
> [    8.693521] systemd[1]: Listening on Process Core Dump Socket.
> [    8.745061] systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
> [    8.759882] systemd[1]: Listening on Journal Socket (/dev/log).
> [    8.801664] systemd[1]: Listening on Journal Socket.
> [    8.848051] systemd[1]: Listening on Network Service Netlink Socket.
> [    8.892567] systemd[1]: Listening on udev Control Socket.
> [    8.941553] systemd[1]: Listening on udev Kernel Socket.
> [    8.981628] systemd[1]: Condition check resulted in Huge Pages File System being skipped.
> [    8.990034] systemd[1]: Condition check resulted in POSIX Message Queue File System being skipped.
> [    8.999279] systemd[1]: Condition check resulted in Kernel Debug File System being skipped.
> [    9.010575] systemd[1]: Mounting Kernel Trace File System...
> [    9.043918] systemd[1]: Mounting Temporary Directory (/tmp)...
> [    9.081515] systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
> [    9.095686] systemd[1]: Starting Load Kernel Module drm...
> [    9.138063] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
> [    9.148885] systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
> [    9.157149] systemd[1]: Condition check resulted in FUSE Control File System being skipped.
> [    9.165757] systemd[1]: Condition check resulted in Kernel Configuration File System being skipped.
> [    9.177585] systemd[1]: Starting Remount Root and Kernel File Systems...
> [    9.211505] systemd[1]: Condition check resulted in Repartition Root Disk being skipped.
> [    9.222607] systemd[1]: Starting Apply Kernel Variables...
> [    9.264359] systemd[1]: Starting Coldplug All udev Devices...
> [    9.305343] systemd[1]: Mounted Kernel Trace File System.
> [    9.352014] systemd[1]: Mounted Temporary Directory (/tmp).
> [    9.391997] systemd[1]: modprobe@drm.service: Succeeded.
> [    9.398342] systemd[1]: Finished Load Kernel Module drm.
> [    9.436436] systemd[1]: Finished Remount Root and Kernel File Systems.
> [    9.472894] systemd[1]: Finished Apply Kernel Variables.
> [    9.514310] systemd[1]: Condition check resulted in First Boot Wizard being skipped.
> [    9.529349] systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
> [    9.540514] systemd[1]: Starting Load/Save Random Seed...
> [    9.561757] systemd[1]: Condition check resulted in Create System Users being skipped.
> [    9.578340] systemd[1]: Starting Create Static Device Nodes in /dev...
> [    9.639784] systemd[1]: Finished Create Static Device Nodes in /dev.
> [    9.692025] systemd[1]: Reached target Local File Systems (Pre).
> [    9.741485] systemd[1]: Condition check resulted in Virtual Machine and Container Storage (Compatibility) being skipped.
> [    9.752637] systemd[1]: Reached target Local File Systems.
> [    9.794672] systemd[1]: Started Entropy Daemon based on the HAVEGE algorithm.
> [    9.831672] systemd[1]: Condition check resulted in Rebuild Dynamic Linker Cache being skipped.
> [    9.844130] systemd[1]: Starting Journal Service...
> [    9.861510] systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
> [    9.885260] systemd[1]: Starting Rule-based Manager for Device Events and Files...
> [    9.932999] systemd[1]: Finished Coldplug All udev Devices.
> [   10.175983] systemd[1]: Started Journal Service.
> [   11.579842] mvneta f1070000.ethernet eth0: configuring for fixed/rgmii link mode
> [   11.607754] mvneta f1070000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
> [   11.787479] mvneta f1034000.ethernet eth2: PHY [f1072004.mdio-mii:01] driver [Marvell 88E1510] (irq=POLL)
> [   11.817734] mvneta f1034000.ethernet eth2: configuring for phy/sgmii link mode
> [   12.102369] BTRFS info (device mmcblk0p1): devid 1 device path /dev/root changed to /dev/mmcblk0p1 scanned by systemd-udevd (194)
> [   13.131291] random: crng init done
> [   13.134710] random: 7 urandom warning(s) missed due to ratelimiting
> [   14.961639] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
> [   14.969684] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 17:20 ` Bjorn Helgaas
@ 2020-10-27 17:44   ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-27 18:59     ` Toke Høiland-Jørgensen
  2020-10-27 18:56   ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 62+ messages in thread
From: ™֟☻̭҇ Ѽ ҉ ® @ 2020-10-27 17:44 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	vtolkm, Bjorn Helgaas


[-- Attachment #1.1.1: Type: text/plain, Size: 4274 bytes --]


On 27/10/2020 18:20, Bjorn Helgaas wrote:
> [+cc vtolkm]
>
> On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
>> Hi everyone
>>
>> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
>> having some trouble getting the PCI bus to work correctly. Specifically,
>> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
>> the resource request fix[0] applied on top.
>>
>> The kernel boots fine, and the patch in [0] makes the PCI devices show
>> up. But I'm still getting initialisation errors like these:
>>
>> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
>> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>>
>> and the WiFi drivers fail to initialise with what appears to me to be
>> errors related to the bus rather than to the drivers themselves:
>>
>> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
>> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
>> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
>> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
>> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
>> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
>> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>>
>> lspci looks OK, though:
>>
>> # lspci
>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
>> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
>>
>> Does anyone have any clue what could be going on here? Is this a bug, or
>> did I miss something in my config or other initialisation? I've tried
>> with both the stock u-boot distributed with the board, and with an
>> upstream u-boot from latest master; doesn't seem to make any different.
> Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
> report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
> don't think we have a fix yet.
>

Got the same device working with > 5.10.0-rc1-next-20201027-to-dirty < 
but ASPM turned off, as mentioned in the cited bug report.


  dmesg | grep ath

ath10k_pci 0000:02:00.0: enabling device (0140 -> 0142)
ath10k_pci 0000:02:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
ath10k_pci 0000:02:00.0: qca988x hw2.0 target 0x4100016c chip_id 
0x043202ff sub 0000:0000
ath9k 0000:03:00.0: enabling device (0140 -> 0142)
ath10k_pci 0000:02:00.0: kconfig debug 1 debugfs 0 tracing 1 dfs 0 
testmode 0
ath10k_pci 0000:02:00.0: firmware ver 10.2.4-1.0-00047 api 5 features 
no-p2p,raw-mode,mfp,allows-mesh-bcast crc32 35bd9258
ath: EEPROM regdomain sanitized
ath: EEPROM regdomain: 0x64
ath: EEPROM indicates we should expect a direct regpair map
ath: Country alpha2 being used: 00
ath: Regpair used: 0x64
ath10k_pci 0000:02:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
ath10k_pci 0000:02:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp max-sta 
128 raw 0 hwcrypto 1
ath: EEPROM regdomain sanitized
ath: EEPROM regdomain: 0x64
ath: EEPROM indicates we should expect a direct regpair map
ath: Country alpha2 being used: 00
ath: Regpair used: 0x64
ath10k_pci 0000:02:00.0: pdev param 0 not supported by firmware

----

Note: related issues - workaround compile ath and cfg80211 as modules

(1) https://bugzilla.kernel.org/show_bug.cgi?id=209863
(2) https://bugzilla.kernel.org/show_bug.cgi?id=209855
(3) https://bugzilla.kernel.org/show_bug.cgi?id=209853





[-- Attachment #1.1.2: OpenPGP_0x729CFF47A416598B.asc --]
[-- Type: application/pgp-keys, Size: 3163 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 15:43 PCI trouble on mvebu (Turris Omnia) Toke Høiland-Jørgensen
  2020-10-27 17:20 ` Bjorn Helgaas
@ 2020-10-27 18:03 ` Marek Behun
  2020-10-27 19:00   ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 62+ messages in thread
From: Marek Behun @ 2020-10-27 18:03 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas

Are you using stock U-Boot in the Omnia?

Marek

On Tue, 27 Oct 2020 16:43:20 +0100
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> Hi everyone
> 
> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
> having some trouble getting the PCI bus to work correctly. Specifically,
> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
> the resource request fix[0] applied on top.
> 
> The kernel boots fine, and the patch in [0] makes the PCI devices show
> up. But I'm still getting initialisation errors like these:
> 
> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> 
> and the WiFi drivers fail to initialise with what appears to me to be
> errors related to the bus rather than to the drivers themselves:
> 
> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> 
> lspci looks OK, though:
> 
> # lspci
> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
> 
> Does anyone have any clue what could be going on here? Is this a bug, or
> did I miss something in my config or other initialisation? I've tried
> with both the stock u-boot distributed with the board, and with an
> upstream u-boot from latest master; doesn't seem to make any different.
> 
> Any pointers will be greatly appreciated!
> 
> Thanks,
> 
> -Toke
> 
> 
> [0] https://lore.kernel.org/linux-pci/20201023145252.2691779-1-robh@kernel.org/

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 17:20 ` Bjorn Helgaas
  2020-10-27 17:44   ` ™֟☻̭҇ Ѽ ҉ ®
@ 2020-10-27 18:56   ` Toke Høiland-Jørgensen
  2020-10-28 13:36     ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-27 18:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas, vtolkm

Bjorn Helgaas <helgaas@kernel.org> writes:

> [+cc vtolkm]
>
> On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
>> Hi everyone
>> 
>> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
>> having some trouble getting the PCI bus to work correctly. Specifically,
>> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
>> the resource request fix[0] applied on top.
>> 
>> The kernel boots fine, and the patch in [0] makes the PCI devices show
>> up. But I'm still getting initialisation errors like these:
>> 
>> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
>> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> 
>> and the WiFi drivers fail to initialise with what appears to me to be
>> errors related to the bus rather than to the drivers themselves:
>> 
>> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
>> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
>> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
>> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
>> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
>> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
>> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>> 
>> lspci looks OK, though:
>> 
>> # lspci
>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
>> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
>> 
>> Does anyone have any clue what could be going on here? Is this a bug, or
>> did I miss something in my config or other initialisation? I've tried
>> with both the stock u-boot distributed with the board, and with an
>> upstream u-boot from latest master; doesn't seem to make any different.
>
> Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
> report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
> don't think we have a fix yet.

Yes! Turning that off does indeed help! Thanks a bunch :)

You mention that bisecting this would be helpful - I can try that
tomorrow; any idea when this was last working?

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 17:44   ` ™֟☻̭҇ Ѽ ҉ ®
@ 2020-10-27 18:59     ` Toke Høiland-Jørgensen
  2020-10-27 20:20       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-27 18:59 UTC (permalink / raw)
  To: vtolkm
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	vtolkm, Bjorn Helgaas

"™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:

> On 27/10/2020 18:20, Bjorn Helgaas wrote:
>> [+cc vtolkm]
>>
>> On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
>>> Hi everyone
>>>
>>> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
>>> having some trouble getting the PCI bus to work correctly. Specifically,
>>> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
>>> the resource request fix[0] applied on top.
>>>
>>> The kernel boots fine, and the patch in [0] makes the PCI devices show
>>> up. But I'm still getting initialisation errors like these:
>>>
>>> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>>> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>>> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
>>> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>>>
>>> and the WiFi drivers fail to initialise with what appears to me to be
>>> errors related to the bus rather than to the drivers themselves:
>>>
>>> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>>> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
>>> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
>>> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
>>> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
>>> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
>>> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>>> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
>>> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>>>
>>> lspci looks OK, though:
>>>
>>> # lspci
>>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
>>> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
>>>
>>> Does anyone have any clue what could be going on here? Is this a bug, or
>>> did I miss something in my config or other initialisation? I've tried
>>> with both the stock u-boot distributed with the board, and with an
>>> upstream u-boot from latest master; doesn't seem to make any different.
>> Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
>> report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
>> don't think we have a fix yet.
>>
>
> Got the same device working with > 5.10.0-rc1-next-20201027-to-dirty < 
> but ASPM turned off, as mentioned in the cited bug report.

Yup, indeed that helped!

> Note: related issues - workaround compile ath and cfg80211 as modules
>
> (1) https://bugzilla.kernel.org/show_bug.cgi?id=209863
> (2) https://bugzilla.kernel.org/show_bug.cgi?id=209855
> (3) https://bugzilla.kernel.org/show_bug.cgi?id=209853

Yeah, I had noticed the regdb failure but put off debugging that until
the PCI issue was resolved. So guess that's next on my list - thanks for
the pointer (although I'd rather avoid the module approach as booting
the kernel directly from my build box over tftp is quite convenient...
Let's see if there isn't another way to fix this)

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 18:03 ` Marek Behun
@ 2020-10-27 19:00   ` Toke Høiland-Jørgensen
  2020-10-27 20:19     ` Marek Behun
  0 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-27 19:00 UTC (permalink / raw)
  To: Marek Behun; +Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas

Marek Behun <marek.behun@nic.cz> writes:

> Are you using stock U-Boot in the Omnia?

I've tried both that and the latest upstream - didn't make a difference
wrt the PCI issue. Only difference I've noticed other than that (apart
from being able to turn more things on when using upstream) is that the
upstream u-boot can't seem to find the eMMC chip on the Omnia. Any idea
why? It doesn't matter right now since I'm just tftp-booting, but it
would be kinda nice to get that fixed as well :)

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 19:00   ` Toke Høiland-Jørgensen
@ 2020-10-27 20:19     ` Marek Behun
  2020-10-27 20:49       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: Marek Behun @ 2020-10-27 20:19 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas

On Tue, 27 Oct 2020 20:00:58 +0100
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> Marek Behun <marek.behun@nic.cz> writes:
> 
> > Are you using stock U-Boot in the Omnia?  
> 
> I've tried both that and the latest upstream - didn't make a difference
> wrt the PCI issue. Only difference I've noticed other than that (apart
> from being able to turn more things on when using upstream) is that the
> upstream u-boot can't seem to find the eMMC chip on the Omnia. Any idea
> why? It doesn't matter right now since I'm just tftp-booting, but it
> would be kinda nice to get that fixed as well :)
> 
> -Toke
> 

No idea, I will have to look into that.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 18:59     ` Toke Høiland-Jørgensen
@ 2020-10-27 20:20       ` Toke Høiland-Jørgensen
  2020-10-27 21:22         ` ™֟☻̭҇ Ѽ ҉ ®
  0 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-27 20:20 UTC (permalink / raw)
  To: vtolkm
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	vtolkm, Bjorn Helgaas

Toke Høiland-Jørgensen <toke@redhat.com> writes:

>> Note: related issues - workaround compile ath and cfg80211 as modules
>>
>> (1) https://bugzilla.kernel.org/show_bug.cgi?id=209863
>> (2) https://bugzilla.kernel.org/show_bug.cgi?id=209855
>> (3) https://bugzilla.kernel.org/show_bug.cgi?id=209853
>
> Yeah, I had noticed the regdb failure but put off debugging that until
> the PCI issue was resolved. So guess that's next on my list - thanks for
> the pointer (although I'd rather avoid the module approach as booting
> the kernel directly from my build box over tftp is quite convenient...
> Let's see if there isn't another way to fix this)

To follow up on this, everything seems to work just fine (ath10k init at
boot + regulatory db load) if I simply set:

CONFIG_EXTRA_FIRMWARE="ath10k/QCA988X/hw2.0/board.bin ath10k/QCA988X/hw2.0/firmware-5.bin regulatory.db regulatory.db.p7s"

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 20:19     ` Marek Behun
@ 2020-10-27 20:49       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-27 20:49 UTC (permalink / raw)
  To: Marek Behun; +Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas

Marek Behun <marek.behun@nic.cz> writes:

> On Tue, 27 Oct 2020 20:00:58 +0100
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> Marek Behun <marek.behun@nic.cz> writes:
>> 
>> > Are you using stock U-Boot in the Omnia?  
>> 
>> I've tried both that and the latest upstream - didn't make a difference
>> wrt the PCI issue. Only difference I've noticed other than that (apart
>> from being able to turn more things on when using upstream) is that the
>> upstream u-boot can't seem to find the eMMC chip on the Omnia. Any idea
>> why? It doesn't matter right now since I'm just tftp-booting, but it
>> would be kinda nice to get that fixed as well :)
>> 
>> -Toke
>> 
>
> No idea, I will have to look into that.

Please do! Would be awesome to get it working :)

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 20:20       ` Toke Høiland-Jørgensen
@ 2020-10-27 21:22         ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-27 21:31           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: ™֟☻̭҇ Ѽ ҉ ® @ 2020-10-27 21:22 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Bjorn Helgaas


[-- Attachment #1.1.1: Type: text/plain, Size: 1825 bytes --]

On 27/10/2020 21:20, Toke Høiland-Jørgensen wrote:
> Toke Høiland-Jørgensen <toke@redhat.com> writes:
>
>>> Note: related issues - workaround compile ath and cfg80211 as modules
>>>
>>> (1) https://bugzilla.kernel.org/show_bug.cgi?id=209863
>>> (2) https://bugzilla.kernel.org/show_bug.cgi?id=209855
>>> (3) https://bugzilla.kernel.org/show_bug.cgi?id=209853
>> Yeah, I had noticed the regdb failure but put off debugging that until
>> the PCI issue was resolved. So guess that's next on my list - thanks for
>> the pointer (although I'd rather avoid the module approach as booting
>> the kernel directly from my build box over tftp is quite convenient...
>> Let's see if there isn't another way to fix this)
> To follow up on this, everything seems to work just fine (ath10k init at
> boot + regulatory db load) if I simply set:
>
> CONFIG_EXTRA_FIRMWARE="ath10k/QCA988X/hw2.0/board.bin ath10k/QCA988X/hw2.0/firmware-5.bin regulatory.db regulatory.db.p7s"
>
> -Toke
>

That works on my node only for the regulatory files but not the ath10 
firmware with kconfig:

  Symbol: EXTRA_FIRMWARE_DIR [=/srv/fw]
  Type  : string
  Defined at drivers/base/firmware_loader/Kconfig:63
    Prompt: Firmware blobs root directory
    Depends on: FW_LOADER [=y] && EXTRA_FIRMWARE [=regulatory.db 
regulatory.db.p7s board.bin firmware-5.bin]!=
    Location:
     -> Device Drivers
       -> Generic Driver Options
         -> Firmware loader
           -> Firmware loading facility (FW_LOADER [=y])
             -> Build named firmware blobs into the kernel binary 
(EXTRA_FIRMWARE [=regulatory.db regulatory.db.p7s board.bin 
firmware-5.bin])

But that is off thread topic anyway and bug lodged 
https://bugzilla.kernel.org/show_bug.cgi?id=209855


[-- Attachment #1.1.2: OpenPGP_0x729CFF47A416598B.asc --]
[-- Type: application/pgp-keys, Size: 3163 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 21:22         ` ™֟☻̭҇ Ѽ ҉ ®
@ 2020-10-27 21:31           ` Toke Høiland-Jørgensen
  2020-10-27 22:01             ` ™֟☻̭҇ Ѽ ҉ ®
  0 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-27 21:31 UTC (permalink / raw)
  To: vtolkm
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Bjorn Helgaas

"™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:

> On 27/10/2020 21:20, Toke Høiland-Jørgensen wrote:
>> Toke Høiland-Jørgensen <toke@redhat.com> writes:
>>
>>>> Note: related issues - workaround compile ath and cfg80211 as modules
>>>>
>>>> (1) https://bugzilla.kernel.org/show_bug.cgi?id=209863
>>>> (2) https://bugzilla.kernel.org/show_bug.cgi?id=209855
>>>> (3) https://bugzilla.kernel.org/show_bug.cgi?id=209853
>>> Yeah, I had noticed the regdb failure but put off debugging that until
>>> the PCI issue was resolved. So guess that's next on my list - thanks for
>>> the pointer (although I'd rather avoid the module approach as booting
>>> the kernel directly from my build box over tftp is quite convenient...
>>> Let's see if there isn't another way to fix this)
>> To follow up on this, everything seems to work just fine (ath10k init at
>> boot + regulatory db load) if I simply set:
>>
>> CONFIG_EXTRA_FIRMWARE="ath10k/QCA988X/hw2.0/board.bin ath10k/QCA988X/hw2.0/firmware-5.bin regulatory.db regulatory.db.p7s"
>>
>> -Toke
>>
>
> That works on my node only for the regulatory files but not the ath10 
> firmware with kconfig:
>
>   Symbol: EXTRA_FIRMWARE_DIR [=/srv/fw]
>   Type  : string
>   Defined at drivers/base/firmware_loader/Kconfig:63
>     Prompt: Firmware blobs root directory
>     Depends on: FW_LOADER [=y] && EXTRA_FIRMWARE [=regulatory.db 
> regulatory.db.p7s board.bin firmware-5.bin]!=
>     Location:
>      -> Device Drivers
>        -> Generic Driver Options
>          -> Firmware loader
>            -> Firmware loading facility (FW_LOADER [=y])
>              -> Build named firmware blobs into the kernel binary 
> (EXTRA_FIRMWARE [=regulatory.db regulatory.db.p7s board.bin 
> firmware-5.bin])

I think that's because you're missing the path prefix
(ath10k/QCA988X/hw2.0/) from board.bin and firmware-5.bin?
request_firmware() uses the full path...

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 21:31           ` Toke Høiland-Jørgensen
@ 2020-10-27 22:01             ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-27 22:12               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: ™֟☻̭҇ Ѽ ҉ ® @ 2020-10-27 22:01 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Bjorn Helgaas


[-- Attachment #1.1.1: Type: text/plain, Size: 3037 bytes --]

On 27/10/2020 22:31, Toke Høiland-Jørgensen wrote:
>>> To follow up on this, everything seems to work just fine (ath10k init at
>>> boot + regulatory db load) if I simply set:
>>>
>>> CONFIG_EXTRA_FIRMWARE="ath10k/QCA988X/hw2.0/board.bin ath10k/QCA988X/hw2.0/firmware-5.bin regulatory.db regulatory.db.p7s"
>>>
>>> -Toke
>>>
>> That works on my node only for the regulatory files but not the ath10
>> firmware with kconfig:
>>
>>    Symbol: EXTRA_FIRMWARE_DIR [=/srv/fw]
>>    Type  : string
>>    Defined at drivers/base/firmware_loader/Kconfig:63
>>      Prompt: Firmware blobs root directory
>>      Depends on: FW_LOADER [=y] && EXTRA_FIRMWARE [=regulatory.db
>> regulatory.db.p7s board.bin firmware-5.bin]!=
>>      Location:
>>       -> Device Drivers
>>         -> Generic Driver Options
>>           -> Firmware loader
>>             -> Firmware loading facility (FW_LOADER [=y])
>>               -> Build named firmware blobs into the kernel binary
>> (EXTRA_FIRMWARE [=regulatory.db regulatory.db.p7s board.bin
>> firmware-5.bin])
> I think that's because you're missing the path prefix
> (ath10k/QCA988X/hw2.0/) from board.bin and firmware-5.bin?
> request_firmware() uses the full path...
>
> -Toke

Well, that would be weird/strange having to specify the path prefix for 
build-in firmware,considering:

  CONFIG_FW_LOADER:

  This enables the firmware loading facility in the kernel. The kernel
  will first look for built-in firmware, if it has any. Next, it will
  look for the requested firmware in a series of filesystem paths:

        o firmware_class path module parameter or kernel boot param
        o /lib/firmware/updates/UTS_RELEASE
        o /lib/firmware/updates
        o /lib/firmware/UTS_RELEASE
        o /lib/firmware

----

Nevertheless, I tried with same path prefix as per your kconfig but the 
compilation fails, which I am not surprised since the ath10 blobs are 
not located at that path

   UPD     drivers/base/firmware_loader/builtin/regulatory.db.gen.S
   UPD drivers/base/firmware_loader/builtin/regulatory.db.p7s.gen.S
make[4]: *** No rule to make target 
'/srv/fw/ath10k/QCA988X/hw2.0/board.bin', needed by 
'drivers/base/firmware_loader/builtin/ath10k/QCA988X/hw2.0/board.bin.gen.o'. 
Stop.
make[4]: *** Waiting for unfinished jobs....
   UPD 
drivers/base/firmware_loader/builtin/ath10k/QCA988X/hw2.0/board.bin.gen.S
make[3]: *** [scripts/Makefile.build:500: 
drivers/base/firmware_loader/builtin] Error 2
make[2]: *** [scripts/Makefile.build:500: drivers/base/firmware_loader] 
Error 2
make[1]: *** [scripts/Makefile.build:500: drivers/base] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:1799: drivers] Error 2
make: *** Waiting for unfinished jobs....

I suspect that since you are booting the kernel directly from my build 
box over tftp it accesses the ath10 firmware blobs on the build box.




[-- Attachment #1.1.2: OpenPGP_0x729CFF47A416598B.asc --]
[-- Type: application/pgp-keys, Size: 3163 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 22:01             ` ™֟☻̭҇ Ѽ ҉ ®
@ 2020-10-27 22:12               ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-27 22:12 UTC (permalink / raw)
  To: vtolkm
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Bjorn Helgaas

"™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:

> On 27/10/2020 22:31, Toke Høiland-Jørgensen wrote:
>>>> To follow up on this, everything seems to work just fine (ath10k init at
>>>> boot + regulatory db load) if I simply set:
>>>>
>>>> CONFIG_EXTRA_FIRMWARE="ath10k/QCA988X/hw2.0/board.bin ath10k/QCA988X/hw2.0/firmware-5.bin regulatory.db regulatory.db.p7s"
>>>>
>>>> -Toke
>>>>
>>> That works on my node only for the regulatory files but not the ath10
>>> firmware with kconfig:
>>>
>>>    Symbol: EXTRA_FIRMWARE_DIR [=/srv/fw]
>>>    Type  : string
>>>    Defined at drivers/base/firmware_loader/Kconfig:63
>>>      Prompt: Firmware blobs root directory
>>>      Depends on: FW_LOADER [=y] && EXTRA_FIRMWARE [=regulatory.db
>>> regulatory.db.p7s board.bin firmware-5.bin]!=
>>>      Location:
>>>       -> Device Drivers
>>>         -> Generic Driver Options
>>>           -> Firmware loader
>>>             -> Firmware loading facility (FW_LOADER [=y])
>>>               -> Build named firmware blobs into the kernel binary
>>> (EXTRA_FIRMWARE [=regulatory.db regulatory.db.p7s board.bin
>>> firmware-5.bin])
>> I think that's because you're missing the path prefix
>> (ath10k/QCA988X/hw2.0/) from board.bin and firmware-5.bin?
>> request_firmware() uses the full path...
>>
>> -Toke
>
> Well, that would be weird/strange having to specify the path prefix for 
> build-in firmware,considering:
>
>   CONFIG_FW_LOADER:
>
>   This enables the firmware loading facility in the kernel. The kernel
>   will first look for built-in firmware, if it has any. Next, it will
>   look for the requested firmware in a series of filesystem paths:
>
>         o firmware_class path module parameter or kernel boot param
>         o /lib/firmware/updates/UTS_RELEASE
>         o /lib/firmware/updates
>         o /lib/firmware/UTS_RELEASE
>         o /lib/firmware

Why would that be weird? The driver is requesting firmware with a path
prefix, so the firmware location has to match... Doesn't matter if it's
in the filesystem or builtin.

> ----
>
> Nevertheless, I tried with same path prefix as per your kconfig but the 
> compilation fails, which I am not surprised since the ath10 blobs are 
> not located at that path

Well you'd need to fix that :)

>    UPD     drivers/base/firmware_loader/builtin/regulatory.db.gen.S
>    UPD drivers/base/firmware_loader/builtin/regulatory.db.p7s.gen.S
> make[4]: *** No rule to make target 
> '/srv/fw/ath10k/QCA988X/hw2.0/board.bin', needed by

Based on that error message, you'd need to do something like:

mkdir -p /srv/fw/ath10k/QCA988X/hw2.0
mv /srv/fw/{board.bin,firmware-5.bin} /srv/fw/ath10k/QCA988X/hw2.0

> 'drivers/base/firmware_loader/builtin/ath10k/QCA988X/hw2.0/board.bin.gen.o'. 
> Stop.
> make[4]: *** Waiting for unfinished jobs....
>    UPD 
> drivers/base/firmware_loader/builtin/ath10k/QCA988X/hw2.0/board.bin.gen.S
> make[3]: *** [scripts/Makefile.build:500: 
> drivers/base/firmware_loader/builtin] Error 2
> make[2]: *** [scripts/Makefile.build:500: drivers/base/firmware_loader] 
> Error 2
> make[1]: *** [scripts/Makefile.build:500: drivers/base] Error 2
> make[1]: *** Waiting for unfinished jobs....
> make: *** [Makefile:1799: drivers] Error 2
> make: *** Waiting for unfinished jobs....
>
> I suspect that since you are booting the kernel directly from my build 
> box over tftp it accesses the ath10 firmware blobs on the build box.

Yes, obviously it's reading the firmware blobs at build time from the
location on the build box, then embedding them in the kernel image,
which is then served over tftp to the Omnia. It's not loading anything
from the build box after that (how would that work?)

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-27 18:56   ` Toke Høiland-Jørgensen
@ 2020-10-28 13:36     ` Toke Høiland-Jørgensen
  2020-10-28 14:42       ` Bjorn Helgaas
  0 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-28 13:36 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas, vtolkm

Toke Høiland-Jørgensen <toke@redhat.com> writes:

> Bjorn Helgaas <helgaas@kernel.org> writes:
>
>> [+cc vtolkm]
>>
>> On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
>>> Hi everyone
>>> 
>>> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
>>> having some trouble getting the PCI bus to work correctly. Specifically,
>>> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
>>> the resource request fix[0] applied on top.
>>> 
>>> The kernel boots fine, and the patch in [0] makes the PCI devices show
>>> up. But I'm still getting initialisation errors like these:
>>> 
>>> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>>> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>>> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
>>> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>>> 
>>> and the WiFi drivers fail to initialise with what appears to me to be
>>> errors related to the bus rather than to the drivers themselves:
>>> 
>>> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>>> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
>>> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
>>> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
>>> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
>>> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
>>> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>>> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
>>> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>>> 
>>> lspci looks OK, though:
>>> 
>>> # lspci
>>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
>>> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
>>> 
>>> Does anyone have any clue what could be going on here? Is this a bug, or
>>> did I miss something in my config or other initialisation? I've tried
>>> with both the stock u-boot distributed with the board, and with an
>>> upstream u-boot from latest master; doesn't seem to make any different.
>>
>> Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
>> report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
>> don't think we have a fix yet.
>
> Yes! Turning that off does indeed help! Thanks a bunch :)
>
> You mention that bisecting this would be helpful - I can try that
> tomorrow; any idea when this was last working?

OK, so I tried to bisect this, but, erm, I couldn't find a working
revision to start from? I went all the way back to 4.10 (which is the
first version to include the device tree file for the Omnia), and even
on that, the wireless cards were failing to initialise with ASPM
enabled...

Happy to run other tests, but I think I'm going to need some pointers -
the PCI subsystem is not my home turf :)

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-28 13:36     ` Toke Høiland-Jørgensen
@ 2020-10-28 14:42       ` Bjorn Helgaas
  2020-10-28 15:08         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: Bjorn Helgaas @ 2020-10-28 14:42 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas, vtolkm

On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
> Toke Høiland-Jørgensen <toke@redhat.com> writes:
> 
> > Bjorn Helgaas <helgaas@kernel.org> writes:
> >
> >> [+cc vtolkm]
> >>
> >> On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
> >>> Hi everyone
> >>> 
> >>> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
> >>> having some trouble getting the PCI bus to work correctly. Specifically,
> >>> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
> >>> the resource request fix[0] applied on top.
> >>> 
> >>> The kernel boots fine, and the patch in [0] makes the PCI devices show
> >>> up. But I'm still getting initialisation errors like these:
> >>> 
> >>> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
> >>> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> >>> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
> >>> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> >>> 
> >>> and the WiFi drivers fail to initialise with what appears to me to be
> >>> errors related to the bus rather than to the drivers themselves:
> >>> 
> >>> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
> >>> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
> >>> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
> >>> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
> >>> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
> >>> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
> >>> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> >>> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> >>> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> >>> 
> >>> lspci looks OK, though:
> >>> 
> >>> # lspci
> >>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> >>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> >>> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> >>> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
> >>> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
> >>> 
> >>> Does anyone have any clue what could be going on here? Is this a bug, or
> >>> did I miss something in my config or other initialisation? I've tried
> >>> with both the stock u-boot distributed with the board, and with an
> >>> upstream u-boot from latest master; doesn't seem to make any different.
> >>
> >> Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
> >> report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
> >> don't think we have a fix yet.
> >
> > Yes! Turning that off does indeed help! Thanks a bunch :)
> >
> > You mention that bisecting this would be helpful - I can try that
> > tomorrow; any idea when this was last working?
> 
> OK, so I tried to bisect this, but, erm, I couldn't find a working
> revision to start from? I went all the way back to 4.10 (which is the
> first version to include the device tree file for the Omnia), and even
> on that, the wireless cards were failing to initialise with ASPM
> enabled...

I have no personal experience with this device; all I know is that the
bugzilla suggests that it worked in v5.4, which isn't much help.

Possibly the apparent regression was really a .config change, i.e.,
CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
"worked" but got enabled later and it started failing?

Maybe the debug patch below would be worth trying to see if it makes
any difference?  If it *does* help, try omitting the first hunk to see
if we just need to apply the quirk_enable_clear_retrain_link() quirk.

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index ac0557a305af..afe7fa1d54d6 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -103,7 +103,7 @@ static const char *policy_str[] = {
 	[POLICY_POWER_SUPERSAVE] = "powersupersave"
 };
 
-#define LINK_RETRAIN_TIMEOUT HZ
+#define LINK_RETRAIN_TIMEOUT (10*HZ)
 
 static int policy_to_aspm_state(struct pcie_link_state *link)
 {
@@ -201,7 +201,7 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
 	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
 	reg16 |= PCI_EXP_LNKCTL_RL;
 	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
-	if (parent->clear_retrain_link) {
+	if (1 || parent->clear_retrain_link) {
 		/*
 		 * Due to an erratum in some devices the Retrain Link bit
 		 * needs to be cleared again manually to allow the link

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-28 14:42       ` Bjorn Helgaas
@ 2020-10-28 15:08         ` Toke Høiland-Jørgensen
  2020-10-28 16:40           ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-29 15:12           ` Rob Herring
  0 siblings, 2 replies; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-28 15:08 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas, vtolkm

Bjorn Helgaas <helgaas@kernel.org> writes:

> On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
>> Toke Høiland-Jørgensen <toke@redhat.com> writes:
>> 
>> > Bjorn Helgaas <helgaas@kernel.org> writes:
>> >
>> >> [+cc vtolkm]
>> >>
>> >> On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
>> >>> Hi everyone
>> >>> 
>> >>> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
>> >>> having some trouble getting the PCI bus to work correctly. Specifically,
>> >>> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
>> >>> the resource request fix[0] applied on top.
>> >>> 
>> >>> The kernel boots fine, and the patch in [0] makes the PCI devices show
>> >>> up. But I'm still getting initialisation errors like these:
>> >>> 
>> >>> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>> >>> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> >>> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
>> >>> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> >>> 
>> >>> and the WiFi drivers fail to initialise with what appears to me to be
>> >>> errors related to the bus rather than to the drivers themselves:
>> >>> 
>> >>> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>> >>> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
>> >>> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
>> >>> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
>> >>> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
>> >>> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
>> >>> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>> >>> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
>> >>> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>> >>> 
>> >>> lspci looks OK, though:
>> >>> 
>> >>> # lspci
>> >>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> >>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> >>> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> >>> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
>> >>> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
>> >>> 
>> >>> Does anyone have any clue what could be going on here? Is this a bug, or
>> >>> did I miss something in my config or other initialisation? I've tried
>> >>> with both the stock u-boot distributed with the board, and with an
>> >>> upstream u-boot from latest master; doesn't seem to make any different.
>> >>
>> >> Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
>> >> report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
>> >> don't think we have a fix yet.
>> >
>> > Yes! Turning that off does indeed help! Thanks a bunch :)
>> >
>> > You mention that bisecting this would be helpful - I can try that
>> > tomorrow; any idea when this was last working?
>> 
>> OK, so I tried to bisect this, but, erm, I couldn't find a working
>> revision to start from? I went all the way back to 4.10 (which is the
>> first version to include the device tree file for the Omnia), and even
>> on that, the wireless cards were failing to initialise with ASPM
>> enabled...
>
> I have no personal experience with this device; all I know is that the
> bugzilla suggests that it worked in v5.4, which isn't much help.
>
> Possibly the apparent regression was really a .config change, i.e.,
> CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
> "worked" but got enabled later and it started failing?

Yeah, I suspect so. The OpenWrt config disables CONFIG_PCIEASPM by
default and only turns it on for specific targets. So I guess that it's
most likely that this has never worked...

> Maybe the debug patch below would be worth trying to see if it makes
> any difference?  If it *does* help, try omitting the first hunk to see
> if we just need to apply the quirk_enable_clear_retrain_link() quirk.

Tried, doesn't help...

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-28 15:08         ` Toke Høiland-Jørgensen
@ 2020-10-28 16:40           ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-28 23:16             ` Bjorn Helgaas
  2020-10-29  1:21             ` Marek Behun
  2020-10-29 15:12           ` Rob Herring
  1 sibling, 2 replies; 62+ messages in thread
From: ™֟☻̭҇ Ѽ ҉ ® @ 2020-10-28 16:40 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Bjorn Helgaas
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas


[-- Attachment #1.1.1: Type: text/plain, Size: 4852 bytes --]


On 28/10/2020 16:08, Toke Høiland-Jørgensen wrote:
> Bjorn Helgaas <helgaas@kernel.org> writes:
>
>> On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
>>> Toke Høiland-Jørgensen <toke@redhat.com> writes:
>>>
>>>> Bjorn Helgaas <helgaas@kernel.org> writes:
>>>>
>>>>> [+cc vtolkm]
>>>>>
>>>>> On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
>>>>>> Hi everyone
>>>>>>
>>>>>> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
>>>>>> having some trouble getting the PCI bus to work correctly. Specifically,
>>>>>> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
>>>>>> the resource request fix[0] applied on top.
>>>>>>
>>>>>> The kernel boots fine, and the patch in [0] makes the PCI devices show
>>>>>> up. But I'm still getting initialisation errors like these:
>>>>>>
>>>>>> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>>>>>> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>>>>>> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
>>>>>> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>>>>>>
>>>>>> and the WiFi drivers fail to initialise with what appears to me to be
>>>>>> errors related to the bus rather than to the drivers themselves:
>>>>>>
>>>>>> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>>>>>> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
>>>>>> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
>>>>>> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
>>>>>> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
>>>>>> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
>>>>>> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>>>>>> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
>>>>>> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>>>>>>
>>>>>> lspci looks OK, though:
>>>>>>
>>>>>> # lspci
>>>>>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>>>>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>>>>> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>>>>> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
>>>>>> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
>>>>>>
>>>>>> Does anyone have any clue what could be going on here? Is this a bug, or
>>>>>> did I miss something in my config or other initialisation? I've tried
>>>>>> with both the stock u-boot distributed with the board, and with an
>>>>>> upstream u-boot from latest master; doesn't seem to make any different.
>>>>> Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
>>>>> report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
>>>>> don't think we have a fix yet.
>>>> Yes! Turning that off does indeed help! Thanks a bunch :)
>>>>
>>>> You mention that bisecting this would be helpful - I can try that
>>>> tomorrow; any idea when this was last working?
>>> OK, so I tried to bisect this, but, erm, I couldn't find a working
>>> revision to start from? I went all the way back to 4.10 (which is the
>>> first version to include the device tree file for the Omnia), and even
>>> on that, the wireless cards were failing to initialise with ASPM
>>> enabled...
>> I have no personal experience with this device; all I know is that the
>> bugzilla suggests that it worked in v5.4, which isn't much help.
>>
>> Possibly the apparent regression was really a .config change, i.e.,
>> CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
>> "worked" but got enabled later and it started failing?
> Yeah, I suspect so. The OpenWrt config disables CONFIG_PCIEASPM by
> default and only turns it on for specific targets. So I guess that it's
> most likely that this has never worked...
>
>> Maybe the debug patch below would be worth trying to see if it makes
>> any difference?  If it *does* help, try omitting the first hunk to see
>> if we just need to apply the quirk_enable_clear_retrain_link() quirk.
> Tried, doesn't help...
>
> -Toke
>

Found this patch

https://github.com/openwrt/openwrt/blob/7c0496f29bed87326f1bf591ca25ace82373cfc7/target/linux/mvebu/patches-5.4/405-PCI-aardvark-Improve-link-training.patch 


that mentions the Compex WLE900VX card, which reading the lspci verbose 
output from the bugtracker seems to the device being troubled.





[-- Attachment #1.1.2: OpenPGP_0x729CFF47A416598B.asc --]
[-- Type: application/pgp-keys, Size: 3163 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-28 16:40           ` ™֟☻̭҇ Ѽ ҉ ®
@ 2020-10-28 23:16             ` Bjorn Helgaas
  2020-10-29 10:09               ` Pali Rohár
                                 ` (2 more replies)
  2020-10-29  1:21             ` Marek Behun
  1 sibling, 3 replies; 62+ messages in thread
From: Bjorn Helgaas @ 2020-10-28 23:16 UTC (permalink / raw)
  To: vtolkm, Toke Høiland-Jørgensen
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Pali Rohár, Marek Behún, Thomas Petazzoni,
	Jason Cooper

[+cc Pali, Marek, Thomas, Jason]

On Wed, Oct 28, 2020 at 04:40:00PM +0000, ™֟☻̭҇ Ѽ ҉ ® wrote:
> On 28/10/2020 16:08, Toke Høiland-Jørgensen wrote:
> > Bjorn Helgaas <helgaas@kernel.org> writes:
> > > On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
> > > > Toke Høiland-Jørgensen <toke@redhat.com> writes:
> > > > > Bjorn Helgaas <helgaas@kernel.org> writes:
> > > > > 
> > > > > > [+cc vtolkm]
> > > > > > 
> > > > > > On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
> > > > > > > Hi everyone
> > > > > > > 
> > > > > > > I'm trying to get a mainline kernel to run on my Turris Omnia, and am
> > > > > > > having some trouble getting the PCI bus to work correctly. Specifically,
> > > > > > > I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
> > > > > > > the resource request fix[0] applied on top.
> > > > > > > 
> > > > > > > The kernel boots fine, and the patch in [0] makes the PCI devices show
> > > > > > > up. But I'm still getting initialisation errors like these:
> > > > > > > 
> > > > > > > [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
> > > > > > > [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> > > > > > > [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
> > > > > > > [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> > > > > > > 
> > > > > > > and the WiFi drivers fail to initialise with what appears to me to be
> > > > > > > errors related to the bus rather than to the drivers themselves:
> > > > > > > 
> > > > > > > [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
> > > > > > > [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
> > > > > > > [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
> > > > > > > [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
> > > > > > > [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
> > > > > > > [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
> > > > > > > [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> > > > > > > [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> > > > > > > [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> > > > > > > 
> > > > > > > lspci looks OK, though:
> > > > > > > 
> > > > > > > # lspci
> > > > > > > 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> > > > > > > 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> > > > > > > 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> > > > > > > 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
> > > > > > > 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
> > > > > > > 
> > > > > > > Does anyone have any clue what could be going on here? Is this a bug, or
> > > > > > > did I miss something in my config or other initialisation? I've tried
> > > > > > > with both the stock u-boot distributed with the board, and with an
> > > > > > > upstream u-boot from latest master; doesn't seem to make any different.
> > > > > > Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
> > > > > > report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
> > > > > > don't think we have a fix yet.
> > > > > Yes! Turning that off does indeed help! Thanks a bunch :)
> > > > > 
> > > > > You mention that bisecting this would be helpful - I can try that
> > > > > tomorrow; any idea when this was last working?
> > > > OK, so I tried to bisect this, but, erm, I couldn't find a working
> > > > revision to start from? I went all the way back to 4.10 (which is the
> > > > first version to include the device tree file for the Omnia), and even
> > > > on that, the wireless cards were failing to initialise with ASPM
> > > > enabled...
> > > I have no personal experience with this device; all I know is that the
> > > bugzilla suggests that it worked in v5.4, which isn't much help.
> > > 
> > > Possibly the apparent regression was really a .config change, i.e.,
> > > CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
> > > "worked" but got enabled later and it started failing?
> > Yeah, I suspect so. The OpenWrt config disables CONFIG_PCIEASPM by
> > default and only turns it on for specific targets. So I guess that it's
> > most likely that this has never worked...
> > 
> > > Maybe the debug patch below would be worth trying to see if it makes
> > > any difference?  If it *does* help, try omitting the first hunk to see
> > > if we just need to apply the quirk_enable_clear_retrain_link() quirk.
> > Tried, doesn't help...
> > 
> > -Toke
> 
> Found this patch
> 
> https://github.com/openwrt/openwrt/blob/7c0496f29bed87326f1bf591ca25ace82373cfc7/target/linux/mvebu/patches-5.4/405-PCI-aardvark-Improve-link-training.patch
> 
> that mentions the Compex WLE900VX card, which reading the lspci verbose
> output from the bugtracker seems to the device being troubled.

Interesting.  Indeed, the Compex WLE900VX card seems to have the
Qualcomm Atheros QCA9880 on it, and it looks like Toke's system has
the same device in it.

The patch you mention (https://git.kernel.org/linus/43fc679ced18) is
for aardvark, so of course doesn't help mvebu.

PCIe hardware is supposed to automatically negotiate the highest link
speed supported by both ends.  But software *is* allowed to set an
upper limit (the Target Link Speed in Link Control 2).  If we initiate
a retrain and the link doesn't come back up, I wonder if we should try
to help the hardware out by using Target Link Speed to limit to a
lower speed and attempting another retrain, something like this hacky
patch: (please collect the dmesg log if you try this)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index ac0557a305af..fb6e13532a2c 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -192,12 +192,42 @@ static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
 	link->clkpm_disable = blacklist ? 1 : 0;
 }
 
+#define PCI_EXP_LNKCAP2_SLS	0x000000fe
+
+static int decrease_tls(struct pci_dev *pdev)
+{
+	u32 lnkcap2;
+	u16 lnkctl2, tls;
+
+	pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP2, &lnkcap2);
+
+	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, &lnkctl2);
+	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
+
+	pci_info(pdev, "lnkcap2 %#010x sls %#04x lnkctl2 %#06x tls %#03x\n",
+		 lnkcap2, (lnkcap2 & PCI_EXP_LNKCAP2_SLS) >> 1,
+		 lnkctl2, tls);
+
+	if (tls < 2)
+		return -EINVAL;
+
+	tls--;
+	pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tls);
+	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, &lnkctl2);
+	pci_info(pdev, "lnkctl2 %#010x new tls %#03x\n",
+		 lnkctl2, tls);
+
+	return 0;
+}
+
 static bool pcie_retrain_link(struct pcie_link_state *link)
 {
 	struct pci_dev *parent = link->pdev;
 	unsigned long end_jiffies;
 	u16 reg16;
 
+top:
 	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
 	reg16 |= PCI_EXP_LNKCTL_RL;
 	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
@@ -216,10 +246,14 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
 	do {
 		pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &reg16);
 		if (!(reg16 & PCI_EXP_LNKSTA_LT))
-			break;
+			return true;	/* success */
 		msleep(1);
 	} while (time_before(jiffies, end_jiffies));
-	return !(reg16 & PCI_EXP_LNKSTA_LT);
+
+	if (decrease_tls(parent))
+		return false;	/* can't decrease any more */
+
+	goto top;
 }
 
 /*

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-28 16:40           ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-28 23:16             ` Bjorn Helgaas
@ 2020-10-29  1:21             ` Marek Behun
  1 sibling, 0 replies; 62+ messages in thread
From: Marek Behun @ 2020-10-29  1:21 UTC (permalink / raw)
  To: ™֟☻̭҇ Ѽ ҉ ®
  Cc: vtolkm, Toke Høiland-Jørgensen, Bjorn Helgaas,
	linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas

On Wed, 28 Oct 2020 16:40:00 +0000
"™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> wrote:

> Found this patch
> 
> https://github.com/openwrt/openwrt/blob/7c0496f29bed87326f1bf591ca25ace82373cfc7/target/linux/mvebu/patches-5.4/405-PCI-aardvark-Improve-link-training.patch 
> 
> 
> that mentions the Compex WLE900VX card, which reading the lspci verbose 
> output from the bugtracker seems to the device being troubled.

It seems mvebu driver in combination with compex card is similarily
broken as aardvark was... :) Hopefully Pali will want to look into this.

Marek

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-28 23:16             ` Bjorn Helgaas
@ 2020-10-29 10:09               ` Pali Rohár
  2020-10-29 10:56                 ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-29 11:12                 ` Toke Høiland-Jørgensen
  2020-10-29 10:41               ` Toke Høiland-Jørgensen
  2020-10-30 11:23               ` Pali Rohár
  2 siblings, 2 replies; 62+ messages in thread
From: Pali Rohár @ 2020-10-29 10:09 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: vtolkm, Toke Høiland-Jørgensen, linux-pci,
	linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Marek Behún, Thomas Petazzoni, Jason Cooper

Hello!

On Wednesday 28 October 2020 18:16:26 Bjorn Helgaas wrote:
> [+cc Pali, Marek, Thomas, Jason]
> 
> On Wed, Oct 28, 2020 at 04:40:00PM +0000, ™֟☻̭҇ Ѽ ҉ ® wrote:
> > On 28/10/2020 16:08, Toke Høiland-Jørgensen wrote:
> > > Bjorn Helgaas <helgaas@kernel.org> writes:
> > > > On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
> > > > > Toke Høiland-Jørgensen <toke@redhat.com> writes:
> > > > > > Bjorn Helgaas <helgaas@kernel.org> writes:
> > > > > > 
> > > > > > > [+cc vtolkm]
> > > > > > > 
> > > > > > > On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
> > > > > > > > Hi everyone
> > > > > > > > 
> > > > > > > > I'm trying to get a mainline kernel to run on my Turris Omnia, and am
> > > > > > > > having some trouble getting the PCI bus to work correctly. Specifically,
> > > > > > > > I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
> > > > > > > > the resource request fix[0] applied on top.
> > > > > > > > 
> > > > > > > > The kernel boots fine, and the patch in [0] makes the PCI devices show
> > > > > > > > up. But I'm still getting initialisation errors like these:
> > > > > > > > 
> > > > > > > > [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
> > > > > > > > [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> > > > > > > > [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
> > > > > > > > [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> > > > > > > > 
> > > > > > > > and the WiFi drivers fail to initialise with what appears to me to be
> > > > > > > > errors related to the bus rather than to the drivers themselves:
> > > > > > > > 
> > > > > > > > [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
> > > > > > > > [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
> > > > > > > > [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
> > > > > > > > [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
> > > > > > > > [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
> > > > > > > > [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
> > > > > > > > [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> > > > > > > > [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> > > > > > > > [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> > > > > > > > 
> > > > > > > > lspci looks OK, though:
> > > > > > > > 
> > > > > > > > # lspci
> > > > > > > > 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> > > > > > > > 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> > > > > > > > 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> > > > > > > > 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
> > > > > > > > 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
> > > > > > > > 
> > > > > > > > Does anyone have any clue what could be going on here? Is this a bug, or
> > > > > > > > did I miss something in my config or other initialisation? I've tried
> > > > > > > > with both the stock u-boot distributed with the board, and with an
> > > > > > > > upstream u-boot from latest master; doesn't seem to make any different.
> > > > > > > Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
> > > > > > > report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
> > > > > > > don't think we have a fix yet.
> > > > > > Yes! Turning that off does indeed help! Thanks a bunch :)

I have been testing mainline kernel on Turris Omnia with two PCIe
default cards (WLE200 and WLE900) and it worked fine. But I do not know
if I had ASPM enabled or not.

So it is working fine for you when CONFIG_PCIEASPM is disabled and whole
issue is only when CONFIG_PCIEASPM is enabled?

> > > > > > 
> > > > > > You mention that bisecting this would be helpful - I can try that
> > > > > > tomorrow; any idea when this was last working?
> > > > > OK, so I tried to bisect this, but, erm, I couldn't find a working
> > > > > revision to start from? I went all the way back to 4.10 (which is the
> > > > > first version to include the device tree file for the Omnia), and even
> > > > > on that, the wireless cards were failing to initialise with ASPM
> > > > > enabled...
> > > > I have no personal experience with this device; all I know is that the
> > > > bugzilla suggests that it worked in v5.4, which isn't much help.
> > > > 
> > > > Possibly the apparent regression was really a .config change, i.e.,
> > > > CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
> > > > "worked" but got enabled later and it started failing?
> > > Yeah, I suspect so. The OpenWrt config disables CONFIG_PCIEASPM by
> > > default and only turns it on for specific targets. So I guess that it's
> > > most likely that this has never worked...
> > > 
> > > > Maybe the debug patch below would be worth trying to see if it makes
> > > > any difference?  If it *does* help, try omitting the first hunk to see
> > > > if we just need to apply the quirk_enable_clear_retrain_link() quirk.
> > > Tried, doesn't help...
> > > 
> > > -Toke
> > 
> > Found this patch
> > 
> > https://github.com/openwrt/openwrt/blob/7c0496f29bed87326f1bf591ca25ace82373cfc7/target/linux/mvebu/patches-5.4/405-PCI-aardvark-Improve-link-training.patch
> > 
> > that mentions the Compex WLE900VX card, which reading the lspci verbose
> > output from the bugtracker seems to the device being troubled.
> 
> Interesting.  Indeed, the Compex WLE900VX card seems to have the
> Qualcomm Atheros QCA9880 on it, and it looks like Toke's system has
> the same device in it.
> 
> The patch you mention (https://git.kernel.org/linus/43fc679ced18) is
> for aardvark, so of course doesn't help mvebu.

That patch is for aardvark driver, PCI controller on Armada 3720 SOC.
We have found out that lot of people were patching aardvark driver to
explicitly set only pcie gen 1 mode in internal aardvark register as
default value (gen 2) did not worked correctly with more Compex cards.
Then we have created above patch which force pcie gen 1 mode only for
gen 1 cards and it stabilized Compex cards. I think that there a HW bug
in that SOC which cause that PCI controller does not work correctly.

This patch is needed for Espressobin and Turris MOX. I have been testing
it with CONFIG_PCIEASPM=y on both devices and basically all tested cards
worked fine.

> PCIe hardware is supposed to automatically negotiate the highest link
> speed supported by both ends.  But software *is* allowed to set an
> upper limit (the Target Link Speed in Link Control 2).  If we initiate
> a retrain and the link doesn't come back up, I wonder if we should try
> to help the hardware out by using Target Link Speed to limit to a
> lower speed and attempting another retrain, something like this hacky
> patch: (please collect the dmesg log if you try this)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index ac0557a305af..fb6e13532a2c 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -192,12 +192,42 @@ static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
>  	link->clkpm_disable = blacklist ? 1 : 0;
>  }
>  
> +#define PCI_EXP_LNKCAP2_SLS	0x000000fe
> +
> +static int decrease_tls(struct pci_dev *pdev)
> +{
> +	u32 lnkcap2;
> +	u16 lnkctl2, tls;
> +
> +	pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP2, &lnkcap2);
> +
> +	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, &lnkctl2);
> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
> +
> +	pci_info(pdev, "lnkcap2 %#010x sls %#04x lnkctl2 %#06x tls %#03x\n",
> +		 lnkcap2, (lnkcap2 & PCI_EXP_LNKCAP2_SLS) >> 1,
> +		 lnkctl2, tls);
> +
> +	if (tls < 2)
> +		return -EINVAL;
> +
> +	tls--;
> +	pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL2,
> +					   PCI_EXP_LNKCTL2_TLS, tls);
> +	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, &lnkctl2);
> +	pci_info(pdev, "lnkctl2 %#010x new tls %#03x\n",
> +		 lnkctl2, tls);
> +
> +	return 0;
> +}
> +
>  static bool pcie_retrain_link(struct pcie_link_state *link)
>  {
>  	struct pci_dev *parent = link->pdev;
>  	unsigned long end_jiffies;
>  	u16 reg16;
>  
> +top:
>  	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
>  	reg16 |= PCI_EXP_LNKCTL_RL;
>  	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> @@ -216,10 +246,14 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>  	do {
>  		pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &reg16);
>  		if (!(reg16 & PCI_EXP_LNKSTA_LT))
> -			break;
> +			return true;	/* success */
>  		msleep(1);
>  	} while (time_before(jiffies, end_jiffies));
> -	return !(reg16 & PCI_EXP_LNKSTA_LT);
> +
> +	if (decrease_tls(parent))
> +		return false;	/* can't decrease any more */
> +
> +	goto top;
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-28 23:16             ` Bjorn Helgaas
  2020-10-29 10:09               ` Pali Rohár
@ 2020-10-29 10:41               ` Toke Høiland-Jørgensen
  2020-10-29 11:18                 ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-30 11:23               ` Pali Rohár
  2 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-29 10:41 UTC (permalink / raw)
  To: Bjorn Helgaas, vtolkm
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Pali Rohár, Marek Behún, Thomas Petazzoni,
	Jason Cooper

Bjorn Helgaas <helgaas@kernel.org> writes:

> [+cc Pali, Marek, Thomas, Jason]
>
> On Wed, Oct 28, 2020 at 04:40:00PM +0000, ™֟☻̭҇ Ѽ ҉ ® wrote:
>> On 28/10/2020 16:08, Toke Høiland-Jørgensen wrote:
>> > Bjorn Helgaas <helgaas@kernel.org> writes:
>> > > On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
>> > > > Toke Høiland-Jørgensen <toke@redhat.com> writes:
>> > > > > Bjorn Helgaas <helgaas@kernel.org> writes:
>> > > > > 
>> > > > > > [+cc vtolkm]
>> > > > > > 
>> > > > > > On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
>> > > > > > > Hi everyone
>> > > > > > > 
>> > > > > > > I'm trying to get a mainline kernel to run on my Turris Omnia, and am
>> > > > > > > having some trouble getting the PCI bus to work correctly. Specifically,
>> > > > > > > I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
>> > > > > > > the resource request fix[0] applied on top.
>> > > > > > > 
>> > > > > > > The kernel boots fine, and the patch in [0] makes the PCI devices show
>> > > > > > > up. But I'm still getting initialisation errors like these:
>> > > > > > > 
>> > > > > > > [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>> > > > > > > [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> > > > > > > [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
>> > > > > > > [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> > > > > > > 
>> > > > > > > and the WiFi drivers fail to initialise with what appears to me to be
>> > > > > > > errors related to the bus rather than to the drivers themselves:
>> > > > > > > 
>> > > > > > > [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>> > > > > > > [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
>> > > > > > > [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
>> > > > > > > [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
>> > > > > > > [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
>> > > > > > > [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
>> > > > > > > [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>> > > > > > > [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
>> > > > > > > [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>> > > > > > > 
>> > > > > > > lspci looks OK, though:
>> > > > > > > 
>> > > > > > > # lspci
>> > > > > > > 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> > > > > > > 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> > > > > > > 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> > > > > > > 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
>> > > > > > > 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
>> > > > > > > 
>> > > > > > > Does anyone have any clue what could be going on here? Is this a bug, or
>> > > > > > > did I miss something in my config or other initialisation? I've tried
>> > > > > > > with both the stock u-boot distributed with the board, and with an
>> > > > > > > upstream u-boot from latest master; doesn't seem to make any different.
>> > > > > > Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
>> > > > > > report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
>> > > > > > don't think we have a fix yet.
>> > > > > Yes! Turning that off does indeed help! Thanks a bunch :)
>> > > > > 
>> > > > > You mention that bisecting this would be helpful - I can try that
>> > > > > tomorrow; any idea when this was last working?
>> > > > OK, so I tried to bisect this, but, erm, I couldn't find a working
>> > > > revision to start from? I went all the way back to 4.10 (which is the
>> > > > first version to include the device tree file for the Omnia), and even
>> > > > on that, the wireless cards were failing to initialise with ASPM
>> > > > enabled...
>> > > I have no personal experience with this device; all I know is that the
>> > > bugzilla suggests that it worked in v5.4, which isn't much help.
>> > > 
>> > > Possibly the apparent regression was really a .config change, i.e.,
>> > > CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
>> > > "worked" but got enabled later and it started failing?
>> > Yeah, I suspect so. The OpenWrt config disables CONFIG_PCIEASPM by
>> > default and only turns it on for specific targets. So I guess that it's
>> > most likely that this has never worked...
>> > 
>> > > Maybe the debug patch below would be worth trying to see if it makes
>> > > any difference?  If it *does* help, try omitting the first hunk to see
>> > > if we just need to apply the quirk_enable_clear_retrain_link() quirk.
>> > Tried, doesn't help...
>> > 
>> > -Toke
>> 
>> Found this patch
>> 
>> https://github.com/openwrt/openwrt/blob/7c0496f29bed87326f1bf591ca25ace82373cfc7/target/linux/mvebu/patches-5.4/405-PCI-aardvark-Improve-link-training.patch
>> 
>> that mentions the Compex WLE900VX card, which reading the lspci verbose
>> output from the bugtracker seems to the device being troubled.
>
> Interesting.  Indeed, the Compex WLE900VX card seems to have the
> Qualcomm Atheros QCA9880 on it, and it looks like Toke's system has
> the same device in it.
>
> The patch you mention (https://git.kernel.org/linus/43fc679ced18) is
> for aardvark, so of course doesn't help mvebu.
>
> PCIe hardware is supposed to automatically negotiate the highest link
> speed supported by both ends.  But software *is* allowed to set an
> upper limit (the Target Link Speed in Link Control 2).  If we initiate
> a retrain and the link doesn't come back up, I wonder if we should try
> to help the hardware out by using Target Link Speed to limit to a
> lower speed and attempting another retrain, something like this hacky
> patch: (please collect the dmesg log if you try this)

Well, I tried it, but don't see any of the 'lnkcap2' output from that
new function:

[    1.545853] mvebu-pcie soc:pcie: host bridge /soc/pcie ranges:
[    1.545878] mvebu-pcie soc:pcie:      MEM 0x00f1080000..0x00f1081fff -> 0x0000080000
[    1.545894] mvebu-pcie soc:pcie:      MEM 0x00f1040000..0x00f1041fff -> 0x0000040000
[    1.545907] mvebu-pcie soc:pcie:      MEM 0x00f1044000..0x00f1045fff -> 0x0000044000
[    1.545920] mvebu-pcie soc:pcie:      MEM 0x00f1048000..0x00f1049fff -> 0x0000048000
[    1.545933] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[    1.545945] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[    1.545958] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[    1.545970] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[    1.545982] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[    1.545994] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[    1.546006] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[    1.546014] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[    1.546181] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
[    1.546190] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.546197] pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] (bus address [0x00080000-0x00081fff])
[    1.546204] pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] (bus address [0x00040000-0x00041fff])
[    1.546210] pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] (bus address [0x00044000-0x00045fff])
[    1.546216] pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] (bus address [0x00048000-0x00049fff])
[    1.546220] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
[    1.546225] pci_bus 0000:00: root bus resource [io  0x1000-0xeffff]
[    1.546294] pci 0000:00:01.0: [11ab:6820] type 01 class 0x060400
[    1.546308] pci 0000:00:01.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.546482] pci 0000:00:02.0: [11ab:6820] type 01 class 0x060400
[    1.546495] pci 0000:00:02.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.546643] pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400
[    1.546656] pci 0000:00:03.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.547379] PCI: bus0: Fast back to back transfers disabled
[    1.547387] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.547394] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.547402] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.547484] pci 0000:01:00.0: [168c:002e] type 00 class 0x028000
[    1.547507] pci 0000:01:00.0: reg 0x10: [mem 0xe8000000-0xe800ffff 64bit]
[    1.547615] pci 0000:01:00.0: supports D1
[    1.547620] pci 0000:01:00.0: PME# supported from D0 D1 D3hot
[    1.547730] pci 0000:00:01.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.631937] PCI: bus2: Fast back to back transfers enabled
[    1.631945] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
[    1.632655] PCI: bus3: Fast back to back transfers enabled
[    1.632662] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
[    1.632694] pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
[    1.632702] pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
[    1.632710] pci 0000:00:01.0: BAR 6: assigned [mem 0xe0100000-0xe01007ff pref]
[    1.632718] pci 0000:00:02.0: BAR 6: assigned [mem 0xe0500000-0xe05007ff pref]
[    1.632726] pci 0000:00:03.0: BAR 6: assigned [mem 0xe0600000-0xe06007ff pref]
[    1.632734] pci 0000:01:00.0: BAR 0: assigned [mem 0xe0000000-0xe000ffff 64bit]
[    1.632741] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
[    1.632746] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[    1.632752] pci 0000:00:01.0: PCI bridge to [bus 01]
[    1.632760] pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xe00fffff]
[    1.632769] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0200000-0xe03fffff 64bit]
[    1.632776] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
[    1.632782] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[    1.632788] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0400000-0xe040ffff pref]
[    1.632793] pci 0000:00:02.0: PCI bridge to [bus 02]
[    1.632800] pci 0000:00:02.0:   bridge window [mem 0xe0200000-0xe04fffff]
[    1.632807] pci 0000:00:03.0: PCI bridge to [bus 03]

(and then later, still):
[    3.476364] pci 0000:00:01.0: enabling device (0140 -> 0142)
[    3.477542] ata1: SATA link down (SStatus 0 SControl 300)
[    3.482126] ath9k 0000:01:00.0: enabling device (0000 -> 0002)
[    3.487487] ata2: SATA link down (SStatus 0 SControl 300)
[    3.493379] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
[    3.505891] ath: phy0: Unable to initialize hardware; initialization status: -95
[    3.513325] ath9k 0000:01:00.0: Failed to initialize device
[    3.518933] ath9k: probe of 0000:01:00.0 failed with error -95
[    3.524862] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
[    3.531904] pci 0000:00:02.0: enabling device (0140 -> 0142)
[    3.537590] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[    3.577436] ath10k_pci 0000:02:00.0: failed to wake up device : -110
[    3.583948] ath10k_pci: probe of 0000:02:00.0 failed with error -110


-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 10:09               ` Pali Rohár
@ 2020-10-29 10:56                 ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-29 11:12                 ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 62+ messages in thread
From: ™֟☻̭҇ Ѽ ҉ ® @ 2020-10-29 10:56 UTC (permalink / raw)
  To: Pali Rohár, Bjorn Helgaas
  Cc: Toke Høiland-Jørgensen, linux-pci, linux-arm-kernel,
	Rob Herring, Ilias Apalodimas, Marek Behún,
	Thomas Petazzoni, Jason Cooper

On 29/10/2020 11:09, Pali Rohár wrote:
> Hello!
>
> On Wednesday 28 October 2020 18:16:26 Bjorn Helgaas wrote:
>> [+cc Pali, Marek, Thomas, Jason]
>>
>> On Wed, Oct 28, 2020 at 04:40:00PM +0000, ™֟☻̭҇ Ѽ ҉ ® wrote:
>>> On 28/10/2020 16:08, Toke Høiland-Jørgensen wrote:
>>>> Bjorn Helgaas <helgaas@kernel.org> writes:
>>>>> On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
>>>>>> Toke Høiland-Jørgensen <toke@redhat.com> writes:
>>>>>>> Bjorn Helgaas <helgaas@kernel.org> writes:
>>>>>>>
>>>>>>>> [+cc vtolkm]
>>>>>>>>
>>>>>>>> On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
>>>>>>>>> Hi everyone
>>>>>>>>>
>>>>>>>>> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
>>>>>>>>> having some trouble getting the PCI bus to work correctly. Specifically,
>>>>>>>>> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
>>>>>>>>> the resource request fix[0] applied on top.
>>>>>>>>>
>>>>>>>>> The kernel boots fine, and the patch in [0] makes the PCI devices show
>>>>>>>>> up. But I'm still getting initialisation errors like these:
>>>>>>>>>
>>>>>>>>> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>>>>>>>>> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>>>>>>>>> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
>>>>>>>>> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>>>>>>>>>
>>>>>>>>> and the WiFi drivers fail to initialise with what appears to me to be
>>>>>>>>> errors related to the bus rather than to the drivers themselves:
>>>>>>>>>
>>>>>>>>> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>>>>>>>>> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
>>>>>>>>> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
>>>>>>>>> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
>>>>>>>>> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
>>>>>>>>> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
>>>>>>>>> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>>>>>>>>> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
>>>>>>>>> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>>>>>>>>>
>>>>>>>>> lspci looks OK, though:
>>>>>>>>>
>>>>>>>>> # lspci
>>>>>>>>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>>>>>>>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>>>>>>>> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>>>>>>>> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
>>>>>>>>> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
>>>>>>>>>
>>>>>>>>> Does anyone have any clue what could be going on here? Is this a bug, or
>>>>>>>>> did I miss something in my config or other initialisation? I've tried
>>>>>>>>> with both the stock u-boot distributed with the board, and with an
>>>>>>>>> upstream u-boot from latest master; doesn't seem to make any different.
>>>>>>>> Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
>>>>>>>> report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
>>>>>>>> don't think we have a fix yet.
>>>>>>> Yes! Turning that off does indeed help! Thanks a bunch :)
> I have been testing mainline kernel on Turris Omnia with two PCIe
> default cards (WLE200 and WLE900) and it worked fine. But I do not know
> if I had ASPM enabled or not.
>
> So it is working fine for you when CONFIG_PCIEASPM is disabled and whole
> issue is only when CONFIG_PCIEASPM is enabled?

Yes, that is the gist of it.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 10:09               ` Pali Rohár
  2020-10-29 10:56                 ` ™֟☻̭҇ Ѽ ҉ ®
@ 2020-10-29 11:12                 ` Toke Høiland-Jørgensen
  2020-10-29 19:30                   ` Bjorn Helgaas
  1 sibling, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-29 11:12 UTC (permalink / raw)
  To: Pali Rohár, Bjorn Helgaas
  Cc: vtolkm, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper

Pali Rohár <pali@kernel.org> writes:

> Hello!
>
> On Wednesday 28 October 2020 18:16:26 Bjorn Helgaas wrote:
>> [+cc Pali, Marek, Thomas, Jason]
>> 
>> On Wed, Oct 28, 2020 at 04:40:00PM +0000, ™֟☻̭҇ Ѽ ҉ ® wrote:
>> > On 28/10/2020 16:08, Toke Høiland-Jørgensen wrote:
>> > > Bjorn Helgaas <helgaas@kernel.org> writes:
>> > > > On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
>> > > > > Toke Høiland-Jørgensen <toke@redhat.com> writes:
>> > > > > > Bjorn Helgaas <helgaas@kernel.org> writes:
>> > > > > > 
>> > > > > > > [+cc vtolkm]
>> > > > > > > 
>> > > > > > > On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
>> > > > > > > > Hi everyone
>> > > > > > > > 
>> > > > > > > > I'm trying to get a mainline kernel to run on my Turris Omnia, and am
>> > > > > > > > having some trouble getting the PCI bus to work correctly. Specifically,
>> > > > > > > > I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
>> > > > > > > > the resource request fix[0] applied on top.
>> > > > > > > > 
>> > > > > > > > The kernel boots fine, and the patch in [0] makes the PCI devices show
>> > > > > > > > up. But I'm still getting initialisation errors like these:
>> > > > > > > > 
>> > > > > > > > [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>> > > > > > > > [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> > > > > > > > [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
>> > > > > > > > [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> > > > > > > > 
>> > > > > > > > and the WiFi drivers fail to initialise with what appears to me to be
>> > > > > > > > errors related to the bus rather than to the drivers themselves:
>> > > > > > > > 
>> > > > > > > > [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>> > > > > > > > [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
>> > > > > > > > [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
>> > > > > > > > [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
>> > > > > > > > [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
>> > > > > > > > [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
>> > > > > > > > [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>> > > > > > > > [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
>> > > > > > > > [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>> > > > > > > > 
>> > > > > > > > lspci looks OK, though:
>> > > > > > > > 
>> > > > > > > > # lspci
>> > > > > > > > 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> > > > > > > > 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> > > > > > > > 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> > > > > > > > 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
>> > > > > > > > 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
>> > > > > > > > 
>> > > > > > > > Does anyone have any clue what could be going on here? Is this a bug, or
>> > > > > > > > did I miss something in my config or other initialisation? I've tried
>> > > > > > > > with both the stock u-boot distributed with the board, and with an
>> > > > > > > > upstream u-boot from latest master; doesn't seem to make any different.
>> > > > > > > Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
>> > > > > > > report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
>> > > > > > > don't think we have a fix yet.
>> > > > > > Yes! Turning that off does indeed help! Thanks a bunch :)
>
> I have been testing mainline kernel on Turris Omnia with two PCIe
> default cards (WLE200 and WLE900) and it worked fine. But I do not know
> if I had ASPM enabled or not.
>
> So it is working fine for you when CONFIG_PCIEASPM is disabled and whole
> issue is only when CONFIG_PCIEASPM is enabled?

Yup, exactly. And I'm also currently testing with the default WLE200/900
cards... I just tried sticking an MT76-based WiFi card into the third
PCI slot, and that doesn't come up either when I enable PCIEASPM.

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 10:41               ` Toke Høiland-Jørgensen
@ 2020-10-29 11:18                 ` ™֟☻̭҇ Ѽ ҉ ®
  0 siblings, 0 replies; 62+ messages in thread
From: ™֟☻̭҇ Ѽ ҉ ® @ 2020-10-29 11:18 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Bjorn Helgaas
  Cc: linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Pali Rohár, Marek Behún, Thomas Petazzoni,
	Jason Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 12195 bytes --]


On 29/10/2020 11:41, Toke Høiland-Jørgensen wrote:
> Bjorn Helgaas <helgaas@kernel.org> writes:
>
>> [+cc Pali, Marek, Thomas, Jason]
>>
>> On Wed, Oct 28, 2020 at 04:40:00PM +0000, ™֟☻̭҇ Ѽ ҉ ® wrote:
>>> On 28/10/2020 16:08, Toke Høiland-Jørgensen wrote:
>>>> Bjorn Helgaas <helgaas@kernel.org> writes:
>>>>> On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
>>>>>> Toke Høiland-Jørgensen <toke@redhat.com> writes:
>>>>>>> Bjorn Helgaas <helgaas@kernel.org> writes:
>>>>>>>
>>>>>>>> [+cc vtolkm]
>>>>>>>>
>>>>>>>> On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
>>>>>>>>> Hi everyone
>>>>>>>>>
>>>>>>>>> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
>>>>>>>>> having some trouble getting the PCI bus to work correctly. Specifically,
>>>>>>>>> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
>>>>>>>>> the resource request fix[0] applied on top.
>>>>>>>>>
>>>>>>>>> The kernel boots fine, and the patch in [0] makes the PCI devices show
>>>>>>>>> up. But I'm still getting initialisation errors like these:
>>>>>>>>>
>>>>>>>>> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>>>>>>>>> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>>>>>>>>> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
>>>>>>>>> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>>>>>>>>>
>>>>>>>>> and the WiFi drivers fail to initialise with what appears to me to be
>>>>>>>>> errors related to the bus rather than to the drivers themselves:
>>>>>>>>>
>>>>>>>>> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>>>>>>>>> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
>>>>>>>>> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
>>>>>>>>> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
>>>>>>>>> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
>>>>>>>>> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
>>>>>>>>> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>>>>>>>>> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
>>>>>>>>> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>>>>>>>>>
>>>>>>>>> lspci looks OK, though:
>>>>>>>>>
>>>>>>>>> # lspci
>>>>>>>>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>>>>>>>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>>>>>>>> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>>>>>>>>> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
>>>>>>>>> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
>>>>>>>>>
>>>>>>>>> Does anyone have any clue what could be going on here? Is this a bug, or
>>>>>>>>> did I miss something in my config or other initialisation? I've tried
>>>>>>>>> with both the stock u-boot distributed with the board, and with an
>>>>>>>>> upstream u-boot from latest master; doesn't seem to make any different.
>>>>>>>> Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
>>>>>>>> report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
>>>>>>>> don't think we have a fix yet.
>>>>>>> Yes! Turning that off does indeed help! Thanks a bunch :)
>>>>>>>
>>>>>>> You mention that bisecting this would be helpful - I can try that
>>>>>>> tomorrow; any idea when this was last working?
>>>>>> OK, so I tried to bisect this, but, erm, I couldn't find a working
>>>>>> revision to start from? I went all the way back to 4.10 (which is the
>>>>>> first version to include the device tree file for the Omnia), and even
>>>>>> on that, the wireless cards were failing to initialise with ASPM
>>>>>> enabled...
>>>>> I have no personal experience with this device; all I know is that the
>>>>> bugzilla suggests that it worked in v5.4, which isn't much help.
>>>>>
>>>>> Possibly the apparent regression was really a .config change, i.e.,
>>>>> CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
>>>>> "worked" but got enabled later and it started failing?
>>>> Yeah, I suspect so. The OpenWrt config disables CONFIG_PCIEASPM by
>>>> default and only turns it on for specific targets. So I guess that it's
>>>> most likely that this has never worked...
>>>>
>>>>> Maybe the debug patch below would be worth trying to see if it makes
>>>>> any difference?  If it *does* help, try omitting the first hunk to see
>>>>> if we just need to apply the quirk_enable_clear_retrain_link() quirk.
>>>> Tried, doesn't help...
>>>>
>>>> -Toke
>>> Found this patch
>>>
>>> https://github.com/openwrt/openwrt/blob/7c0496f29bed87326f1bf591ca25ace82373cfc7/target/linux/mvebu/patches-5.4/405-PCI-aardvark-Improve-link-training.patch
>>>
>>> that mentions the Compex WLE900VX card, which reading the lspci verbose
>>> output from the bugtracker seems to the device being troubled.
>> Interesting.  Indeed, the Compex WLE900VX card seems to have the
>> Qualcomm Atheros QCA9880 on it, and it looks like Toke's system has
>> the same device in it.
>>
>> The patch you mention (https://git.kernel.org/linus/43fc679ced18) is
>> for aardvark, so of course doesn't help mvebu.
>>
>> PCIe hardware is supposed to automatically negotiate the highest link
>> speed supported by both ends.  But software *is* allowed to set an
>> upper limit (the Target Link Speed in Link Control 2).  If we initiate
>> a retrain and the link doesn't come back up, I wonder if we should try
>> to help the hardware out by using Target Link Speed to limit to a
>> lower speed and attempting another retrain, something like this hacky
>> patch: (please collect the dmesg log if you try this)
> Well, I tried it, but don't see any of the 'lnkcap2' output from that
> new function:
>
> [    1.545853] mvebu-pcie soc:pcie: host bridge /soc/pcie ranges:
> [    1.545878] mvebu-pcie soc:pcie:      MEM 0x00f1080000..0x00f1081fff -> 0x0000080000
> [    1.545894] mvebu-pcie soc:pcie:      MEM 0x00f1040000..0x00f1041fff -> 0x0000040000
> [    1.545907] mvebu-pcie soc:pcie:      MEM 0x00f1044000..0x00f1045fff -> 0x0000044000
> [    1.545920] mvebu-pcie soc:pcie:      MEM 0x00f1048000..0x00f1049fff -> 0x0000048000
> [    1.545933] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
> [    1.545945] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
> [    1.545958] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
> [    1.545970] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
> [    1.545982] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
> [    1.545994] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
> [    1.546006] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
> [    1.546014] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
> [    1.546181] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
> [    1.546190] pci_bus 0000:00: root bus resource [bus 00-ff]
> [    1.546197] pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] (bus address [0x00080000-0x00081fff])
> [    1.546204] pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] (bus address [0x00040000-0x00041fff])
> [    1.546210] pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] (bus address [0x00044000-0x00045fff])
> [    1.546216] pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] (bus address [0x00048000-0x00049fff])
> [    1.546220] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
> [    1.546225] pci_bus 0000:00: root bus resource [io  0x1000-0xeffff]
> [    1.546294] pci 0000:00:01.0: [11ab:6820] type 01 class 0x060400
> [    1.546308] pci 0000:00:01.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
> [    1.546482] pci 0000:00:02.0: [11ab:6820] type 01 class 0x060400
> [    1.546495] pci 0000:00:02.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
> [    1.546643] pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400
> [    1.546656] pci 0000:00:03.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
> [    1.547379] PCI: bus0: Fast back to back transfers disabled
> [    1.547387] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [    1.547394] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [    1.547402] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [    1.547484] pci 0000:01:00.0: [168c:002e] type 00 class 0x028000
> [    1.547507] pci 0000:01:00.0: reg 0x10: [mem 0xe8000000-0xe800ffff 64bit]
> [    1.547615] pci 0000:01:00.0: supports D1
> [    1.547620] pci 0000:01:00.0: PME# supported from D0 D1 D3hot
> [    1.547730] pci 0000:00:01.0: ASPM: current common clock configuration is inconsistent, reconfiguring
> [    1.631937] PCI: bus2: Fast back to back transfers enabled
> [    1.631945] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
> [    1.632655] PCI: bus3: Fast back to back transfers enabled
> [    1.632662] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
> [    1.632694] pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
> [    1.632702] pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
> [    1.632710] pci 0000:00:01.0: BAR 6: assigned [mem 0xe0100000-0xe01007ff pref]
> [    1.632718] pci 0000:00:02.0: BAR 6: assigned [mem 0xe0500000-0xe05007ff pref]
> [    1.632726] pci 0000:00:03.0: BAR 6: assigned [mem 0xe0600000-0xe06007ff pref]
> [    1.632734] pci 0000:01:00.0: BAR 0: assigned [mem 0xe0000000-0xe000ffff 64bit]
> [    1.632741] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
> [    1.632746] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> [    1.632752] pci 0000:00:01.0: PCI bridge to [bus 01]
> [    1.632760] pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xe00fffff]
> [    1.632769] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0200000-0xe03fffff 64bit]
> [    1.632776] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
> [    1.632782] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> [    1.632788] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0400000-0xe040ffff pref]
> [    1.632793] pci 0000:00:02.0: PCI bridge to [bus 02]
> [    1.632800] pci 0000:00:02.0:   bridge window [mem 0xe0200000-0xe04fffff]
> [    1.632807] pci 0000:00:03.0: PCI bridge to [bus 03]
>
> (and then later, still):
> [    3.476364] pci 0000:00:01.0: enabling device (0140 -> 0142)
> [    3.477542] ata1: SATA link down (SStatus 0 SControl 300)
> [    3.482126] ath9k 0000:01:00.0: enabling device (0000 -> 0002)
> [    3.487487] ata2: SATA link down (SStatus 0 SControl 300)
> [    3.493379] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
> [    3.505891] ath: phy0: Unable to initialize hardware; initialization status: -95
> [    3.513325] ath9k 0000:01:00.0: Failed to initialize device
> [    3.518933] ath9k: probe of 0000:01:00.0 failed with error -95
> [    3.524862] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
> [    3.531904] pci 0000:00:02.0: enabling device (0140 -> 0142)
> [    3.537590] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> [    3.577436] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> [    3.583948] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>
>
> -Toke
>

Same result my end - run tested with next-20201027

N.B. node does not boot anymore with next-20201028, but that that is 
independent of this patch and apparently another issue.

[-- Attachment #1.1.2: OpenPGP_0x729CFF47A416598B.asc --]
[-- Type: application/pgp-keys, Size: 3163 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-28 15:08         ` Toke Høiland-Jørgensen
  2020-10-28 16:40           ` ™֟☻̭҇ Ѽ ҉ ®
@ 2020-10-29 15:12           ` Rob Herring
  1 sibling, 0 replies; 62+ messages in thread
From: Rob Herring @ 2020-10-29 15:12 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Bjorn Helgaas, PCI, linux-arm-kernel, Ilias Apalodimas, vtolkm

On Wed, Oct 28, 2020 at 10:08 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Bjorn Helgaas <helgaas@kernel.org> writes:
>
> > On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
> >> Toke Høiland-Jørgensen <toke@redhat.com> writes:
> >>
> >> > Bjorn Helgaas <helgaas@kernel.org> writes:
> >> >
> >> >> [+cc vtolkm]
> >> >>
> >> >> On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
> >> >>> Hi everyone
> >> >>>
> >> >>> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
> >> >>> having some trouble getting the PCI bus to work correctly. Specifically,
> >> >>> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
> >> >>> the resource request fix[0] applied on top.
> >> >>>
> >> >>> The kernel boots fine, and the patch in [0] makes the PCI devices show
> >> >>> up. But I'm still getting initialisation errors like these:
> >> >>>
> >> >>> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
> >> >>> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> >> >>> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
> >> >>> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> >> >>>
> >> >>> and the WiFi drivers fail to initialise with what appears to me to be
> >> >>> errors related to the bus rather than to the drivers themselves:
> >> >>>
> >> >>> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
> >> >>> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
> >> >>> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
> >> >>> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
> >> >>> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
> >> >>> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
> >> >>> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> >> >>> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> >> >>> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> >> >>>
> >> >>> lspci looks OK, though:
> >> >>>
> >> >>> # lspci
> >> >>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> >> >>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> >> >>> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> >> >>> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
> >> >>> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
> >> >>>
> >> >>> Does anyone have any clue what could be going on here? Is this a bug, or
> >> >>> did I miss something in my config or other initialisation? I've tried
> >> >>> with both the stock u-boot distributed with the board, and with an
> >> >>> upstream u-boot from latest master; doesn't seem to make any different.
> >> >>
> >> >> Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
> >> >> report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
> >> >> don't think we have a fix yet.
> >> >
> >> > Yes! Turning that off does indeed help! Thanks a bunch :)
> >> >
> >> > You mention that bisecting this would be helpful - I can try that
> >> > tomorrow; any idea when this was last working?
> >>
> >> OK, so I tried to bisect this, but, erm, I couldn't find a working
> >> revision to start from? I went all the way back to 4.10 (which is the
> >> first version to include the device tree file for the Omnia), and even
> >> on that, the wireless cards were failing to initialise with ASPM
> >> enabled...
> >
> > I have no personal experience with this device; all I know is that the
> > bugzilla suggests that it worked in v5.4, which isn't much help.
> >
> > Possibly the apparent regression was really a .config change, i.e.,
> > CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
> > "worked" but got enabled later and it started failing?
>
> Yeah, I suspect so. The OpenWrt config disables CONFIG_PCIEASPM by
> default and only turns it on for specific targets. So I guess that it's
> most likely that this has never worked...

FYI, there's a bugzilla for this:

https://bugzilla.kernel.org/show_bug.cgi?id=209833

Rob

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 11:12                 ` Toke Høiland-Jørgensen
@ 2020-10-29 19:30                   ` Bjorn Helgaas
  2020-10-29 19:56                     ` ™֟☻̭҇ Ѽ ҉ ®
                                       ` (4 more replies)
  0 siblings, 5 replies; 62+ messages in thread
From: Bjorn Helgaas @ 2020-10-29 19:30 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Pali Rohár, vtolkm, linux-pci, linux-arm-kernel,
	Rob Herring, Ilias Apalodimas, Marek Behún,
	Thomas Petazzoni, Jason Cooper

On Thu, Oct 29, 2020 at 12:12:21PM +0100, Toke Høiland-Jørgensen wrote:
> Pali Rohár <pali@kernel.org> writes:

> > I have been testing mainline kernel on Turris Omnia with two PCIe
> > default cards (WLE200 and WLE900) and it worked fine. But I do not know
> > if I had ASPM enabled or not.
> >
> > So it is working fine for you when CONFIG_PCIEASPM is disabled and whole
> > issue is only when CONFIG_PCIEASPM is enabled?
> 
> Yup, exactly. And I'm also currently testing with the default WLE200/900
> cards... I just tried sticking an MT76-based WiFi card into the third
> PCI slot, and that doesn't come up either when I enable PCIEASPM.

Huh.  So IIUC, the following cases all try to retrain the link and it
fails to come up again:

  - aardvark + WLE900VX (see commit 43fc679ced18)
  - mvebu + WLE200
  - mvebu + WLE900
  - mvebu + MT76

In all these cases, Linux was able to enumerate the NIC, which means
the link was up when firmware handed it off.

I think Linux decided the Common Clock Configuration was wrong, so it
tried to fix it and retrain the link, and the link didn't come back
up.

I don't have "lspci -vv" output from all of them, but in vtolkm's
case, the firmware handed off with:

  00:02.0 Root Port to [bus 02]  SlotClk+ CommClk+
  02:00.0 QCA986x/988x NIC       SlotClk+ CommClk-

Per spec (PCIe r5, sec 7.5.3.7), SlotClk is HwInit and CommClk is RW
and should power up as 0.  If I'm reading the implementation note
correctly, if SlotClk is set on both ends of the link, software should
set CommClk, so the config above *does* look wrong, and CommClk+ on
the Root Port suggests that firmware set it.

I think both the aardvark and mvebu systems probably use U-Boot.  I
don't know U-Boot at all, but I don't see anything in it that touches
Link Control.  I'm curious what happens if you put one of these cards
in a PC.  If anybody tries it, please collect the "sudo lspci -vv" and
dmesg output.

We could quirk these NICs to avoid the retrain, but since aardvark and
mvebu have no obvious connection and WLE200/WLE900 and MT76 have no
obvious connection, I doubt there's a simple hardware defect that
explains all these.  

Maybe we're doing something wrong in the retrain, but obviously the
link came up in the first place.  AFAIK the only thing we're changing
is the CommClk setting, and that looks legitimate per spec.

Another experiment: build kernel without CONFIG_PCIEASPM, set $ROOT
and $NIC appropriately, and try the following:

  # Set $ROOT and $NIC (update to match your system):

    # ROOT=00:02.0
    # NIC=02:00.0

  # Dump the Root Port and NIC Link registers:

    # setpci -s$ROOT CAP_EXP+0xc.l              # Link Capabilities
    # setpci -s$ROOT CAP_EXP+0x10.w             # Link Control
    # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status

    # setpci -s$NIC  CAP_EXP+0xc.l              # Link Capabilities
    # setpci -s$NIC  CAP_EXP+0x10.w             # Link Control
    # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status

  # Retrain the link:

    # setpci -s$ROOT CAP_EXP+0x10.w=0x0020      # Link Control Retrain Link
    # sleep 1
    # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status
    # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status

  # Set CommClk+ and retrain the link:

    # setpci -s$NIC  CAP_EXP+0x10.w=0x0040      # Link Control Common Clock
    # setpci -s$ROOT CAP_EXP+0x10.w=0x0040      # Link Control Common Clock
    # setpci -s$ROOT CAP_EXP+0x10.w=0x0060      # Link Control RL + CC
    # sleep 1
    # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status
    # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 19:30                   ` Bjorn Helgaas
@ 2020-10-29 19:56                     ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-29 19:57                     ` Andrew Lunn
                                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 62+ messages in thread
From: ™֟☻̭҇ Ѽ ҉ ® @ 2020-10-29 19:56 UTC (permalink / raw)
  To: Bjorn Helgaas, Toke Høiland-Jørgensen
  Cc: Pali Rohár, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 4820 bytes --]

On 29/10/2020 20:30, Bjorn Helgaas wrote:
> On Thu, Oct 29, 2020 at 12:12:21PM +0100, Toke Høiland-Jørgensen wrote:
>> Pali Rohár <pali@kernel.org> writes:
>>> I have been testing mainline kernel on Turris Omnia with two PCIe
>>> default cards (WLE200 and WLE900) and it worked fine. But I do not know
>>> if I had ASPM enabled or not.
>>>
>>> So it is working fine for you when CONFIG_PCIEASPM is disabled and whole
>>> issue is only when CONFIG_PCIEASPM is enabled?
>> Yup, exactly. And I'm also currently testing with the default WLE200/900
>> cards... I just tried sticking an MT76-based WiFi card into the third
>> PCI slot, and that doesn't come up either when I enable PCIEASPM.
> Huh.  So IIUC, the following cases all try to retrain the link and it
> fails to come up again:
>
>    - aardvark + WLE900VX (see commit 43fc679ced18)
>    - mvebu + WLE200
>    - mvebu + WLE900
>    - mvebu + MT76
>
> In all these cases, Linux was able to enumerate the NIC, which means
> the link was up when firmware handed it off.
>
> I think Linux decided the Common Clock Configuration was wrong, so it
> tried to fix it and retrain the link, and the link didn't come back
> up.
>
> I don't have "lspci -vv" output from all of them, but in vtolkm's
> case, the firmware handed off with:
>
>    00:02.0 Root Port to [bus 02]  SlotClk+ CommClk+
>    02:00.0 QCA986x/988x NIC       SlotClk+ CommClk-
>
> Per spec (PCIe r5, sec 7.5.3.7), SlotClk is HwInit and CommClk is RW
> and should power up as 0.  If I'm reading the implementation note
> correctly, if SlotClk is set on both ends of the link, software should
> set CommClk, so the config above *does* look wrong, and CommClk+ on
> the Root Port suggests that firmware set it.
>
> I think both the aardvark and mvebu systems probably use U-Boot.  I
> don't know U-Boot at all, but I don't see anything in it that touches
> Link Control.  I'm curious what happens if you put one of these cards
> in a PC.  If anybody tries it, please collect the "sudo lspci -vv" and
> dmesg output.
>
> We could quirk these NICs to avoid the retrain, but since aardvark and
> mvebu have no obvious connection and WLE200/WLE900 and MT76 have no
> obvious connection, I doubt there's a simple hardware defect that
> explains all these.
>
> Maybe we're doing something wrong in the retrain, but obviously the
> link came up in the first place.  AFAIK the only thing we're changing
> is the CommClk setting, and that looks legitimate per spec.
>
> Another experiment: build kernel without CONFIG_PCIEASPM, set $ROOT
> and $NIC appropriately, and try the following:
>
>    # Set $ROOT and $NIC (update to match your system):
>
>      # ROOT=00:02.0
>      # NIC=02:00.0
>
>    # Dump the Root Port and NIC Link registers:
>
>      # setpci -s$ROOT CAP_EXP+0xc.l              # Link Capabilities
>      # setpci -s$ROOT CAP_EXP+0x10.w             # Link Control
>      # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status
>
>      # setpci -s$NIC  CAP_EXP+0xc.l              # Link Capabilities
>      # setpci -s$NIC  CAP_EXP+0x10.w             # Link Control
>      # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status
>
>    # Retrain the link:
>
>      # setpci -s$ROOT CAP_EXP+0x10.w=0x0020      # Link Control Retrain Link
>      # sleep 1
>      # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status
>      # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status
>
>    # Set CommClk+ and retrain the link:
>
>      # setpci -s$NIC  CAP_EXP+0x10.w=0x0040      # Link Control Common Clock
>      # setpci -s$ROOT CAP_EXP+0x10.w=0x0040      # Link Control Common Clock
>      # setpci -s$ROOT CAP_EXP+0x10.w=0x0060      # Link Control RL + CC
>      # sleep 1
>      # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status
>      # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status

ROOT=00:02.0
NIC=02:00.0
setpci -s$ROOT CAP_EXP+0xc.l
0003ac12
setpci -s$ROOT CAP_EXP+0x10.w
0040
setpci -s$ROOT CAP_EXP+0x12.w
1011
setpci -s$NIC  CAP_EXP+0xc.l

00036c11
setpci -s$NIC  CAP_EXP+0x10.w
0000
setpci -s$NIC  CAP_EXP+0x12.w
1011
setpci -s$ROOT CAP_EXP+0x10.w=0x0020
sleep 1
setpci -s$ROOT CAP_EXP+0x12.w
1011
setpci -s$NIC  CAP_EXP+0x12.w
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there 
are no capabilities with that id.
setpci -s$NIC  CAP_EXP+0x10.w=0x0040
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there 
are no capabilities with that id.
setpci -s$ROOT CAP_EXP+0x10.w=0x0040
setpci -s$ROOT CAP_EXP+0x10.w=0x0060
sleep 1
setpci -s$ROOT CAP_EXP+0x12.w
1811
setpci -s$NIC  CAP_EXP+0x12.w
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there 
are no capabilities with that id.


[-- Attachment #1.1.2: OpenPGP_0x729CFF47A416598B.asc --]
[-- Type: application/pgp-keys, Size: 3163 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 19:30                   ` Bjorn Helgaas
  2020-10-29 19:56                     ` ™֟☻̭҇ Ѽ ҉ ®
@ 2020-10-29 19:57                     ` Andrew Lunn
  2020-10-29 21:55                       ` Thomas Petazzoni
  2020-10-29 20:18                     ` Toke Høiland-Jørgensen
                                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 62+ messages in thread
From: Andrew Lunn @ 2020-10-29 19:57 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Toke Høiland-Jørgensen, Rob Herring, Jason Cooper,
	Pali Rohár, Ilias Apalodimas, Marek Behún,
	Thomas Petazzoni, linux-pci, vtolkm, linux-arm-kernel

> We could quirk these NICs to avoid the retrain, but since aardvark and
> mvebu have no obvious connection

Both are Mavell. There could be some shared IP.

     Andrew

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 19:30                   ` Bjorn Helgaas
  2020-10-29 19:56                     ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-29 19:57                     ` Andrew Lunn
@ 2020-10-29 20:18                     ` Toke Høiland-Jørgensen
  2020-10-29 22:09                       ` Toke Høiland-Jørgensen
  2020-10-29 20:58                     ` Marek Behun
  2020-10-29 21:54                     ` Thomas Petazzoni
  4 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-29 20:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Pali Rohár, vtolkm, linux-pci, linux-arm-kernel,
	Rob Herring, Ilias Apalodimas, Marek Behún,
	Thomas Petazzoni, Jason Cooper

Bjorn Helgaas <helgaas@kernel.org> writes:

> Another experiment: build kernel without CONFIG_PCIEASPM, set $ROOT
> and $NIC appropriately, and try the following:
>
>   # Set $ROOT and $NIC (update to match your system):
>
>     # ROOT=00:02.0
>     # NIC=02:00.0

(these matched the ath10k card, so just went with that)

>   # Dump the Root Port and NIC Link registers:
>
>     # setpci -s$ROOT CAP_EXP+0xc.l              # Link Capabilities
>     # setpci -s$ROOT CAP_EXP+0x10.w             # Link Control
>     # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status

# setpci -s$ROOT CAP_EXP+0xc.l
0003ac12
# setpci -s$ROOT CAP_EXP+0x10.w
0040
# setpci -s$ROOT CAP_EXP+0x12.w
1011

>     # setpci -s$NIC  CAP_EXP+0xc.l              # Link Capabilities
>     # setpci -s$NIC  CAP_EXP+0x10.w             # Link Control
>     # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status

# setpci -s$NIC CAP_EXP+0xc.l
00036c11
# setpci -s$NIC CAP_EXP+0x10.w
0000
# setpci -s$NIC CAP_EXP+0x12.w
1011

>   # Retrain the link:
>
>     # setpci -s$ROOT CAP_EXP+0x10.w=0x0020      # Link Control Retrain Link
>     # sleep 1
>     # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status
>     # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status

# setpci -s$ROOT CAP_EXP+0x10.w=0x0020
# sleep 1
# setpci -s$ROOT CAP_EXP+0x12.w
1011
# setpci -s$NIC CAP_EXP+0x12.w
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there are no capabilities with that id.
# setpci -s$ROOT CAP_EXP+0x10.w
0000

(nothing in the dmesg either) - rebooted before trying the below:

>   # Set CommClk+ and retrain the link:
>
>     # setpci -s$NIC  CAP_EXP+0x10.w=0x0040      # Link Control Common Clock
>     # setpci -s$ROOT CAP_EXP+0x10.w=0x0040      # Link Control Common Clock
>     # setpci -s$ROOT CAP_EXP+0x10.w=0x0060      # Link Control RL + CC
>     # sleep 1
>     # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status
>     # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status

# setpci -s$NIC CAP_EXP+0x10.w=0x0040
# setpci -s$ROOT CAP_EXP+0x10.w=0x0040
# setpci -s$ROOT CAP_EXP+0x10.w=0x0060
# sleep 1
# setpci -s$ROOT CAP_EXP+0x12.w
1011
# setpci -s$NIC CAP_EXP+0x12.w
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there are no capabilities with that id.

# lspci -v
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04) (prog-if 00 [Normal decode])
        Device tree node: /sys/firmware/devicetree/base/soc/pcie/pcie@1,0
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: [disabled]
        Memory behind bridge: e0000000-e00fffff [size=1M]
        Prefetchable memory behind bridge: 00000000-000fffff [size=1M]
        Expansion ROM at e0100000 [virtual] [disabled] [size=2K]
        Capabilities: [40] Express Root Port (Slot+), MSI 00
lspci: Unable to load libkmod resources: error -12

00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04) (prog-if 00 [Normal decode])
        Device tree node: /sys/firmware/devicetree/base/soc/pcie/pcie@2,0
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        I/O behind bridge: [disabled]
        Memory behind bridge: e0200000-e04fffff [size=3M]
        Prefetchable memory behind bridge: 00000000-000fffff [size=1M]
        Expansion ROM at e0500000 [virtual] [disabled] [size=2K]
        Capabilities: [40] Express Root Port (Slot+), MSI 00

00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04) (prog-if 00 [Normal decode])
        Device tree node: /sys/firmware/devicetree/base/soc/pcie/pcie@3,0
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
        I/O behind bridge: [disabled]
        Memory behind bridge: e0600000-e07fffff [size=2M]
        Prefetchable memory behind bridge: 00000000-000fffff [size=1M]
        Expansion ROM at e0800000 [virtual] [disabled] [size=2K]
        Capabilities: [40] Express Root Port (Slot+), MSI 00

01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
        Subsystem: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express)
        Flags: bus master, fast devsel, latency 0, IRQ 60
        Memory at e0000000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
        Capabilities: [60] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 00-15-17-ff-ff-24-14-12
        Capabilities: [170] Power Budgeting <?>
        Kernel driver in use: ath9k

02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff) (prog-if ff)
        !!! Unknown header type 7f
        Kernel driver in use: ath10k_pci

03:00.0 Network controller: MEDIATEK Corp. Device 7612
        Subsystem: MEDIATEK Corp. Device 7612
        Flags: bus master, fast devsel, latency 0, IRQ 63
        Memory at e0600000 (64-bit, non-prefetchable) [size=1M]
        Expansion ROM at e0700000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [158] Latency Tolerance Reporting
        Capabilities: [160] L1 PM Substates
        Kernel driver in use: mt76x2e


-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 19:30                   ` Bjorn Helgaas
                                       ` (2 preceding siblings ...)
  2020-10-29 20:18                     ` Toke Høiland-Jørgensen
@ 2020-10-29 20:58                     ` Marek Behun
  2020-10-30 10:08                       ` Pali Rohár
  2020-10-29 21:54                     ` Thomas Petazzoni
  4 siblings, 1 reply; 62+ messages in thread
From: Marek Behun @ 2020-10-29 20:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Toke Høiland-Jørgensen, Pali Rohár, vtolkm,
	linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Thomas Petazzoni, Jason Cooper

On Thu, 29 Oct 2020 14:30:22 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> On Thu, Oct 29, 2020 at 12:12:21PM +0100, Toke Høiland-Jørgensen wrote:
> > Pali Rohár <pali@kernel.org> writes:  
> 
> > > I have been testing mainline kernel on Turris Omnia with two PCIe
> > > default cards (WLE200 and WLE900) and it worked fine. But I do not know
> > > if I had ASPM enabled or not.
> > >
> > > So it is working fine for you when CONFIG_PCIEASPM is disabled and whole
> > > issue is only when CONFIG_PCIEASPM is enabled?  
> > 
> > Yup, exactly. And I'm also currently testing with the default WLE200/900
> > cards... I just tried sticking an MT76-based WiFi card into the third
> > PCI slot, and that doesn't come up either when I enable PCIEASPM.  
> 
> Huh.  So IIUC, the following cases all try to retrain the link and it
> fails to come up again:
> 
>   - aardvark + WLE900VX (see commit 43fc679ced18)
>   - mvebu + WLE200
>   - mvebu + WLE900
>   - mvebu + MT76

Bjorn, IIRC Pali's patches fix the WLE900VX card for Aardvark (both in
kernel and in U-Boot).
IMO mvebu has similar issues. Both these drivers handle the PCIe reset
signal incorrectly (or at least Aardvark did before Pali's work).

mvebu is used on Turris Omnia, and our HW guys first solved the WLE900VX
not working issue by using different capacitors for the SerDeses (this
was 5 years ago). But after Pali's work on Aardvark I think this could
also be solved for mvebu driver in software.

BTW the WLE900VX card has problems on many systems, it won't work for
example on Thinkpad X230. There is a bug on kernel bugzilla reported
for this.

My opinion is that many drivers do not respect the PCIe specification
for reset and link training totally correctly (Pali was talking about
this when he was looking at Aardvark) and that WLE900VX has a bug that
in combination with those drivers causes the fail. If you look at the
drivers, they are incompatible in how they handle the reset signal and
link training.

I am curious what Pali will tell us, he said that he will look into the
mvebu driver.

Marek

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 19:30                   ` Bjorn Helgaas
                                       ` (3 preceding siblings ...)
  2020-10-29 20:58                     ` Marek Behun
@ 2020-10-29 21:54                     ` Thomas Petazzoni
  2020-10-29 23:15                       ` Toke Høiland-Jørgensen
  4 siblings, 1 reply; 62+ messages in thread
From: Thomas Petazzoni @ 2020-10-29 21:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Toke Høiland-Jørgensen, Pali Rohár, vtolkm,
	linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Marek Behún, Jason Cooper

Hello,

On Thu, 29 Oct 2020 14:30:22 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> We could quirk these NICs to avoid the retrain, but since aardvark and
> mvebu have no obvious connection and WLE200/WLE900 and MT76 have no
> obvious connection, I doubt there's a simple hardware defect that
> explains all these.  

aardvark and mvebu have one very strong connection: they are the only
two drivers making use of the PCI Bridge emulation logic in
drivers/pci/pci-bridge-emul.c:

drivers/pci$ git grep pci-bridge-emul
akefile:obj-$(CONFIG_PCI_BRIDGE_EMUL)  += pci-bridge-emul.o
controller/pci-aardvark.c:#include "../pci-bridge-emul.h"
controller/pci-mvebu.c:#include "../pci-bridge-emul.h"
pci-bridge-emul.c:#include "pci-bridge-emul.h"

I haven't read the whole thread, but it is important to keep in mind
that on those two platforms, the PCI Bridge seen by Linux is *not* a
real HW bridge. It is faked by the the pci-bridge-emul code. So if this
code has defects/bugs in how it emulates a PCI Bridge behavior, you
might see weird things.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 19:57                     ` Andrew Lunn
@ 2020-10-29 21:55                       ` Thomas Petazzoni
  0 siblings, 0 replies; 62+ messages in thread
From: Thomas Petazzoni @ 2020-10-29 21:55 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Bjorn Helgaas, Toke Høiland-Jørgensen, Rob Herring,
	Jason Cooper, Pali Rohár, Ilias Apalodimas,
	Marek Behún, linux-pci, vtolkm, linux-arm-kernel

On Thu, 29 Oct 2020 20:57:31 +0100
Andrew Lunn <andrew@lunn.ch> wrote:

> > We could quirk these NICs to avoid the retrain, but since aardvark and
> > mvebu have no obvious connection  
> 
> Both are Mavell. There could be some shared IP.

From my experience, even though both are from Marvell, they are really
different IP blocks, made by different teams, used in different SoCs.

However, as I replied to Bjorn, both use the PCI Bridge emulation logic.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 20:18                     ` Toke Høiland-Jørgensen
@ 2020-10-29 22:09                       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-29 22:09 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Pali Rohár, vtolkm, linux-pci, linux-arm-kernel,
	Rob Herring, Ilias Apalodimas, Marek Behún,
	Thomas Petazzoni, Jason Cooper

Toke Høiland-Jørgensen <toke@redhat.com> writes:

> Bjorn Helgaas <helgaas@kernel.org> writes:
>
>> Another experiment: build kernel without CONFIG_PCIEASPM, set $ROOT
>> and $NIC appropriately, and try the following:
>>
>>   # Set $ROOT and $NIC (update to match your system):
>>
>>     # ROOT=00:02.0
>>     # NIC=02:00.0
>
> (these matched the ath10k card, so just went with that)

And since Marek's latest email mentioned that the WLE900 is especially
problematic, I also tried with the other slot that has the mt76 in it:

# ROOT=00:03.0
# NIC=03:00.0
# setpci -s$ROOT CAP_EXP+0xc.l
0003ac12
# setpci -s$ROOT CAP_EXP+0x10.w
0040
# setpci -s$ROOT CAP_EXP+0x12.w
1011
# setpci -s$NIC CAP_EXP+0xc.l
0047dc11
# setpci -s$NIC CAP_EXP+0x10.w
0000
# setpci -s$NIC CAP_EXP+0x12.w
1011

# setpci -s$ROOT CAP_EXP+0x10.w=0x0020
# sleep 1
# setpci -s$ROOT CAP_EXP+0x12.w
1011
# setpci -s$NIC CAP_EXP+0x12.w
1011

# setpci -s$NIC CAP_EXP+0x10.w=0x0040
# setpci -s$ROOT CAP_EXP+0x10.w=0x0040
# setpci -s$ROOT CAP_EXP+0x10.w=0x0060
# sleep 1
# setpci -s$ROOT CAP_EXP+0x12.w
1011
# setpci -s$NIC CAP_EXP+0x12.w
1011

And based on this I went back and rebuilt the kernel with PCIEASPM
enabled, and now both the WLE200 and the MT76 works with this output:

[    1.544429] mvebu-pcie soc:pcie: host bridge /soc/pcie ranges:
[    1.544455] mvebu-pcie soc:pcie:      MEM 0x00f1080000..0x00f1081fff -> 0x0000080000
[    1.544471] mvebu-pcie soc:pcie:      MEM 0x00f1040000..0x00f1041fff -> 0x0000040000
[    1.544485] mvebu-pcie soc:pcie:      MEM 0x00f1044000..0x00f1045fff -> 0x0000044000
[    1.544500] mvebu-pcie soc:pcie:      MEM 0x00f1048000..0x00f1049fff -> 0x0000048000
[    1.544513] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[    1.544527] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[    1.544540] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[    1.544552] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[    1.544565] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[    1.544577] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[    1.544590] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[    1.544599] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[    1.544768] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
[    1.544776] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.544783] pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] (bus address [0x00080000-0x00081fff])
[    1.544789] pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] (bus address [0x00040000-0x00041fff])
[    1.544795] pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] (bus address [0x00044000-0x00045fff])
[    1.544801] pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] (bus address [0x00048000-0x00049fff])
[    1.544806] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
[    1.544811] pci_bus 0000:00: root bus resource [io  0x1000-0xeffff]
[    1.544882] pci 0000:00:01.0: [11ab:6820] type 01 class 0x060400
[    1.544896] pci 0000:00:01.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.545073] pci 0000:00:02.0: [11ab:6820] type 01 class 0x060400
[    1.545085] pci 0000:00:02.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.545237] pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400
[    1.545250] pci 0000:00:03.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.546030] PCI: bus0: Fast back to back transfers disabled
[    1.546037] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.546045] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.546052] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.546132] pci 0000:01:00.0: [168c:002e] type 00 class 0x028000
[    1.546154] pci 0000:01:00.0: reg 0x10: [mem 0xe8000000-0xe800ffff 64bit]
[    1.546263] pci 0000:01:00.0: supports D1
[    1.546268] pci 0000:01:00.0: PME# supported from D0 D1 D3hot
[    1.546377] pci 0000:00:01.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.602042] PCI: bus1: Fast back to back transfers enabled
[    1.602052] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.602146] pci 0000:02:00.0: [168c:003c] type 00 class 0x028000
[    1.602169] pci 0000:02:00.0: reg 0x10: [mem 0xea000000-0xea1fffff 64bit]
[    1.602201] pci 0000:02:00.0: reg 0x30: [mem 0xea200000-0xea20ffff pref]
[    1.602280] pci 0000:02:00.0: supports D1 D2
[    1.602377] pci 0000:00:02.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.632025] PCI: bus2: Fast back to back transfers enabled
[    1.632033] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
[    1.632117] pci 0000:03:00.0: [14c3:7612] type 00 class 0x028000
[    1.632141] pci 0000:03:00.0: reg 0x10: [mem 0xec000000-0xec0fffff 64bit]
[    1.632175] pci 0000:03:00.0: reg 0x30: [mem 0xec100000-0xec10ffff pref]
[    1.632262] pci 0000:03:00.0: PME# supported from D0 D3hot D3cold
[    1.632373] pci 0000:00:03.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.662037] PCI: bus3: Fast back to back transfers disabled
[    1.662045] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
[    1.662078] pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
[    1.662086] pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
[    1.662093] pci 0000:00:03.0: BAR 8: assigned [mem 0xe0600000-0xe07fffff]
[    1.662101] pci 0000:00:01.0: BAR 6: assigned [mem 0xe0100000-0xe01007ff pref]
[    1.662109] pci 0000:00:02.0: BAR 6: assigned [mem 0xe0500000-0xe05007ff pref]
[    1.662116] pci 0000:00:03.0: BAR 6: assigned [mem 0xe0800000-0xe08007ff pref]
[    1.662124] pci 0000:01:00.0: BAR 0: assigned [mem 0xe0000000-0xe000ffff 64bit]
[    1.662135] pci 0000:00:01.0: PCI bridge to [bus 01]
[    1.662142] pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xe00fffff]
[    1.662151] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0200000-0xe03fffff 64bit]
[    1.662158] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
[    1.662164] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[    1.662170] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0400000-0xe040ffff pref]
[    1.662176] pci 0000:00:02.0: PCI bridge to [bus 02]
[    1.662182] pci 0000:00:02.0:   bridge window [mem 0xe0200000-0xe04fffff]
[    1.662190] pci 0000:03:00.0: BAR 0: assigned [mem 0xe0600000-0xe06fffff 64bit]
[    1.662202] pci 0000:03:00.0: BAR 6: assigned [mem 0xe0700000-0xe070ffff pref]
[    1.662207] pci 0000:00:03.0: PCI bridge to [bus 03]
[    1.662212] pci 0000:00:03.0:   bridge window [mem 0xe0600000-0xe07fffff]


This has me somewhat puzzled. Investigating further, it turns out that
if I *remove* the MT76 card, the WLE200 starts failing again. So with
just the WLE* cards plugged in, I went back and tried the setpci
sequence again with the WLE200 (with PCIEASPM disabled):

# ROOT=00:01.0
# NIC=01:00.0
# setpci -s$ROOT CAP_EXP+0xc.l
0003ac12
# setpci -s$ROOT CAP_EXP+0x10.w
0040
# setpci -s$ROOT CAP_EXP+0x12.w
1011
# setpci -s$NIC CAP_EXP+0xc.l
00033c11
# setpci -s$NIC CAP_EXP+0x10.w
0000
# setpci -s$NIC CAP_EXP+0x12.w
1011
# setpci -s$ROOT CAP_EXP+0x10.w=0x0020
# sleep 1
# setpci -s$ROOT CAP_EXP+0x12.w
1011
# setpci -s$NIC CAP_EXP+0x12.w
1011
# setpci -s$NIC CAP_EXP+0x10.w=0x0040
# setpci -s$ROOT CAP_EXP+0x10.w=0x0040
# setpci -s$ROOT CAP_EXP+0x10.w=0x0060
# sleep 1
# setpci -s$ROOT CAP_EXP+0x12.w
1011
# setpci -s$NIC CAP_EXP+0x12.w
1011

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 21:54                     ` Thomas Petazzoni
@ 2020-10-29 23:15                       ` Toke Høiland-Jørgensen
  2020-10-30  8:23                         ` Thomas Petazzoni
  2020-10-30 10:15                         ` Pali Rohár
  0 siblings, 2 replies; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-29 23:15 UTC (permalink / raw)
  To: Thomas Petazzoni, Bjorn Helgaas
  Cc: Pali Rohár, vtolkm, linux-pci, linux-arm-kernel,
	Rob Herring, Ilias Apalodimas, Marek Behún, Jason Cooper

Thomas Petazzoni <thomas.petazzoni@bootlin.com> writes:

> Hello,
>
> On Thu, 29 Oct 2020 14:30:22 -0500
> Bjorn Helgaas <helgaas@kernel.org> wrote:
>
>> We could quirk these NICs to avoid the retrain, but since aardvark and
>> mvebu have no obvious connection and WLE200/WLE900 and MT76 have no
>> obvious connection, I doubt there's a simple hardware defect that
>> explains all these.  
>
> aardvark and mvebu have one very strong connection: they are the only
> two drivers making use of the PCI Bridge emulation logic in
> drivers/pci/pci-bridge-emul.c:
>
> drivers/pci$ git grep pci-bridge-emul
> akefile:obj-$(CONFIG_PCI_BRIDGE_EMUL)  += pci-bridge-emul.o
> controller/pci-aardvark.c:#include "../pci-bridge-emul.h"
> controller/pci-mvebu.c:#include "../pci-bridge-emul.h"
> pci-bridge-emul.c:#include "pci-bridge-emul.h"
>
> I haven't read the whole thread, but it is important to keep in mind
> that on those two platforms, the PCI Bridge seen by Linux is *not* a
> real HW bridge. It is faked by the the pci-bridge-emul code. So if this
> code has defects/bugs in how it emulates a PCI Bridge behavior, you
> might see weird things.

Ohh, that's interesting. Why does it need to emulate it?

And could this cause things weird interactions like what I'm seeing,
where a somewhat buggy device in slot 2 affects the ability to retrain
the link also in slot 1, but only if there's no device in slot 3?

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 23:15                       ` Toke Høiland-Jørgensen
@ 2020-10-30  8:23                         ` Thomas Petazzoni
  2020-10-30 10:15                         ` Pali Rohár
  1 sibling, 0 replies; 62+ messages in thread
From: Thomas Petazzoni @ 2020-10-30  8:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Bjorn Helgaas, Pali Rohár, vtolkm, linux-pci,
	linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Marek Behún, Jason Cooper

On Fri, 30 Oct 2020 00:15:57 +0100
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> > I haven't read the whole thread, but it is important to keep in mind
> > that on those two platforms, the PCI Bridge seen by Linux is *not* a
> > real HW bridge. It is faked by the the pci-bridge-emul code. So if this
> > code has defects/bugs in how it emulates a PCI Bridge behavior, you
> > might see weird things.  
> 
> Ohh, that's interesting. Why does it need to emulate it?

Because the HW doesn't expose a standard PCI Bridge. On mvebu, the main
initial motivation was to be able to configure MBus windows dynamically
depending on PCI endpoints that are connected.

For AArdvark, the rationale is documented in commit
8a3ebd8de328301aacbe328650a59253be2ac82c:

commit 8a3ebd8de328301aacbe328650a59253be2ac82c
Author: Zachary Zhang <zhangzg@marvell.com>
Date:   Thu Oct 18 17:37:19 2018 +0200

    PCI: aardvark: Implement emulated root PCI bridge config space
    
    The PCI controller in the Marvell Armada 3720 does not implement a
    software-accessible root port PCI bridge configuration space. This
    causes a number of problems when using PCIe switches or when the Max
    Payload size needs to be aligned between the root complex and the
    endpoint.
    
    Implementing an emulated root PCI bridge, like is already done in the
    pci-mvebu driver for older Marvell platforms allows to solve those
    issues, and also to support features such as ASR, PME, VC, HP.
    
    Signed-off-by: Zachary Zhang <zhangzg@marvell.com>
    [Thomas: convert to the common emulated PCI bridge logic.]
    Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
    Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 20:58                     ` Marek Behun
@ 2020-10-30 10:08                       ` Pali Rohár
  2020-10-30 10:45                         ` Marek Behun
  0 siblings, 1 reply; 62+ messages in thread
From: Pali Rohár @ 2020-10-30 10:08 UTC (permalink / raw)
  To: Marek Behun
  Cc: Bjorn Helgaas, Toke Høiland-Jørgensen, vtolkm,
	linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Thomas Petazzoni, Jason Cooper

Hello!

On Thursday 29 October 2020 21:58:53 Marek Behun wrote:
> On Thu, 29 Oct 2020 14:30:22 -0500
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> > On Thu, Oct 29, 2020 at 12:12:21PM +0100, Toke Høiland-Jørgensen wrote:
> > > Pali Rohár <pali@kernel.org> writes:  
> > 
> > > > I have been testing mainline kernel on Turris Omnia with two PCIe
> > > > default cards (WLE200 and WLE900) and it worked fine. But I do not know
> > > > if I had ASPM enabled or not.
> > > >
> > > > So it is working fine for you when CONFIG_PCIEASPM is disabled and whole
> > > > issue is only when CONFIG_PCIEASPM is enabled?  
> > > 
> > > Yup, exactly. And I'm also currently testing with the default WLE200/900
> > > cards... I just tried sticking an MT76-based WiFi card into the third
> > > PCI slot, and that doesn't come up either when I enable PCIEASPM.  
> > 
> > Huh.  So IIUC, the following cases all try to retrain the link and it
> > fails to come up again:
> > 
> >   - aardvark + WLE900VX (see commit 43fc679ced18)

Just to note: aardvark + WLE200 worked fine whatever I did. No
workaround and no patch was needed.

> >   - mvebu + WLE200
> >   - mvebu + WLE900
> >   - mvebu + MT76
> 
> Bjorn, IIRC Pali's patches fix the WLE900VX card for Aardvark (both in
> kernel and in U-Boot).
> IMO mvebu has similar issues. Both these drivers handle the PCIe reset
> signal incorrectly (or at least Aardvark did before Pali's work).
> 
> mvebu is used on Turris Omnia, and our HW guys first solved the WLE900VX
> not working issue by using different capacitors for the SerDeses (this
> was 5 years ago). But after Pali's work on Aardvark I think this could
> also be solved for mvebu driver in software.

Apparently not :-( See below, we cannot control PERST# pin from software
on Turris Omnia.

> BTW the WLE900VX card has problems on many systems, it won't work for
> example on Thinkpad X230. There is a bug on kernel bugzilla reported
> for this.

WLE900VX is really buggy card. During its initialization/reset
W_DISABLE# (pin 20) must be in correct state, otherwise system would
never see this card. This is reason why it does not work in laptops,
sometimes could help double reboot and playing with rfkill state prior
reboot. See reported issue:

https://bugzilla.kernel.org/show_bug.cgi?id=84821#c53

> My opinion is that many drivers do not respect the PCIe specification
> for reset and link training totally correctly (Pali was talking about
> this when he was looking at Aardvark) and that WLE900VX has a bug that
> in combination with those drivers causes the fail. If you look at the
> drivers, they are incompatible in how they handle the reset signal and
> link training.

Seems that aardvark or WLE900VX card (not only this one, but basically
every ath10k tested card, also non-Compex) have problems that when
booting Linux kernel they are in some totally strange state and whatever
I did I was not able to detect them and make link training success. The
only thing which helped was to issue card reset via out of band PERST#
signal.

And here is the main issue with PERST# signal on linux kernel. Basically
every driver issue card reset via PERST# signal for different amount of
time. Something which must be driver and card independent, probably
already documented in PCIe specification. See my email:

https://lore.kernel.org/linux-pci/20200424092546.25p3hdtkehohe3xw@pali/

I was trying to find that minimal reset timeout in specifications, but I
was not able to understand all those details and timeouts defined in
different diagrams. I'm not HW guy. See what was I able to find out:

https://lore.kernel.org/linux-pci/20200507212002.GA32182@bogus/

And my conclusion is here:

https://lore.kernel.org/linux-pci/20200513115940.fiemtnxfqcyqo6ik@pali/

So to finally fix issues with card reset we need somebody who understand
hardware documents and PCIe specifications and can figure out what is
the correct minimal value of delay needed for proper card reset via
PERST# signal. And then fix all PCI controller drivers to use this
value.

In aardvark we have timeout which was enough for my tested cards on
Espressobin and Turris MOX.


And second issue is with link training. What helped me to finally fix
link training for PCIe cards on A3720 with aardvark driver in both
U-Boot and Linux kernel was comment in following commit:

https://git.kernel.org/linus/f4c7d053d7f7

    As required by PCI Express spec a delay for at least 100ms after
    such a reset [fundamental reset by asserted PERST# signal] before
    link training is needed.

In aardvark control register I forcibly disabled link training bit prior
issuing reset via PERST# signal and then I re-enabled it 100ms after
reset was completed.

I have sent aardvark patch which update comment for above requirement:
https://lore.kernel.org/linux-pci/20200924084618.12442-1-pali@kernel.org/

> I am curious what Pali will tell us, he said that he will look into the
> mvebu driver.

If same problem with WLE900 cards is also on A38x SOC (with pci-mvebu
driver) then it would be hard to fix it on Turris Omnia.

On Turris MOX (with aardvark) PERST# pin from card is connected to some
MPP pin on A3720 SOC, which we can control via GPIO. In DTS we have
configured it as "reset-gpios" and therefore aardvark driver can
assert/deassert PERST# for card when needed.

On Turris Omnia (with pci-mvebu) PERST# pin from wifi card is connected
to MCU and it asserts/deasserts this pin only after board reset. Also it
is shared line across all mPCIe slots and also with other peripherals.

So we cannot issue reset via PERST# signal on Turris Omnia. But there
are other ways how to issue fundamental reset, via in band signaling.

But IIRC issuing fundamental reset via in band PCIe bus is done via PCIe
bridge to which is card connected. So second problem, we do not have
PCIe bridge on mvebu platforms, it is just emulated via kernel. Unless
there is some "special" register for issuing fundamental reset we would
not be able to emulate this reset.

Aardvark does not have PCIe bridge too, but in its internal registers
are bits for different types of reset. And when I was trying to use them
nothing happened, nothing helped. Only external reset via PERST# signal
was able to initialize card.

I will look into A38x PCI registers if there is not something which
could help us. But without access to PERST# pin I'm sceptical if we can
do something... Only just hoping that in PCIe ASPM retraining code is a
bug which can be fixed...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-29 23:15                       ` Toke Høiland-Jørgensen
  2020-10-30  8:23                         ` Thomas Petazzoni
@ 2020-10-30 10:15                         ` Pali Rohár
  1 sibling, 0 replies; 62+ messages in thread
From: Pali Rohár @ 2020-10-30 10:15 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Thomas Petazzoni, Bjorn Helgaas, vtolkm, linux-pci,
	linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Marek Behún, Jason Cooper

On Friday 30 October 2020 00:15:57 Toke Høiland-Jørgensen wrote:
> Thomas Petazzoni <thomas.petazzoni@bootlin.com> writes:
> 
> > Hello,
> >
> > On Thu, 29 Oct 2020 14:30:22 -0500
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> >> We could quirk these NICs to avoid the retrain, but since aardvark and
> >> mvebu have no obvious connection and WLE200/WLE900 and MT76 have no
> >> obvious connection, I doubt there's a simple hardware defect that
> >> explains all these.  
> >
> > aardvark and mvebu have one very strong connection: they are the only
> > two drivers making use of the PCI Bridge emulation logic in
> > drivers/pci/pci-bridge-emul.c:
> >
> > drivers/pci$ git grep pci-bridge-emul
> > akefile:obj-$(CONFIG_PCI_BRIDGE_EMUL)  += pci-bridge-emul.o
> > controller/pci-aardvark.c:#include "../pci-bridge-emul.h"
> > controller/pci-mvebu.c:#include "../pci-bridge-emul.h"
> > pci-bridge-emul.c:#include "pci-bridge-emul.h"
> >
> > I haven't read the whole thread, but it is important to keep in mind
> > that on those two platforms, the PCI Bridge seen by Linux is *not* a
> > real HW bridge. It is faked by the the pci-bridge-emul code. So if this
> > code has defects/bugs in how it emulates a PCI Bridge behavior, you
> > might see weird things.
> 
> Ohh, that's interesting. Why does it need to emulate it?

I could speculate, they wanted to decrease cost of hw, so they did not
include bridge into hw and let user to emulate it (if is needed).

> And could this cause things weird interactions like what I'm seeing,
> where a somewhat buggy device in slot 2 affects the ability to retrain
> the link also in slot 1, but only if there's no device in slot 3?

I doubt, slots and registers are independent. Every slot/card has own
(emulated) bridge.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-30 10:08                       ` Pali Rohár
@ 2020-10-30 10:45                         ` Marek Behun
  0 siblings, 0 replies; 62+ messages in thread
From: Marek Behun @ 2020-10-30 10:45 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Bjorn Helgaas, Toke Høiland-Jørgensen, vtolkm,
	linux-pci, linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Thomas Petazzoni, Jason Cooper

On Fri, 30 Oct 2020 11:08:07 +0100
Pali Rohár <pali@kernel.org> wrote:

> On Turris Omnia (with pci-mvebu) PERST# pin from wifi card is connected
> to MCU and it asserts/deasserts this pin only after board reset. Also it
> is shared line across all mPCIe slots and also with other peripherals.
> 
> So we cannot issue reset via PERST# signal on Turris Omnia. But there
> are other ways how to issue fundamental reset, via in band signaling.

We can code this into MCU code, AFAIK it is upgradable from main CPU
via I2C :) I wanted to try this because of LEDs anyway...

But I think that all 3 PCIe slots have their PERST# signal connected to
just one GPIO on the MCU...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-28 23:16             ` Bjorn Helgaas
  2020-10-29 10:09               ` Pali Rohár
  2020-10-29 10:41               ` Toke Høiland-Jørgensen
@ 2020-10-30 11:23               ` Pali Rohár
  2020-10-30 13:02                 ` Toke Høiland-Jørgensen
  2 siblings, 1 reply; 62+ messages in thread
From: Pali Rohár @ 2020-10-30 11:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: vtolkm, Toke Høiland-Jørgensen, linux-pci,
	linux-arm-kernel, Rob Herring, Ilias Apalodimas,
	Marek Behún, Thomas Petazzoni, Jason Cooper

On Wednesday 28 October 2020 18:16:26 Bjorn Helgaas wrote:
> [+cc Pali, Marek, Thomas, Jason]
> 
> On Wed, Oct 28, 2020 at 04:40:00PM +0000, ™֟☻̭҇ Ѽ ҉ ® wrote:
> > On 28/10/2020 16:08, Toke Høiland-Jørgensen wrote:
> > > Bjorn Helgaas <helgaas@kernel.org> writes:
> > > > On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
> > > > > Toke Høiland-Jørgensen <toke@redhat.com> writes:
> > > > > > Bjorn Helgaas <helgaas@kernel.org> writes:
> > > > > > 
> > > > > > > [+cc vtolkm]
> > > > > > > 
> > > > > > > On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
> > > > > > > > Hi everyone
> > > > > > > > 
> > > > > > > > I'm trying to get a mainline kernel to run on my Turris Omnia, and am
> > > > > > > > having some trouble getting the PCI bus to work correctly. Specifically,
> > > > > > > > I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
> > > > > > > > the resource request fix[0] applied on top.
> > > > > > > > 
> > > > > > > > The kernel boots fine, and the patch in [0] makes the PCI devices show
> > > > > > > > up. But I'm still getting initialisation errors like these:
> > > > > > > > 
> > > > > > > > [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
> > > > > > > > [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> > > > > > > > [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
> > > > > > > > [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> > > > > > > > 
> > > > > > > > and the WiFi drivers fail to initialise with what appears to me to be
> > > > > > > > errors related to the bus rather than to the drivers themselves:
> > > > > > > > 
> > > > > > > > [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
> > > > > > > > [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
> > > > > > > > [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
> > > > > > > > [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
> > > > > > > > [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
> > > > > > > > [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
> > > > > > > > [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> > > > > > > > [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> > > > > > > > [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> > > > > > > > 
> > > > > > > > lspci looks OK, though:
> > > > > > > > 
> > > > > > > > # lspci
> > > > > > > > 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> > > > > > > > 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> > > > > > > > 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> > > > > > > > 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
> > > > > > > > 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
> > > > > > > > 
> > > > > > > > Does anyone have any clue what could be going on here? Is this a bug, or
> > > > > > > > did I miss something in my config or other initialisation? I've tried
> > > > > > > > with both the stock u-boot distributed with the board, and with an
> > > > > > > > upstream u-boot from latest master; doesn't seem to make any different.
> > > > > > > Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
> > > > > > > report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
> > > > > > > don't think we have a fix yet.
> > > > > > Yes! Turning that off does indeed help! Thanks a bunch :)
> > > > > > 
> > > > > > You mention that bisecting this would be helpful - I can try that
> > > > > > tomorrow; any idea when this was last working?
> > > > > OK, so I tried to bisect this, but, erm, I couldn't find a working
> > > > > revision to start from? I went all the way back to 4.10 (which is the
> > > > > first version to include the device tree file for the Omnia), and even
> > > > > on that, the wireless cards were failing to initialise with ASPM
> > > > > enabled...
> > > > I have no personal experience with this device; all I know is that the
> > > > bugzilla suggests that it worked in v5.4, which isn't much help.
> > > > 
> > > > Possibly the apparent regression was really a .config change, i.e.,
> > > > CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
> > > > "worked" but got enabled later and it started failing?
> > > Yeah, I suspect so. The OpenWrt config disables CONFIG_PCIEASPM by
> > > default and only turns it on for specific targets. So I guess that it's
> > > most likely that this has never worked...
> > > 
> > > > Maybe the debug patch below would be worth trying to see if it makes
> > > > any difference?  If it *does* help, try omitting the first hunk to see
> > > > if we just need to apply the quirk_enable_clear_retrain_link() quirk.
> > > Tried, doesn't help...
> > > 
> > > -Toke
> > 
> > Found this patch
> > 
> > https://github.com/openwrt/openwrt/blob/7c0496f29bed87326f1bf591ca25ace82373cfc7/target/linux/mvebu/patches-5.4/405-PCI-aardvark-Improve-link-training.patch
> > 
> > that mentions the Compex WLE900VX card, which reading the lspci verbose
> > output from the bugtracker seems to the device being troubled.
> 
> Interesting.  Indeed, the Compex WLE900VX card seems to have the
> Qualcomm Atheros QCA9880 on it, and it looks like Toke's system has
> the same device in it.
> 
> The patch you mention (https://git.kernel.org/linus/43fc679ced18) is
> for aardvark, so of course doesn't help mvebu.
> 
> PCIe hardware is supposed to automatically negotiate the highest link
> speed supported by both ends.  But software *is* allowed to set an
> upper limit (the Target Link Speed in Link Control 2).  If we initiate
> a retrain and the link doesn't come back up, I wonder if we should try
> to help the hardware out by using Target Link Speed to limit to a
> lower speed and attempting another retrain, something like this hacky
> patch: (please collect the dmesg log if you try this)

My experience with that WLE900VX card, aardvark driver and aspm code:

Link training in GEN2 mode for this card succeed only once after reset.
Repeated link retraining fails and it fails even when aardvark is
reconfigured to GEN1 mode. Reset via PERST# signal is required to have
working link training.

What I did in aardvark driver: Set mode to GEN2, do link training. If
success read "negotiated link speed" from "Link Control Status Register"
(for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
retrain link again (for WLE900VX now it would be at GEN1). After that
card is stable and all future retraining (e.g. from aspm.c) also passes.

If I do not change aardvark mode from GEN2 to GEN1 the second link
training fails. And if I change mode to GEN1 after this failed link
training then nothing happen, link training do not success.

So just speculation now... In current setup initialization of card does
one link training at GEN2. Then aspm.c is called which is doing second
link retraining at GEN2. And if it fails then below patch issue third
link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
then second link retraining must be at GEN1 (not GEN2) to workaround
this issue.

Bjorn, Toke: what about trying to hack aspm.c code to never do link
retraining at GEN2 speed? And always force GEN1 speed prior link
training?

> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index ac0557a305af..fb6e13532a2c 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -192,12 +192,42 @@ static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
>  	link->clkpm_disable = blacklist ? 1 : 0;
>  }
>  
> +#define PCI_EXP_LNKCAP2_SLS	0x000000fe
> +
> +static int decrease_tls(struct pci_dev *pdev)
> +{
> +	u32 lnkcap2;
> +	u16 lnkctl2, tls;
> +
> +	pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP2, &lnkcap2);
> +
> +	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, &lnkctl2);
> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
> +
> +	pci_info(pdev, "lnkcap2 %#010x sls %#04x lnkctl2 %#06x tls %#03x\n",
> +		 lnkcap2, (lnkcap2 & PCI_EXP_LNKCAP2_SLS) >> 1,
> +		 lnkctl2, tls);
> +
> +	if (tls < 2)
> +		return -EINVAL;
> +
> +	tls--;
> +	pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL2,
> +					   PCI_EXP_LNKCTL2_TLS, tls);
> +	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, &lnkctl2);
> +	pci_info(pdev, "lnkctl2 %#010x new tls %#03x\n",
> +		 lnkctl2, tls);
> +
> +	return 0;
> +}
> +
>  static bool pcie_retrain_link(struct pcie_link_state *link)
>  {
>  	struct pci_dev *parent = link->pdev;
>  	unsigned long end_jiffies;
>  	u16 reg16;
>  
> +top:
>  	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
>  	reg16 |= PCI_EXP_LNKCTL_RL;
>  	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> @@ -216,10 +246,14 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>  	do {
>  		pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &reg16);
>  		if (!(reg16 & PCI_EXP_LNKSTA_LT))
> -			break;
> +			return true;	/* success */
>  		msleep(1);
>  	} while (time_before(jiffies, end_jiffies));
> -	return !(reg16 & PCI_EXP_LNKSTA_LT);
> +
> +	if (decrease_tls(parent))
> +		return false;	/* can't decrease any more */
> +
> +	goto top;
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-30 11:23               ` Pali Rohár
@ 2020-10-30 13:02                 ` Toke Høiland-Jørgensen
  2020-10-30 14:23                   ` Pali Rohár
  0 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-30 13:02 UTC (permalink / raw)
  To: Pali Rohár, Bjorn Helgaas
  Cc: vtolkm, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper

Pali Rohár <pali@kernel.org> writes:

> On Wednesday 28 October 2020 18:16:26 Bjorn Helgaas wrote:
>> [+cc Pali, Marek, Thomas, Jason]
>> 
>> On Wed, Oct 28, 2020 at 04:40:00PM +0000, ™֟☻̭҇ Ѽ ҉ ® wrote:
>> > On 28/10/2020 16:08, Toke Høiland-Jørgensen wrote:
>> > > Bjorn Helgaas <helgaas@kernel.org> writes:
>> > > > On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
>> > > > > Toke Høiland-Jørgensen <toke@redhat.com> writes:
>> > > > > > Bjorn Helgaas <helgaas@kernel.org> writes:
>> > > > > > 
>> > > > > > > [+cc vtolkm]
>> > > > > > > 
>> > > > > > > On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
>> > > > > > > > Hi everyone
>> > > > > > > > 
>> > > > > > > > I'm trying to get a mainline kernel to run on my Turris Omnia, and am
>> > > > > > > > having some trouble getting the PCI bus to work correctly. Specifically,
>> > > > > > > > I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
>> > > > > > > > the resource request fix[0] applied on top.
>> > > > > > > > 
>> > > > > > > > The kernel boots fine, and the patch in [0] makes the PCI devices show
>> > > > > > > > up. But I'm still getting initialisation errors like these:
>> > > > > > > > 
>> > > > > > > > [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>> > > > > > > > [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> > > > > > > > [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
>> > > > > > > > [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> > > > > > > > 
>> > > > > > > > and the WiFi drivers fail to initialise with what appears to me to be
>> > > > > > > > errors related to the bus rather than to the drivers themselves:
>> > > > > > > > 
>> > > > > > > > [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>> > > > > > > > [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
>> > > > > > > > [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
>> > > > > > > > [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
>> > > > > > > > [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
>> > > > > > > > [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
>> > > > > > > > [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>> > > > > > > > [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
>> > > > > > > > [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
>> > > > > > > > 
>> > > > > > > > lspci looks OK, though:
>> > > > > > > > 
>> > > > > > > > # lspci
>> > > > > > > > 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> > > > > > > > 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> > > > > > > > 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
>> > > > > > > > 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
>> > > > > > > > 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
>> > > > > > > > 
>> > > > > > > > Does anyone have any clue what could be going on here? Is this a bug, or
>> > > > > > > > did I miss something in my config or other initialisation? I've tried
>> > > > > > > > with both the stock u-boot distributed with the board, and with an
>> > > > > > > > upstream u-boot from latest master; doesn't seem to make any different.
>> > > > > > > Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
>> > > > > > > report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
>> > > > > > > don't think we have a fix yet.
>> > > > > > Yes! Turning that off does indeed help! Thanks a bunch :)
>> > > > > > 
>> > > > > > You mention that bisecting this would be helpful - I can try that
>> > > > > > tomorrow; any idea when this was last working?
>> > > > > OK, so I tried to bisect this, but, erm, I couldn't find a working
>> > > > > revision to start from? I went all the way back to 4.10 (which is the
>> > > > > first version to include the device tree file for the Omnia), and even
>> > > > > on that, the wireless cards were failing to initialise with ASPM
>> > > > > enabled...
>> > > > I have no personal experience with this device; all I know is that the
>> > > > bugzilla suggests that it worked in v5.4, which isn't much help.
>> > > > 
>> > > > Possibly the apparent regression was really a .config change, i.e.,
>> > > > CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
>> > > > "worked" but got enabled later and it started failing?
>> > > Yeah, I suspect so. The OpenWrt config disables CONFIG_PCIEASPM by
>> > > default and only turns it on for specific targets. So I guess that it's
>> > > most likely that this has never worked...
>> > > 
>> > > > Maybe the debug patch below would be worth trying to see if it makes
>> > > > any difference?  If it *does* help, try omitting the first hunk to see
>> > > > if we just need to apply the quirk_enable_clear_retrain_link() quirk.
>> > > Tried, doesn't help...
>> > > 
>> > > -Toke
>> > 
>> > Found this patch
>> > 
>> > https://github.com/openwrt/openwrt/blob/7c0496f29bed87326f1bf591ca25ace82373cfc7/target/linux/mvebu/patches-5.4/405-PCI-aardvark-Improve-link-training.patch
>> > 
>> > that mentions the Compex WLE900VX card, which reading the lspci verbose
>> > output from the bugtracker seems to the device being troubled.
>> 
>> Interesting.  Indeed, the Compex WLE900VX card seems to have the
>> Qualcomm Atheros QCA9880 on it, and it looks like Toke's system has
>> the same device in it.
>> 
>> The patch you mention (https://git.kernel.org/linus/43fc679ced18) is
>> for aardvark, so of course doesn't help mvebu.
>> 
>> PCIe hardware is supposed to automatically negotiate the highest link
>> speed supported by both ends.  But software *is* allowed to set an
>> upper limit (the Target Link Speed in Link Control 2).  If we initiate
>> a retrain and the link doesn't come back up, I wonder if we should try
>> to help the hardware out by using Target Link Speed to limit to a
>> lower speed and attempting another retrain, something like this hacky
>> patch: (please collect the dmesg log if you try this)
>
> My experience with that WLE900VX card, aardvark driver and aspm code:
>
> Link training in GEN2 mode for this card succeed only once after reset.
> Repeated link retraining fails and it fails even when aardvark is
> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
> working link training.
>
> What I did in aardvark driver: Set mode to GEN2, do link training. If
> success read "negotiated link speed" from "Link Control Status Register"
> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
> retrain link again (for WLE900VX now it would be at GEN1). After that
> card is stable and all future retraining (e.g. from aspm.c) also passes.
>
> If I do not change aardvark mode from GEN2 to GEN1 the second link
> training fails. And if I change mode to GEN1 after this failed link
> training then nothing happen, link training do not success.
>
> So just speculation now... In current setup initialization of card does
> one link training at GEN2. Then aspm.c is called which is doing second
> link retraining at GEN2. And if it fails then below patch issue third
> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
> then second link retraining must be at GEN1 (not GEN2) to workaround
> this issue.
>
> Bjorn, Toke: what about trying to hack aspm.c code to never do link
> retraining at GEN2 speed? And always force GEN1 speed prior link
> training?

Sounds like a plan. I poked around in aspm.c and must confess to being a
bit lost in the soup of registers ;)

So if one of you can cook up a patch, that would be most helpful!

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-30 13:02                 ` Toke Høiland-Jørgensen
@ 2020-10-30 14:23                   ` Pali Rohár
  2020-10-30 14:54                     ` ™֟☻̭҇ Ѽ ҉ ®
  0 siblings, 1 reply; 62+ messages in thread
From: Pali Rohár @ 2020-10-30 14:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Bjorn Helgaas, vtolkm, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper

On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
> Pali Rohár <pali@kernel.org> writes:
> > My experience with that WLE900VX card, aardvark driver and aspm code:
> >
> > Link training in GEN2 mode for this card succeed only once after reset.
> > Repeated link retraining fails and it fails even when aardvark is
> > reconfigured to GEN1 mode. Reset via PERST# signal is required to have
> > working link training.
> >
> > What I did in aardvark driver: Set mode to GEN2, do link training. If
> > success read "negotiated link speed" from "Link Control Status Register"
> > (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
> > retrain link again (for WLE900VX now it would be at GEN1). After that
> > card is stable and all future retraining (e.g. from aspm.c) also passes.
> >
> > If I do not change aardvark mode from GEN2 to GEN1 the second link
> > training fails. And if I change mode to GEN1 after this failed link
> > training then nothing happen, link training do not success.
> >
> > So just speculation now... In current setup initialization of card does
> > one link training at GEN2. Then aspm.c is called which is doing second
> > link retraining at GEN2. And if it fails then below patch issue third
> > link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
> > then second link retraining must be at GEN1 (not GEN2) to workaround
> > this issue.
> >
> > Bjorn, Toke: what about trying to hack aspm.c code to never do link
> > retraining at GEN2 speed? And always force GEN1 speed prior link
> > training?
> 
> Sounds like a plan. I poked around in aspm.c and must confess to being a
> bit lost in the soup of registers ;)
> 
> So if one of you can cook up a patch, that would be most helpful!

I modified Bjorn's patch, explicitly set tls to 1 and added debug info
about cls (current link speed, that what is used by aardvark). It is
untested, I just tried to compile it.

Can try it?

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 253c30cc1967..f934c0b52f41 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
 	unsigned long end_jiffies;
 	u16 reg16;
 
+	u32 lnkcap2;
+	u16 lnksta, lnkctl2, cls, tls;
+
+	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
+	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
+	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
+	cls = lnksta & PCI_EXP_LNKSTA_CLS;
+	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
+
+	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
+		lnkcap2, (lnkcap2 & 0x3F) >> 1,
+		lnksta, cls,
+		lnkctl2, tls);
+
+	tls = 1;
+	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
+					PCI_EXP_LNKCTL2_TLS, tls);
+	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
+	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
+		lnkctl2, tls);
+
 	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
 	reg16 |= PCI_EXP_LNKCTL_RL;
 	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
@@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
 			break;
 		msleep(1);
 	} while (time_before(jiffies, end_jiffies));
+	pci_info(parent, "lnksta %#06x new cls %#03x\n",
+		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
 	return !(reg16 & PCI_EXP_LNKSTA_LT);
 }
 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-30 14:23                   ` Pali Rohár
@ 2020-10-30 14:54                     ` ™֟☻̭҇ Ѽ ҉ ®
  2020-10-31 12:49                       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: ™֟☻̭҇ Ѽ ҉ ® @ 2020-10-30 14:54 UTC (permalink / raw)
  To: Pali Rohár, Toke Høiland-Jørgensen
  Cc: Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 11659 bytes --]

On 30/10/2020 15:23, Pali Rohár wrote:
> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
>> Pali Rohár <pali@kernel.org> writes:
>>> My experience with that WLE900VX card, aardvark driver and aspm code:
>>>
>>> Link training in GEN2 mode for this card succeed only once after reset.
>>> Repeated link retraining fails and it fails even when aardvark is
>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
>>> working link training.
>>>
>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
>>> success read "negotiated link speed" from "Link Control Status Register"
>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
>>> retrain link again (for WLE900VX now it would be at GEN1). After that
>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
>>>
>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
>>> training fails. And if I change mode to GEN1 after this failed link
>>> training then nothing happen, link training do not success.
>>>
>>> So just speculation now... In current setup initialization of card does
>>> one link training at GEN2. Then aspm.c is called which is doing second
>>> link retraining at GEN2. And if it fails then below patch issue third
>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
>>> then second link retraining must be at GEN1 (not GEN2) to workaround
>>> this issue.
>>>
>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
>>> retraining at GEN2 speed? And always force GEN1 speed prior link
>>> training?
>> Sounds like a plan. I poked around in aspm.c and must confess to being a
>> bit lost in the soup of registers ;)
>>
>> So if one of you can cook up a patch, that would be most helpful!
> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
> about cls (current link speed, that what is used by aardvark). It is
> untested, I just tried to compile it.
>
> Can try it?
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 253c30cc1967..f934c0b52f41 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>   	unsigned long end_jiffies;
>   	u16 reg16;
>   
> +	u32 lnkcap2;
> +	u16 lnksta, lnkctl2, cls, tls;
> +
> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
> +
> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
> +		lnksta, cls,
> +		lnkctl2, tls);
> +
> +	tls = 1;
> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
> +					PCI_EXP_LNKCTL2_TLS, tls);
> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
> +		lnkctl2, tls);
> +
>   	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
>   	reg16 |= PCI_EXP_LNKCTL_RL;
>   	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>   			break;
>   		msleep(1);
>   	} while (time_before(jiffies, end_jiffies));
> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
>   	return !(reg16 & PCI_EXP_LNKSTA_LT);
>   }
>   

Still exhibiting the BAR update error, run tested with next--20201030


0.396182] mvebu-pcie soc:pcie: host bridge /soc/pcie ranges:
0.396205] mvebu-pcie soc:pcie: Parsing ranges property...
0.396222] mvebu-pcie soc:pcie:      MEM 0x00f1080000..0x00f1081fff -> 
0x0000080000
0.396251] mvebu-pcie soc:pcie:      MEM 0x00f1040000..0x00f1041fff -> 
0x0000040000
0.396278] mvebu-pcie soc:pcie:      MEM 0x00f1044000..0x00f1045fff -> 
0x0000044000
0.396303] mvebu-pcie soc:pcie:      MEM 0x00f1048000..0x00f1049fff -> 
0x0000048000
0.396329] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe 
-> 0x0100000000
0.396340] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe 
-> 0x0100000000
0.396351] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe 
-> 0x0200000000
0.396361] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe 
-> 0x0200000000
0.396372] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe 
-> 0x0300000000
0.396382] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe 
-> 0x0300000000
0.396393] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe 
-> 0x0400000000
0.396400] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe 
-> 0x0400000000
0.397280] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
0.397299] pci_bus 0000:00: root bus resource [bus 00-ff]
0.397314] pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] 
(bus address [0x00080000-0x00081fff])
0.397327] pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] 
(bus address [0x00040000-0x00041fff])
0.397348] pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] 
(bus address [0x00044000-0x00045fff])
0.397360] pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] 
(bus address [0x00048000-0x00049fff])
0.397371] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
0.397383] pci_bus 0000:00: root bus resource [io  0x1000-0xeffff]
0.397388] pci_bus 0000:00: scanning bus
0.397495] pci 0000:00:01.0: [11ab:6820] type 01 class 0x060400
0.397509] pci 0000:00:01.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
0.398052] pci 0000:00:02.0: [11ab:6820] type 01 class 0x060400
0.398064] pci 0000:00:02.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
0.398585] pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400
0.398597] pci 0000:00:03.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
0.399755] pci_bus 0000:00: fixups for bus
0.399773] pci 0000:00:01.0: scanning [bus 00-00] behind bridge, pass 0
0.399777] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), 
reconfiguring
0.399784] pci 0000:00:02.0: scanning [bus 00-00] behind bridge, pass 0
0.399787] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), 
reconfiguring
0.399794] pci 0000:00:03.0: scanning [bus 00-00] behind bridge, pass 0
0.399797] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), 
reconfiguring
0.399803] pci 0000:00:01.0: scanning [bus 00-00] behind bridge, pass 1
0.400032] pci_bus 0000:01: scanning bus
0.400784] pci_bus 0000:01: fixups for bus
0.400794] pci_bus 0000:01: bus scan returning with max=01
0.400800] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
0.400808] pci 0000:00:02.0: scanning [bus 00-00] behind bridge, pass 1
0.401032] pci_bus 0000:02: scanning bus
0.401078] pci 0000:02:00.0: [168c:003c] type 00 class 0x028000
0.401098] pci 0000:02:00.0: reg 0x10: [mem 0x00000000-0x001fffff 64bit]
0.401125] pci 0000:02:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref]
0.401217] pci 0000:02:00.0: supports D1 D2
0.401614] pci 0000:00:02.0: ASPM: current common clock configuration is 
inconsistent, reconfiguring
0.401626] pci 0000:00:02.0: lnkcap2 0x00000000 sls 0x00 lnksta 0x1011 
cls 0x1 lnkctl2 0x0000 tls 0x0
0.401632] pci 0000:00:02.0: lnkctl2 0x00000000 new tls 0x1
0.428701] pci 0000:00:02.0: lnksta 0x1011 new cls 0x1
0.429486] pci_bus 0000:02: fixups for bus
0.429498] pci_bus 0000:02: bus scan returning with max=02
0.429504] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
0.429514] pci 0000:00:03.0: scanning [bus 00-00] behind bridge, pass 1
0.429778] pci_bus 0000:03: scanning bus
0.429831] pci 0000:03:00.0: [168c:002e] type 00 class 0x028000
0.429854] pci 0000:03:00.0: reg 0x10: [mem 0x00000000-0x0000ffff 64bit]
0.429978] pci 0000:03:00.0: supports D1
0.429985] pci 0000:03:00.0: PME# supported from D0 D1 D3hot
0.429992] pci 0000:03:00.0: PME# disabled
0.430403] pci 0000:00:03.0: ASPM: current common clock configuration is 
inconsistent, reconfiguring
0.430416] pci 0000:00:03.0: lnkcap2 0x00000000 sls 0x00 lnksta 0x1011 
cls 0x1 lnkctl2 0x0000 tls 0x0
0.430421] pci 0000:00:03.0: lnkctl2 0x00000000 new tls 0x1
0.460692] pci 0000:00:03.0: lnksta 0x1011 new cls 0x1
0.461459] pci_bus 0000:03: fixups for bus
0.461470] pci_bus 0000:03: bus scan returning with max=03
0.461476] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
0.461482] pci_bus 0000:00: bus scan returning with max=03
0.461552] pci 0000:00:02.0: BAR 8: assigned [mem 0xe0000000-0xe02fffff]
0.461561] pci 0000:00:03.0: BAR 8: assigned [mem 0xe0300000-0xe03fffff]
0.461568] pci 0000:00:01.0: BAR 6: assigned [mem 0xe0400000-0xe04007ff pref]
0.461576] pci 0000:00:02.0: BAR 6: assigned [mem 0xe0500000-0xe05007ff pref]
0.461583] pci 0000:00:03.0: BAR 6: assigned [mem 0xe0600000-0xe06007ff pref]
0.461593] pci 0000:00:01.0: PCI bridge to [bus 01]
0.461620] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0000000-0xe01fffff 
64bit]
0.461627] pci 0000:02:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
0.461633] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 
0xffffffff)
0.461639] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0200000-0xe020ffff pref]
0.461645] pci 0000:00:02.0: PCI bridge to [bus 02]
0.461651] pci 0000:00:02.0:   bridge window [mem 0xe0000000-0xe02fffff]
0.461666] pci 0000:03:00.0: BAR 0: assigned [mem 0xe0300000-0xe030ffff 
64bit]
0.461673] pci 0000:03:00.0: BAR 0: error updating (0xe0300004 != 0xffffffff)
0.461678] pci 0000:03:00.0: BAR 0: error updating (high 0x000000 != 
0xffffffff)
0.461683] pci 0000:00:03.0: PCI bridge to [bus 03]
0.461689] pci 0000:00:03.0:   bridge window [mem 0xe0300000-0xe03fffff]
0.461701] pci 0000:00:01.0: Max Payload Size set to  128/ 128 (was 128), 
Max Read Rq  128
0.461710] pci 0000:00:02.0: Max Payload Size set to  128/ 128 (was 128), 
Max Read Rq  128
0.461715] pci 0000:02:00.0: Failed attempting to set the MPS
0.461721] pci 0000:02:00.0: Max Payload Size set to  128/ 256 (was 128), 
Max Read Rq  128
0.461729] pci 0000:00:03.0: Max Payload Size set to  128/ 128 (was 128), 
Max Read Rq  128
0.461734] pci 0000:03:00.0: Failed attempting to set the MPS
0.461740] pci 0000:03:00.0: Max Payload Size set to  128/ 128 (was 128), 
Max Read Rq  128
0.461855] pcieport 0000:00:01.0: assign IRQ: got 0
0.461866] pcieport 0000:00:01.0: enabling bus mastering
0.461959] pcieport 0000:00:02.0: assign IRQ: got 0
0.461966] pcieport 0000:00:02.0: enabling device (0140 -> 0142)
0.461980] pcieport 0000:00:02.0: enabling bus mastering
0.462065] pcieport 0000:00:03.0: assign IRQ: got 0
0.462070] pcieport 0000:00:03.0: enabling device (0140 -> 0142)
0.462080] pcieport 0000:00:03.0: enabling bus mastering
2.467153] pci 0000:00:03.0: enabling bus mastering
2.519024] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
2.531459] ath10k_pci 0000:02:00.0: assign IRQ: got 0
2.536915] pci 0000:00:02.0: enabling bus mastering
2.540553] ath10k_pci 0000:02:00.0: can't change power state from D3hot 
to D0 (config space inaccessible)
2.580450] ath10k_pci 0000:02:00.0: failed to wake up device : -110
2.586973] ath10k_pci 0000:02:00.0: disabling bus mastering
2.587220] ath10k_pci: probe of 0000:02:00.0 failed with error -110
2.605598] ehci-pci: EHCI PCI platform driver



[-- Attachment #1.1.2: OpenPGP_0x729CFF47A416598B.asc --]
[-- Type: application/pgp-keys, Size: 3163 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-30 14:54                     ` ™֟☻̭҇ Ѽ ҉ ®
@ 2020-10-31 12:49                       ` Toke Høiland-Jørgensen
  2020-11-02 15:24                         ` Pali Rohár
  0 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-31 12:49 UTC (permalink / raw)
  To: vtolkm, Pali Rohár
  Cc: Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper

"™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:

> On 30/10/2020 15:23, Pali Rohár wrote:
>> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
>>> Pali Rohár <pali@kernel.org> writes:
>>>> My experience with that WLE900VX card, aardvark driver and aspm code:
>>>>
>>>> Link training in GEN2 mode for this card succeed only once after reset.
>>>> Repeated link retraining fails and it fails even when aardvark is
>>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
>>>> working link training.
>>>>
>>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
>>>> success read "negotiated link speed" from "Link Control Status Register"
>>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
>>>> retrain link again (for WLE900VX now it would be at GEN1). After that
>>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
>>>>
>>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
>>>> training fails. And if I change mode to GEN1 after this failed link
>>>> training then nothing happen, link training do not success.
>>>>
>>>> So just speculation now... In current setup initialization of card does
>>>> one link training at GEN2. Then aspm.c is called which is doing second
>>>> link retraining at GEN2. And if it fails then below patch issue third
>>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
>>>> then second link retraining must be at GEN1 (not GEN2) to workaround
>>>> this issue.
>>>>
>>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
>>>> retraining at GEN2 speed? And always force GEN1 speed prior link
>>>> training?
>>> Sounds like a plan. I poked around in aspm.c and must confess to being a
>>> bit lost in the soup of registers ;)
>>>
>>> So if one of you can cook up a patch, that would be most helpful!
>> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
>> about cls (current link speed, that what is used by aardvark). It is
>> untested, I just tried to compile it.
>>
>> Can try it?
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index 253c30cc1967..f934c0b52f41 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>>   	unsigned long end_jiffies;
>>   	u16 reg16;
>>   
>> +	u32 lnkcap2;
>> +	u16 lnksta, lnkctl2, cls, tls;
>> +
>> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
>> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
>> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
>> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
>> +
>> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
>> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
>> +		lnksta, cls,
>> +		lnkctl2, tls);
>> +
>> +	tls = 1;
>> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
>> +					PCI_EXP_LNKCTL2_TLS, tls);
>> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
>> +		lnkctl2, tls);
>> +
>>   	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
>>   	reg16 |= PCI_EXP_LNKCTL_RL;
>>   	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>>   			break;
>>   		msleep(1);
>>   	} while (time_before(jiffies, end_jiffies));
>> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
>> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
>>   	return !(reg16 & PCI_EXP_LNKSTA_LT);
>>   }
>>   
>
> Still exhibiting the BAR update error, run tested with next--20201030

Yup, same for me :(

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-10-31 12:49                       ` Toke Høiland-Jørgensen
@ 2020-11-02 15:24                         ` Pali Rohár
  2020-11-02 15:54                           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: Pali Rohár @ 2020-11-02 15:24 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper

On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:
> 
> > On 30/10/2020 15:23, Pali Rohár wrote:
> >> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
> >>> Pali Rohár <pali@kernel.org> writes:
> >>>> My experience with that WLE900VX card, aardvark driver and aspm code:
> >>>>
> >>>> Link training in GEN2 mode for this card succeed only once after reset.
> >>>> Repeated link retraining fails and it fails even when aardvark is
> >>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
> >>>> working link training.
> >>>>
> >>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
> >>>> success read "negotiated link speed" from "Link Control Status Register"
> >>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
> >>>> retrain link again (for WLE900VX now it would be at GEN1). After that
> >>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
> >>>>
> >>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
> >>>> training fails. And if I change mode to GEN1 after this failed link
> >>>> training then nothing happen, link training do not success.
> >>>>
> >>>> So just speculation now... In current setup initialization of card does
> >>>> one link training at GEN2. Then aspm.c is called which is doing second
> >>>> link retraining at GEN2. And if it fails then below patch issue third
> >>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
> >>>> then second link retraining must be at GEN1 (not GEN2) to workaround
> >>>> this issue.
> >>>>
> >>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
> >>>> retraining at GEN2 speed? And always force GEN1 speed prior link
> >>>> training?
> >>> Sounds like a plan. I poked around in aspm.c and must confess to being a
> >>> bit lost in the soup of registers ;)
> >>>
> >>> So if one of you can cook up a patch, that would be most helpful!
> >> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
> >> about cls (current link speed, that what is used by aardvark). It is
> >> untested, I just tried to compile it.
> >>
> >> Can try it?
> >>
> >> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >> index 253c30cc1967..f934c0b52f41 100644
> >> --- a/drivers/pci/pcie/aspm.c
> >> +++ b/drivers/pci/pcie/aspm.c
> >> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
> >>   	unsigned long end_jiffies;
> >>   	u16 reg16;
> >>   
> >> +	u32 lnkcap2;
> >> +	u16 lnksta, lnkctl2, cls, tls;
> >> +
> >> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> >> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
> >> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
> >> +
> >> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
> >> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
> >> +		lnksta, cls,
> >> +		lnkctl2, tls);
> >> +
> >> +	tls = 1;
> >> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
> >> +					PCI_EXP_LNKCTL2_TLS, tls);
> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> >> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
> >> +		lnkctl2, tls);
> >> +
> >>   	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
> >>   	reg16 |= PCI_EXP_LNKCTL_RL;
> >>   	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> >> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
> >>   			break;
> >>   		msleep(1);
> >>   	} while (time_before(jiffies, end_jiffies));
> >> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
> >> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
> >>   	return !(reg16 & PCI_EXP_LNKSTA_LT);
> >>   }
> >>   
> >
> > Still exhibiting the BAR update error, run tested with next--20201030
> 
> Yup, same for me :(

So then it is different issue and not similar to aardvark one.

Anyway, was ASPM working on some previous kernel version? Or was it
always broken on Turris Omnia?

And has somebody other Armada 385 device with mPCIe slots to test if
ASPM is working? Or any other 32bit Marvell Armada SOC?

I would like to know if this is issue only on Turris Omnia or also on
other Armada 385 SOC device or even on any other device which uses
pci-mvebu.c driver.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-11-02 15:24                         ` Pali Rohár
@ 2020-11-02 15:54                           ` Toke Høiland-Jørgensen
  2020-11-02 16:18                             ` ™֟☻̭҇ Ѽ ҉ ®
  2021-03-15 19:58                             ` Pali Rohár
  0 siblings, 2 replies; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-11-02 15:54 UTC (permalink / raw)
  To: Pali Rohár
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper

Pali Rohár <pali@kernel.org> writes:

> On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
>> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:
>> 
>> > On 30/10/2020 15:23, Pali Rohár wrote:
>> >> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
>> >>> Pali Rohár <pali@kernel.org> writes:
>> >>>> My experience with that WLE900VX card, aardvark driver and aspm code:
>> >>>>
>> >>>> Link training in GEN2 mode for this card succeed only once after reset.
>> >>>> Repeated link retraining fails and it fails even when aardvark is
>> >>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
>> >>>> working link training.
>> >>>>
>> >>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
>> >>>> success read "negotiated link speed" from "Link Control Status Register"
>> >>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
>> >>>> retrain link again (for WLE900VX now it would be at GEN1). After that
>> >>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
>> >>>>
>> >>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
>> >>>> training fails. And if I change mode to GEN1 after this failed link
>> >>>> training then nothing happen, link training do not success.
>> >>>>
>> >>>> So just speculation now... In current setup initialization of card does
>> >>>> one link training at GEN2. Then aspm.c is called which is doing second
>> >>>> link retraining at GEN2. And if it fails then below patch issue third
>> >>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
>> >>>> then second link retraining must be at GEN1 (not GEN2) to workaround
>> >>>> this issue.
>> >>>>
>> >>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
>> >>>> retraining at GEN2 speed? And always force GEN1 speed prior link
>> >>>> training?
>> >>> Sounds like a plan. I poked around in aspm.c and must confess to being a
>> >>> bit lost in the soup of registers ;)
>> >>>
>> >>> So if one of you can cook up a patch, that would be most helpful!
>> >> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
>> >> about cls (current link speed, that what is used by aardvark). It is
>> >> untested, I just tried to compile it.
>> >>
>> >> Can try it?
>> >>
>> >> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> >> index 253c30cc1967..f934c0b52f41 100644
>> >> --- a/drivers/pci/pcie/aspm.c
>> >> +++ b/drivers/pci/pcie/aspm.c
>> >> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>> >>   	unsigned long end_jiffies;
>> >>   	u16 reg16;
>> >>   
>> >> +	u32 lnkcap2;
>> >> +	u16 lnksta, lnkctl2, cls, tls;
>> >> +
>> >> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
>> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
>> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>> >> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
>> >> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
>> >> +
>> >> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
>> >> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
>> >> +		lnksta, cls,
>> >> +		lnkctl2, tls);
>> >> +
>> >> +	tls = 1;
>> >> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
>> >> +					PCI_EXP_LNKCTL2_TLS, tls);
>> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>> >> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
>> >> +		lnkctl2, tls);
>> >> +
>> >>   	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
>> >>   	reg16 |= PCI_EXP_LNKCTL_RL;
>> >>   	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>> >> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>> >>   			break;
>> >>   		msleep(1);
>> >>   	} while (time_before(jiffies, end_jiffies));
>> >> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
>> >> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
>> >>   	return !(reg16 & PCI_EXP_LNKSTA_LT);
>> >>   }
>> >>   
>> >
>> > Still exhibiting the BAR update error, run tested with next--20201030
>> 
>> Yup, same for me :(
>
> So then it is different issue and not similar to aardvark one.
>
> Anyway, was ASPM working on some previous kernel version? Or was it
> always broken on Turris Omnia?

I tried bisecting and couldn't find a commit that worked. And OpenWrt by
default builds with ASPM off, so my best guess is that it was always
broken.

However, the two other PCI slots *do* work with ASPM on, as long as
they're both occupied when booting. If I only have one card installed
apart from the dodge WLE900, both of them fail...

> And has somebody other Armada 385 device with mPCIe slots to test if
> ASPM is working? Or any other 32bit Marvell Armada SOC?
>
> I would like to know if this is issue only on Turris Omnia or also on
> other Armada 385 SOC device or even on any other device which uses
> pci-mvebu.c driver.

See above: It does partly work on my Omnia. Is it possible to define a
quirk to just disable it on a per-slot basis for the WLE900 card? Maybe
just doing that and calling it a day would be enough...

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-11-02 15:54                           ` Toke Høiland-Jørgensen
@ 2020-11-02 16:18                             ` ™֟☻̭҇ Ѽ ҉ ®
  2020-11-02 16:33                               ` Toke Høiland-Jørgensen
  2021-03-15 19:58                             ` Pali Rohár
  1 sibling, 1 reply; 62+ messages in thread
From: ™֟☻̭҇ Ѽ ҉ ® @ 2020-11-02 16:18 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Pali Rohár
  Cc: Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 5129 bytes --]


On 02/11/2020 16:54, Toke Høiland-Jørgensen wrote:
> Pali Rohár <pali@kernel.org> writes:
>
>> On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
>>> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:
>>>
>>>> On 30/10/2020 15:23, Pali Rohár wrote:
>>>>> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
>>>>>> Pali Rohár <pali@kernel.org> writes:
>>>>>>> My experience with that WLE900VX card, aardvark driver and aspm code:
>>>>>>>
>>>>>>> Link training in GEN2 mode for this card succeed only once after reset.
>>>>>>> Repeated link retraining fails and it fails even when aardvark is
>>>>>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
>>>>>>> working link training.
>>>>>>>
>>>>>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
>>>>>>> success read "negotiated link speed" from "Link Control Status Register"
>>>>>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
>>>>>>> retrain link again (for WLE900VX now it would be at GEN1). After that
>>>>>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
>>>>>>>
>>>>>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
>>>>>>> training fails. And if I change mode to GEN1 after this failed link
>>>>>>> training then nothing happen, link training do not success.
>>>>>>>
>>>>>>> So just speculation now... In current setup initialization of card does
>>>>>>> one link training at GEN2. Then aspm.c is called which is doing second
>>>>>>> link retraining at GEN2. And if it fails then below patch issue third
>>>>>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
>>>>>>> then second link retraining must be at GEN1 (not GEN2) to workaround
>>>>>>> this issue.
>>>>>>>
>>>>>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
>>>>>>> retraining at GEN2 speed? And always force GEN1 speed prior link
>>>>>>> training?
>>>>>> Sounds like a plan. I poked around in aspm.c and must confess to being a
>>>>>> bit lost in the soup of registers ;)
>>>>>>
>>>>>> So if one of you can cook up a patch, that would be most helpful!
>>>>> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
>>>>> about cls (current link speed, that what is used by aardvark). It is
>>>>> untested, I just tried to compile it.
>>>>>
>>>>> Can try it?
>>>>>
>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>>> index 253c30cc1967..f934c0b52f41 100644
>>>>> --- a/drivers/pci/pcie/aspm.c
>>>>> +++ b/drivers/pci/pcie/aspm.c
>>>>> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>>>>>    	unsigned long end_jiffies;
>>>>>    	u16 reg16;
>>>>>    
>>>>> +	u32 lnkcap2;
>>>>> +	u16 lnksta, lnkctl2, cls, tls;
>>>>> +
>>>>> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
>>>>> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
>>>>> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>>>>> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
>>>>> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
>>>>> +
>>>>> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
>>>>> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
>>>>> +		lnksta, cls,
>>>>> +		lnkctl2, tls);
>>>>> +
>>>>> +	tls = 1;
>>>>> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
>>>>> +					PCI_EXP_LNKCTL2_TLS, tls);
>>>>> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>>>>> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
>>>>> +		lnkctl2, tls);
>>>>> +
>>>>>    	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
>>>>>    	reg16 |= PCI_EXP_LNKCTL_RL;
>>>>>    	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>>>>> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>>>>>    			break;
>>>>>    		msleep(1);
>>>>>    	} while (time_before(jiffies, end_jiffies));
>>>>> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
>>>>> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
>>>>>    	return !(reg16 & PCI_EXP_LNKSTA_LT);
>>>>>    }
>>>>>    
>>>> Still exhibiting the BAR update error, run tested with next--20201030
>>> Yup, same for me :(
>> So then it is different issue and not similar to aardvark one.
>>
>> Anyway, was ASPM working on some previous kernel version? Or was it
>> always broken on Turris Omnia?
> I tried bisecting and couldn't find a commit that worked. And OpenWrt by
> default builds with ASPM off, so my best guess is that it was always
> broken.
>
> However, the two other PCI slots *do* work with ASPM on, as long as
> they're both occupied when booting. If I only have one card installed
> apart from the dodge WLE900, both of them fail...

Just to be sure it is not a (particular) mPCIe slot issue on the TO - 
did you change the device order in the mPCIe slots?

On my node:

- right slot (next to the CPU) hosts a SSD
- centre slot hosts WLE900VX
- left slot (over the SIM card slot) hosts the WLE200N2


[-- Attachment #1.1.2: OpenPGP_0x729CFF47A416598B.asc --]
[-- Type: application/pgp-keys, Size: 3163 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-11-02 16:18                             ` ™֟☻̭҇ Ѽ ҉ ®
@ 2020-11-02 16:33                               ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-11-02 16:33 UTC (permalink / raw)
  To: vtolkm, Pali Rohár
  Cc: Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper

"™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:

> On 02/11/2020 16:54, Toke Høiland-Jørgensen wrote:
>> Pali Rohár <pali@kernel.org> writes:
>>
>>> On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
>>>> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:
>>>>
>>>>> On 30/10/2020 15:23, Pali Rohár wrote:
>>>>>> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
>>>>>>> Pali Rohár <pali@kernel.org> writes:
>>>>>>>> My experience with that WLE900VX card, aardvark driver and aspm code:
>>>>>>>>
>>>>>>>> Link training in GEN2 mode for this card succeed only once after reset.
>>>>>>>> Repeated link retraining fails and it fails even when aardvark is
>>>>>>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
>>>>>>>> working link training.
>>>>>>>>
>>>>>>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
>>>>>>>> success read "negotiated link speed" from "Link Control Status Register"
>>>>>>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
>>>>>>>> retrain link again (for WLE900VX now it would be at GEN1). After that
>>>>>>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
>>>>>>>>
>>>>>>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
>>>>>>>> training fails. And if I change mode to GEN1 after this failed link
>>>>>>>> training then nothing happen, link training do not success.
>>>>>>>>
>>>>>>>> So just speculation now... In current setup initialization of card does
>>>>>>>> one link training at GEN2. Then aspm.c is called which is doing second
>>>>>>>> link retraining at GEN2. And if it fails then below patch issue third
>>>>>>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
>>>>>>>> then second link retraining must be at GEN1 (not GEN2) to workaround
>>>>>>>> this issue.
>>>>>>>>
>>>>>>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
>>>>>>>> retraining at GEN2 speed? And always force GEN1 speed prior link
>>>>>>>> training?
>>>>>>> Sounds like a plan. I poked around in aspm.c and must confess to being a
>>>>>>> bit lost in the soup of registers ;)
>>>>>>>
>>>>>>> So if one of you can cook up a patch, that would be most helpful!
>>>>>> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
>>>>>> about cls (current link speed, that what is used by aardvark). It is
>>>>>> untested, I just tried to compile it.
>>>>>>
>>>>>> Can try it?
>>>>>>
>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>>>> index 253c30cc1967..f934c0b52f41 100644
>>>>>> --- a/drivers/pci/pcie/aspm.c
>>>>>> +++ b/drivers/pci/pcie/aspm.c
>>>>>> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>>>>>>    	unsigned long end_jiffies;
>>>>>>    	u16 reg16;
>>>>>>    
>>>>>> +	u32 lnkcap2;
>>>>>> +	u16 lnksta, lnkctl2, cls, tls;
>>>>>> +
>>>>>> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
>>>>>> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
>>>>>> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>>>>>> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
>>>>>> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
>>>>>> +
>>>>>> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
>>>>>> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
>>>>>> +		lnksta, cls,
>>>>>> +		lnkctl2, tls);
>>>>>> +
>>>>>> +	tls = 1;
>>>>>> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
>>>>>> +					PCI_EXP_LNKCTL2_TLS, tls);
>>>>>> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>>>>>> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
>>>>>> +		lnkctl2, tls);
>>>>>> +
>>>>>>    	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
>>>>>>    	reg16 |= PCI_EXP_LNKCTL_RL;
>>>>>>    	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>>>>>> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>>>>>>    			break;
>>>>>>    		msleep(1);
>>>>>>    	} while (time_before(jiffies, end_jiffies));
>>>>>> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
>>>>>> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
>>>>>>    	return !(reg16 & PCI_EXP_LNKSTA_LT);
>>>>>>    }
>>>>>>    
>>>>> Still exhibiting the BAR update error, run tested with next--20201030
>>>> Yup, same for me :(
>>> So then it is different issue and not similar to aardvark one.
>>>
>>> Anyway, was ASPM working on some previous kernel version? Or was it
>>> always broken on Turris Omnia?
>> I tried bisecting and couldn't find a commit that worked. And OpenWrt by
>> default builds with ASPM off, so my best guess is that it was always
>> broken.
>>
>> However, the two other PCI slots *do* work with ASPM on, as long as
>> they're both occupied when booting. If I only have one card installed
>> apart from the dodge WLE900, both of them fail...
>
> Just to be sure it is not a (particular) mPCIe slot issue on the TO - 
> did you change the device order in the mPCIe slots?

No, I didn't.

> On my node:
>
> - right slot (next to the CPU) hosts a SSD
> - centre slot hosts WLE900VX
> - left slot (over the SIM card slot) hosts the WLE200N2

That's the same order as the PCI subsystem enumerates the slots (on my
machine at least). I have WLE200/WLE900/MT76 in those three slots, which
makes slot 1 and 3 work, while slot 2 craps out. If I remove the MT76
card (as it was originally), neither of slots 1 and 2 work...

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2020-11-02 15:54                           ` Toke Høiland-Jørgensen
  2020-11-02 16:18                             ` ™֟☻̭҇ Ѽ ҉ ®
@ 2021-03-15 19:58                             ` Pali Rohár
  2021-03-16  9:25                               ` Pali Rohár
  1 sibling, 1 reply; 62+ messages in thread
From: Pali Rohár @ 2021-03-15 19:58 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper

On Monday 02 November 2020 16:54:35 Toke Høiland-Jørgensen wrote:
> Pali Rohár <pali@kernel.org> writes:
> 
> > On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
> >> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:
> >> 
> >> > On 30/10/2020 15:23, Pali Rohár wrote:
> >> >> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
> >> >>> Pali Rohár <pali@kernel.org> writes:
> >> >>>> My experience with that WLE900VX card, aardvark driver and aspm code:
> >> >>>>
> >> >>>> Link training in GEN2 mode for this card succeed only once after reset.
> >> >>>> Repeated link retraining fails and it fails even when aardvark is
> >> >>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
> >> >>>> working link training.
> >> >>>>
> >> >>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
> >> >>>> success read "negotiated link speed" from "Link Control Status Register"
> >> >>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
> >> >>>> retrain link again (for WLE900VX now it would be at GEN1). After that
> >> >>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
> >> >>>>
> >> >>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
> >> >>>> training fails. And if I change mode to GEN1 after this failed link
> >> >>>> training then nothing happen, link training do not success.
> >> >>>>
> >> >>>> So just speculation now... In current setup initialization of card does
> >> >>>> one link training at GEN2. Then aspm.c is called which is doing second
> >> >>>> link retraining at GEN2. And if it fails then below patch issue third
> >> >>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
> >> >>>> then second link retraining must be at GEN1 (not GEN2) to workaround
> >> >>>> this issue.
> >> >>>>
> >> >>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
> >> >>>> retraining at GEN2 speed? And always force GEN1 speed prior link
> >> >>>> training?
> >> >>> Sounds like a plan. I poked around in aspm.c and must confess to being a
> >> >>> bit lost in the soup of registers ;)
> >> >>>
> >> >>> So if one of you can cook up a patch, that would be most helpful!
> >> >> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
> >> >> about cls (current link speed, that what is used by aardvark). It is
> >> >> untested, I just tried to compile it.
> >> >>
> >> >> Can try it?
> >> >>
> >> >> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >> >> index 253c30cc1967..f934c0b52f41 100644
> >> >> --- a/drivers/pci/pcie/aspm.c
> >> >> +++ b/drivers/pci/pcie/aspm.c
> >> >> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
> >> >>   	unsigned long end_jiffies;
> >> >>   	u16 reg16;
> >> >>   
> >> >> +	u32 lnkcap2;
> >> >> +	u16 lnksta, lnkctl2, cls, tls;
> >> >> +
> >> >> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
> >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
> >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> >> >> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
> >> >> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
> >> >> +
> >> >> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
> >> >> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
> >> >> +		lnksta, cls,
> >> >> +		lnkctl2, tls);
> >> >> +
> >> >> +	tls = 1;
> >> >> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
> >> >> +					PCI_EXP_LNKCTL2_TLS, tls);
> >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> >> >> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
> >> >> +		lnkctl2, tls);
> >> >> +
> >> >>   	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
> >> >>   	reg16 |= PCI_EXP_LNKCTL_RL;
> >> >>   	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> >> >> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
> >> >>   			break;
> >> >>   		msleep(1);
> >> >>   	} while (time_before(jiffies, end_jiffies));
> >> >> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
> >> >> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
> >> >>   	return !(reg16 & PCI_EXP_LNKSTA_LT);
> >> >>   }
> >> >>   
> >> >
> >> > Still exhibiting the BAR update error, run tested with next--20201030
> >> 
> >> Yup, same for me :(

I'm answering my own question. This code does not work on Omnia because
A38x pci-mvebu.c driver is using emulator for PCIe root bridge and it
does not implement PCI_EXP_LNKCTL2 and PCI_EXP_LNKCTL2 registers. So
code for forcing link speed has no effect on Omnia...

> > So then it is different issue and not similar to aardvark one.

... and therefore it can be still same issue which I have debugged on
aardvark.

> > Anyway, was ASPM working on some previous kernel version? Or was it
> > always broken on Turris Omnia?
> 
> I tried bisecting and couldn't find a commit that worked. And OpenWrt by
> default builds with ASPM off, so my best guess is that it was always
> broken.

I see and it makes sense that it does not work in any version.

> However, the two other PCI slots *do* work with ASPM on, as long as
> they're both occupied when booting. If I only have one card installed
> apart from the dodge WLE900, both of them fail...
> 
> > And has somebody other Armada 385 device with mPCIe slots to test if
> > ASPM is working? Or any other 32bit Marvell Armada SOC?
> >
> > I would like to know if this is issue only on Turris Omnia or also on
> > other Armada 385 SOC device or even on any other device which uses
> > pci-mvebu.c driver.
> 
> See above: It does partly work on my Omnia. Is it possible to define a
> quirk to just disable it on a per-slot basis for the WLE900 card? Maybe
> just doing that and calling it a day would be enough...
> 
> -Toke
> 

Toke, can you try to put this WLE900 card into some x86 computer and
check if this card works? With ASPM enabled and also with ASPM disabled?
Or into any other device which does not have Marvell PCIe controller?

I need to know if problem is with this WLE900 card or with Marvell's
PCIe controllers. And based on it I can prepare quirk / hook for either
WLE900 card or for Marvell PCIe drivers (or both, based on how it is
broken).


PS for all: Please do not put fancy unicode characters into email From:
header as such email would be marked as spam and automatically filtered.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-15 19:58                             ` Pali Rohár
@ 2021-03-16  9:25                               ` Pali Rohár
  2021-03-18 22:43                                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: Pali Rohár @ 2021-03-16  9:25 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper

On Monday 15 March 2021 20:58:06 Pali Rohár wrote:
> On Monday 02 November 2020 16:54:35 Toke Høiland-Jørgensen wrote:
> > Pali Rohár <pali@kernel.org> writes:
> > 
> > > On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
> > >> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:
> > >> 
> > >> > On 30/10/2020 15:23, Pali Rohár wrote:
> > >> >> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
> > >> >>> Pali Rohár <pali@kernel.org> writes:
> > >> >>>> My experience with that WLE900VX card, aardvark driver and aspm code:
> > >> >>>>
> > >> >>>> Link training in GEN2 mode for this card succeed only once after reset.
> > >> >>>> Repeated link retraining fails and it fails even when aardvark is
> > >> >>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
> > >> >>>> working link training.
> > >> >>>>
> > >> >>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
> > >> >>>> success read "negotiated link speed" from "Link Control Status Register"
> > >> >>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
> > >> >>>> retrain link again (for WLE900VX now it would be at GEN1). After that
> > >> >>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
> > >> >>>>
> > >> >>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
> > >> >>>> training fails. And if I change mode to GEN1 after this failed link
> > >> >>>> training then nothing happen, link training do not success.
> > >> >>>>
> > >> >>>> So just speculation now... In current setup initialization of card does
> > >> >>>> one link training at GEN2. Then aspm.c is called which is doing second
> > >> >>>> link retraining at GEN2. And if it fails then below patch issue third
> > >> >>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
> > >> >>>> then second link retraining must be at GEN1 (not GEN2) to workaround
> > >> >>>> this issue.
> > >> >>>>
> > >> >>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
> > >> >>>> retraining at GEN2 speed? And always force GEN1 speed prior link
> > >> >>>> training?
> > >> >>> Sounds like a plan. I poked around in aspm.c and must confess to being a
> > >> >>> bit lost in the soup of registers ;)
> > >> >>>
> > >> >>> So if one of you can cook up a patch, that would be most helpful!
> > >> >> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
> > >> >> about cls (current link speed, that what is used by aardvark). It is
> > >> >> untested, I just tried to compile it.
> > >> >>
> > >> >> Can try it?
> > >> >>
> > >> >> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > >> >> index 253c30cc1967..f934c0b52f41 100644
> > >> >> --- a/drivers/pci/pcie/aspm.c
> > >> >> +++ b/drivers/pci/pcie/aspm.c
> > >> >> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
> > >> >>   	unsigned long end_jiffies;
> > >> >>   	u16 reg16;
> > >> >>   
> > >> >> +	u32 lnkcap2;
> > >> >> +	u16 lnksta, lnkctl2, cls, tls;
> > >> >> +
> > >> >> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> > >> >> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
> > >> >> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
> > >> >> +
> > >> >> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
> > >> >> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
> > >> >> +		lnksta, cls,
> > >> >> +		lnkctl2, tls);
> > >> >> +
> > >> >> +	tls = 1;
> > >> >> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
> > >> >> +					PCI_EXP_LNKCTL2_TLS, tls);
> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> > >> >> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
> > >> >> +		lnkctl2, tls);
> > >> >> +
> > >> >>   	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
> > >> >>   	reg16 |= PCI_EXP_LNKCTL_RL;
> > >> >>   	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> > >> >> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
> > >> >>   			break;
> > >> >>   		msleep(1);
> > >> >>   	} while (time_before(jiffies, end_jiffies));
> > >> >> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
> > >> >> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
> > >> >>   	return !(reg16 & PCI_EXP_LNKSTA_LT);
> > >> >>   }
> > >> >>   
> > >> >
> > >> > Still exhibiting the BAR update error, run tested with next--20201030
> > >> 
> > >> Yup, same for me :(
> 
> I'm answering my own question. This code does not work on Omnia because
> A38x pci-mvebu.c driver is using emulator for PCIe root bridge and it
> does not implement PCI_EXP_LNKCTL2 and PCI_EXP_LNKCTL2 registers. So
> code for forcing link speed has no effect on Omnia...

Toke, on A38x PCIe controller it is possible to access PCI_EXP_LNKCTL2
register. Just access is not exported via emulated root bridge.

Documentation for this PCIe controller is public, so anybody can look at
register description. See page 571, A.7 PCI Express 2.0 Port 0 Registers

http://web.archive.org/web/20200420191927/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-embedded-processors-armada-38x-functional-specifications-2015-11.pdf

In drivers/pci/controller/pci-mvebu.c you can set a new value for this
register via function call:

    mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);

So, could you try to set PCI_EXP_LNKCTL2_TLS bits to gen1 in some hw
init function, e.g. mvebu_pcie_setup_hw()?

    u32 val = mvebu_readl(port, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
    val &= ~PCI_EXP_LNKCTL2_TLS;
    val |= PCI_EXP_LNKCTL2_TLS_2_5GT;
    mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);

> > > So then it is different issue and not similar to aardvark one.
> 
> ... and therefore it can be still same issue which I have debugged on
> aardvark.
> 
> > > Anyway, was ASPM working on some previous kernel version? Or was it
> > > always broken on Turris Omnia?
> > 
> > I tried bisecting and couldn't find a commit that worked. And OpenWrt by
> > default builds with ASPM off, so my best guess is that it was always
> > broken.
> 
> I see and it makes sense that it does not work in any version.
> 
> > However, the two other PCI slots *do* work with ASPM on, as long as
> > they're both occupied when booting. If I only have one card installed
> > apart from the dodge WLE900, both of them fail...
> > 
> > > And has somebody other Armada 385 device with mPCIe slots to test if
> > > ASPM is working? Or any other 32bit Marvell Armada SOC?
> > >
> > > I would like to know if this is issue only on Turris Omnia or also on
> > > other Armada 385 SOC device or even on any other device which uses
> > > pci-mvebu.c driver.
> > 
> > See above: It does partly work on my Omnia. Is it possible to define a
> > quirk to just disable it on a per-slot basis for the WLE900 card? Maybe
> > just doing that and calling it a day would be enough...
> > 
> > -Toke
> > 
> 
> Toke, can you try to put this WLE900 card into some x86 computer and
> check if this card works? With ASPM enabled and also with ASPM disabled?
> Or into any other device which does not have Marvell PCIe controller?
> 
> I need to know if problem is with this WLE900 card or with Marvell's
> PCIe controllers. And based on it I can prepare quirk / hook for either
> WLE900 card or for Marvell PCIe drivers (or both, based on how it is
> broken).
> 
> 
> PS for all: Please do not put fancy unicode characters into email From:
> header as such email would be marked as spam and automatically filtered.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-16  9:25                               ` Pali Rohár
@ 2021-03-18 22:43                                 ` Toke Høiland-Jørgensen
  2021-03-18 23:16                                   ` Pali Rohár
  0 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-03-18 22:43 UTC (permalink / raw)
  To: Pali Rohár
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper

Pali Rohár <pali@kernel.org> writes:

> On Monday 15 March 2021 20:58:06 Pali Rohár wrote:
>> On Monday 02 November 2020 16:54:35 Toke Høiland-Jørgensen wrote:
>> > Pali Rohár <pali@kernel.org> writes:
>> > 
>> > > On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
>> > >> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:
>> > >> 
>> > >> > On 30/10/2020 15:23, Pali Rohár wrote:
>> > >> >> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
>> > >> >>> Pali Rohár <pali@kernel.org> writes:
>> > >> >>>> My experience with that WLE900VX card, aardvark driver and aspm code:
>> > >> >>>>
>> > >> >>>> Link training in GEN2 mode for this card succeed only once after reset.
>> > >> >>>> Repeated link retraining fails and it fails even when aardvark is
>> > >> >>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
>> > >> >>>> working link training.
>> > >> >>>>
>> > >> >>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
>> > >> >>>> success read "negotiated link speed" from "Link Control Status Register"
>> > >> >>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
>> > >> >>>> retrain link again (for WLE900VX now it would be at GEN1). After that
>> > >> >>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
>> > >> >>>>
>> > >> >>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
>> > >> >>>> training fails. And if I change mode to GEN1 after this failed link
>> > >> >>>> training then nothing happen, link training do not success.
>> > >> >>>>
>> > >> >>>> So just speculation now... In current setup initialization of card does
>> > >> >>>> one link training at GEN2. Then aspm.c is called which is doing second
>> > >> >>>> link retraining at GEN2. And if it fails then below patch issue third
>> > >> >>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
>> > >> >>>> then second link retraining must be at GEN1 (not GEN2) to workaround
>> > >> >>>> this issue.
>> > >> >>>>
>> > >> >>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
>> > >> >>>> retraining at GEN2 speed? And always force GEN1 speed prior link
>> > >> >>>> training?
>> > >> >>> Sounds like a plan. I poked around in aspm.c and must confess to being a
>> > >> >>> bit lost in the soup of registers ;)
>> > >> >>>
>> > >> >>> So if one of you can cook up a patch, that would be most helpful!
>> > >> >> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
>> > >> >> about cls (current link speed, that what is used by aardvark). It is
>> > >> >> untested, I just tried to compile it.
>> > >> >>
>> > >> >> Can try it?
>> > >> >>
>> > >> >> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> > >> >> index 253c30cc1967..f934c0b52f41 100644
>> > >> >> --- a/drivers/pci/pcie/aspm.c
>> > >> >> +++ b/drivers/pci/pcie/aspm.c
>> > >> >> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>> > >> >>   	unsigned long end_jiffies;
>> > >> >>   	u16 reg16;
>> > >> >>   
>> > >> >> +	u32 lnkcap2;
>> > >> >> +	u16 lnksta, lnkctl2, cls, tls;
>> > >> >> +
>> > >> >> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
>> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
>> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>> > >> >> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
>> > >> >> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
>> > >> >> +
>> > >> >> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
>> > >> >> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
>> > >> >> +		lnksta, cls,
>> > >> >> +		lnkctl2, tls);
>> > >> >> +
>> > >> >> +	tls = 1;
>> > >> >> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
>> > >> >> +					PCI_EXP_LNKCTL2_TLS, tls);
>> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>> > >> >> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
>> > >> >> +		lnkctl2, tls);
>> > >> >> +
>> > >> >>   	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
>> > >> >>   	reg16 |= PCI_EXP_LNKCTL_RL;
>> > >> >>   	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>> > >> >> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>> > >> >>   			break;
>> > >> >>   		msleep(1);
>> > >> >>   	} while (time_before(jiffies, end_jiffies));
>> > >> >> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
>> > >> >> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
>> > >> >>   	return !(reg16 & PCI_EXP_LNKSTA_LT);
>> > >> >>   }
>> > >> >>   
>> > >> >
>> > >> > Still exhibiting the BAR update error, run tested with next--20201030
>> > >> 
>> > >> Yup, same for me :(
>> 
>> I'm answering my own question. This code does not work on Omnia because
>> A38x pci-mvebu.c driver is using emulator for PCIe root bridge and it
>> does not implement PCI_EXP_LNKCTL2 and PCI_EXP_LNKCTL2 registers. So
>> code for forcing link speed has no effect on Omnia...
>
> Toke, on A38x PCIe controller it is possible to access PCI_EXP_LNKCTL2
> register. Just access is not exported via emulated root bridge.
>
> Documentation for this PCIe controller is public, so anybody can look at
> register description. See page 571, A.7 PCI Express 2.0 Port 0 Registers
>
> http://web.archive.org/web/20200420191927/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-embedded-processors-armada-38x-functional-specifications-2015-11.pdf
>
> In drivers/pci/controller/pci-mvebu.c you can set a new value for this
> register via function call:
>
>     mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
>
> So, could you try to set PCI_EXP_LNKCTL2_TLS bits to gen1 in some hw
> init function, e.g. mvebu_pcie_setup_hw()?
>
>     u32 val = mvebu_readl(port, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
>     val &= ~PCI_EXP_LNKCTL2_TLS;
>     val |= PCI_EXP_LNKCTL2_TLS_2_5GT;
>     mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);

I pasted this into the top of mvebu_pcie_setup_hw(), and that indeed
seems to fix things so that all three PCIE devices work even with ASPM
turned on! :)

Do you still need me to test the card on a different machine? Not sure I
have an x86 machine with a mini-PCIe slot handy, but I can go hunting if
needed...

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-18 22:43                                 ` Toke Høiland-Jørgensen
@ 2021-03-18 23:16                                   ` Pali Rohár
  2021-03-26 12:50                                     ` Pali Rohár
  0 siblings, 1 reply; 62+ messages in thread
From: Pali Rohár @ 2021-03-18 23:16 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni,
	Jason Cooper

On Thursday 18 March 2021 23:43:58 Toke Høiland-Jørgensen wrote:
> Pali Rohár <pali@kernel.org> writes:
> 
> > On Monday 15 March 2021 20:58:06 Pali Rohár wrote:
> >> On Monday 02 November 2020 16:54:35 Toke Høiland-Jørgensen wrote:
> >> > Pali Rohár <pali@kernel.org> writes:
> >> > 
> >> > > On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
> >> > >> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:
> >> > >> 
> >> > >> > On 30/10/2020 15:23, Pali Rohár wrote:
> >> > >> >> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
> >> > >> >>> Pali Rohár <pali@kernel.org> writes:
> >> > >> >>>> My experience with that WLE900VX card, aardvark driver and aspm code:
> >> > >> >>>>
> >> > >> >>>> Link training in GEN2 mode for this card succeed only once after reset.
> >> > >> >>>> Repeated link retraining fails and it fails even when aardvark is
> >> > >> >>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
> >> > >> >>>> working link training.
> >> > >> >>>>
> >> > >> >>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
> >> > >> >>>> success read "negotiated link speed" from "Link Control Status Register"
> >> > >> >>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
> >> > >> >>>> retrain link again (for WLE900VX now it would be at GEN1). After that
> >> > >> >>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
> >> > >> >>>>
> >> > >> >>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
> >> > >> >>>> training fails. And if I change mode to GEN1 after this failed link
> >> > >> >>>> training then nothing happen, link training do not success.
> >> > >> >>>>
> >> > >> >>>> So just speculation now... In current setup initialization of card does
> >> > >> >>>> one link training at GEN2. Then aspm.c is called which is doing second
> >> > >> >>>> link retraining at GEN2. And if it fails then below patch issue third
> >> > >> >>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
> >> > >> >>>> then second link retraining must be at GEN1 (not GEN2) to workaround
> >> > >> >>>> this issue.
> >> > >> >>>>
> >> > >> >>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
> >> > >> >>>> retraining at GEN2 speed? And always force GEN1 speed prior link
> >> > >> >>>> training?
> >> > >> >>> Sounds like a plan. I poked around in aspm.c and must confess to being a
> >> > >> >>> bit lost in the soup of registers ;)
> >> > >> >>>
> >> > >> >>> So if one of you can cook up a patch, that would be most helpful!
> >> > >> >> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
> >> > >> >> about cls (current link speed, that what is used by aardvark). It is
> >> > >> >> untested, I just tried to compile it.
> >> > >> >>
> >> > >> >> Can try it?
> >> > >> >>
> >> > >> >> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >> > >> >> index 253c30cc1967..f934c0b52f41 100644
> >> > >> >> --- a/drivers/pci/pcie/aspm.c
> >> > >> >> +++ b/drivers/pci/pcie/aspm.c
> >> > >> >> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
> >> > >> >>   	unsigned long end_jiffies;
> >> > >> >>   	u16 reg16;
> >> > >> >>   
> >> > >> >> +	u32 lnkcap2;
> >> > >> >> +	u16 lnksta, lnkctl2, cls, tls;
> >> > >> >> +
> >> > >> >> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
> >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
> >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> >> > >> >> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
> >> > >> >> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
> >> > >> >> +
> >> > >> >> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
> >> > >> >> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
> >> > >> >> +		lnksta, cls,
> >> > >> >> +		lnkctl2, tls);
> >> > >> >> +
> >> > >> >> +	tls = 1;
> >> > >> >> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
> >> > >> >> +					PCI_EXP_LNKCTL2_TLS, tls);
> >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> >> > >> >> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
> >> > >> >> +		lnkctl2, tls);
> >> > >> >> +
> >> > >> >>   	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
> >> > >> >>   	reg16 |= PCI_EXP_LNKCTL_RL;
> >> > >> >>   	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> >> > >> >> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
> >> > >> >>   			break;
> >> > >> >>   		msleep(1);
> >> > >> >>   	} while (time_before(jiffies, end_jiffies));
> >> > >> >> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
> >> > >> >> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
> >> > >> >>   	return !(reg16 & PCI_EXP_LNKSTA_LT);
> >> > >> >>   }
> >> > >> >>   
> >> > >> >
> >> > >> > Still exhibiting the BAR update error, run tested with next--20201030
> >> > >> 
> >> > >> Yup, same for me :(
> >> 
> >> I'm answering my own question. This code does not work on Omnia because
> >> A38x pci-mvebu.c driver is using emulator for PCIe root bridge and it
> >> does not implement PCI_EXP_LNKCTL2 and PCI_EXP_LNKCTL2 registers. So
> >> code for forcing link speed has no effect on Omnia...
> >
> > Toke, on A38x PCIe controller it is possible to access PCI_EXP_LNKCTL2
> > register. Just access is not exported via emulated root bridge.
> >
> > Documentation for this PCIe controller is public, so anybody can look at
> > register description. See page 571, A.7 PCI Express 2.0 Port 0 Registers
> >
> > http://web.archive.org/web/20200420191927/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-embedded-processors-armada-38x-functional-specifications-2015-11.pdf
> >
> > In drivers/pci/controller/pci-mvebu.c you can set a new value for this
> > register via function call:
> >
> >     mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
> >
> > So, could you try to set PCI_EXP_LNKCTL2_TLS bits to gen1 in some hw
> > init function, e.g. mvebu_pcie_setup_hw()?
> >
> >     u32 val = mvebu_readl(port, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
> >     val &= ~PCI_EXP_LNKCTL2_TLS;
> >     val |= PCI_EXP_LNKCTL2_TLS_2_5GT;
> >     mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
> 
> I pasted this into the top of mvebu_pcie_setup_hw(), and that indeed
> seems to fix things so that all three PCIE devices work even with ASPM
> turned on! :)

Perfect! Now I'm sure that it is same issue as in aardvark driver.

I will prepare patches for both pci-aardvark.c and pci-mvebu.c to export
PCI_EXP_LNKCTL2 register via emulated bridge. And so aspm.c code would
be able to use Bjorn or my patch which I have sent last year.

Question reminds, if this is issue with QCA wifi chip on that Compex
card or it is issue with PCIe controllers, now on A38x and A3720 SoC.
Note that both A38x and A3720 platforms are from Marvell, but they have
different PCIe controllers (so it does not mean that both must have same
hw bugs).

> Do you still need me to test the card on a different machine? Not sure I
> have an x86 machine with a mini-PCIe slot handy, but I can go hunting if
> needed...

Yes, now it is needed to know if we can find any other PCIe controller
in which this card does not work when ASPM is enabled and above "retrain
link" kernel code is executed.

It does not have to be x86. But due to how UEFI, ACPI and other
firmwares touches PCIe, there is a high chance that on some x86 machine
this bug can appear too. More firmwares = more problems.

On arm platforms with native controller drivers there does not have to
be any firmware (like in Marvell case) so only kernel touches PCIe HW at
the same time.

Note that on x86, ASPM may be disabled (if firmware indicates it), so
command line argument like "pcie_aspm=force" is probably required for
tests.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-18 23:16                                   ` Pali Rohár
@ 2021-03-26 12:50                                     ` Pali Rohár
  2021-03-26 15:25                                       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: Pali Rohár @ 2021-03-26 12:50 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni

On Friday 19 March 2021 00:16:29 Pali Rohár wrote:
> On Thursday 18 March 2021 23:43:58 Toke Høiland-Jørgensen wrote:
> > Pali Rohár <pali@kernel.org> writes:
> > 
> > > On Monday 15 March 2021 20:58:06 Pali Rohár wrote:
> > >> On Monday 02 November 2020 16:54:35 Toke Høiland-Jørgensen wrote:
> > >> > Pali Rohár <pali@kernel.org> writes:
> > >> > 
> > >> > > On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
> > >> > >> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:
> > >> > >> 
> > >> > >> > On 30/10/2020 15:23, Pali Rohár wrote:
> > >> > >> >> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
> > >> > >> >>> Pali Rohár <pali@kernel.org> writes:
> > >> > >> >>>> My experience with that WLE900VX card, aardvark driver and aspm code:
> > >> > >> >>>>
> > >> > >> >>>> Link training in GEN2 mode for this card succeed only once after reset.
> > >> > >> >>>> Repeated link retraining fails and it fails even when aardvark is
> > >> > >> >>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
> > >> > >> >>>> working link training.
> > >> > >> >>>>
> > >> > >> >>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
> > >> > >> >>>> success read "negotiated link speed" from "Link Control Status Register"
> > >> > >> >>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
> > >> > >> >>>> retrain link again (for WLE900VX now it would be at GEN1). After that
> > >> > >> >>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
> > >> > >> >>>>
> > >> > >> >>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
> > >> > >> >>>> training fails. And if I change mode to GEN1 after this failed link
> > >> > >> >>>> training then nothing happen, link training do not success.
> > >> > >> >>>>
> > >> > >> >>>> So just speculation now... In current setup initialization of card does
> > >> > >> >>>> one link training at GEN2. Then aspm.c is called which is doing second
> > >> > >> >>>> link retraining at GEN2. And if it fails then below patch issue third
> > >> > >> >>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
> > >> > >> >>>> then second link retraining must be at GEN1 (not GEN2) to workaround
> > >> > >> >>>> this issue.
> > >> > >> >>>>
> > >> > >> >>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
> > >> > >> >>>> retraining at GEN2 speed? And always force GEN1 speed prior link
> > >> > >> >>>> training?
> > >> > >> >>> Sounds like a plan. I poked around in aspm.c and must confess to being a
> > >> > >> >>> bit lost in the soup of registers ;)
> > >> > >> >>>
> > >> > >> >>> So if one of you can cook up a patch, that would be most helpful!
> > >> > >> >> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
> > >> > >> >> about cls (current link speed, that what is used by aardvark). It is
> > >> > >> >> untested, I just tried to compile it.
> > >> > >> >>
> > >> > >> >> Can try it?
> > >> > >> >>
> > >> > >> >> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > >> > >> >> index 253c30cc1967..f934c0b52f41 100644
> > >> > >> >> --- a/drivers/pci/pcie/aspm.c
> > >> > >> >> +++ b/drivers/pci/pcie/aspm.c
> > >> > >> >> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
> > >> > >> >>   	unsigned long end_jiffies;
> > >> > >> >>   	u16 reg16;
> > >> > >> >>   
> > >> > >> >> +	u32 lnkcap2;
> > >> > >> >> +	u16 lnksta, lnkctl2, cls, tls;
> > >> > >> >> +
> > >> > >> >> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
> > >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
> > >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> > >> > >> >> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
> > >> > >> >> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
> > >> > >> >> +
> > >> > >> >> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
> > >> > >> >> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
> > >> > >> >> +		lnksta, cls,
> > >> > >> >> +		lnkctl2, tls);
> > >> > >> >> +
> > >> > >> >> +	tls = 1;
> > >> > >> >> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
> > >> > >> >> +					PCI_EXP_LNKCTL2_TLS, tls);
> > >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> > >> > >> >> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
> > >> > >> >> +		lnkctl2, tls);
> > >> > >> >> +
> > >> > >> >>   	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
> > >> > >> >>   	reg16 |= PCI_EXP_LNKCTL_RL;
> > >> > >> >>   	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> > >> > >> >> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
> > >> > >> >>   			break;
> > >> > >> >>   		msleep(1);
> > >> > >> >>   	} while (time_before(jiffies, end_jiffies));
> > >> > >> >> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
> > >> > >> >> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
> > >> > >> >>   	return !(reg16 & PCI_EXP_LNKSTA_LT);
> > >> > >> >>   }
> > >> > >> >>   
> > >> > >> >
> > >> > >> > Still exhibiting the BAR update error, run tested with next--20201030
> > >> > >> 
> > >> > >> Yup, same for me :(
> > >> 
> > >> I'm answering my own question. This code does not work on Omnia because
> > >> A38x pci-mvebu.c driver is using emulator for PCIe root bridge and it
> > >> does not implement PCI_EXP_LNKCTL2 and PCI_EXP_LNKCTL2 registers. So
> > >> code for forcing link speed has no effect on Omnia...
> > >
> > > Toke, on A38x PCIe controller it is possible to access PCI_EXP_LNKCTL2
> > > register. Just access is not exported via emulated root bridge.
> > >
> > > Documentation for this PCIe controller is public, so anybody can look at
> > > register description. See page 571, A.7 PCI Express 2.0 Port 0 Registers
> > >
> > > http://web.archive.org/web/20200420191927/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-embedded-processors-armada-38x-functional-specifications-2015-11.pdf
> > >
> > > In drivers/pci/controller/pci-mvebu.c you can set a new value for this
> > > register via function call:
> > >
> > >     mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
> > >
> > > So, could you try to set PCI_EXP_LNKCTL2_TLS bits to gen1 in some hw
> > > init function, e.g. mvebu_pcie_setup_hw()?
> > >
> > >     u32 val = mvebu_readl(port, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
> > >     val &= ~PCI_EXP_LNKCTL2_TLS;
> > >     val |= PCI_EXP_LNKCTL2_TLS_2_5GT;
> > >     mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
> > 
> > I pasted this into the top of mvebu_pcie_setup_hw(), and that indeed
> > seems to fix things so that all three PCIE devices work even with ASPM
> > turned on! :)
> 
> Perfect! Now I'm sure that it is same issue as in aardvark driver.
> 
> I will prepare patches for both pci-aardvark.c and pci-mvebu.c to export
> PCI_EXP_LNKCTL2 register via emulated bridge. And so aspm.c code would
> be able to use Bjorn or my patch which I have sent last year.
> 
> Question reminds, if this is issue with QCA wifi chip on that Compex
> card or it is issue with PCIe controllers, now on A38x and A3720 SoC.
> Note that both A38x and A3720 platforms are from Marvell, but they have
> different PCIe controllers (so it does not mean that both must have same
> hw bugs).

Seems that this is really issue in QCA98xx chips. I have send patch
which adds quirk for these wifi chips:

https://lore.kernel.org/linux-pci/20210326124326.21163-1-pali@kernel.org/

> > Do you still need me to test the card on a different machine? Not sure I
> > have an x86 machine with a mini-PCIe slot handy, but I can go hunting if
> > needed...
> 
> Yes, now it is needed to know if we can find any other PCIe controller
> in which this card does not work when ASPM is enabled and above "retrain
> link" kernel code is executed.
> 
> It does not have to be x86. But due to how UEFI, ACPI and other
> firmwares touches PCIe, there is a high chance that on some x86 machine
> this bug can appear too. More firmwares = more problems.
> 
> On arm platforms with native controller drivers there does not have to
> be any firmware (like in Marvell case) so only kernel touches PCIe HW at
> the same time.
> 
> Note that on x86, ASPM may be disabled (if firmware indicates it), so
> command line argument like "pcie_aspm=force" is probably required for
> tests.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-26 12:50                                     ` Pali Rohár
@ 2021-03-26 15:25                                       ` Toke Høiland-Jørgensen
  2021-03-26 15:34                                         ` Pali Rohár
  0 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-03-26 15:25 UTC (permalink / raw)
  To: Pali Rohár
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni

Pali Rohár <pali@kernel.org> writes:

> On Friday 19 March 2021 00:16:29 Pali Rohár wrote:
>> On Thursday 18 March 2021 23:43:58 Toke Høiland-Jørgensen wrote:
>> > Pali Rohár <pali@kernel.org> writes:
>> > 
>> > > On Monday 15 March 2021 20:58:06 Pali Rohár wrote:
>> > >> On Monday 02 November 2020 16:54:35 Toke Høiland-Jørgensen wrote:
>> > >> > Pali Rohár <pali@kernel.org> writes:
>> > >> > 
>> > >> > > On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
>> > >> > >> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:
>> > >> > >> 
>> > >> > >> > On 30/10/2020 15:23, Pali Rohár wrote:
>> > >> > >> >> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
>> > >> > >> >>> Pali Rohár <pali@kernel.org> writes:
>> > >> > >> >>>> My experience with that WLE900VX card, aardvark driver and aspm code:
>> > >> > >> >>>>
>> > >> > >> >>>> Link training in GEN2 mode for this card succeed only once after reset.
>> > >> > >> >>>> Repeated link retraining fails and it fails even when aardvark is
>> > >> > >> >>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
>> > >> > >> >>>> working link training.
>> > >> > >> >>>>
>> > >> > >> >>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
>> > >> > >> >>>> success read "negotiated link speed" from "Link Control Status Register"
>> > >> > >> >>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
>> > >> > >> >>>> retrain link again (for WLE900VX now it would be at GEN1). After that
>> > >> > >> >>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
>> > >> > >> >>>>
>> > >> > >> >>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
>> > >> > >> >>>> training fails. And if I change mode to GEN1 after this failed link
>> > >> > >> >>>> training then nothing happen, link training do not success.
>> > >> > >> >>>>
>> > >> > >> >>>> So just speculation now... In current setup initialization of card does
>> > >> > >> >>>> one link training at GEN2. Then aspm.c is called which is doing second
>> > >> > >> >>>> link retraining at GEN2. And if it fails then below patch issue third
>> > >> > >> >>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
>> > >> > >> >>>> then second link retraining must be at GEN1 (not GEN2) to workaround
>> > >> > >> >>>> this issue.
>> > >> > >> >>>>
>> > >> > >> >>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
>> > >> > >> >>>> retraining at GEN2 speed? And always force GEN1 speed prior link
>> > >> > >> >>>> training?
>> > >> > >> >>> Sounds like a plan. I poked around in aspm.c and must confess to being a
>> > >> > >> >>> bit lost in the soup of registers ;)
>> > >> > >> >>>
>> > >> > >> >>> So if one of you can cook up a patch, that would be most helpful!
>> > >> > >> >> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
>> > >> > >> >> about cls (current link speed, that what is used by aardvark). It is
>> > >> > >> >> untested, I just tried to compile it.
>> > >> > >> >>
>> > >> > >> >> Can try it?
>> > >> > >> >>
>> > >> > >> >> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> > >> > >> >> index 253c30cc1967..f934c0b52f41 100644
>> > >> > >> >> --- a/drivers/pci/pcie/aspm.c
>> > >> > >> >> +++ b/drivers/pci/pcie/aspm.c
>> > >> > >> >> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>> > >> > >> >>   	unsigned long end_jiffies;
>> > >> > >> >>   	u16 reg16;
>> > >> > >> >>   
>> > >> > >> >> +	u32 lnkcap2;
>> > >> > >> >> +	u16 lnksta, lnkctl2, cls, tls;
>> > >> > >> >> +
>> > >> > >> >> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
>> > >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
>> > >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>> > >> > >> >> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
>> > >> > >> >> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
>> > >> > >> >> +
>> > >> > >> >> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
>> > >> > >> >> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
>> > >> > >> >> +		lnksta, cls,
>> > >> > >> >> +		lnkctl2, tls);
>> > >> > >> >> +
>> > >> > >> >> +	tls = 1;
>> > >> > >> >> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
>> > >> > >> >> +					PCI_EXP_LNKCTL2_TLS, tls);
>> > >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>> > >> > >> >> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
>> > >> > >> >> +		lnkctl2, tls);
>> > >> > >> >> +
>> > >> > >> >>   	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
>> > >> > >> >>   	reg16 |= PCI_EXP_LNKCTL_RL;
>> > >> > >> >>   	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>> > >> > >> >> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>> > >> > >> >>   			break;
>> > >> > >> >>   		msleep(1);
>> > >> > >> >>   	} while (time_before(jiffies, end_jiffies));
>> > >> > >> >> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
>> > >> > >> >> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
>> > >> > >> >>   	return !(reg16 & PCI_EXP_LNKSTA_LT);
>> > >> > >> >>   }
>> > >> > >> >>   
>> > >> > >> >
>> > >> > >> > Still exhibiting the BAR update error, run tested with next--20201030
>> > >> > >> 
>> > >> > >> Yup, same for me :(
>> > >> 
>> > >> I'm answering my own question. This code does not work on Omnia because
>> > >> A38x pci-mvebu.c driver is using emulator for PCIe root bridge and it
>> > >> does not implement PCI_EXP_LNKCTL2 and PCI_EXP_LNKCTL2 registers. So
>> > >> code for forcing link speed has no effect on Omnia...
>> > >
>> > > Toke, on A38x PCIe controller it is possible to access PCI_EXP_LNKCTL2
>> > > register. Just access is not exported via emulated root bridge.
>> > >
>> > > Documentation for this PCIe controller is public, so anybody can look at
>> > > register description. See page 571, A.7 PCI Express 2.0 Port 0 Registers
>> > >
>> > > http://web.archive.org/web/20200420191927/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-embedded-processors-armada-38x-functional-specifications-2015-11.pdf
>> > >
>> > > In drivers/pci/controller/pci-mvebu.c you can set a new value for this
>> > > register via function call:
>> > >
>> > >     mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
>> > >
>> > > So, could you try to set PCI_EXP_LNKCTL2_TLS bits to gen1 in some hw
>> > > init function, e.g. mvebu_pcie_setup_hw()?
>> > >
>> > >     u32 val = mvebu_readl(port, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
>> > >     val &= ~PCI_EXP_LNKCTL2_TLS;
>> > >     val |= PCI_EXP_LNKCTL2_TLS_2_5GT;
>> > >     mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
>> > 
>> > I pasted this into the top of mvebu_pcie_setup_hw(), and that indeed
>> > seems to fix things so that all three PCIE devices work even with ASPM
>> > turned on! :)
>> 
>> Perfect! Now I'm sure that it is same issue as in aardvark driver.
>> 
>> I will prepare patches for both pci-aardvark.c and pci-mvebu.c to export
>> PCI_EXP_LNKCTL2 register via emulated bridge. And so aspm.c code would
>> be able to use Bjorn or my patch which I have sent last year.
>> 
>> Question reminds, if this is issue with QCA wifi chip on that Compex
>> card or it is issue with PCIe controllers, now on A38x and A3720 SoC.
>> Note that both A38x and A3720 platforms are from Marvell, but they have
>> different PCIe controllers (so it does not mean that both must have same
>> hw bugs).
>
> Seems that this is really issue in QCA98xx chips. I have send patch
> which adds quirk for these wifi chips:
>
> https://lore.kernel.org/linux-pci/20210326124326.21163-1-pali@kernel.org/

I tried applying that, and while it does fix the ath10k card, it seems
to break the ath9k card in the slot next to it. When booting with the
patch applied, I get this in dmesg:

[    3.556599] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver

Could there be some kind of data corruption in play here making the
driver think the chip revision is wrong, or something like that? If I
boot the same kernel without the patch applied, the ath9k initialisation
works fine, but obviously the ath10k is then still broken...

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-26 15:25                                       ` Toke Høiland-Jørgensen
@ 2021-03-26 15:34                                         ` Pali Rohár
  2021-03-26 16:54                                           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: Pali Rohár @ 2021-03-26 15:34 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni

On Friday 26 March 2021 16:25:27 Toke Høiland-Jørgensen wrote:
> Pali Rohár <pali@kernel.org> writes:
> 
> > On Friday 19 March 2021 00:16:29 Pali Rohár wrote:
> >> On Thursday 18 March 2021 23:43:58 Toke Høiland-Jørgensen wrote:
> >> > Pali Rohár <pali@kernel.org> writes:
> >> > 
> >> > > On Monday 15 March 2021 20:58:06 Pali Rohár wrote:
> >> > >> On Monday 02 November 2020 16:54:35 Toke Høiland-Jørgensen wrote:
> >> > >> > Pali Rohár <pali@kernel.org> writes:
> >> > >> > 
> >> > >> > > On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
> >> > >> > >> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:
> >> > >> > >> 
> >> > >> > >> > On 30/10/2020 15:23, Pali Rohár wrote:
> >> > >> > >> >> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
> >> > >> > >> >>> Pali Rohár <pali@kernel.org> writes:
> >> > >> > >> >>>> My experience with that WLE900VX card, aardvark driver and aspm code:
> >> > >> > >> >>>>
> >> > >> > >> >>>> Link training in GEN2 mode for this card succeed only once after reset.
> >> > >> > >> >>>> Repeated link retraining fails and it fails even when aardvark is
> >> > >> > >> >>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
> >> > >> > >> >>>> working link training.
> >> > >> > >> >>>>
> >> > >> > >> >>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
> >> > >> > >> >>>> success read "negotiated link speed" from "Link Control Status Register"
> >> > >> > >> >>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
> >> > >> > >> >>>> retrain link again (for WLE900VX now it would be at GEN1). After that
> >> > >> > >> >>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
> >> > >> > >> >>>>
> >> > >> > >> >>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
> >> > >> > >> >>>> training fails. And if I change mode to GEN1 after this failed link
> >> > >> > >> >>>> training then nothing happen, link training do not success.
> >> > >> > >> >>>>
> >> > >> > >> >>>> So just speculation now... In current setup initialization of card does
> >> > >> > >> >>>> one link training at GEN2. Then aspm.c is called which is doing second
> >> > >> > >> >>>> link retraining at GEN2. And if it fails then below patch issue third
> >> > >> > >> >>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
> >> > >> > >> >>>> then second link retraining must be at GEN1 (not GEN2) to workaround
> >> > >> > >> >>>> this issue.
> >> > >> > >> >>>>
> >> > >> > >> >>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
> >> > >> > >> >>>> retraining at GEN2 speed? And always force GEN1 speed prior link
> >> > >> > >> >>>> training?
> >> > >> > >> >>> Sounds like a plan. I poked around in aspm.c and must confess to being a
> >> > >> > >> >>> bit lost in the soup of registers ;)
> >> > >> > >> >>>
> >> > >> > >> >>> So if one of you can cook up a patch, that would be most helpful!
> >> > >> > >> >> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
> >> > >> > >> >> about cls (current link speed, that what is used by aardvark). It is
> >> > >> > >> >> untested, I just tried to compile it.
> >> > >> > >> >>
> >> > >> > >> >> Can try it?
> >> > >> > >> >>
> >> > >> > >> >> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >> > >> > >> >> index 253c30cc1967..f934c0b52f41 100644
> >> > >> > >> >> --- a/drivers/pci/pcie/aspm.c
> >> > >> > >> >> +++ b/drivers/pci/pcie/aspm.c
> >> > >> > >> >> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
> >> > >> > >> >>   	unsigned long end_jiffies;
> >> > >> > >> >>   	u16 reg16;
> >> > >> > >> >>   
> >> > >> > >> >> +	u32 lnkcap2;
> >> > >> > >> >> +	u16 lnksta, lnkctl2, cls, tls;
> >> > >> > >> >> +
> >> > >> > >> >> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
> >> > >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
> >> > >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> >> > >> > >> >> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
> >> > >> > >> >> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
> >> > >> > >> >> +
> >> > >> > >> >> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
> >> > >> > >> >> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
> >> > >> > >> >> +		lnksta, cls,
> >> > >> > >> >> +		lnkctl2, tls);
> >> > >> > >> >> +
> >> > >> > >> >> +	tls = 1;
> >> > >> > >> >> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
> >> > >> > >> >> +					PCI_EXP_LNKCTL2_TLS, tls);
> >> > >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
> >> > >> > >> >> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
> >> > >> > >> >> +		lnkctl2, tls);
> >> > >> > >> >> +
> >> > >> > >> >>   	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
> >> > >> > >> >>   	reg16 |= PCI_EXP_LNKCTL_RL;
> >> > >> > >> >>   	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> >> > >> > >> >> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
> >> > >> > >> >>   			break;
> >> > >> > >> >>   		msleep(1);
> >> > >> > >> >>   	} while (time_before(jiffies, end_jiffies));
> >> > >> > >> >> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
> >> > >> > >> >> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
> >> > >> > >> >>   	return !(reg16 & PCI_EXP_LNKSTA_LT);
> >> > >> > >> >>   }
> >> > >> > >> >>   
> >> > >> > >> >
> >> > >> > >> > Still exhibiting the BAR update error, run tested with next--20201030
> >> > >> > >> 
> >> > >> > >> Yup, same for me :(
> >> > >> 
> >> > >> I'm answering my own question. This code does not work on Omnia because
> >> > >> A38x pci-mvebu.c driver is using emulator for PCIe root bridge and it
> >> > >> does not implement PCI_EXP_LNKCTL2 and PCI_EXP_LNKCTL2 registers. So
> >> > >> code for forcing link speed has no effect on Omnia...
> >> > >
> >> > > Toke, on A38x PCIe controller it is possible to access PCI_EXP_LNKCTL2
> >> > > register. Just access is not exported via emulated root bridge.
> >> > >
> >> > > Documentation for this PCIe controller is public, so anybody can look at
> >> > > register description. See page 571, A.7 PCI Express 2.0 Port 0 Registers
> >> > >
> >> > > http://web.archive.org/web/20200420191927/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-embedded-processors-armada-38x-functional-specifications-2015-11.pdf
> >> > >
> >> > > In drivers/pci/controller/pci-mvebu.c you can set a new value for this
> >> > > register via function call:
> >> > >
> >> > >     mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
> >> > >
> >> > > So, could you try to set PCI_EXP_LNKCTL2_TLS bits to gen1 in some hw
> >> > > init function, e.g. mvebu_pcie_setup_hw()?
> >> > >
> >> > >     u32 val = mvebu_readl(port, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
> >> > >     val &= ~PCI_EXP_LNKCTL2_TLS;
> >> > >     val |= PCI_EXP_LNKCTL2_TLS_2_5GT;
> >> > >     mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
> >> > 
> >> > I pasted this into the top of mvebu_pcie_setup_hw(), and that indeed
> >> > seems to fix things so that all three PCIE devices work even with ASPM
> >> > turned on! :)
> >> 
> >> Perfect! Now I'm sure that it is same issue as in aardvark driver.
> >> 
> >> I will prepare patches for both pci-aardvark.c and pci-mvebu.c to export
> >> PCI_EXP_LNKCTL2 register via emulated bridge. And so aspm.c code would
> >> be able to use Bjorn or my patch which I have sent last year.
> >> 
> >> Question reminds, if this is issue with QCA wifi chip on that Compex
> >> card or it is issue with PCIe controllers, now on A38x and A3720 SoC.
> >> Note that both A38x and A3720 platforms are from Marvell, but they have
> >> different PCIe controllers (so it does not mean that both must have same
> >> hw bugs).
> >
> > Seems that this is really issue in QCA98xx chips. I have send patch
> > which adds quirk for these wifi chips:
> >
> > https://lore.kernel.org/linux-pci/20210326124326.21163-1-pali@kernel.org/
> 
> I tried applying that, and while it does fix the ath10k card, it seems
> to break the ath9k card in the slot next to it.

Ehm, what? Patch which I have sent today to mailing list calls quirk
code only for PCI device id used by QCA98xx cards. For all other cards
it is noop.

Can you send PCI device id of your ath9k card (lspci -nn)? Because all
my tested ath9k cards have different PCI device id.

> When booting with the
> patch applied, I get this in dmesg:
> 
> [    3.556599] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver

Can you send whole dmesg log? So I can see which new err/info lines are
printed.

> Could there be some kind of data corruption in play here making the
> driver think the chip revision is wrong, or something like that? If I
> boot the same kernel without the patch applied, the ath9k initialisation
> works fine, but obviously the ath10k is then still broken...

There is something really strange.

Can you add debug log into pcie_change_tls_to_gen1() function to check
for which card is this function called?

Are you testing this new patch with or without changes to
mvebu_pcie_setup_hw() function?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-26 15:34                                         ` Pali Rohár
@ 2021-03-26 16:54                                           ` Toke Høiland-Jørgensen
  2021-03-26 17:11                                             ` Pali Rohár
  0 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-03-26 16:54 UTC (permalink / raw)
  To: Pali Rohár
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni

Pali Rohár <pali@kernel.org> writes:

> On Friday 26 March 2021 16:25:27 Toke Høiland-Jørgensen wrote:
>> Pali Rohár <pali@kernel.org> writes:
>> 
>> > On Friday 19 March 2021 00:16:29 Pali Rohár wrote:
>> >> On Thursday 18 March 2021 23:43:58 Toke Høiland-Jørgensen wrote:
>> >> > Pali Rohár <pali@kernel.org> writes:
>> >> > 
>> >> > > On Monday 15 March 2021 20:58:06 Pali Rohár wrote:
>> >> > >> On Monday 02 November 2020 16:54:35 Toke Høiland-Jørgensen wrote:
>> >> > >> > Pali Rohár <pali@kernel.org> writes:
>> >> > >> > 
>> >> > >> > > On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
>> >> > >> > >> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@googlemail.com> writes:
>> >> > >> > >> 
>> >> > >> > >> > On 30/10/2020 15:23, Pali Rohár wrote:
>> >> > >> > >> >> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
>> >> > >> > >> >>> Pali Rohár <pali@kernel.org> writes:
>> >> > >> > >> >>>> My experience with that WLE900VX card, aardvark driver and aspm code:
>> >> > >> > >> >>>>
>> >> > >> > >> >>>> Link training in GEN2 mode for this card succeed only once after reset.
>> >> > >> > >> >>>> Repeated link retraining fails and it fails even when aardvark is
>> >> > >> > >> >>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
>> >> > >> > >> >>>> working link training.
>> >> > >> > >> >>>>
>> >> > >> > >> >>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
>> >> > >> > >> >>>> success read "negotiated link speed" from "Link Control Status Register"
>> >> > >> > >> >>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
>> >> > >> > >> >>>> retrain link again (for WLE900VX now it would be at GEN1). After that
>> >> > >> > >> >>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
>> >> > >> > >> >>>>
>> >> > >> > >> >>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
>> >> > >> > >> >>>> training fails. And if I change mode to GEN1 after this failed link
>> >> > >> > >> >>>> training then nothing happen, link training do not success.
>> >> > >> > >> >>>>
>> >> > >> > >> >>>> So just speculation now... In current setup initialization of card does
>> >> > >> > >> >>>> one link training at GEN2. Then aspm.c is called which is doing second
>> >> > >> > >> >>>> link retraining at GEN2. And if it fails then below patch issue third
>> >> > >> > >> >>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
>> >> > >> > >> >>>> then second link retraining must be at GEN1 (not GEN2) to workaround
>> >> > >> > >> >>>> this issue.
>> >> > >> > >> >>>>
>> >> > >> > >> >>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
>> >> > >> > >> >>>> retraining at GEN2 speed? And always force GEN1 speed prior link
>> >> > >> > >> >>>> training?
>> >> > >> > >> >>> Sounds like a plan. I poked around in aspm.c and must confess to being a
>> >> > >> > >> >>> bit lost in the soup of registers ;)
>> >> > >> > >> >>>
>> >> > >> > >> >>> So if one of you can cook up a patch, that would be most helpful!
>> >> > >> > >> >> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
>> >> > >> > >> >> about cls (current link speed, that what is used by aardvark). It is
>> >> > >> > >> >> untested, I just tried to compile it.
>> >> > >> > >> >>
>> >> > >> > >> >> Can try it?
>> >> > >> > >> >>
>> >> > >> > >> >> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> >> > >> > >> >> index 253c30cc1967..f934c0b52f41 100644
>> >> > >> > >> >> --- a/drivers/pci/pcie/aspm.c
>> >> > >> > >> >> +++ b/drivers/pci/pcie/aspm.c
>> >> > >> > >> >> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>> >> > >> > >> >>   	unsigned long end_jiffies;
>> >> > >> > >> >>   	u16 reg16;
>> >> > >> > >> >>   
>> >> > >> > >> >> +	u32 lnkcap2;
>> >> > >> > >> >> +	u16 lnksta, lnkctl2, cls, tls;
>> >> > >> > >> >> +
>> >> > >> > >> >> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
>> >> > >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
>> >> > >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>> >> > >> > >> >> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
>> >> > >> > >> >> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
>> >> > >> > >> >> +
>> >> > >> > >> >> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
>> >> > >> > >> >> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
>> >> > >> > >> >> +		lnksta, cls,
>> >> > >> > >> >> +		lnkctl2, tls);
>> >> > >> > >> >> +
>> >> > >> > >> >> +	tls = 1;
>> >> > >> > >> >> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
>> >> > >> > >> >> +					PCI_EXP_LNKCTL2_TLS, tls);
>> >> > >> > >> >> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>> >> > >> > >> >> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
>> >> > >> > >> >> +		lnkctl2, tls);
>> >> > >> > >> >> +
>> >> > >> > >> >>   	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
>> >> > >> > >> >>   	reg16 |= PCI_EXP_LNKCTL_RL;
>> >> > >> > >> >>   	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>> >> > >> > >> >> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>> >> > >> > >> >>   			break;
>> >> > >> > >> >>   		msleep(1);
>> >> > >> > >> >>   	} while (time_before(jiffies, end_jiffies));
>> >> > >> > >> >> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
>> >> > >> > >> >> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
>> >> > >> > >> >>   	return !(reg16 & PCI_EXP_LNKSTA_LT);
>> >> > >> > >> >>   }
>> >> > >> > >> >>   
>> >> > >> > >> >
>> >> > >> > >> > Still exhibiting the BAR update error, run tested with next--20201030
>> >> > >> > >> 
>> >> > >> > >> Yup, same for me :(
>> >> > >> 
>> >> > >> I'm answering my own question. This code does not work on Omnia because
>> >> > >> A38x pci-mvebu.c driver is using emulator for PCIe root bridge and it
>> >> > >> does not implement PCI_EXP_LNKCTL2 and PCI_EXP_LNKCTL2 registers. So
>> >> > >> code for forcing link speed has no effect on Omnia...
>> >> > >
>> >> > > Toke, on A38x PCIe controller it is possible to access PCI_EXP_LNKCTL2
>> >> > > register. Just access is not exported via emulated root bridge.
>> >> > >
>> >> > > Documentation for this PCIe controller is public, so anybody can look at
>> >> > > register description. See page 571, A.7 PCI Express 2.0 Port 0 Registers
>> >> > >
>> >> > > http://web.archive.org/web/20200420191927/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-embedded-processors-armada-38x-functional-specifications-2015-11.pdf
>> >> > >
>> >> > > In drivers/pci/controller/pci-mvebu.c you can set a new value for this
>> >> > > register via function call:
>> >> > >
>> >> > >     mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
>> >> > >
>> >> > > So, could you try to set PCI_EXP_LNKCTL2_TLS bits to gen1 in some hw
>> >> > > init function, e.g. mvebu_pcie_setup_hw()?
>> >> > >
>> >> > >     u32 val = mvebu_readl(port, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
>> >> > >     val &= ~PCI_EXP_LNKCTL2_TLS;
>> >> > >     val |= PCI_EXP_LNKCTL2_TLS_2_5GT;
>> >> > >     mvebu_writel(port, val, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
>> >> > 
>> >> > I pasted this into the top of mvebu_pcie_setup_hw(), and that indeed
>> >> > seems to fix things so that all three PCIE devices work even with ASPM
>> >> > turned on! :)
>> >> 
>> >> Perfect! Now I'm sure that it is same issue as in aardvark driver.
>> >> 
>> >> I will prepare patches for both pci-aardvark.c and pci-mvebu.c to export
>> >> PCI_EXP_LNKCTL2 register via emulated bridge. And so aspm.c code would
>> >> be able to use Bjorn or my patch which I have sent last year.
>> >> 
>> >> Question reminds, if this is issue with QCA wifi chip on that Compex
>> >> card or it is issue with PCIe controllers, now on A38x and A3720 SoC.
>> >> Note that both A38x and A3720 platforms are from Marvell, but they have
>> >> different PCIe controllers (so it does not mean that both must have same
>> >> hw bugs).
>> >
>> > Seems that this is really issue in QCA98xx chips. I have send patch
>> > which adds quirk for these wifi chips:
>> >
>> > https://lore.kernel.org/linux-pci/20210326124326.21163-1-pali@kernel.org/
>> 
>> I tried applying that, and while it does fix the ath10k card, it seems
>> to break the ath9k card in the slot next to it.
>
> Ehm, what?

I know, right?! :/

> Patch which I have sent today to mailing list calls quirk code only
> for PCI device id used by QCA98xx cards. For all other cards it is
> noop.

So upon further investigation this seems to be unrelated to the patch.
Meaning that I can't reliably get the ath9k device to work again by
reverting it. And the patch does seem to fix the ath10k device, so I
think that's probably good.

However, the issue with ath9k does seem to be related to ASPM; if I turn
that off in .config, I get the ath9k device back. So we have these
cases:

ASPM disabled:          ath9k, ath10k and mt76 cards all work
ASPM enabled, no patch: only mt76 card works
ASPM enabled + patch:   ath10k and mt76 cards work

So IDK, maybe the ath9k card needs a quirk as well? Or the mvebu board
is just generally flaky?

> Can you send PCI device id of your ath9k card (lspci -nn)? Because all
> my tested ath9k cards have different PCI device id.

[root@omnia-arch ~]# lspci -nn
00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
00:03.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]

>> When booting with the
>> patch applied, I get this in dmesg:
>> 
>> [    3.556599] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>
> Can you send whole dmesg log? So I can see which new err/info lines are
> printed.

Pasting all three cases below:

ASPM disabled in kernel:
[    2.976258] ahci-mvebu f10a8000.sata: supply ahci not found, using dummy regulator
[    2.983948] ahci-mvebu f10a8000.sata: supply phy not found, using dummy regulator
[    2.991502] ahci-mvebu f10a8000.sata: supply target not found, using dummy regulator
[    2.999337] ahci-mvebu f10a8000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode
[    3.008418] ahci-mvebu f10a8000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs 
[    3.017677] scsi host0: ahci-mvebu
[    3.021317] scsi host1: ahci-mvebu
[    3.024837] ata1: SATA max UDMA/133 mmio [mem 0xf10a8000-0xf10a9fff] port 0x100 irq 53
[    3.032784] ata2: SATA max UDMA/133 mmio [mem 0xf10a8000-0xf10a9fff] port 0x180 irq 53
[    3.041624] spi-nor spi0.0: s25fl164k (8192 Kbytes)
[    3.046534] 2 fixed-partitions partitions found on MTD device spi0.0
[    3.052918] Creating 2 MTD partitions on "spi0.0":
[    3.057723] 0x000000000000-0x000000100000 : "U-Boot"
[    3.071739] 0x000000100000-0x000000800000 : "Rescue system"
[    3.092049] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[    3.099901] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[    3.110165] libphy: Fixed MDIO Bus: probed
[    3.114489] tun: Universal TUN/TAP device driver, 1.6
[    3.119943] libphy: orion_mdio_bus: probed
[    3.125168] mv88e6085 f1072004.mdio-mii:10: switch 0x1760 detected: Marvell 88E6176, revision 1
[    3.337489] libphy: mv88e6xxx SMI: probed
[    3.348427] mvneta_bm f10c8000.bm: failed to allocate internal memory
[    3.354912] mvneta_bm: probe of f10c8000.bm failed with error -12
[    3.361844] mvneta f1070000.ethernet eth0: Using hardware mac address d8:58:d7:00:4e:98
[    3.370661] mvneta f1030000.ethernet eth1: Using hardware mac address d8:58:d7:00:4e:96
[    3.379452] mvneta f1034000.ethernet eth2: Using hardware mac address d8:58:d7:00:4e:97
[    3.382747] ata1: SATA link down (SStatus 0 SControl 300)
[    3.387737] pci 0000:00:01.0: enabling device (0140 -> 0142)
[    3.392932] ata2: SATA link down (SStatus 0 SControl 300)
[    3.485413] ath: EEPROM regdomain sanitized
[    3.485417] ath: EEPROM regdomain: 0x64
[    3.485421] ath: EEPROM indicates we should expect a direct regpair map
[    3.485427] ath: Country alpha2 being used: 00
[    3.485431] ath: Regpair used: 0x64
[    3.487037] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[    3.487723] ieee80211 phy0: Atheros AR9287 Rev:2 mem=0xf08c0000, irq=61
[    3.494787] pci 0000:00:02.0: enabling device (0140 -> 0142)
[    3.500670] ath10k_pci 0000:02:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[    3.611778] pci 0000:00:03.0: enabling device (0140 -> 0142)
[    3.617534] mt76x2e 0000:03:00.0: ASIC revision: 76120044
[    3.736545] ath10k_pci 0000:02:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000
[    3.745816] ath10k_pci 0000:02:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 0 testmode 0
[    3.754631] ath10k_pci 0000:02:00.0: firmware ver 10.2.4-1.0-00047 api 5 features no-p2p,raw-mode,mfp,allows-mesh-bcast crc32 35bd9258
[    3.799430] ath10k_pci 0000:02:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
[    4.272133] mt76x2e 0000:03:00.0: ROM patch build: 20141115060606a
[    4.279423] mt76x2e 0000:03:00.0: Firmware Version: 0.0.00
[    4.284936] mt76x2e 0000:03:00.0: Build: 1
[    4.289043] mt76x2e 0000:03:00.0: Build Time: 201507311614____
[    4.311382] mt76x2e 0000:03:00.0: Firmware running!
[    4.316666] ieee80211 phy2: Selected rate control algorithm 'minstrel_ht'
[    4.317581] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    4.324153] ehci-pci: EHCI PCI platform driver
[    4.328626] ehci-orion: EHCI orion driver
[    4.332765] orion-ehci f1058000.usb: EHCI Host Controller
[    4.338189] orion-ehci f1058000.usb: new USB bus registered, assigned bus number 1
[    4.345840] orion-ehci f1058000.usb: irq 49, io mem 0xf1058000
[    4.381383] orion-ehci f1058000.usb: USB 2.0 started, EHCI 1.00
[    4.387686] hub 1-0:1.0: USB hub found
[    4.391487] hub 1-0:1.0: 1 port detected
[    4.395906] xhci-hcd f10f0000.usb3: xHCI Host Controller
[    4.401243] xhci-hcd f10f0000.usb3: new USB bus registered, assigned bus number 2
[    4.408813] xhci-hcd f10f0000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x0000000000010010
[    4.418108] xhci-hcd f10f0000.usb3: irq 55, io mem 0xf10f0000
[    4.424246] hub 2-0:1.0: USB hub found
[    4.428022] hub 2-0:1.0: 1 port detected
[    4.432125] xhci-hcd f10f0000.usb3: xHCI Host Controller
[    4.437457] xhci-hcd f10f0000.usb3: new USB bus registered, assigned bus number 3
[    4.444981] xhci-hcd f10f0000.usb3: Host supports USB 3.0 SuperSpeed
[    4.451399] usb usb3: We don't know the algorithms for LPM for this host, disabling LPM.
[    4.459764] hub 3-0:1.0: USB hub found
[    4.463554] hub 3-0:1.0: 1 port detected
[    4.467745] xhci-hcd f10f8000.usb3: xHCI Host Controller
[    4.473091] xhci-hcd f10f8000.usb3: new USB bus registered, assigned bus number 4
[    4.480645] xhci-hcd f10f8000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x0000000000010010
[    4.489931] xhci-hcd f10f8000.usb3: irq 56, io mem 0xf10f8000
[    4.496068] hub 4-0:1.0: USB hub found
[    4.499841] hub 4-0:1.0: 1 port detected
[    4.504872] xhci-hcd f10f8000.usb3: xHCI Host Controller
[    4.510202] xhci-hcd f10f8000.usb3: new USB bus registered, assigned bus number 5
[    4.517734] xhci-hcd f10f8000.usb3: Host supports USB 3.0 SuperSpeed
[    4.524138] usb usb5: We don't know the algorithms for LPM for this host, disabling LPM.
[    4.532517] hub 5-0:1.0: USB hub found
[    4.536289] hub 5-0:1.0: 1 port detected
[    4.540478] usbcore: registered new interface driver usb-storage
[    4.547239] armada38x-rtc f10a3800.rtc: registered as rtc0
[    4.552835] armada38x-rtc f10a3800.rtc: setting system clock to 2021-03-26T16:20:15 UTC (1616775615)
[    4.562130] i2c /dev entries driver
[    4.565923] i2c i2c-0: Not using recovery: no recover_bus() found
[    4.573058] at24 1-0054: supply vcc not found, using dummy regulator
[    4.580309] at24 1-0054: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
[    4.587074] i2c i2c-0: Added multiplexed i2c bus 1
[    4.592013] i2c i2c-0: Added multiplexed i2c bus 2
[    4.596920] i2c i2c-0: Added multiplexed i2c bus 3
[    4.601835] i2c i2c-0: Added multiplexed i2c bus 4
[    4.606742] i2c i2c-0: Added multiplexed i2c bus 5
[    4.611719] i2c i2c-0: Added multiplexed i2c bus 6
[    4.616636] i2c i2c-0: Added multiplexed i2c bus 7
[    4.621758] pca953x 8-0071: supply vcc not found, using dummy regulator
[    4.628452] pca953x 8-0071: using no AI
[    4.632847] pca953x 8-0071: interrupt support not compiled in
[    4.639217] i2c i2c-0: Added multiplexed i2c bus 8
[    4.644095] pca954x 0-0070: registered 8 multiplexed busses for I2C mux pca9547
[    4.653257] orion_wdt: Initial timeout 171 sec
[    4.657949] sdhci: Secure Digital Host Controller Interface driver
[    4.664154] sdhci: Copyright(c) Pierre Ossman
[    4.668629] sdhci-pltfm: SDHCI platform and OF driver helper
[    4.674605] ledtrig-cpu: registered to indicate activity on CPUs
[    4.681575] marvell-cesa f1090000.crypto: CESA device successfully registered
[    4.688898] usbcore: registered new interface driver usbhid
[    4.694525] usbhid: USB HID core driver
[    4.698475] GACT probability on
[    4.701661] Mirror/redirect action on
[    4.705344] Simple TC action Loaded
[    4.708868] u32 classifier
[    4.709904] mmc0: SDHCI controller on f10d8000.sdhci [f10d8000.sdhci] using ADMA
[    4.711587]     Performance counters on
[    4.711589]     input device check on
[    4.726537]     Actions configured
[    4.730425] NET: Registered protocol family 10
[    4.735700] Segment Routing with IPv6
[    4.739449] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    4.745712] NET: Registered protocol family 17
[    4.750262] 8021q: 802.1Q VLAN Support v1.8
[    4.754568] ThumbEE CPU extension supported.
[    4.758868] Registering SWP/SWPB emulation handler
[    4.763814] Loading compiled-in X.509 certificates
[    4.769890] Btrfs loaded, crc32c=crc32c-generic, zoned=no
[    4.776962] mv88e6085 f1072004.mdio-mii:10: switch 0x1760 detected: Marvell 88E6176, revision 1
[    4.857571] mmc0: new high speed MMC card at address 0001
[    4.863325] mmcblk0: mmc0:0001 H8G4a\x92 7.28 GiB 
[    4.867990] mmcblk0boot0: mmc0:0001 H8G4a\x92 partition 1 4.00 MiB
[    4.884409] mmcblk0boot1: mmc0:0001 H8G4a\x92 partition 2 4.00 MiB
[    4.896614] mmcblk0rpmb: mmc0:0001 H8G4a\x92 partition 3 4.00 MiB, chardev (250:0)
[    4.905592]  mmcblk0: p1
[    4.962991] libphy: mv88e6xxx SMI: probed
[    4.967796] ath10k_pci 0000:02:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp max-sta 128 raw 0 hwcrypto 1
[    5.082952] ath: EEPROM regdomain sanitized
[    5.082960] ath: EEPROM regdomain: 0x64
[    5.082964] ath: EEPROM indicates we should expect a direct regpair map
[    5.082970] ath: Country alpha2 being used: 00
[    5.082974] ath: Regpair used: 0x64
[    5.616015] mv88e6085 f1072004.mdio-mii:10 lan0 (uninitialized): PHY [mv88e6xxx-1:00] driver [Marvell 88E1540] (irq=75)
[    5.651333] mv88e6085 f1072004.mdio-mii:10 lan1 (uninitialized): PHY [mv88e6xxx-1:01] driver [Marvell 88E1540] (irq=76)
[    5.679855] mv88e6085 f1072004.mdio-mii:10 lan2 (uninitialized): PHY [mv88e6xxx-1:02] driver [Marvell 88E1540] (irq=77)
[    5.715061] mv88e6085 f1072004.mdio-mii:10 lan3 (uninitialized): PHY [mv88e6xxx-1:03] driver [Marvell 88E1540] (irq=78)
[    5.745795] mv88e6085 f1072004.mdio-mii:10 lan4 (uninitialized): PHY [mv88e6xxx-1:04] driver [Marvell 88E1540] (irq=79)
[    5.762566] mv88e6085 f1072004.mdio-mii:10: configuring for fixed/rgmii-id link mode
[    5.772960] mv88e6085 f1072004.mdio-mii:10: Link is Up - 1Gbps/Full - flow control off
[    5.780968] DSA: tree 0 setup
[    5.784683] Waiting 2 sec before mounting root device...
[    5.790133] ath: EEPROM regdomain: 0x80d0
[    5.790138] ath: EEPROM indicates we should expect a country code
[    5.790141] ath: doing EEPROM country->regdmn map search
[    5.790143] ath: country maps to regdmn code: 0x37
[    5.790147] ath: Country alpha2 being used: DK
[    5.790150] ath: Regpair used: 0x37
[    5.790156] ath: regdomain 0x80d0 dynamically updated by user
[    5.790193] ath: EEPROM regdomain: 0x80d0
[    5.790196] ath: EEPROM indicates we should expect a country code
[    5.790199] ath: doing EEPROM country->regdmn map search
[    5.790201] ath: country maps to regdmn code: 0x37
[    5.790204] ath: Country alpha2 being used: DK
[    5.790207] ath: Regpair used: 0x37
[    5.790211] ath: regdomain 0x80d0 dynamically updated by user
[    7.837897] BTRFS: device fsid 448334b8-1b27-4738-8118-9e70b56b1e58 devid 1 transid 13774 /dev/root scanned by swapper/0 (1)
[    7.849813] BTRFS info (device mmcblk0p1): disk space caching is enabled
[    7.856549] BTRFS info (device mmcblk0p1): has skinny extents
[    7.868764] BTRFS info (device mmcblk0p1): enabling ssd optimizations
[    7.877839] VFS: Mounted root (btrfs filesystem) on device 0:13.
[    7.884300] devtmpfs: mounted
[    7.887886] Freeing unused kernel memory: 1024K
[    7.931610] Run /sbin/init as init process
[    7.935718]   with arguments:
[    7.935722]     /sbin/init
[    7.935726]     earlyprintk
[    7.935729]   with environment:
[    7.935731]     HOME=/
[    7.935734]     TERM=linux
[    8.001203] random: fast init done
[    8.361921] systemd[1]: systemd 247.3-1-arch running in system mode. (+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
[    8.384851] systemd[1]: Detected architecture arm.
[    8.512663] systemd[1]: Set hostname to <omnia-arch>.
[    8.701050] systemd-gpt-auto-generator[173]: File system behind root file system is reported by btrfs to be backed by pseudo-device /dev/root, which is not a valid userspace accessible device node. Cannot determine correct backing block device.
[    8.724725] systemd[167]: /usr/lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 1.
[    8.940665] systemd[1]: Queued start job for default target Graphical Interface.
[    8.948762] random: systemd: uninitialized urandom read (16 bytes read)
[    8.976192] systemd[1]: Created slice system-getty.slice.
[    9.011489] random: systemd: uninitialized urandom read (16 bytes read)
[    9.019036] systemd[1]: Created slice system-modprobe.slice.
[    9.051479] random: systemd: uninitialized urandom read (16 bytes read)
[    9.058989] systemd[1]: Created slice system-serial\x2dgetty.slice.
[    9.102304] systemd[1]: Created slice User and Session Slice.
[    9.141626] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[    9.181591] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[    9.221501] systemd[1]: Condition check resulted in Arbitrary Executable File Formats File System Automount Point being skipped.
[    9.233245] systemd[1]: Reached target Local Encrypted Volumes.
[    9.281608] systemd[1]: Reached target Paths.
[    9.311499] systemd[1]: Reached target Remote File Systems.
[    9.351458] systemd[1]: Reached target Slices.
[    9.381494] systemd[1]: Reached target Swap.
[    9.411697] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[    9.463054] systemd[1]: Listening on Process Core Dump Socket.
[    9.505727] systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
[    9.515082] systemd[1]: Listening on Journal Socket (/dev/log).
[    9.561786] systemd[1]: Listening on Journal Socket.
[    9.608194] systemd[1]: Listening on Network Service Netlink Socket.
[    9.653001] systemd[1]: Listening on udev Control Socket.
[    9.701707] systemd[1]: Listening on udev Kernel Socket.
[    9.751738] systemd[1]: Condition check resulted in Huge Pages File System being skipped.
[    9.760158] systemd[1]: Condition check resulted in POSIX Message Queue File System being skipped.
[    9.771842] systemd[1]: Mounting Kernel Debug File System...
[    9.824052] systemd[1]: Mounting Kernel Trace File System...
[    9.864055] systemd[1]: Mounting Temporary Directory (/tmp)...
[    9.901704] systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
[    9.915840] systemd[1]: Starting Load Kernel Module configfs...
[    9.954174] systemd[1]: Starting Load Kernel Module drm...
[    9.994448] systemd[1]: Starting Load Kernel Module fuse...
[   10.038218] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
[   10.048741] systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
[   10.059545] systemd[1]: Starting Remount Root and Kernel File Systems...
[   10.101616] systemd[1]: Condition check resulted in Repartition Root Disk being skipped.
[   10.112555] systemd[1]: Starting Apply Kernel Variables...
[   10.154279] systemd[1]: Starting Coldplug All udev Devices...
[   10.196174] systemd[1]: Mounted Kernel Debug File System.
[   10.232075] systemd[1]: Mounted Kernel Trace File System.
[   10.271750] systemd[1]: Mounted Temporary Directory (/tmp).
[   10.311990] systemd[1]: modprobe@configfs.service: Succeeded.
[   10.318833] systemd[1]: Finished Load Kernel Module configfs.
[   10.356178] systemd[1]: modprobe@drm.service: Succeeded.
[   10.362801] systemd[1]: Finished Load Kernel Module drm.
[   10.402063] systemd[1]: modprobe@fuse.service: Succeeded.
[   10.408508] systemd[1]: Finished Load Kernel Module fuse.
[   10.442754] systemd[1]: Finished Remount Root and Kernel File Systems.
[   10.482774] systemd[1]: Finished Apply Kernel Variables.
[   10.524656] systemd[1]: Condition check resulted in FUSE Control File System being skipped.
[   10.533471] systemd[1]: Condition check resulted in Kernel Configuration File System being skipped.
[   10.542901] systemd[1]: Condition check resulted in First Boot Wizard being skipped.
[   10.558481] systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
[   10.569814] systemd[1]: Starting Load/Save Random Seed...
[   10.591724] systemd[1]: Condition check resulted in Create System Users being skipped.
[   10.604115] systemd[1]: Starting Create Static Device Nodes in /dev...
[   10.713433] systemd[1]: Finished Create Static Device Nodes in /dev.
[   10.731783] systemd[1]: Reached target Local File Systems (Pre).
[   10.751621] systemd[1]: Condition check resulted in Virtual Machine and Container Storage (Compatibility) being skipped.
[   10.762698] systemd[1]: Reached target Local File Systems.
[   10.804744] systemd[1]: Started Entropy Daemon based on the HAVEGE algorithm.
[   10.851807] systemd[1]: Condition check resulted in Rebuild Dynamic Linker Cache being skipped.
[   10.864572] systemd[1]: Starting Journal Service...
[   10.885604] systemd[1]: Starting Rule-based Manager for Device Events and Files...
[   10.933455] systemd[1]: Finished Coldplug All udev Devices.
[   11.003259] systemd[1]: Started Journal Service.
[   11.107515] systemd-journald[193]: Received client request to flush runtime journal.
[   12.370305] mvneta f1034000.ethernet eth2: PHY [f1072004.mdio-mii:01] driver [Marvell 88E1510] (irq=POLL)
[   12.402376] mvneta f1034000.ethernet eth2: configuring for phy/sgmii link mode
[   12.717844] mvneta f1070000.ethernet eth0: configuring for fixed/rgmii link mode
[   12.728688] mvneta f1070000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
[   12.923038] ath9k 0000:01:00.0 wlp1s0: renamed from wlan0
[   13.032064] random: crng init done
[   13.035500] random: 7 urandom warning(s) missed due to ratelimiting
[   13.047961] mt76x2e 0000:03:00.0 wlp3s0: renamed from wlan1
[   13.210519] ath10k_pci 0000:02:00.0 wlp2s0: renamed from wlan2
[   13.259848] BTRFS info (device mmcblk0p1): devid 1 device path /dev/root changed to /dev/mmcblk0p1 scanned by systemd-udevd (200)
[   15.521757] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[   15.626452] ath10k_pci 0000:02:00.0: pdev param 0 not supported by firmware
[   15.648452] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready

ASPM enabled, no patch:
[    1.592272] pci 0000:00:01.0: PCI bridge to [bus 01]
[    1.592280] pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xe00fffff]
[    1.592290] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0200000-0xe03fffff 64bit]
[    1.592298] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
[    1.592305] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[    1.592313] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0400000-0xe040ffff pref]
[    1.592320] pci 0000:00:02.0: PCI bridge to [bus 02]
[    1.592326] pci 0000:00:02.0:   bridge window [mem 0xe0200000-0xe04fffff]
[    1.592336] pci 0000:03:00.0: BAR 0: assigned [mem 0xe0600000-0xe06fffff 64bit]
[    1.592349] pci 0000:03:00.0: BAR 6: assigned [mem 0xe0700000-0xe070ffff pref]
[    1.592357] pci 0000:00:03.0: PCI bridge to [bus 03]
[    1.592363] pci 0000:00:03.0:   bridge window [mem 0xe0600000-0xe07fffff]
[    1.592639] mv_xor f1060800.xor: Marvell shared XOR driver
[    1.651773] mv_xor f1060800.xor: Marvell XOR (Descriptor Mode): ( xor cpy intr )
[    1.651912] mv_xor f1060900.xor: Marvell shared XOR driver
[    1.711771] mv_xor f1060900.xor: Marvell XOR (Descriptor Mode): ( xor cpy intr )
[    1.730234] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    1.731099] printk: console [ttyS0] disabled
[    1.751190] f1012000.serial: ttyS0 at MMIO 0xf1012000 (irq = 30, base_baud = 15625000) is a 16550A
[    3.098634] printk: console [ttyS0] enabled
[    3.123524] f1012100.serial: ttyS1 at MMIO 0xf1012100 (irq = 31, base_baud = 15625000) is a 16550A
[    3.133234] ahci-mvebu f10a8000.sata: supply ahci not found, using dummy regulator
[    3.140900] ahci-mvebu f10a8000.sata: supply phy not found, using dummy regulator
[    3.148455] ahci-mvebu f10a8000.sata: supply target not found, using dummy regulator
[    3.156311] ahci-mvebu f10a8000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode
[    3.165396] ahci-mvebu f10a8000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs 
[    3.174645] scsi host0: ahci-mvebu
[    3.178287] scsi host1: ahci-mvebu
[    3.181806] ata1: SATA max UDMA/133 mmio [mem 0xf10a8000-0xf10a9fff] port 0x100 irq 53
[    3.189747] ata2: SATA max UDMA/133 mmio [mem 0xf10a8000-0xf10a9fff] port 0x180 irq 53
[    3.198555] spi-nor spi0.0: s25fl164k (8192 Kbytes)
[    3.203487] 2 fixed-partitions partitions found on MTD device spi0.0
[    3.209858] Creating 2 MTD partitions on "spi0.0":
[    3.214668] 0x000000000000-0x000000100000 : "U-Boot"
[    3.231750] 0x000000100000-0x000000800000 : "Rescue system"
[    3.238228] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[    3.246104] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[    3.256368] libphy: Fixed MDIO Bus: probed
[    3.260622] tun: Universal TUN/TAP device driver, 1.6
[    3.266077] libphy: orion_mdio_bus: probed
[    3.271350] mv88e6085 f1072004.mdio-mii:10: switch 0x1760 detected: Marvell 88E6176, revision 1
[    3.496234] libphy: mv88e6xxx SMI: probed
[    3.507137] mvneta_bm f10c8000.bm: failed to allocate internal memory
[    3.513632] mvneta_bm: probe of f10c8000.bm failed with error -12
[    3.520579] mvneta f1070000.ethernet eth0: Using hardware mac address d8:58:d7:00:4e:98
[    3.529438] mvneta f1030000.ethernet eth1: Using hardware mac address d8:58:d7:00:4e:96
[    3.532721] ata2: SATA link down (SStatus 0 SControl 300)
[    3.543677] mvneta f1034000.ethernet eth2: Using hardware mac address d8:58:d7:00:4e:97
[    3.551400] ata1: SATA link down (SStatus 0 SControl 300)
[    3.551984] pci 0000:00:01.0: enabling device (0140 -> 0142)
[    3.562825] ath9k 0000:01:00.0: enabling device (0000 -> 0002)
[    3.568745] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
[    3.575912] ath: phy0: Unable to initialize hardware; initialization status: -95
[    3.583348] ath9k 0000:01:00.0: Failed to initialize device
[    3.588955] ath9k: probe of 0000:01:00.0 failed with error -95
[    3.594889] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
[    3.601924] pci 0000:00:02.0: enabling device (0140 -> 0142)
[    3.607610] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[    3.647457] ath10k_pci 0000:02:00.0: failed to wake up device : -110
[    3.653973] ath10k_pci: probe of 0000:02:00.0 failed with error -110
[    3.660490] pci 0000:00:03.0: enabling device (0140 -> 0142)
[    3.666248] mt76x2e 0000:03:00.0: ASIC revision: 76120044
[    4.322137] mt76x2e 0000:03:00.0: ROM patch build: 20141115060606a
[    4.329426] mt76x2e 0000:03:00.0: Firmware Version: 0.0.00
[    4.334938] mt76x2e 0000:03:00.0: Build: 1
[    4.339044] mt76x2e 0000:03:00.0: Build Time: 201507311614____
[    4.361396] mt76x2e 0000:03:00.0: Firmware running!
[    4.366676] ieee80211 phy2: Selected rate control algorithm 'minstrel_ht'
[    4.367557] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    4.374129] ehci-pci: EHCI PCI platform driver
[    4.378601] ehci-orion: EHCI orion driver
[    4.382735] orion-ehci f1058000.usb: EHCI Host Controller
[    4.388159] orion-ehci f1058000.usb: new USB bus registered, assigned bus number 1
[    4.395807] orion-ehci f1058000.usb: irq 49, io mem 0xf1058000
[    4.431395] orion-ehci f1058000.usb: USB 2.0 started, EHCI 1.00
[    4.437694] hub 1-0:1.0: USB hub found
[    4.441482] hub 1-0:1.0: 1 port detected
[    4.445898] xhci-hcd f10f0000.usb3: xHCI Host Controller
[    4.451233] xhci-hcd f10f0000.usb3: new USB bus registered, assigned bus number 2
[    4.458801] xhci-hcd f10f0000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x0000000000010010
[    4.468077] xhci-hcd f10f0000.usb3: irq 55, io mem 0xf10f0000
[    4.474214] hub 2-0:1.0: USB hub found
[    4.477988] hub 2-0:1.0: 1 port detected
[    4.482079] xhci-hcd f10f0000.usb3: xHCI Host Controller
[    4.487408] xhci-hcd f10f0000.usb3: new USB bus registered, assigned bus number 3
[    4.494934] xhci-hcd f10f0000.usb3: Host supports USB 3.0 SuperSpeed
[    4.501331] usb usb3: We don't know the algorithms for LPM for this host, disabling LPM.
[    4.509702] hub 3-0:1.0: USB hub found
[    4.513483] hub 3-0:1.0: 1 port detected
[    4.517673] xhci-hcd f10f8000.usb3: xHCI Host Controller
[    4.523018] xhci-hcd f10f8000.usb3: new USB bus registered, assigned bus number 4
[    4.530572] xhci-hcd f10f8000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x0000000000010010
[    4.539846] xhci-hcd f10f8000.usb3: irq 56, io mem 0xf10f8000
[    4.545966] hub 4-0:1.0: USB hub found
[    4.549738] hub 4-0:1.0: 1 port detected
[    4.553885] xhci-hcd f10f8000.usb3: xHCI Host Controller
[    4.559216] xhci-hcd f10f8000.usb3: new USB bus registered, assigned bus number 5
[    4.566739] xhci-hcd f10f8000.usb3: Host supports USB 3.0 SuperSpeed
[    4.573144] usb usb5: We don't know the algorithms for LPM for this host, disabling LPM.
[    4.581515] hub 5-0:1.0: USB hub found
[    4.585287] hub 5-0:1.0: 1 port detected
[    4.589465] usbcore: registered new interface driver usb-storage
[    4.596214] armada38x-rtc f10a3800.rtc: registered as rtc0
[    4.601799] armada38x-rtc f10a3800.rtc: setting system clock to 2021-03-26T16:11:35 UTC (1616775095)
[    4.611086] i2c /dev entries driver
[    4.614887] i2c i2c-0: Not using recovery: no recover_bus() found
[    4.622023] at24 1-0054: supply vcc not found, using dummy regulator
[    4.629281] at24 1-0054: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
[    4.636062] i2c i2c-0: Added multiplexed i2c bus 1
[    4.640975] i2c i2c-0: Added multiplexed i2c bus 2
[    4.645896] i2c i2c-0: Added multiplexed i2c bus 3
[    4.650800] i2c i2c-0: Added multiplexed i2c bus 4
[    4.655728] i2c i2c-0: Added multiplexed i2c bus 5
[    4.660632] i2c i2c-0: Added multiplexed i2c bus 6
[    4.665602] i2c i2c-0: Added multiplexed i2c bus 7
[    4.670712] pca953x 8-0071: supply vcc not found, using dummy regulator
[    4.677408] pca953x 8-0071: using no AI
[    4.681786] pca953x 8-0071: interrupt support not compiled in
[    4.688149] i2c i2c-0: Added multiplexed i2c bus 8
[    4.693024] pca954x 0-0070: registered 8 multiplexed busses for I2C mux pca9547
[    4.701771] orion_wdt: Initial timeout 171 sec
[    4.706487] sdhci: Secure Digital Host Controller Interface driver
[    4.712694] sdhci: Copyright(c) Pierre Ossman
[    4.717166] sdhci-pltfm: SDHCI platform and OF driver helper
[    4.723128] ledtrig-cpu: registered to indicate activity on CPUs
[    4.730073] marvell-cesa f1090000.crypto: CESA device successfully registered
[    4.737410] usbcore: registered new interface driver usbhid
[    4.743005] usbhid: USB HID core driver
[    4.746954] GACT probability on
[    4.748973] mmc0: SDHCI controller on f10d8000.sdhci [f10d8000.sdhci] using ADMA
[    4.750110] Mirror/redirect action on
[    4.761224] Simple TC action Loaded
[    4.764778] u32 classifier
[    4.767497]     Performance counters on
[    4.771352]     input device check on
[    4.775050]     Actions configured
[    4.778936] NET: Registered protocol family 10
[    4.784230] Segment Routing with IPv6
[    4.787967] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    4.794228] NET: Registered protocol family 17
[    4.798762] 8021q: 802.1Q VLAN Support v1.8
[    4.803057] ThumbEE CPU extension supported.
[    4.807340] Registering SWP/SWPB emulation handler
[    4.812276] Loading compiled-in X.509 certificates
[    4.818281] Btrfs loaded, crc32c=crc32c-generic, zoned=no
[    4.825371] mv88e6085 f1072004.mdio-mii:10: switch 0x1760 detected: Marvell 88E6176, revision 1
[    4.987606] mmc0: new high speed MMC card at address 0001
[    4.992837] libphy: mv88e6xxx SMI: probed
[    4.997259] mmcblk0: mmc0:0001 H8G4a\x92 7.28 GiB 
[    5.002056] mmcblk0boot0: mmc0:0001 H8G4a\x92 partition 1 4.00 MiB
[    5.008124] mmcblk0boot1: mmc0:0001 H8G4a\x92 partition 2 4.00 MiB
[    5.014160] mmcblk0rpmb: mmc0:0001 H8G4a\x92 partition 3 4.00 MiB, chardev (250:0)
[    5.022894]  mmcblk0: p1
[    5.641653] mv88e6085 f1072004.mdio-mii:10 lan0 (uninitialized): PHY [mv88e6xxx-1:00] driver [Marvell 88E1540] (irq=73)
[    5.672563] mv88e6085 f1072004.mdio-mii:10 lan1 (uninitialized): PHY [mv88e6xxx-1:01] driver [Marvell 88E1540] (irq=74)
[    5.705082] mv88e6085 f1072004.mdio-mii:10 lan2 (uninitialized): PHY [mv88e6xxx-1:02] driver [Marvell 88E1540] (irq=75)
[    5.731373] mv88e6085 f1072004.mdio-mii:10 lan3 (uninitialized): PHY [mv88e6xxx-1:03] driver [Marvell 88E1540] (irq=76)
[    5.766642] mv88e6085 f1072004.mdio-mii:10 lan4 (uninitialized): PHY [mv88e6xxx-1:04] driver [Marvell 88E1540] (irq=77)
[    5.783423] mv88e6085 f1072004.mdio-mii:10: configuring for fixed/rgmii-id link mode
[    5.793831] mv88e6085 f1072004.mdio-mii:10: Link is Up - 1Gbps/Full - flow control off
[    5.801848] DSA: tree 0 setup
[    5.805559] Waiting 2 sec before mounting root device...
[    7.837895] BTRFS: device fsid 448334b8-1b27-4738-8118-9e70b56b1e58 devid 1 transid 13732 /dev/root scanned by swapper/0 (1)
[    7.849816] BTRFS info (device mmcblk0p1): disk space caching is enabled
[    7.856552] BTRFS info (device mmcblk0p1): has skinny extents
[    7.868426] BTRFS info (device mmcblk0p1): enabling ssd optimizations
[    7.877500] VFS: Mounted root (btrfs filesystem) on device 0:13.
[    7.883966] devtmpfs: mounted
[    7.887547] Freeing unused kernel memory: 1024K
[    7.931625] Run /sbin/init as init process
[    7.935733]   with arguments:
[    7.935737]     /sbin/init
[    7.935740]     earlyprintk
[    7.935743]   with environment:
[    7.935746]     HOME=/
[    7.935749]     TERM=linux
[    8.048502] random: fast init done
[    8.365030] systemd[1]: systemd 247.3-1-arch running in system mode. (+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
[    8.388024] systemd[1]: Detected architecture arm.
[    8.462658] systemd[1]: Set hostname to <omnia-arch>.
[    8.627786] systemd-gpt-auto-generator[172]: File system behind root file system is reported by btrfs to be backed by pseudo-device /dev/root, which is not a valid userspace accessible device node. Cannot determine correct backing block device.
[    8.655604] systemd[166]: /usr/lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 1.
[    8.881598] systemd[1]: Queued start job for default target Graphical Interface.
[    8.889617] random: systemd: uninitialized urandom read (16 bytes read)
[    8.916194] systemd[1]: Created slice system-getty.slice.
[    8.951618] random: systemd: uninitialized urandom read (16 bytes read)
[    8.959166] systemd[1]: Created slice system-modprobe.slice.
[    8.991497] random: systemd: uninitialized urandom read (16 bytes read)
[    8.998988] systemd[1]: Created slice system-serial\x2dgetty.slice.
[    9.032326] systemd[1]: Created slice User and Session Slice.
[    9.071643] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[    9.111679] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[    9.151521] systemd[1]: Condition check resulted in Arbitrary Executable File Formats File System Automount Point being skipped.
[    9.163271] systemd[1]: Reached target Local Encrypted Volumes.
[    9.201592] systemd[1]: Reached target Paths.
[    9.231508] systemd[1]: Reached target Remote File Systems.
[    9.271473] systemd[1]: Reached target Slices.
[    9.301510] systemd[1]: Reached target Swap.
[    9.331712] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[    9.382980] systemd[1]: Listening on Process Core Dump Socket.
[    9.425820] systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
[    9.435158] systemd[1]: Listening on Journal Socket (/dev/log).
[    9.481817] systemd[1]: Listening on Journal Socket.
[    9.518192] systemd[1]: Listening on Network Service Netlink Socket.
[    9.563010] systemd[1]: Listening on udev Control Socket.
[    9.611715] systemd[1]: Listening on udev Kernel Socket.
[    9.661746] systemd[1]: Condition check resulted in Huge Pages File System being skipped.
[    9.670165] systemd[1]: Condition check resulted in POSIX Message Queue File System being skipped.
[    9.681829] systemd[1]: Mounting Kernel Debug File System...
[    9.724106] systemd[1]: Mounting Kernel Trace File System...
[    9.764065] systemd[1]: Mounting Temporary Directory (/tmp)...
[    9.801730] systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
[    9.815900] systemd[1]: Starting Load Kernel Module configfs...
[    9.854292] systemd[1]: Starting Load Kernel Module drm...
[    9.894496] systemd[1]: Starting Load Kernel Module fuse...
[    9.938207] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
[    9.948767] systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
[    9.959564] systemd[1]: Starting Remount Root and Kernel File Systems...
[   10.001625] systemd[1]: Condition check resulted in Repartition Root Disk being skipped.
[   10.012531] systemd[1]: Starting Apply Kernel Variables...
[   10.054299] systemd[1]: Starting Coldplug All udev Devices...
[   10.106173] systemd[1]: Mounted Kernel Debug File System.
[   10.151975] systemd[1]: Mounted Kernel Trace File System.
[   10.201691] systemd[1]: Mounted Temporary Directory (/tmp).
[   10.242004] systemd[1]: modprobe@configfs.service: Succeeded.
[   10.248810] systemd[1]: Finished Load Kernel Module configfs.
[   10.286193] systemd[1]: modprobe@drm.service: Succeeded.
[   10.292770] systemd[1]: Finished Load Kernel Module drm.
[   10.332207] systemd[1]: modprobe@fuse.service: Succeeded.
[   10.338606] systemd[1]: Finished Load Kernel Module fuse.
[   10.372731] systemd[1]: Finished Remount Root and Kernel File Systems.
[   10.412748] systemd[1]: Finished Apply Kernel Variables.
[   10.464629] systemd[1]: Condition check resulted in FUSE Control File System being skipped.
[   10.473431] systemd[1]: Condition check resulted in Kernel Configuration File System being skipped.
[   10.482779] systemd[1]: Condition check resulted in First Boot Wizard being skipped.
[   10.498346] systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
[   10.509666] systemd[1]: Starting Load/Save Random Seed...
[   10.531781] systemd[1]: Condition check resulted in Create System Users being skipped.
[   10.543704] systemd[1]: Starting Create Static Device Nodes in /dev...
[   10.722733] systemd[1]: Finished Create Static Device Nodes in /dev.
[   10.773072] systemd[1]: Finished Coldplug All udev Devices.
[   10.811698] systemd[1]: Reached target Local File Systems (Pre).
[   10.851561] systemd[1]: Condition check resulted in Virtual Machine and Container Storage (Compatibility) being skipped.
[   10.862560] systemd[1]: Reached target Local File Systems.
[   10.904738] systemd[1]: Started Entropy Daemon based on the HAVEGE algorithm.
[   10.941790] systemd[1]: Condition check resulted in Rebuild Dynamic Linker Cache being skipped.
[   10.954403] systemd[1]: Starting Journal Service...
[   11.002108] systemd[1]: Starting Rule-based Manager for Device Events and Files...
[   11.104142] systemd[1]: Started Journal Service.
[   11.205698] systemd-journald[193]: Received client request to flush runtime journal.
[   12.715234] mvneta f1034000.ethernet eth2: PHY [f1072004.mdio-mii:01] driver [Marvell 88E1510] (irq=POLL)
[   12.742129] mvneta f1034000.ethernet eth2: configuring for phy/sgmii link mode
[   12.867939] mvneta f1070000.ethernet eth0: configuring for fixed/rgmii link mode
[   12.888463] mvneta f1070000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
[   13.003326] mt76x2e 0000:03:00.0 wlp3s0: renamed from wlan0
[   13.110923] random: crng init done
[   13.141526] random: 7 urandom warning(s) missed due to ratelimiting
[   13.320567] BTRFS info (device mmcblk0p1): devid 1 device path /dev/root changed to /dev/mmcblk0p1 scanned by systemd-udevd (199)
[   15.911774] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[   15.919818] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready

ASPM enabled, with patch:
[    1.631901] mv_xor f1060900.xor: Marvell shared XOR driver
[    1.691759] mv_xor f1060900.xor: Marvell XOR (Descriptor Mode): ( xor cpy intr )
[    1.710225] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    1.711090] printk: console [ttyS0] disabled
[    1.731185] f1012000.serial: ttyS0 at MMIO 0xf1012000 (irq = 30, base_baud = 15625000) is a 16550A
[    3.086738] printk: console [ttyS0] enabled
[    3.111636] f1012100.serial: ttyS1 at MMIO 0xf1012100 (irq = 31, base_baud = 15625000) is a 16550A
[    3.121337] ahci-mvebu f10a8000.sata: supply ahci not found, using dummy regulator
[    3.129018] ahci-mvebu f10a8000.sata: supply phy not found, using dummy regulator
[    3.136573] ahci-mvebu f10a8000.sata: supply target not found, using dummy regulator
[    3.144419] ahci-mvebu f10a8000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode
[    3.153514] ahci-mvebu f10a8000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs 
[    3.162766] scsi host0: ahci-mvebu
[    3.166400] scsi host1: ahci-mvebu
[    3.169909] ata1: SATA max UDMA/133 mmio [mem 0xf10a8000-0xf10a9fff] port 0x100 irq 53
[    3.177861] ata2: SATA max UDMA/133 mmio [mem 0xf10a8000-0xf10a9fff] port 0x180 irq 53
[    3.186676] spi-nor spi0.0: s25fl164k (8192 Kbytes)
[    3.191598] 2 fixed-partitions partitions found on MTD device spi0.0
[    3.197969] Creating 2 MTD partitions on "spi0.0":
[    3.202779] 0x000000000000-0x000000100000 : "U-Boot"
[    3.221737] 0x000000100000-0x000000800000 : "Rescue system"
[    3.228225] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[    3.236100] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[    3.246357] libphy: Fixed MDIO Bus: probed
[    3.250614] tun: Universal TUN/TAP device driver, 1.6
[    3.256068] libphy: orion_mdio_bus: probed
[    3.261289] mv88e6085 f1072004.mdio-mii:10: switch 0x1760 detected: Marvell 88E6176, revision 1
[    3.483904] libphy: mv88e6xxx SMI: probed
[    3.494715] mvneta_bm f10c8000.bm: failed to allocate internal memory
[    3.501206] mvneta_bm: probe of f10c8000.bm failed with error -12
[    3.508159] mvneta f1070000.ethernet eth0: Using hardware mac address d8:58:d7:00:4e:98
[    3.516220] ata2: SATA link down (SStatus 0 SControl 300)
[    3.521683] ata1: SATA link down (SStatus 0 SControl 300)
[    3.527904] mvneta f1030000.ethernet eth1: Using hardware mac address d8:58:d7:00:4e:96
[    3.536693] mvneta f1034000.ethernet eth2: Using hardware mac address d8:58:d7:00:4e:97
[    3.544979] pci 0000:00:01.0: enabling device (0140 -> 0142)
[    3.550664] ath9k 0000:01:00.0: enabling device (0000 -> 0002)
[    3.556599] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
[    3.563783] ath: phy0: Unable to initialize hardware; initialization status: -95
[    3.571200] ath9k 0000:01:00.0: Failed to initialize device
[    3.576817] ath9k: probe of 0000:01:00.0 failed with error -95
[    3.583038] pci 0000:00:02.0: enabling device (0140 -> 0142)
[    3.588904] ath10k_pci 0000:02:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[    3.701778] pci 0000:00:03.0: enabling device (0140 -> 0142)
[    3.707530] mt76x2e 0000:03:00.0: ASIC revision: 76120044
[    3.836545] ath10k_pci 0000:02:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000
[    3.845813] ath10k_pci 0000:02:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 0 testmode 0
[    3.854625] ath10k_pci 0000:02:00.0: firmware ver 10.2.4-1.0-00047 api 5 features no-p2p,raw-mode,mfp,allows-mesh-bcast crc32 35bd9258
[    3.899415] ath10k_pci 0000:02:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
[    4.362131] mt76x2e 0000:03:00.0: ROM patch build: 20141115060606a
[    4.369421] mt76x2e 0000:03:00.0: Firmware Version: 0.0.00
[    4.374934] mt76x2e 0000:03:00.0: Build: 1
[    4.379041] mt76x2e 0000:03:00.0: Build Time: 201507311614____
[    4.401383] mt76x2e 0000:03:00.0: Firmware running!
[    4.406664] ieee80211 phy2: Selected rate control algorithm 'minstrel_ht'
[    4.407567] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    4.414141] ehci-pci: EHCI PCI platform driver
[    4.418614] ehci-orion: EHCI orion driver
[    4.422749] orion-ehci f1058000.usb: EHCI Host Controller
[    4.428172] orion-ehci f1058000.usb: new USB bus registered, assigned bus number 1
[    4.435825] orion-ehci f1058000.usb: irq 49, io mem 0xf1058000
[    4.471384] orion-ehci f1058000.usb: USB 2.0 started, EHCI 1.00
[    4.477701] hub 1-0:1.0: USB hub found
[    4.481498] hub 1-0:1.0: 1 port detected
[    4.485916] xhci-hcd f10f0000.usb3: xHCI Host Controller
[    4.491253] xhci-hcd f10f0000.usb3: new USB bus registered, assigned bus number 2
[    4.498822] xhci-hcd f10f0000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x0000000000010010
[    4.508116] xhci-hcd f10f0000.usb3: irq 55, io mem 0xf10f0000
[    4.514262] hub 2-0:1.0: USB hub found
[    4.518035] hub 2-0:1.0: 1 port detected
[    4.522138] xhci-hcd f10f0000.usb3: xHCI Host Controller
[    4.527468] xhci-hcd f10f0000.usb3: new USB bus registered, assigned bus number 3
[    4.534993] xhci-hcd f10f0000.usb3: Host supports USB 3.0 SuperSpeed
[    4.541411] usb usb3: We don't know the algorithms for LPM for this host, disabling LPM.
[    4.549785] hub 3-0:1.0: USB hub found
[    4.553574] hub 3-0:1.0: 1 port detected
[    4.557768] xhci-hcd f10f8000.usb3: xHCI Host Controller
[    4.563115] xhci-hcd f10f8000.usb3: new USB bus registered, assigned bus number 4
[    4.570665] xhci-hcd f10f8000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x0000000000010010
[    4.579950] xhci-hcd f10f8000.usb3: irq 56, io mem 0xf10f8000
[    4.586077] hub 4-0:1.0: USB hub found
[    4.589849] hub 4-0:1.0: 1 port detected
[    4.594849] xhci-hcd f10f8000.usb3: xHCI Host Controller
[    4.600178] xhci-hcd f10f8000.usb3: new USB bus registered, assigned bus number 5
[    4.607711] xhci-hcd f10f8000.usb3: Host supports USB 3.0 SuperSpeed
[    4.614117] usb usb5: We don't know the algorithms for LPM for this host, disabling LPM.
[    4.622494] hub 5-0:1.0: USB hub found
[    4.626265] hub 5-0:1.0: 1 port detected
[    4.630439] usbcore: registered new interface driver usb-storage
[    4.637200] armada38x-rtc f10a3800.rtc: registered as rtc0
[    4.642796] armada38x-rtc f10a3800.rtc: setting system clock to 2021-03-26T15:21:33 UTC (1616772093)
[    4.652088] i2c /dev entries driver
[    4.655879] i2c i2c-0: Not using recovery: no recover_bus() found
[    4.663003] at24 1-0054: supply vcc not found, using dummy regulator
[    4.670261] at24 1-0054: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
[    4.677027] i2c i2c-0: Added multiplexed i2c bus 1
[    4.681962] i2c i2c-0: Added multiplexed i2c bus 2
[    4.686871] i2c i2c-0: Added multiplexed i2c bus 3
[    4.691781] i2c i2c-0: Added multiplexed i2c bus 4
[    4.696685] i2c i2c-0: Added multiplexed i2c bus 5
[    4.701657] i2c i2c-0: Added multiplexed i2c bus 6
[    4.706568] i2c i2c-0: Added multiplexed i2c bus 7
[    4.711692] pca953x 8-0071: supply vcc not found, using dummy regulator
[    4.718379] pca953x 8-0071: using no AI
[    4.722770] pca953x 8-0071: interrupt support not compiled in
[    4.729132] i2c i2c-0: Added multiplexed i2c bus 8
[    4.734009] pca954x 0-0070: registered 8 multiplexed busses for I2C mux pca9547
[    4.743152] orion_wdt: Initial timeout 171 sec
[    4.747871] sdhci: Secure Digital Host Controller Interface driver
[    4.754077] sdhci: Copyright(c) Pierre Ossman
[    4.758547] sdhci-pltfm: SDHCI platform and OF driver helper
[    4.764523] ledtrig-cpu: registered to indicate activity on CPUs
[    4.771498] marvell-cesa f1090000.crypto: CESA device successfully registered
[    4.778822] usbcore: registered new interface driver usbhid
[    4.784448] usbhid: USB HID core driver
[    4.788400] GACT probability on
[    4.791591] Mirror/redirect action on
[    4.795273] Simple TC action Loaded
[    4.798799] u32 classifier
[    4.799815] mmc0: SDHCI controller on f10d8000.sdhci [f10d8000.sdhci] using ADMA
[    4.801518]     Performance counters on
[    4.801520]     input device check on
[    4.801521]     Actions configured
[    4.801981] NET: Registered protocol family 10
[    4.825094] Segment Routing with IPv6
[    4.828820] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    4.835086] NET: Registered protocol family 17
[    4.839638] 8021q: 802.1Q VLAN Support v1.8
[    4.843942] ThumbEE CPU extension supported.
[    4.848240] Registering SWP/SWPB emulation handler
[    4.853185] Loading compiled-in X.509 certificates
[    4.859289] Btrfs loaded, crc32c=crc32c-generic, zoned=no
[    4.866364] mv88e6085 f1072004.mdio-mii:10: switch 0x1760 detected: Marvell 88E6176, revision 1
[    4.947964] mmc0: new high speed MMC card at address 0001
[    4.953701] mmcblk0: mmc0:0001 H8G4a\x92 7.28 GiB 
[    4.958373] mmcblk0boot0: mmc0:0001 H8G4a\x92 partition 1 4.00 MiB
[    4.974792] mmcblk0boot1: mmc0:0001 H8G4a\x92 partition 2 4.00 MiB
[    4.981468] mmcblk0rpmb: mmc0:0001 H8G4a\x92 partition 3 4.00 MiB, chardev (250:0)
[    4.990436]  mmcblk0: p1
[    5.045869] libphy: mv88e6xxx SMI: probed
[    5.065394] ath10k_pci 0000:02:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp max-sta 128 raw 0 hwcrypto 1
[    5.182884] ath: EEPROM regdomain sanitized
[    5.182892] ath: EEPROM regdomain: 0x64
[    5.182897] ath: EEPROM indicates we should expect a direct regpair map
[    5.182903] ath: Country alpha2 being used: 00
[    5.182907] ath: Regpair used: 0x64
[    5.697833] mv88e6085 f1072004.mdio-mii:10 lan0 (uninitialized): PHY [mv88e6xxx-1:00] driver [Marvell 88E1540] (irq=75)
[    5.730170] mv88e6085 f1072004.mdio-mii:10 lan1 (uninitialized): PHY [mv88e6xxx-1:01] driver [Marvell 88E1540] (irq=76)
[    5.765363] mv88e6085 f1072004.mdio-mii:10 lan2 (uninitialized): PHY [mv88e6xxx-1:02] driver [Marvell 88E1540] (irq=77)
[    5.798260] mv88e6085 f1072004.mdio-mii:10 lan3 (uninitialized): PHY [mv88e6xxx-1:03] driver [Marvell 88E1540] (irq=78)
[    5.828978] mv88e6085 f1072004.mdio-mii:10 lan4 (uninitialized): PHY [mv88e6xxx-1:04] driver [Marvell 88E1540] (irq=79)
[    5.847992] mv88e6085 f1072004.mdio-mii:10: configuring for fixed/rgmii-id link mode
[    5.858403] mv88e6085 f1072004.mdio-mii:10: Link is Up - 1Gbps/Full - flow control off
[    5.866420] DSA: tree 0 setup
[    5.870132] Waiting 2 sec before mounting root device...
[    5.875609] ath: EEPROM regdomain: 0x80d0
[    5.875614] ath: EEPROM indicates we should expect a country code
[    5.875617] ath: doing EEPROM country->regdmn map search
[    5.875620] ath: country maps to regdmn code: 0x37
[    5.875624] ath: Country alpha2 being used: DK
[    5.875627] ath: Regpair used: 0x37
[    5.875633] ath: regdomain 0x80d0 dynamically updated by user
[    7.917893] BTRFS: device fsid 448334b8-1b27-4738-8118-9e70b56b1e58 devid 1 transid 13610 /dev/root scanned by swapper/0 (1)
[    7.929810] BTRFS info (device mmcblk0p1): disk space caching is enabled
[    7.936547] BTRFS info (device mmcblk0p1): has skinny extents
[    7.948767] BTRFS info (device mmcblk0p1): enabling ssd optimizations
[    7.957822] VFS: Mounted root (btrfs filesystem) on device 0:13.
[    7.964279] devtmpfs: mounted
[    7.967862] Freeing unused kernel memory: 1024K
[    8.011610] Run /sbin/init as init process
[    8.015718]   with arguments:
[    8.015722]     /sbin/init
[    8.015725]     earlyprintk
[    8.015728]   with environment:
[    8.015731]     HOME=/
[    8.015734]     TERM=linux
[    8.092110] random: fast init done
[    8.441825] systemd[1]: systemd 247.3-1-arch running in system mode. (+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
[    8.464757] systemd[1]: Detected architecture arm.
[    8.592661] systemd[1]: Set hostname to <omnia-arch>.
[    8.786365] systemd-gpt-auto-generator[173]: File system behind root file system is reported by btrfs to be backed by pseudo-device /dev/root, which is not a valid userspace accessible device node. Cannot determine correct backing block device.
[    8.810036] systemd[167]: /usr/lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 1.
[    9.029889] systemd[1]: Queued start job for default target Graphical Interface.
[    9.038220] random: systemd: uninitialized urandom read (16 bytes read)
[    9.065479] systemd[1]: Created slice system-getty.slice.
[    9.101527] random: systemd: uninitialized urandom read (16 bytes read)
[    9.109083] systemd[1]: Created slice system-modprobe.slice.
[    9.141479] random: systemd: uninitialized urandom read (16 bytes read)
[    9.148986] systemd[1]: Created slice system-serial\x2dgetty.slice.
[    9.182292] systemd[1]: Created slice User and Session Slice.
[    9.221625] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[    9.261586] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[    9.301489] systemd[1]: Condition check resulted in Arbitrary Executable File Formats File System Automount Point being skipped.
[    9.313240] systemd[1]: Reached target Local Encrypted Volumes.
[    9.351600] systemd[1]: Reached target Paths.
[    9.381488] systemd[1]: Reached target Remote File Systems.
[    9.421453] systemd[1]: Reached target Slices.
[    9.451485] systemd[1]: Reached target Swap.
[    9.481688] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[    9.532988] systemd[1]: Listening on Process Core Dump Socket.
[    9.575779] systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
[    9.585118] systemd[1]: Listening on Journal Socket (/dev/log).
[    9.621803] systemd[1]: Listening on Journal Socket.
[    9.658182] systemd[1]: Listening on Network Service Netlink Socket.
[    9.692994] systemd[1]: Listening on udev Control Socket.
[    9.731720] systemd[1]: Listening on udev Kernel Socket.
[    9.771747] systemd[1]: Condition check resulted in Huge Pages File System being skipped.
[    9.780163] systemd[1]: Condition check resulted in POSIX Message Queue File System being skipped.
[    9.791835] systemd[1]: Mounting Kernel Debug File System...
[    9.834096] systemd[1]: Mounting Kernel Trace File System...
[    9.873970] systemd[1]: Mounting Temporary Directory (/tmp)...
[    9.911710] systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
[    9.925842] systemd[1]: Starting Load Kernel Module configfs...
[    9.964225] systemd[1]: Starting Load Kernel Module drm...
[   10.004524] systemd[1]: Starting Load Kernel Module fuse...
[   10.048219] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
[   10.058755] systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
[   10.069469] systemd[1]: Starting Remount Root and Kernel File Systems...
[   10.111602] systemd[1]: Condition check resulted in Repartition Root Disk being skipped.
[   10.122503] systemd[1]: Starting Apply Kernel Variables...
[   10.164228] systemd[1]: Starting Coldplug All udev Devices...
[   10.206094] systemd[1]: Mounted Kernel Debug File System.
[   10.241980] systemd[1]: Mounted Kernel Trace File System.
[   10.281734] systemd[1]: Mounted Temporary Directory (/tmp).
[   10.321996] systemd[1]: modprobe@configfs.service: Succeeded.
[   10.328795] systemd[1]: Finished Load Kernel Module configfs.
[   10.366145] systemd[1]: modprobe@drm.service: Succeeded.
[   10.372733] systemd[1]: Finished Load Kernel Module drm.
[   10.412110] systemd[1]: modprobe@fuse.service: Succeeded.
[   10.418547] systemd[1]: Finished Load Kernel Module fuse.
[   10.452851] systemd[1]: Finished Remount Root and Kernel File Systems.
[   10.492903] systemd[1]: Finished Apply Kernel Variables.
[   10.534693] systemd[1]: Condition check resulted in FUSE Control File System being skipped.
[   10.543506] systemd[1]: Condition check resulted in Kernel Configuration File System being skipped.
[   10.552850] systemd[1]: Condition check resulted in First Boot Wizard being skipped.
[   10.568407] systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
[   10.579654] systemd[1]: Starting Load/Save Random Seed...
[   10.601715] systemd[1]: Condition check resulted in Create System Users being skipped.
[   10.613507] systemd[1]: Starting Create Static Device Nodes in /dev...
[   10.743498] systemd[1]: Finished Create Static Device Nodes in /dev.
[   10.762086] systemd[1]: Reached target Local File Systems (Pre).
[   10.801636] systemd[1]: Condition check resulted in Virtual Machine and Container Storage (Compatibility) being skipped.
[   10.812817] systemd[1]: Reached target Local File Systems.
[   10.854718] systemd[1]: Started Entropy Daemon based on the HAVEGE algorithm.
[   10.891763] systemd[1]: Condition check resulted in Rebuild Dynamic Linker Cache being skipped.
[   10.904350] systemd[1]: Starting Journal Service...
[   10.945329] systemd[1]: Starting Rule-based Manager for Device Events and Files...
[   10.993388] systemd[1]: Finished Coldplug All udev Devices.
[   11.039498] systemd[1]: Started Journal Service.
[   11.155201] systemd-journald[193]: Received client request to flush runtime journal.
[   12.440807] mvneta f1034000.ethernet eth2: PHY [f1072004.mdio-mii:01] driver [Marvell 88E1510] (irq=POLL)
[   12.457437] mvneta f1034000.ethernet eth2: configuring for phy/sgmii link mode
[   12.536401] mvneta f1070000.ethernet eth0: configuring for fixed/rgmii link mode
[   12.551575] mvneta f1070000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
[   12.731311] ath10k_pci 0000:02:00.0 wlp2s0: renamed from wlan1
[   12.893504] BTRFS info (device mmcblk0p1): devid 1 device path /dev/root changed to /dev/mmcblk0p1 scanned by systemd-udevd (202)
[   12.922368] mt76x2e 0000:03:00.0 wlp3s0: renamed from wlan0
[   13.451476] random: crng init done
[   13.454898] random: 7 urandom warning(s) missed due to ratelimiting
[   15.550016] ath10k_pci 0000:02:00.0: pdev param 0 not supported by firmware
[   15.591776] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[   15.599825] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready

>> Could there be some kind of data corruption in play here making the
>> driver think the chip revision is wrong, or something like that? If I
>> boot the same kernel without the patch applied, the ath9k initialisation
>> works fine, but obviously the ath10k is then still broken...
>
> There is something really strange.
>
> Can you add debug log into pcie_change_tls_to_gen1() function to check
> for which card is this function called?

Erm, it looks like it's never called? I added this:

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index ea5bdf6107f6..794c682d4bd3 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -198,6 +198,9 @@ static int pcie_change_tls_to_gen1(struct pci_dev *parent)
        u32 reg32;
        int ret;
 
+       printk("pcie_change_tls_to_getn1() called for device %x:%x:%x\n",
+              parent->device, parent->subsystem_vendor, parent->subsystem_device);
+
        /* Check if link speed can be forced to 2.5 GT/s */
        pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &reg32);
        if (!(reg32 & PCI_EXP_LNKCAP2_SLS_2_5GB)) {

But 'dmesg | grep called' returns nothing...

> Are you testing this new patch with or without changes to
> mvebu_pcie_setup_hw() function?

I applied your patch on top of latest mac80211-next, which right now is
this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git/commit/?id=4b837ad53be2ab100dfaa99dc73a9443a8a2392d

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-26 16:54                                           ` Toke Høiland-Jørgensen
@ 2021-03-26 17:11                                             ` Pali Rohár
  2021-03-26 17:51                                               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: Pali Rohár @ 2021-03-26 17:11 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni

On Friday 26 March 2021 17:54:38 Toke Høiland-Jørgensen wrote:
> Pali Rohár <pali@kernel.org> writes:
> > On Friday 26 March 2021 16:25:27 Toke Høiland-Jørgensen wrote:
> >> Pali Rohár <pali@kernel.org> writes:
> >> > Seems that this is really issue in QCA98xx chips. I have send patch
> >> > which adds quirk for these wifi chips:
> >> >
> >> > https://lore.kernel.org/linux-pci/20210326124326.21163-1-pali@kernel.org/
> >> 
> >> I tried applying that, and while it does fix the ath10k card, it seems
> >> to break the ath9k card in the slot next to it.
> >
> > Ehm, what?
> 
> I know, right?! :/
> 
> > Patch which I have sent today to mailing list calls quirk code only
> > for PCI device id used by QCA98xx cards. For all other cards it is
> > noop.
> 
> So upon further investigation this seems to be unrelated to the patch.
> Meaning that I can't reliably get the ath9k device to work again by
> reverting it. And the patch does seem to fix the ath10k device, so I
> think that's probably good.
> 
> However, the issue with ath9k does seem to be related to ASPM; if I turn
> that off in .config, I get the ath9k device back.

Ok, perfect. So this my patch is does not break ath9k.

> So we have these
> cases:
> 
> ASPM disabled:          ath9k, ath10k and mt76 cards all work
> ASPM enabled, no patch: only mt76 card works
> ASPM enabled + patch:   ath10k and mt76 cards work
> 
> So IDK, maybe the ath9k card needs a quirk as well? Or the mvebu board
> is just generally flaky?

I'm not sure. Maybe ASPM is somehow buggy on ath9k or needs some special
handling. But issue is not at PCI config space as ath9k driver start
initialization of this card. Needs also some debugging in ath9k driver
if it prints that strange "mac chip rev" error.

I think this issue should be handled separately. Could you report it
also to ath9k mailing list (and CC me)? Maybe other ath developers would
know some more details.

> > Can you send PCI device id of your ath9k card (lspci -nn)? Because all
> > my tested ath9k cards have different PCI device id.
> 
> [root@omnia-arch ~]# lspci -nn
> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
> 00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
> 00:03.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
> 01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
> 02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]

That is fine. Also all ath9k testing cards have id 0x002e.

> >> When booting with the
> >> patch applied, I get this in dmesg:
> >> 
> >> [    3.556599] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
> >
> > Can you send whole dmesg log? So I can see which new err/info lines are
> > printed.
> 
> Pasting all three cases below:
...

Seem that there is no ASPM related line... But your logs are not
complete, beginning is missing. So important lines are maybe trimmed.

> >> Could there be some kind of data corruption in play here making the
> >> driver think the chip revision is wrong, or something like that? If I
> >> boot the same kernel without the patch applied, the ath9k initialisation
> >> works fine, but obviously the ath10k is then still broken...
> >
> > There is something really strange.
> >
> > Can you add debug log into pcie_change_tls_to_gen1() function to check
> > for which card is this function called?
> 
> Erm, it looks like it's never called? I added this:

Ehm? With patch it must be called otherwise ath10k card would not be
detected on PCIe bus. And you tested that patch fixes it...

> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index ea5bdf6107f6..794c682d4bd3 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -198,6 +198,9 @@ static int pcie_change_tls_to_gen1(struct pci_dev *parent)
>         u32 reg32;
>         int ret;
>  
> +       printk("pcie_change_tls_to_getn1() called for device %x:%x:%x\n",
> +              parent->device, parent->subsystem_vendor, parent->subsystem_device);
> +

Try pci_err(parent, "message...\n"); if something changes?

>         /* Check if link speed can be forced to 2.5 GT/s */
>         pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &reg32);
>         if (!(reg32 & PCI_EXP_LNKCAP2_SLS_2_5GB)) {
> 
> But 'dmesg | grep called' returns nothing...
> 
> > Are you testing this new patch with or without changes to
> > mvebu_pcie_setup_hw() function?
> 
> I applied your patch on top of latest mac80211-next, which right now is
> this commit:
> https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git/commit/?id=4b837ad53be2ab100dfaa99dc73a9443a8a2392d

Just to ensure that you are _not_ using hack for mvebu_pcie_setup_hw()
function in pci-mvebu.c (which I have sent few days ago).

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-26 17:11                                             ` Pali Rohár
@ 2021-03-26 17:51                                               ` Toke Høiland-Jørgensen
  2021-03-29 17:09                                                 ` Pali Rohár
  0 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-03-26 17:51 UTC (permalink / raw)
  To: Pali Rohár
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni

Pali Rohár <pali@kernel.org> writes:

> On Friday 26 March 2021 17:54:38 Toke Høiland-Jørgensen wrote:
>> Pali Rohár <pali@kernel.org> writes:
>> > On Friday 26 March 2021 16:25:27 Toke Høiland-Jørgensen wrote:
>> >> Pali Rohár <pali@kernel.org> writes:
>> >> > Seems that this is really issue in QCA98xx chips. I have send patch
>> >> > which adds quirk for these wifi chips:
>> >> >
>> >> > https://lore.kernel.org/linux-pci/20210326124326.21163-1-pali@kernel.org/
>> >> 
>> >> I tried applying that, and while it does fix the ath10k card, it seems
>> >> to break the ath9k card in the slot next to it.
>> >
>> > Ehm, what?
>> 
>> I know, right?! :/
>> 
>> > Patch which I have sent today to mailing list calls quirk code only
>> > for PCI device id used by QCA98xx cards. For all other cards it is
>> > noop.
>> 
>> So upon further investigation this seems to be unrelated to the patch.
>> Meaning that I can't reliably get the ath9k device to work again by
>> reverting it. And the patch does seem to fix the ath10k device, so I
>> think that's probably good.
>> 
>> However, the issue with ath9k does seem to be related to ASPM; if I turn
>> that off in .config, I get the ath9k device back.
>
> Ok, perfect. So this my patch is does not break ath9k.

No, doesn't seem like it!

>> So we have these
>> cases:
>> 
>> ASPM disabled:          ath9k, ath10k and mt76 cards all work
>> ASPM enabled, no patch: only mt76 card works
>> ASPM enabled + patch:   ath10k and mt76 cards work
>> 
>> So IDK, maybe the ath9k card needs a quirk as well? Or the mvebu board
>> is just generally flaky?
>
> I'm not sure. Maybe ASPM is somehow buggy on ath9k or needs some special
> handling. But issue is not at PCI config space as ath9k driver start
> initialization of this card. Needs also some debugging in ath9k driver
> if it prints that strange "mac chip rev" error.

Well that's just being output because it gets a revision that it doesn't
recognise - which it seems to be just reading from a register:

https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L255

The value returned is consistent with the value returned just being
0xffffffff. Which from looking at ioread32() is the value being returned
on a failed read. So there's a driver bug there - the check against -EIO
here is obviously nonsensical:

https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L290

But the underlying cause appears to be that the read from the register
fails, which I suppose is related to something the PCI bus does?

> I think this issue should be handled separately. Could you report it
> also to ath9k mailing list (and CC me)? Maybe other ath developers would
> know some more details.

I'll send a patch for the nonsensical check above, but other than that I
think we're still in PCI land here, or?

>> > Can you send PCI device id of your ath9k card (lspci -nn)? Because all
>> > my tested ath9k cards have different PCI device id.
>> 
>> [root@omnia-arch ~]# lspci -nn
>> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> 00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> 00:03.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> 01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
>> 02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
>
> That is fine. Also all ath9k testing cards have id 0x002e.
>
>> >> When booting with the
>> >> patch applied, I get this in dmesg:
>> >> 
>> >> [    3.556599] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>> >
>> > Can you send whole dmesg log? So I can see which new err/info lines are
>> > printed.
>> 
>> Pasting all three cases below:
> ...
>
> Seem that there is no ASPM related line... But your logs are not
> complete, beginning is missing. So important lines are maybe trimmed.

Ah! Of course - sorry for not noticing that!

Here are the missing bits related to PCIE (pulled off the serial console
- with the patch applied):

[    1.493064] mvebu-pcie soc:pcie: host bridge /soc/pcie ranges:
[    1.493094] mvebu-pcie soc:pcie:      MEM 0x00f1080000..0x00f1081fff -> 0x0000080000
[    1.493113] mvebu-pcie soc:pcie:      MEM 0x00f1040000..0x00f1041fff -> 0x0000040000
[    1.493129] mvebu-pcie soc:pcie:      MEM 0x00f1044000..0x00f1045fff -> 0x0000044000
[    1.493144] mvebu-pcie soc:pcie:      MEM 0x00f1048000..0x00f1049fff -> 0x0000048000
[    1.493159] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[    1.493174] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[    1.493189] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[    1.493203] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[    1.493217] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[    1.493231] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[    1.493245] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[    1.493255] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[    1.493426] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
[    1.493435] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.493443] pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] (bus address [0x00080000-0x00081fff])
[    1.493451] pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] (bus address [0x00040000-0x00041fff])
[    1.493458] pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] (bus address [0x00044000-0x00045fff])
[    1.493465] pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] (bus address [0x00048000-0x00049fff])
[    1.493472] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
[    1.493478] pci_bus 0000:00: root bus resource [io  0x1000-0xeffff]
[    1.493548] pci 0000:00:01.0: [11ab:6820] type 01 class 0x060400
[    1.493564] pci 0000:00:01.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.493719] pci 0000:00:02.0: [11ab:6820] type 01 class 0x060400
[    1.493734] pci 0000:00:02.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.493868] pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400
[    1.493882] pci 0000:00:03.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.494660] PCI: bus0: Fast back to back transfers disabled
[    1.494668] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.494677] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.494685] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.494765] pci 0000:01:00.0: [168c:002e] type 00 class 0x028000
[    1.494788] pci 0000:01:00.0: reg 0x10: [mem 0xe8000000-0xe800ffff 64bit]
[    1.494901] pci 0000:01:00.0: supports D1
[    1.494907] pci 0000:01:00.0: PME# supported from D0 D1 D3hot
[    1.495020] pci 0000:00:01.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.522129] PCI: bus1: Fast back to back transfers enabled
[    1.522137] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.522226] pci 0000:02:00.0: [168c:003c] type 00 class 0x028000
[    1.522249] pci 0000:02:00.0: reg 0x10: [mem 0xea000000-0xea1fffff 64bit]
[    1.522283] pci 0000:02:00.0: reg 0x30: [mem 0xea200000-0xea20ffff pref]
[    1.522362] pci 0000:02:00.0: supports D1 D2
[    1.522457] pci 0000:00:02.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.522466] pcie_change_tls_to_getn1() called for device 6820:0:0
[    1.522472] pci 0000:00:02.0: ASPM: Bridge does not support changing Link Speed to 2.5 GT/s
[    1.522477] pci 0000:00:02.0: ASPM: Retrain Link at higher speed is disallowed by quirk
[    1.522482] pci 0000:00:02.0: ASPM: Could not configure common clock
[    1.523241] PCI: bus2: Fast back to back transfers disabled
[    1.523247] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
[    1.523332] pci 0000:03:00.0: [14c3:7612] type 00 class 0x028000
[    1.523357] pci 0000:03:00.0: reg 0x10: [mem 0xec000000-0xec0fffff 64bit]
[    1.523393] pci 0000:03:00.0: reg 0x30: [mem 0xec100000-0xec10ffff pref]
[    1.523481] pci 0000:03:00.0: PME# supported from D0 D3hot D3cold
[    1.523601] pci 0000:00:03.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.552139] PCI: bus3: Fast back to back transfers disabled
[    1.552147] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
[    1.552183] pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
[    1.552193] pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
[    1.552202] pci 0000:00:03.0: BAR 8: assigned [mem 0xe0600000-0xe07fffff]
[    1.552211] pci 0000:00:01.0: BAR 6: assigned [mem 0xe0100000-0xe01007ff pref]
[    1.552221] pci 0000:00:02.0: BAR 6: assigned [mem 0xe0500000-0xe05007ff pref]
[    1.552229] pci 0000:00:03.0: BAR 6: assigned [mem 0xe0800000-0xe08007ff pref]
[    1.552238] pci 0000:01:00.0: BAR 0: assigned [mem 0xe0000000-0xe000ffff 64bit]
[    1.552247] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
[    1.552254] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[    1.552261] pci 0000:00:01.0: PCI bridge to [bus 01]
[    1.552269] pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xe00fffff]
[    1.552279] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0200000-0xe03fffff 64bit]
[    1.552293] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0400000-0xe040ffff pref]
[    1.552300] pci 0000:00:02.0: PCI bridge to [bus 02]
[    1.552306] pci 0000:00:02.0:   bridge window [mem 0xe0200000-0xe04fffff]
[    1.552315] pci 0000:03:00.0: BAR 0: assigned [mem 0xe0600000-0xe06fffff 64bit]
[    1.552329] pci 0000:03:00.0: BAR 6: assigned [mem 0xe0700000-0xe070ffff pref]
[    1.552335] pci 0000:00:03.0: PCI bridge to [bus 03]
[    1.552342] pci 0000:00:03.0:   bridge window [mem 0xe0600000-0xe07fffff]


>> >> Could there be some kind of data corruption in play here making the
>> >> driver think the chip revision is wrong, or something like that? If I
>> >> boot the same kernel without the patch applied, the ath9k initialisation
>> >> works fine, but obviously the ath10k is then still broken...
>> >
>> > There is something really strange.
>> >
>> > Can you add debug log into pcie_change_tls_to_gen1() function to check
>> > for which card is this function called?
>> 
>> Erm, it looks like it's never called? I added this:
>
> Ehm? With patch it must be called otherwise ath10k card would not be
> detected on PCIe bus. And you tested that patch fixes it...

Yeah, that was due to the missing log lines; it's in the output above.

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-26 17:51                                               ` Toke Høiland-Jørgensen
@ 2021-03-29 17:09                                                 ` Pali Rohár
  2021-03-31 14:02                                                   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: Pali Rohár @ 2021-03-29 17:09 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni

On Friday 26 March 2021 18:51:42 Toke Høiland-Jørgensen wrote:
> Pali Rohár <pali@kernel.org> writes:
> > On Friday 26 March 2021 17:54:38 Toke Høiland-Jørgensen wrote:
> >> So we have these
> >> cases:
> >> 
> >> ASPM disabled:          ath9k, ath10k and mt76 cards all work
> >> ASPM enabled, no patch: only mt76 card works
> >> ASPM enabled + patch:   ath10k and mt76 cards work
> >> 
> >> So IDK, maybe the ath9k card needs a quirk as well? Or the mvebu board
> >> is just generally flaky?
> >
> > I'm not sure. Maybe ASPM is somehow buggy on ath9k or needs some special
> > handling. But issue is not at PCI config space as ath9k driver start
> > initialization of this card. Needs also some debugging in ath9k driver
> > if it prints that strange "mac chip rev" error.
> 
> Well that's just being output because it gets a revision that it doesn't
> recognise - which it seems to be just reading from a register:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L255
> 
> The value returned is consistent with the value returned just being
> 0xffffffff. Which from looking at ioread32() is the value being returned
> on a failed read. So there's a driver bug there - the check against -EIO
> here is obviously nonsensical:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L290
> 
> But the underlying cause appears to be that the read from the register
> fails, which I suppose is related to something the PCI bus does?
> 
> > I think this issue should be handled separately. Could you report it
> > also to ath9k mailing list (and CC me)? Maybe other ath developers would
> > know some more details.
> 
> I'll send a patch for the nonsensical check above, but other than that I
> think we're still in PCI land here, or?

First, can you try to enable my quirk also for this ath9k card with ASPM
enabled?

I have there another ath9k card which after toggling link retraining
changes PCI device ID (really!) to 0xABCD. But lspci ...

There is long story about broken ath9k cards that are reporting 0xABCD
id on x86 machines with specific BIOS versions. It can be find in
ath9k-devel mailing list archive:

https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html

Maybe we now found root cause of this ABCD? If yes, then it also answers
why above ath9k driver check fails (device id was changed) and also
because kernel see correct id (kernel reads id before configuring ASPM
and therefore before triggering link retraining).

> >> > Can you send PCI device id of your ath9k card (lspci -nn)? Because all
> >> > my tested ath9k cards have different PCI device id.
> >> 
> >> [root@omnia-arch ~]# lspci -nn
> >> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
> >> 00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
> >> 00:03.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
> >> 01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
> >> 02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
> >
> > That is fine. Also all ath9k testing cards have id 0x002e.

Today I found out that lspci -nn may lie! Please send output from
command: lspci -nn -x because real PCI device id can read only from -x
hexdump output.

> >> >> When booting with the
> >> >> patch applied, I get this in dmesg:
> >> >> 
> >> >> [    3.556599] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-29 17:09                                                 ` Pali Rohár
@ 2021-03-31 14:02                                                   ` Toke Høiland-Jørgensen
  2021-03-31 16:15                                                     ` Pali Rohár
  0 siblings, 1 reply; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-03-31 14:02 UTC (permalink / raw)
  To: Pali Rohár
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni

Pali Rohár <pali@kernel.org> writes:

> On Friday 26 March 2021 18:51:42 Toke Høiland-Jørgensen wrote:
>> Pali Rohár <pali@kernel.org> writes:
>> > On Friday 26 March 2021 17:54:38 Toke Høiland-Jørgensen wrote:
>> >> So we have these
>> >> cases:
>> >> 
>> >> ASPM disabled:          ath9k, ath10k and mt76 cards all work
>> >> ASPM enabled, no patch: only mt76 card works
>> >> ASPM enabled + patch:   ath10k and mt76 cards work
>> >> 
>> >> So IDK, maybe the ath9k card needs a quirk as well? Or the mvebu board
>> >> is just generally flaky?
>> >
>> > I'm not sure. Maybe ASPM is somehow buggy on ath9k or needs some special
>> > handling. But issue is not at PCI config space as ath9k driver start
>> > initialization of this card. Needs also some debugging in ath9k driver
>> > if it prints that strange "mac chip rev" error.
>> 
>> Well that's just being output because it gets a revision that it doesn't
>> recognise - which it seems to be just reading from a register:
>> 
>> https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L255
>> 
>> The value returned is consistent with the value returned just being
>> 0xffffffff. Which from looking at ioread32() is the value being returned
>> on a failed read. So there's a driver bug there - the check against -EIO
>> here is obviously nonsensical:
>> 
>> https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L290
>> 
>> But the underlying cause appears to be that the read from the register
>> fails, which I suppose is related to something the PCI bus does?
>> 
>> > I think this issue should be handled separately. Could you report it
>> > also to ath9k mailing list (and CC me)? Maybe other ath developers would
>> > know some more details.
>> 
>> I'll send a patch for the nonsensical check above, but other than that I
>> think we're still in PCI land here, or?
>
> First, can you try to enable my quirk also for this ath9k card with ASPM
> enabled?

Yup, with this I get both devices working:

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8ff690c7679d..7e2f9c69f6b2 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3583,6 +3583,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034, quirk_no_bus_reset);
  * PCIe bridge has forced link speed to 2.5 GT/s via PCI_EXP_LNKCTL2 register.
  */
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset_and_no_retrain_link);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x002e, quirk_no_bus_reset_and_no_retrain_link);
 
 /*
  * Root port on some Cavium CN8xxx chips do not successfully complete a bus

>
> I have there another ath9k card which after toggling link retraining
> changes PCI device ID (really!) to 0xABCD. But lspci ...
>
> There is long story about broken ath9k cards that are reporting 0xABCD
> id on x86 machines with specific BIOS versions. It can be find in
> ath9k-devel mailing list archive:
>
> https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html
>
> Maybe we now found root cause of this ABCD? If yes, then it also answers
> why above ath9k driver check fails (device id was changed) and also
> because kernel see correct id (kernel reads id before configuring ASPM
> and therefore before triggering link retraining).
>
>> >> > Can you send PCI device id of your ath9k card (lspci -nn)? Because all
>> >> > my tested ath9k cards have different PCI device id.
>> >> 
>> >> [root@omnia-arch ~]# lspci -nn
>> >> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> >> 00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> >> 00:03.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> >> 01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
>> >> 02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
>> >
>> > That is fine. Also all ath9k testing cards have id 0x002e.
>
> Today I found out that lspci -nn may lie! Please send output from
> command: lspci -nn -x because real PCI device id can read only from -x
> hexdump output.

Without the quirk added to the ath9k:

01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
00: 8c 16 2e 00 02 00 10 00 01 00 80 02 10 00 00 00
10: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 8c 16 a4 30
30: 00 00 00 00 40 00 00 00 00 00 00 00 3d 01 00 00

02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
00: 8c 16 3c 00 46 05 10 00 00 00 80 02 10 00 00 00
10: 04 00 20 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 20 ea 40 00 00 00 00 00 00 00 3e 01 00 00

And with:

01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
00: 8c 16 2e 00 46 01 10 00 01 00 80 02 10 00 00 00
10: 04 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 8c 16 a4 30
30: 00 00 00 00 40 00 00 00 00 00 00 00 3d 01 00 00

02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
00: 8c 16 3c 00 46 05 10 00 00 00 80 02 10 00 00 00
10: 04 00 20 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 20 ea 40 00 00 00 00 00 00 00 3e 01 00 00



Is that change in bytes 5 and 6 significant?

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-31 14:02                                                   ` Toke Høiland-Jørgensen
@ 2021-03-31 16:15                                                     ` Pali Rohár
  2021-03-31 16:53                                                       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 62+ messages in thread
From: Pali Rohár @ 2021-03-31 16:15 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni

On Wednesday 31 March 2021 16:02:42 Toke Høiland-Jørgensen wrote:
> Pali Rohár <pali@kernel.org> writes:
> 
> > On Friday 26 March 2021 18:51:42 Toke Høiland-Jørgensen wrote:
> >> Pali Rohár <pali@kernel.org> writes:
> >> > On Friday 26 March 2021 17:54:38 Toke Høiland-Jørgensen wrote:
> >> >> So we have these
> >> >> cases:
> >> >> 
> >> >> ASPM disabled:          ath9k, ath10k and mt76 cards all work
> >> >> ASPM enabled, no patch: only mt76 card works
> >> >> ASPM enabled + patch:   ath10k and mt76 cards work
> >> >> 
> >> >> So IDK, maybe the ath9k card needs a quirk as well? Or the mvebu board
> >> >> is just generally flaky?
> >> >
> >> > I'm not sure. Maybe ASPM is somehow buggy on ath9k or needs some special
> >> > handling. But issue is not at PCI config space as ath9k driver start
> >> > initialization of this card. Needs also some debugging in ath9k driver
> >> > if it prints that strange "mac chip rev" error.
> >> 
> >> Well that's just being output because it gets a revision that it doesn't
> >> recognise - which it seems to be just reading from a register:
> >> 
> >> https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L255
> >> 
> >> The value returned is consistent with the value returned just being
> >> 0xffffffff. Which from looking at ioread32() is the value being returned
> >> on a failed read. So there's a driver bug there - the check against -EIO
> >> here is obviously nonsensical:
> >> 
> >> https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L290
> >> 
> >> But the underlying cause appears to be that the read from the register
> >> fails, which I suppose is related to something the PCI bus does?
> >> 
> >> > I think this issue should be handled separately. Could you report it
> >> > also to ath9k mailing list (and CC me)? Maybe other ath developers would
> >> > know some more details.
> >> 
> >> I'll send a patch for the nonsensical check above, but other than that I
> >> think we're still in PCI land here, or?
> >
> > First, can you try to enable my quirk also for this ath9k card with ASPM
> > enabled?
> 
> Yup, with this I get both devices working:
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 8ff690c7679d..7e2f9c69f6b2 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3583,6 +3583,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034, quirk_no_bus_reset);
>   * PCIe bridge has forced link speed to 2.5 GT/s via PCI_EXP_LNKCTL2 register.
>   */
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset_and_no_retrain_link);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x002e, quirk_no_bus_reset_and_no_retrain_link);
>  
>  /*
>   * Root port on some Cavium CN8xxx chips do not successfully complete a bus

Ok, thank you for testing!

I'm seeing that testing unit 0x0030 (AR93xx) also needs this quirk, so I
will mark all Atheros chips in above no bus reset list with no retrain
link quirk.

> >
> > I have there another ath9k card which after toggling link retraining
> > changes PCI device ID (really!) to 0xABCD. But lspci ...
> >
> > There is long story about broken ath9k cards that are reporting 0xABCD
> > id on x86 machines with specific BIOS versions. It can be find in
> > ath9k-devel mailing list archive:
> >
> > https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html
> >
> > Maybe we now found root cause of this ABCD? If yes, then it also answers
> > why above ath9k driver check fails (device id was changed) and also
> > because kernel see correct id (kernel reads id before configuring ASPM
> > and therefore before triggering link retraining).
> >
> >> >> > Can you send PCI device id of your ath9k card (lspci -nn)? Because all
> >> >> > my tested ath9k cards have different PCI device id.
> >> >> 
> >> >> [root@omnia-arch ~]# lspci -nn
> >> >> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
> >> >> 00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
> >> >> 00:03.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
> >> >> 01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
> >> >> 02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
> >> >
> >> > That is fine. Also all ath9k testing cards have id 0x002e.
> >
> > Today I found out that lspci -nn may lie! Please send output from
> > command: lspci -nn -x because real PCI device id can read only from -x
> > hexdump output.
> 
> Without the quirk added to the ath9k:
> 
> 01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
> 00: 8c 16 2e 00 02 00 10 00 01 00 80 02 10 00 00 00
> 10: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 8c 16 a4 30
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 3d 01 00 00
> 
> 02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
> 00: 8c 16 3c 00 46 05 10 00 00 00 80 02 10 00 00 00
> 10: 04 00 20 e0 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 30: 00 00 20 ea 40 00 00 00 00 00 00 00 3e 01 00 00
> 
> And with:
> 
> 01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
> 00: 8c 16 2e 00 46 01 10 00 01 00 80 02 10 00 00 00
> 10: 04 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 8c 16 a4 30
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 3d 01 00 00
> 
> 02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
> 00: 8c 16 3c 00 46 05 10 00 00 00 80 02 10 00 00 00
> 10: 04 00 20 e0 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 30: 00 00 20 ea 40 00 00 00 00 00 00 00 3e 01 00 00
> 

Yesterday both MJ and Bjorn told me to use lspci '-b' switch which
instruct lspci to parse capabilities from config space (instead of
kernel cache).

Could you try to run 'lspci -nn -vv' and 'lspci -nn -vv -b' and compare
results? If something changes?

Anyway I have discussion with Adrian Chadd about 0xABCD issue and these
Qualcomm/Atheros cards. When post-AR9300 card is not initialized it
reports PCI device id 0xABCD. Pre-AR9300 cards should report correct PCI
device id even when it is not initialized. WLE200 is AR9287-based, so it
reports always correct id, should not change it during usage.

But seems that also this AR9287 has issue with EEPROM/OTP as you figured
out that ath9k driver is not able to read some device id from internal
register. So please prepare patch for fixing -EIO in ath9k.

PCI vendor & device id is in first 4 bytes and as you can see it is
correct and was not changed.

So I guess lspci output would not change for this card.

> Is that change in bytes 5 and 6 significant?

At offset 0x04 is 16bit PCI Command Register.

In second (with) output is set bit 2 which means that Bus Mastering is
enabled. This is normal and required when card communicate with system.
Then is enabled bit 6 (Parity Error Response) and bit 8 (SERR# Enable),
both for error reporting. This is normal when device is active.

So nothing suspicious here.

> -Toke
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: PCI trouble on mvebu (Turris Omnia)
  2021-03-31 16:15                                                     ` Pali Rohár
@ 2021-03-31 16:53                                                       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 62+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-03-31 16:53 UTC (permalink / raw)
  To: Pali Rohár
  Cc: vtolkm, Bjorn Helgaas, linux-pci, linux-arm-kernel, Rob Herring,
	Ilias Apalodimas, Marek Behún, Thomas Petazzoni

Pali Rohár <pali@kernel.org> writes:

> On Wednesday 31 March 2021 16:02:42 Toke Høiland-Jørgensen wrote:
>> Pali Rohár <pali@kernel.org> writes:
>> 
>> > On Friday 26 March 2021 18:51:42 Toke Høiland-Jørgensen wrote:
>> >> Pali Rohár <pali@kernel.org> writes:
>> >> > On Friday 26 March 2021 17:54:38 Toke Høiland-Jørgensen wrote:
>> >> >> So we have these
>> >> >> cases:
>> >> >> 
>> >> >> ASPM disabled:          ath9k, ath10k and mt76 cards all work
>> >> >> ASPM enabled, no patch: only mt76 card works
>> >> >> ASPM enabled + patch:   ath10k and mt76 cards work
>> >> >> 
>> >> >> So IDK, maybe the ath9k card needs a quirk as well? Or the mvebu board
>> >> >> is just generally flaky?
>> >> >
>> >> > I'm not sure. Maybe ASPM is somehow buggy on ath9k or needs some special
>> >> > handling. But issue is not at PCI config space as ath9k driver start
>> >> > initialization of this card. Needs also some debugging in ath9k driver
>> >> > if it prints that strange "mac chip rev" error.
>> >> 
>> >> Well that's just being output because it gets a revision that it doesn't
>> >> recognise - which it seems to be just reading from a register:
>> >> 
>> >> https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L255
>> >> 
>> >> The value returned is consistent with the value returned just being
>> >> 0xffffffff. Which from looking at ioread32() is the value being returned
>> >> on a failed read. So there's a driver bug there - the check against -EIO
>> >> here is obviously nonsensical:
>> >> 
>> >> https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L290
>> >> 
>> >> But the underlying cause appears to be that the read from the register
>> >> fails, which I suppose is related to something the PCI bus does?
>> >> 
>> >> > I think this issue should be handled separately. Could you report it
>> >> > also to ath9k mailing list (and CC me)? Maybe other ath developers would
>> >> > know some more details.
>> >> 
>> >> I'll send a patch for the nonsensical check above, but other than that I
>> >> think we're still in PCI land here, or?
>> >
>> > First, can you try to enable my quirk also for this ath9k card with ASPM
>> > enabled?
>> 
>> Yup, with this I get both devices working:
>> 
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 8ff690c7679d..7e2f9c69f6b2 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -3583,6 +3583,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034, quirk_no_bus_reset);
>>   * PCIe bridge has forced link speed to 2.5 GT/s via PCI_EXP_LNKCTL2 register.
>>   */
>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset_and_no_retrain_link);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x002e, quirk_no_bus_reset_and_no_retrain_link);
>>  
>>  /*
>>   * Root port on some Cavium CN8xxx chips do not successfully complete a bus
>
> Ok, thank you for testing!
>
> I'm seeing that testing unit 0x0030 (AR93xx) also needs this quirk, so I
> will mark all Atheros chips in above no bus reset list with no retrain
> link quirk.

SGTM.

>> >
>> > I have there another ath9k card which after toggling link retraining
>> > changes PCI device ID (really!) to 0xABCD. But lspci ...
>> >
>> > There is long story about broken ath9k cards that are reporting 0xABCD
>> > id on x86 machines with specific BIOS versions. It can be find in
>> > ath9k-devel mailing list archive:
>> >
>> > https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html
>> >
>> > Maybe we now found root cause of this ABCD? If yes, then it also answers
>> > why above ath9k driver check fails (device id was changed) and also
>> > because kernel see correct id (kernel reads id before configuring ASPM
>> > and therefore before triggering link retraining).
>> >
>> >> >> > Can you send PCI device id of your ath9k card (lspci -nn)? Because all
>> >> >> > my tested ath9k cards have different PCI device id.
>> >> >> 
>> >> >> [root@omnia-arch ~]# lspci -nn
>> >> >> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> >> >> 00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> >> >> 00:03.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> >> >> 01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
>> >> >> 02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
>> >> >
>> >> > That is fine. Also all ath9k testing cards have id 0x002e.
>> >
>> > Today I found out that lspci -nn may lie! Please send output from
>> > command: lspci -nn -x because real PCI device id can read only from -x
>> > hexdump output.
>> 
>> Without the quirk added to the ath9k:
>> 
>> 01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
>> 00: 8c 16 2e 00 02 00 10 00 01 00 80 02 10 00 00 00
>> 10: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 20: 00 00 00 00 00 00 00 00 00 00 00 00 8c 16 a4 30
>> 30: 00 00 00 00 40 00 00 00 00 00 00 00 3d 01 00 00
>> 
>> 02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
>> 00: 8c 16 3c 00 46 05 10 00 00 00 80 02 10 00 00 00
>> 10: 04 00 20 e0 00 00 00 00 00 00 00 00 00 00 00 00
>> 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 30: 00 00 20 ea 40 00 00 00 00 00 00 00 3e 01 00 00
>> 
>> And with:
>> 
>> 01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
>> 00: 8c 16 2e 00 46 01 10 00 01 00 80 02 10 00 00 00
>> 10: 04 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
>> 20: 00 00 00 00 00 00 00 00 00 00 00 00 8c 16 a4 30
>> 30: 00 00 00 00 40 00 00 00 00 00 00 00 3d 01 00 00
>> 
>> 02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
>> 00: 8c 16 3c 00 46 05 10 00 00 00 80 02 10 00 00 00
>> 10: 04 00 20 e0 00 00 00 00 00 00 00 00 00 00 00 00
>> 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 30: 00 00 20 ea 40 00 00 00 00 00 00 00 3e 01 00 00
>> 
>
> Yesterday both MJ and Bjorn told me to use lspci '-b' switch which
> instruct lspci to parse capabilities from config space (instead of
> kernel cache).
>
> Could you try to run 'lspci -nn -vv' and 'lspci -nn -vv -b' and compare
> results? If something changes?

Without -b there seems to be some [size=XX] suffixes to some lines, and
there are some AtomicOpsCap lines that are not there with -b. Also, the
IRQ number and memory offset for ath10k changed like this:

-	Interrupt: pin A routed to IRQ 63
-	Region 0: Memory at e0200000 (64-bit, non-prefetchable) [size=2M]
-	Expansion ROM at e0400000 [disabled] [size=64K]
+	Interrupt: pin A routed to IRQ 62
+	Region 0: Memory at e0200000 (64-bit, non-prefetchable)
+	Expansion ROM at ea200000 [disabled]

> Anyway I have discussion with Adrian Chadd about 0xABCD issue and these
> Qualcomm/Atheros cards. When post-AR9300 card is not initialized it
> reports PCI device id 0xABCD. Pre-AR9300 cards should report correct PCI
> device id even when it is not initialized. WLE200 is AR9287-based, so it
> reports always correct id, should not change it during usage.

Right, makes sense.

> But seems that also this AR9287 has issue with EEPROM/OTP as you figured
> out that ath9k driver is not able to read some device id from internal
> register. So please prepare patch for fixing -EIO in ath9k.

Yup, already did, just forgot to Cc you (sorry about that):
https://patchwork.kernel.org/project/linux-wireless/patch/20210326180819.142480-1-toke@redhat.com/

> PCI vendor & device id is in first 4 bytes and as you can see it is
> correct and was not changed.
>
> So I guess lspci output would not change for this card.
>
>> Is that change in bytes 5 and 6 significant?
>
> At offset 0x04 is 16bit PCI Command Register.
>
> In second (with) output is set bit 2 which means that Bus Mastering is
> enabled. This is normal and required when card communicate with system.
> Then is enabled bit 6 (Parity Error Response) and bit 8 (SERR# Enable),
> both for error reporting. This is normal when device is active.
>
> So nothing suspicious here.

Alright, cool. Thanks a lot for your help with this :)

-Toke


^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2021-03-31 16:54 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-27 15:43 PCI trouble on mvebu (Turris Omnia) Toke Høiland-Jørgensen
2020-10-27 17:20 ` Bjorn Helgaas
2020-10-27 17:44   ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-27 18:59     ` Toke Høiland-Jørgensen
2020-10-27 20:20       ` Toke Høiland-Jørgensen
2020-10-27 21:22         ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-27 21:31           ` Toke Høiland-Jørgensen
2020-10-27 22:01             ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-27 22:12               ` Toke Høiland-Jørgensen
2020-10-27 18:56   ` Toke Høiland-Jørgensen
2020-10-28 13:36     ` Toke Høiland-Jørgensen
2020-10-28 14:42       ` Bjorn Helgaas
2020-10-28 15:08         ` Toke Høiland-Jørgensen
2020-10-28 16:40           ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-28 23:16             ` Bjorn Helgaas
2020-10-29 10:09               ` Pali Rohár
2020-10-29 10:56                 ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-29 11:12                 ` Toke Høiland-Jørgensen
2020-10-29 19:30                   ` Bjorn Helgaas
2020-10-29 19:56                     ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-29 19:57                     ` Andrew Lunn
2020-10-29 21:55                       ` Thomas Petazzoni
2020-10-29 20:18                     ` Toke Høiland-Jørgensen
2020-10-29 22:09                       ` Toke Høiland-Jørgensen
2020-10-29 20:58                     ` Marek Behun
2020-10-30 10:08                       ` Pali Rohár
2020-10-30 10:45                         ` Marek Behun
2020-10-29 21:54                     ` Thomas Petazzoni
2020-10-29 23:15                       ` Toke Høiland-Jørgensen
2020-10-30  8:23                         ` Thomas Petazzoni
2020-10-30 10:15                         ` Pali Rohár
2020-10-29 10:41               ` Toke Høiland-Jørgensen
2020-10-29 11:18                 ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-30 11:23               ` Pali Rohár
2020-10-30 13:02                 ` Toke Høiland-Jørgensen
2020-10-30 14:23                   ` Pali Rohár
2020-10-30 14:54                     ` ™֟☻̭҇ Ѽ ҉ ®
2020-10-31 12:49                       ` Toke Høiland-Jørgensen
2020-11-02 15:24                         ` Pali Rohár
2020-11-02 15:54                           ` Toke Høiland-Jørgensen
2020-11-02 16:18                             ` ™֟☻̭҇ Ѽ ҉ ®
2020-11-02 16:33                               ` Toke Høiland-Jørgensen
2021-03-15 19:58                             ` Pali Rohár
2021-03-16  9:25                               ` Pali Rohár
2021-03-18 22:43                                 ` Toke Høiland-Jørgensen
2021-03-18 23:16                                   ` Pali Rohár
2021-03-26 12:50                                     ` Pali Rohár
2021-03-26 15:25                                       ` Toke Høiland-Jørgensen
2021-03-26 15:34                                         ` Pali Rohár
2021-03-26 16:54                                           ` Toke Høiland-Jørgensen
2021-03-26 17:11                                             ` Pali Rohár
2021-03-26 17:51                                               ` Toke Høiland-Jørgensen
2021-03-29 17:09                                                 ` Pali Rohár
2021-03-31 14:02                                                   ` Toke Høiland-Jørgensen
2021-03-31 16:15                                                     ` Pali Rohár
2021-03-31 16:53                                                       ` Toke Høiland-Jørgensen
2020-10-29  1:21             ` Marek Behun
2020-10-29 15:12           ` Rob Herring
2020-10-27 18:03 ` Marek Behun
2020-10-27 19:00   ` Toke Høiland-Jørgensen
2020-10-27 20:19     ` Marek Behun
2020-10-27 20:49       ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).