ath11k.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* ath11k-qca6390-bringup-202011191920: new suspend implementation
@ 2020-11-19 19:48 Kalle Valo
  2020-11-19 19:52 ` Kalle Valo
  2020-12-02 18:51 ` Pavel Procopiuc
  0 siblings, 2 replies; 19+ messages in thread
From: Kalle Valo @ 2020-11-19 19:48 UTC (permalink / raw)
  To: ath11k

(Bcc: people reporting qca6390 problems)

Hi,

I collected all important QCA6390 fixes to ath11k-qca6390 branch so that
there's a good baseline for all testing:

https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=ath11k-qca6390-bringup

At the moment it's based on v5.10-rc4 and I will try to update it to a
recent -rc release every few weeks or so. Everytime I update the branch
I create a new tag and the latest tag is now:

ath11k-qca6390-bringup-202011191920

In this tag there's now a brand new implementation for suspend, which
relies that the platform provides power to QCA6390 during suspend. Not
all platforms do, but most of them should do that. ath11k also prints a
warning whenever it notices that the firmware has crashed, but I'm not
sure yet if it (the MHI subsystem to be exact) can detect every case.

The MSI patch is mostly the same, it had just some refactoring since the
last version. Unfortunately there's no solution still for the weird
crashes some people are seeing.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-19 19:48 ath11k-qca6390-bringup-202011191920: new suspend implementation Kalle Valo
@ 2020-11-19 19:52 ` Kalle Valo
  2020-11-19 22:00   ` Pavel Procopiuc
  2020-11-21 23:44   ` Mitchell Nordine
  2020-12-02 18:51 ` Pavel Procopiuc
  1 sibling, 2 replies; 19+ messages in thread
From: Kalle Valo @ 2020-11-19 19:52 UTC (permalink / raw)
  To: ath11k

Kalle Valo <kvalo@codeaurora.org> writes:

> (Bcc: people reporting qca6390 problems)
>
> Hi,
>
> I collected all important QCA6390 fixes to ath11k-qca6390 branch so that
> there's a good baseline for all testing:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=ath11k-qca6390-bringup
>
> At the moment it's based on v5.10-rc4 and I will try to update it to a
> recent -rc release every few weeks or so. Everytime I update the branch
> I create a new tag and the latest tag is now:
>
> ath11k-qca6390-bringup-202011191920
>
> In this tag there's now a brand new implementation for suspend, which
> relies that the platform provides power to QCA6390 during suspend. Not
> all platforms do, but most of them should do that. ath11k also prints a
> warning whenever it notices that the firmware has crashed, but I'm not
> sure yet if it (the MHI subsystem to be exact) can detect every case.
>
> The MSI patch is mostly the same, it had just some refactoring since the
> last version. Unfortunately there's no solution still for the weird
> crashes some people are seeing.

Forgot to mention when debugging ath11k PCI issues it's a good idea to
enable MHI debug messages. To do that enable CONFIG_MHI_BUS_DEBUG and
CONFIG_DYNAMIC_DEBUG and run:

sudo sh -c "echo -n 'module mhi +p' > /sys/kernel/debug/dynamic_debug/control"

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-19 19:52 ` Kalle Valo
@ 2020-11-19 22:00   ` Pavel Procopiuc
  2020-11-19 22:11     ` wi nk
  2020-11-20  9:40     ` Kalle Valo
  2020-11-21 23:44   ` Mitchell Nordine
  1 sibling, 2 replies; 19+ messages in thread
From: Pavel Procopiuc @ 2020-11-19 22:00 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath11k

Op 19.11.2020 om 20:52 schreef Kalle Valo:
> Kalle Valo <kvalo@codeaurora.org> writes:
> 
>> (Bcc: people reporting qca6390 problems)
>>
>> Hi,
>>
>> I collected all important QCA6390 fixes to ath11k-qca6390 branch so that
>> there's a good baseline for all testing:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=ath11k-qca6390-bringup
>>
>> At the moment it's based on v5.10-rc4 and I will try to update it to a
>> recent -rc release every few weeks or so. Everytime I update the branch
>> I create a new tag and the latest tag is now:
>>
>> ath11k-qca6390-bringup-202011191920
>>
>> In this tag there's now a brand new implementation for suspend, which
>> relies that the platform provides power to QCA6390 during suspend. Not
>> all platforms do, but most of them should do that. ath11k also prints a
>> warning whenever it notices that the firmware has crashed, but I'm not
>> sure yet if it (the MHI subsystem to be exact) can detect every case.
>>
>> The MSI patch is mostly the same, it had just some refactoring since the
>> last version. Unfortunately there's no solution still for the weird
>> crashes some people are seeing.
> 
> Forgot to mention when debugging ath11k PCI issues it's a good idea to
> enable MHI debug messages. To do that enable CONFIG_MHI_BUS_DEBUG and
> CONFIG_DYNAMIC_DEBUG and run:
> 
> sudo sh -c "echo -n 'module mhi +p' > /sys/kernel/debug/dynamic_debug/control"

Thanks! I gave it a spin. Regarding problems loading the driver, there doesn't seem to be any changes, without the 
memmap=20M$12M I'm seeing similar issues as before: inability to load firmware.

Log with the module autoload at boot:
Nov 19 22:08:15 razor kernel: Linux version 5.10.0-rc4 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34 
p6) 2.34.0) #12 SMP Thu Nov 19 22:03:06 CET 2020
Nov 19 22:08:15 razor kernel:   DMA zone: 64 pages used for memmap
Nov 19 22:08:15 razor kernel:   DMA32 zone: 5213 pages used for memmap
Nov 19 22:08:15 razor kernel:   Normal zone: 255840 pages used for memmap
Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 
0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
Nov 19 22:08:16 razor kernel: mhi 0000:05:00.0: Requested to power ON
Nov 19 22:08:16 razor kernel: mhi 0000:05:00.0: Power on setup success
Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[0] 0x1800000 3522560 1
Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[1] 0x1500000 884736 4
Nov 19 22:08:21 razor kernel: ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
Nov 19 22:08:21 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110

Log with manual load with "options ath11k debug_mask=0xffffffff" and after doing "echo -n 'module mhi +p' > 
/sys/kernel/debug/dynamic_debug/control":
Nov 19 22:34:07 razor kernel: Linux version 5.10.0-rc4 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34 
p6) 2.34.0) #12 SMP Thu Nov 19 22:03:06 CET 2020
Nov 19 22:34:07 razor kernel:   DMA zone: 64 pages used for memmap
Nov 19 22:34:07 razor kernel:   DMA32 zone: 5213 pages used for memmap
Nov 19 22:34:07 razor kernel:   Normal zone: 255840 pages used for memmap
Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 
0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
Nov 19 22:34:42 razor sudo[2247]:      pro : TTY=pts/1 ; PWD=/home/pro ; USER=root ; COMMAND=/sbin/modprobe ath11k_pci
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: boot pci_mem 0x000000003c58b991
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: pci tcsr_soc_hw_version major 2 minor 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: msi base data is 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Hardware name qca6390 hw2.0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Assign MSI to user: MHI, num_vectors: 3, user_base_data: 0, 
base_vector: 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Number of assigned MSI for MHI is 3, base vector is 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b034, shadow reg 0x8fc shadow_idx 0x0, ring_type 0, 
ring num 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b03c, shadow reg 0x900 shadow_idx 0x1, ring_type 0, 
ring num 1
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b044, shadow reg 0x904 shadow_idx 0x2, ring_type 0, 
ring num 2
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b04c, shadow reg 0x908 shadow_idx 0x3, ring_type 0, 
ring num 3
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b054, shadow reg 0x90c shadow_idx 0x4, ring_type 1, 
ring num 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b028, shadow reg 0x910 shadow_idx 0x5, ring_type 2, 
ring num 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b020, shadow reg 0x914 shadow_idx 0x6, ring_type 3, 
ring num 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b06c, shadow reg 0x918 shadow_idx 0x7, ring_type 4, 
ring num 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46000, shadow reg 0x91c shadow_idx 0x8, ring_type 5, 
ring num 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46008, shadow reg 0x920 shadow_idx 0x9, ring_type 5, 
ring num 1
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46010, shadow reg 0x924 shadow_idx 0xa, ring_type 5, 
ring num 2
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46018, shadow reg 0x928 shadow_idx 0xb, ring_type 6, 
ring num 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46034, shadow reg 0x92c shadow_idx 0xc, ring_type 7, 
ring num 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370b0, shadow reg 0x930 shadow_idx 0xd, ring_type 11, 
ring num 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a37018, shadow reg 0x934 shadow_idx 0xe, ring_type 12, 
ring num 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370c4, shadow reg 0x938 shadow_idx 0xf, ring_type 13, 
ring num 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370cc, shadow reg 0x93c shadow_idx 0x10, ring_type 
13, ring num 1
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370d4, shadow reg 0x940 shadow_idx 0x11, ring_type 
13, ring num 2
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370dc, shadow reg 0x944 shadow_idx 0x12, ring_type 
13, ring num 3
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a00400, shadow reg 0x948 shadow_idx 0x13, ring_type 8, 
ring num 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a03400, shadow reg 0x94c shadow_idx 0x14, ring_type 9, 
ring num 1
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0340c, shadow reg 0x950 shadow_idx 0x15, ring_type 
10, ring num 1
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a05400, shadow reg 0x954 shadow_idx 0x16, ring_type 9, 
ring num 2
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0540c, shadow reg 0x958 shadow_idx 0x17, ring_type 
10, ring num 2
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a06400, shadow reg 0x95c shadow_idx 0x18, ring_type 8, 
ring num 3
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a08400, shadow reg 0x960 shadow_idx 0x19, ring_type 8, 
ring num 4
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0b400, shadow reg 0x964 shadow_idx 0x1a, ring_type 9, 
ring num 5
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0b40c, shadow reg 0x968 shadow_idx 0x1b, ring_type 
10, ring num 5
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0e400, shadow reg 0x96c shadow_idx 0x1c, ring_type 8, 
ring num 7
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Assign MSI to user: CE, num_vectors: 10, user_base_data: 3, 
base_vector: 3
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Assign MSI to user: DP, num_vectors: 18, user_base_data: 14, 
base_vector: 14
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:229 group:0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:230 group:1
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:231 group:2
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:233 group:4
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:234 group:5
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:235 group:6
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:236 group:7
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:237 group:8
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:238 group:9
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:239 group:10
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: msi base data is 0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: MHISTATUS 0xff04
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: cookie:0x0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: WLAON_WARM_SW_ENTRY 0x0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: WLAON_WARM_SW_ENTRY 0x0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: soc reset cause:0
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: setting mhi state: INIT(0)
Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: setting mhi state: POWER_ON(2)
Nov 19 22:34:42 razor kernel: mhi 0000:05:00.0: Requested to power ON
Nov 19 22:34:42 razor kernel: mhi 0000:05:00.0: Power on setup success


Suspend seems to work. The log after booting with memmap=20M$12M and doing suspend:
Nov 19 22:47:59 razor kernel: Linux version 5.10.0-rc4 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34 
p6) 2.34.0) #14 SMP Thu Nov 19 22:44:32 CET 2020
Nov 19 22:47:59 razor kernel: Command line: ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2 
memmap=20M$12M quiet
Nov 19 22:47:59 razor kernel:   DMA zone: 47 pages used for memmap
Nov 19 22:47:59 razor kernel:   DMA32 zone: 5149 pages used for memmap
Nov 19 22:47:59 razor kernel:   Normal zone: 255840 pages used for memmap
Nov 19 22:47:59 razor kernel: Kernel command line: ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2 
memmap=20M$12M quiet ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2 memmap=20M$12M quiet
Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 
0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
Nov 19 22:48:00 razor kernel: mhi 0000:05:00.0: Requested to power ON
Nov 19 22:48:00 razor kernel: mhi 0000:05:00.0: Power on setup success
Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[0] 0x2800000 3522560 1
Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[1] 0x2500000 884736 4
Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50 fw_build_id
Nov 19 22:48:02 razor NetworkManager[793]: <info>  [1605822482.1378] rfkill1: found Wi-Fi radio killswitch (at 
/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1) (driver ath11k_pci)
Nov 19 22:48:04 razor ModemManager[725]: <info>  Couldn't check support for device 
'/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin

... suspend here ...

Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Allowing M3 transition
Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Wait for M3 completion
Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Entered with PM state: M3, MHI state: M3
Nov 19 22:49:33 razor ModemManager[725]: <info>  Couldn't check support for device 
'/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-19 22:00   ` Pavel Procopiuc
@ 2020-11-19 22:11     ` wi nk
  2020-11-19 23:08       ` wi nk
  2020-11-20  8:16       ` Pavel Procopiuc
  2020-11-20  9:40     ` Kalle Valo
  1 sibling, 2 replies; 19+ messages in thread
From: wi nk @ 2020-11-19 22:11 UTC (permalink / raw)
  To: Pavel Procopiuc; +Cc: ath11k, Kalle Valo

On Thu, Nov 19, 2020 at 11:00 PM Pavel Procopiuc
<pavel.procopiuc@gmail.com> wrote:
>
> Op 19.11.2020 om 20:52 schreef Kalle Valo:
> > Kalle Valo <kvalo@codeaurora.org> writes:
> >
> >> (Bcc: people reporting qca6390 problems)
> >>
> >> Hi,
> >>
> >> I collected all important QCA6390 fixes to ath11k-qca6390 branch so that
> >> there's a good baseline for all testing:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=ath11k-qca6390-bringup
> >>
> >> At the moment it's based on v5.10-rc4 and I will try to update it to a
> >> recent -rc release every few weeks or so. Everytime I update the branch
> >> I create a new tag and the latest tag is now:
> >>
> >> ath11k-qca6390-bringup-202011191920
> >>
> >> In this tag there's now a brand new implementation for suspend, which
> >> relies that the platform provides power to QCA6390 during suspend. Not
> >> all platforms do, but most of them should do that. ath11k also prints a
> >> warning whenever it notices that the firmware has crashed, but I'm not
> >> sure yet if it (the MHI subsystem to be exact) can detect every case.
> >>
> >> The MSI patch is mostly the same, it had just some refactoring since the
> >> last version. Unfortunately there's no solution still for the weird
> >> crashes some people are seeing.
> >
> > Forgot to mention when debugging ath11k PCI issues it's a good idea to
> > enable MHI debug messages. To do that enable CONFIG_MHI_BUS_DEBUG and
> > CONFIG_DYNAMIC_DEBUG and run:
> >
> > sudo sh -c "echo -n 'module mhi +p' > /sys/kernel/debug/dynamic_debug/control"
>
> Thanks! I gave it a spin. Regarding problems loading the driver, there doesn't seem to be any changes, without the
> memmap=20M$12M I'm seeing similar issues as before: inability to load firmware.
>
> Log with the module autoload at boot:
> Nov 19 22:08:15 razor kernel: Linux version 5.10.0-rc4 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> p6) 2.34.0) #12 SMP Thu Nov 19 22:03:06 CET 2020
> Nov 19 22:08:15 razor kernel:   DMA zone: 64 pages used for memmap
> Nov 19 22:08:15 razor kernel:   DMA32 zone: 5213 pages used for memmap
> Nov 19 22:08:15 razor kernel:   Normal zone: 255840 pages used for memmap
> Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
> Nov 19 22:08:16 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 19 22:08:16 razor kernel: mhi 0000:05:00.0: Power on setup success
> Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[0] 0x1800000 3522560 1
> Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[1] 0x1500000 884736 4
> Nov 19 22:08:21 razor kernel: ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> Nov 19 22:08:21 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
>
> Log with manual load with "options ath11k debug_mask=0xffffffff" and after doing "echo -n 'module mhi +p' >
> /sys/kernel/debug/dynamic_debug/control":
> Nov 19 22:34:07 razor kernel: Linux version 5.10.0-rc4 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> p6) 2.34.0) #12 SMP Thu Nov 19 22:03:06 CET 2020
> Nov 19 22:34:07 razor kernel:   DMA zone: 64 pages used for memmap
> Nov 19 22:34:07 razor kernel:   DMA32 zone: 5213 pages used for memmap
> Nov 19 22:34:07 razor kernel:   Normal zone: 255840 pages used for memmap
> Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> Nov 19 22:34:42 razor sudo[2247]:      pro : TTY=pts/1 ; PWD=/home/pro ; USER=root ; COMMAND=/sbin/modprobe ath11k_pci
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: boot pci_mem 0x000000003c58b991
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: pci tcsr_soc_hw_version major 2 minor 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: msi base data is 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Hardware name qca6390 hw2.0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Assign MSI to user: MHI, num_vectors: 3, user_base_data: 0,
> base_vector: 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Number of assigned MSI for MHI is 3, base vector is 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b034, shadow reg 0x8fc shadow_idx 0x0, ring_type 0,
> ring num 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b03c, shadow reg 0x900 shadow_idx 0x1, ring_type 0,
> ring num 1
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b044, shadow reg 0x904 shadow_idx 0x2, ring_type 0,
> ring num 2
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b04c, shadow reg 0x908 shadow_idx 0x3, ring_type 0,
> ring num 3
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b054, shadow reg 0x90c shadow_idx 0x4, ring_type 1,
> ring num 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b028, shadow reg 0x910 shadow_idx 0x5, ring_type 2,
> ring num 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b020, shadow reg 0x914 shadow_idx 0x6, ring_type 3,
> ring num 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b06c, shadow reg 0x918 shadow_idx 0x7, ring_type 4,
> ring num 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46000, shadow reg 0x91c shadow_idx 0x8, ring_type 5,
> ring num 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46008, shadow reg 0x920 shadow_idx 0x9, ring_type 5,
> ring num 1
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46010, shadow reg 0x924 shadow_idx 0xa, ring_type 5,
> ring num 2
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46018, shadow reg 0x928 shadow_idx 0xb, ring_type 6,
> ring num 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46034, shadow reg 0x92c shadow_idx 0xc, ring_type 7,
> ring num 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370b0, shadow reg 0x930 shadow_idx 0xd, ring_type 11,
> ring num 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a37018, shadow reg 0x934 shadow_idx 0xe, ring_type 12,
> ring num 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370c4, shadow reg 0x938 shadow_idx 0xf, ring_type 13,
> ring num 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370cc, shadow reg 0x93c shadow_idx 0x10, ring_type
> 13, ring num 1
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370d4, shadow reg 0x940 shadow_idx 0x11, ring_type
> 13, ring num 2
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370dc, shadow reg 0x944 shadow_idx 0x12, ring_type
> 13, ring num 3
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a00400, shadow reg 0x948 shadow_idx 0x13, ring_type 8,
> ring num 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a03400, shadow reg 0x94c shadow_idx 0x14, ring_type 9,
> ring num 1
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0340c, shadow reg 0x950 shadow_idx 0x15, ring_type
> 10, ring num 1
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a05400, shadow reg 0x954 shadow_idx 0x16, ring_type 9,
> ring num 2
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0540c, shadow reg 0x958 shadow_idx 0x17, ring_type
> 10, ring num 2
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a06400, shadow reg 0x95c shadow_idx 0x18, ring_type 8,
> ring num 3
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a08400, shadow reg 0x960 shadow_idx 0x19, ring_type 8,
> ring num 4
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0b400, shadow reg 0x964 shadow_idx 0x1a, ring_type 9,
> ring num 5
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0b40c, shadow reg 0x968 shadow_idx 0x1b, ring_type
> 10, ring num 5
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0e400, shadow reg 0x96c shadow_idx 0x1c, ring_type 8,
> ring num 7
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Assign MSI to user: CE, num_vectors: 10, user_base_data: 3,
> base_vector: 3
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Assign MSI to user: DP, num_vectors: 18, user_base_data: 14,
> base_vector: 14
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:229 group:0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:230 group:1
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:231 group:2
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:233 group:4
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:234 group:5
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:235 group:6
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:236 group:7
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:237 group:8
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:238 group:9
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:239 group:10
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: msi base data is 0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: MHISTATUS 0xff04
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: cookie:0x0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: WLAON_WARM_SW_ENTRY 0x0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: WLAON_WARM_SW_ENTRY 0x0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: soc reset cause:0
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: setting mhi state: INIT(0)
> Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: setting mhi state: POWER_ON(2)
> Nov 19 22:34:42 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 19 22:34:42 razor kernel: mhi 0000:05:00.0: Power on setup success
>
>
> Suspend seems to work. The log after booting with memmap=20M$12M and doing suspend:
> Nov 19 22:47:59 razor kernel: Linux version 5.10.0-rc4 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> p6) 2.34.0) #14 SMP Thu Nov 19 22:44:32 CET 2020
> Nov 19 22:47:59 razor kernel: Command line: ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2
> memmap=20M$12M quiet
> Nov 19 22:47:59 razor kernel:   DMA zone: 47 pages used for memmap
> Nov 19 22:47:59 razor kernel:   DMA32 zone: 5149 pages used for memmap
> Nov 19 22:47:59 razor kernel:   Normal zone: 255840 pages used for memmap
> Nov 19 22:47:59 razor kernel: Kernel command line: ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2
> memmap=20M$12M quiet ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2 memmap=20M$12M quiet
> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
> Nov 19 22:48:00 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 19 22:48:00 razor kernel: mhi 0000:05:00.0: Power on setup success
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[0] 0x2800000 3522560 1
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[1] 0x2500000 884736 4
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50 fw_build_id
> Nov 19 22:48:02 razor NetworkManager[793]: <info>  [1605822482.1378] rfkill1: found Wi-Fi radio killswitch (at
> /sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1) (driver ath11k_pci)
> Nov 19 22:48:04 razor ModemManager[725]: <info>  Couldn't check support for device
> '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin
>
> ... suspend here ...
>
> Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Allowing M3 transition
> Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Wait for M3 completion
> Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Entered with PM state: M3, MHI state: M3
> Nov 19 22:49:33 razor ModemManager[725]: <info>  Couldn't check support for device
> '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin
>
> --
> ath11k mailing list
> ath11k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath11k

Hi Pavel,

  I'm compiling it now as well.  For your testing did you revert
7fef431be9c9ac255838a9578331567b9dba4477 again?  The memmap
reservation never functioned for me.

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-19 22:11     ` wi nk
@ 2020-11-19 23:08       ` wi nk
  2020-11-20 16:01         ` Kalle Valo
  2020-11-20  8:16       ` Pavel Procopiuc
  1 sibling, 1 reply; 19+ messages in thread
From: wi nk @ 2020-11-19 23:08 UTC (permalink / raw)
  To: Pavel Procopiuc; +Cc: ath11k, Kalle Valo

On Thu, Nov 19, 2020 at 11:11 PM wi nk <wink@technolu.st> wrote:
>
> On Thu, Nov 19, 2020 at 11:00 PM Pavel Procopiuc
> <pavel.procopiuc@gmail.com> wrote:
> >
> > Op 19.11.2020 om 20:52 schreef Kalle Valo:
> > > Kalle Valo <kvalo@codeaurora.org> writes:
> > >
> > >> (Bcc: people reporting qca6390 problems)
> > >>
> > >> Hi,
> > >>
> > >> I collected all important QCA6390 fixes to ath11k-qca6390 branch so that
> > >> there's a good baseline for all testing:
> > >>
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=ath11k-qca6390-bringup
> > >>
> > >> At the moment it's based on v5.10-rc4 and I will try to update it to a
> > >> recent -rc release every few weeks or so. Everytime I update the branch
> > >> I create a new tag and the latest tag is now:
> > >>
> > >> ath11k-qca6390-bringup-202011191920
> > >>
> > >> In this tag there's now a brand new implementation for suspend, which
> > >> relies that the platform provides power to QCA6390 during suspend. Not
> > >> all platforms do, but most of them should do that. ath11k also prints a
> > >> warning whenever it notices that the firmware has crashed, but I'm not
> > >> sure yet if it (the MHI subsystem to be exact) can detect every case.
> > >>
> > >> The MSI patch is mostly the same, it had just some refactoring since the
> > >> last version. Unfortunately there's no solution still for the weird
> > >> crashes some people are seeing.
> > >
> > > Forgot to mention when debugging ath11k PCI issues it's a good idea to
> > > enable MHI debug messages. To do that enable CONFIG_MHI_BUS_DEBUG and
> > > CONFIG_DYNAMIC_DEBUG and run:
> > >
> > > sudo sh -c "echo -n 'module mhi +p' > /sys/kernel/debug/dynamic_debug/control"
> >
> > Thanks! I gave it a spin. Regarding problems loading the driver, there doesn't seem to be any changes, without the
> > memmap=20M$12M I'm seeing similar issues as before: inability to load firmware.
> >
> > Log with the module autoload at boot:
> > Nov 19 22:08:15 razor kernel: Linux version 5.10.0-rc4 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> > p6) 2.34.0) #12 SMP Thu Nov 19 22:03:06 CET 2020
> > Nov 19 22:08:15 razor kernel:   DMA zone: 64 pages used for memmap
> > Nov 19 22:08:15 razor kernel:   DMA32 zone: 5213 pages used for memmap
> > Nov 19 22:08:15 razor kernel:   Normal zone: 255840 pages used for memmap
> > Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> > Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> > Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> > Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> > 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> > Nov 19 22:08:15 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> > Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> > Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> > Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> > Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
> > Nov 19 22:08:16 razor kernel: mhi 0000:05:00.0: Requested to power ON
> > Nov 19 22:08:16 razor kernel: mhi 0000:05:00.0: Power on setup success
> > Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[0] 0x1800000 3522560 1
> > Nov 19 22:08:16 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[1] 0x1500000 884736 4
> > Nov 19 22:08:21 razor kernel: ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> > Nov 19 22:08:21 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> >
> > Log with manual load with "options ath11k debug_mask=0xffffffff" and after doing "echo -n 'module mhi +p' >
> > /sys/kernel/debug/dynamic_debug/control":
> > Nov 19 22:34:07 razor kernel: Linux version 5.10.0-rc4 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> > p6) 2.34.0) #12 SMP Thu Nov 19 22:03:06 CET 2020
> > Nov 19 22:34:07 razor kernel:   DMA zone: 64 pages used for memmap
> > Nov 19 22:34:07 razor kernel:   DMA32 zone: 5213 pages used for memmap
> > Nov 19 22:34:07 razor kernel:   Normal zone: 255840 pages used for memmap
> > Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> > Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> > Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> > Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> > 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> > Nov 19 22:34:07 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> > Nov 19 22:34:42 razor sudo[2247]:      pro : TTY=pts/1 ; PWD=/home/pro ; USER=root ; COMMAND=/sbin/modprobe ath11k_pci
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: boot pci_mem 0x000000003c58b991
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: pci tcsr_soc_hw_version major 2 minor 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: msi base data is 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Hardware name qca6390 hw2.0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Assign MSI to user: MHI, num_vectors: 3, user_base_data: 0,
> > base_vector: 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Number of assigned MSI for MHI is 3, base vector is 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b034, shadow reg 0x8fc shadow_idx 0x0, ring_type 0,
> > ring num 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b03c, shadow reg 0x900 shadow_idx 0x1, ring_type 0,
> > ring num 1
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b044, shadow reg 0x904 shadow_idx 0x2, ring_type 0,
> > ring num 2
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b04c, shadow reg 0x908 shadow_idx 0x3, ring_type 0,
> > ring num 3
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b054, shadow reg 0x90c shadow_idx 0x4, ring_type 1,
> > ring num 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b028, shadow reg 0x910 shadow_idx 0x5, ring_type 2,
> > ring num 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b020, shadow reg 0x914 shadow_idx 0x6, ring_type 3,
> > ring num 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a3b06c, shadow reg 0x918 shadow_idx 0x7, ring_type 4,
> > ring num 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46000, shadow reg 0x91c shadow_idx 0x8, ring_type 5,
> > ring num 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46008, shadow reg 0x920 shadow_idx 0x9, ring_type 5,
> > ring num 1
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46010, shadow reg 0x924 shadow_idx 0xa, ring_type 5,
> > ring num 2
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46018, shadow reg 0x928 shadow_idx 0xb, ring_type 6,
> > ring num 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a46034, shadow reg 0x92c shadow_idx 0xc, ring_type 7,
> > ring num 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370b0, shadow reg 0x930 shadow_idx 0xd, ring_type 11,
> > ring num 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a37018, shadow reg 0x934 shadow_idx 0xe, ring_type 12,
> > ring num 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370c4, shadow reg 0x938 shadow_idx 0xf, ring_type 13,
> > ring num 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370cc, shadow reg 0x93c shadow_idx 0x10, ring_type
> > 13, ring num 1
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370d4, shadow reg 0x940 shadow_idx 0x11, ring_type
> > 13, ring num 2
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a370dc, shadow reg 0x944 shadow_idx 0x12, ring_type
> > 13, ring num 3
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a00400, shadow reg 0x948 shadow_idx 0x13, ring_type 8,
> > ring num 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a03400, shadow reg 0x94c shadow_idx 0x14, ring_type 9,
> > ring num 1
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0340c, shadow reg 0x950 shadow_idx 0x15, ring_type
> > 10, ring num 1
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a05400, shadow reg 0x954 shadow_idx 0x16, ring_type 9,
> > ring num 2
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0540c, shadow reg 0x958 shadow_idx 0x17, ring_type
> > 10, ring num 2
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a06400, shadow reg 0x95c shadow_idx 0x18, ring_type 8,
> > ring num 3
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a08400, shadow reg 0x960 shadow_idx 0x19, ring_type 8,
> > ring num 4
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0b400, shadow reg 0x964 shadow_idx 0x1a, ring_type 9,
> > ring num 5
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0b40c, shadow reg 0x968 shadow_idx 0x1b, ring_type
> > 10, ring num 5
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: target_reg a0e400, shadow reg 0x96c shadow_idx 0x1c, ring_type 8,
> > ring num 7
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Assign MSI to user: CE, num_vectors: 10, user_base_data: 3,
> > base_vector: 3
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: Assign MSI to user: DP, num_vectors: 18, user_base_data: 14,
> > base_vector: 14
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:229 group:0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:230 group:1
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:231 group:2
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:233 group:4
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:234 group:5
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:235 group:6
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:236 group:7
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:237 group:8
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:238 group:9
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: irq:239 group:10
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: msi base data is 0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: MHISTATUS 0xff04
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: cookie:0x0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: WLAON_WARM_SW_ENTRY 0x0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: WLAON_WARM_SW_ENTRY 0x0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: soc reset cause:0
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: setting mhi state: INIT(0)
> > Nov 19 22:34:42 razor kernel: ath11k_pci 0000:05:00.0: setting mhi state: POWER_ON(2)
> > Nov 19 22:34:42 razor kernel: mhi 0000:05:00.0: Requested to power ON
> > Nov 19 22:34:42 razor kernel: mhi 0000:05:00.0: Power on setup success
> >
> >
> > Suspend seems to work. The log after booting with memmap=20M$12M and doing suspend:
> > Nov 19 22:47:59 razor kernel: Linux version 5.10.0-rc4 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> > p6) 2.34.0) #14 SMP Thu Nov 19 22:44:32 CET 2020
> > Nov 19 22:47:59 razor kernel: Command line: ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2
> > memmap=20M$12M quiet
> > Nov 19 22:47:59 razor kernel:   DMA zone: 47 pages used for memmap
> > Nov 19 22:47:59 razor kernel:   DMA32 zone: 5149 pages used for memmap
> > Nov 19 22:47:59 razor kernel:   Normal zone: 255840 pages used for memmap
> > Nov 19 22:47:59 razor kernel: Kernel command line: ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2
> > memmap=20M$12M quiet ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2 memmap=20M$12M quiet
> > Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> > Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> > Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> > Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> > 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> > Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> > Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> > Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> > Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> > Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
> > Nov 19 22:48:00 razor kernel: mhi 0000:05:00.0: Requested to power ON
> > Nov 19 22:48:00 razor kernel: mhi 0000:05:00.0: Power on setup success
> > Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[0] 0x2800000 3522560 1
> > Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[1] 0x2500000 884736 4
> > Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
> > Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50 fw_build_id
> > Nov 19 22:48:02 razor NetworkManager[793]: <info>  [1605822482.1378] rfkill1: found Wi-Fi radio killswitch (at
> > /sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1) (driver ath11k_pci)
> > Nov 19 22:48:04 razor ModemManager[725]: <info>  Couldn't check support for device
> > '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin
> >
> > ... suspend here ...
> >
> > Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Allowing M3 transition
> > Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Wait for M3 completion
> > Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Entered with PM state: M3, MHI state: M3
> > Nov 19 22:49:33 razor ModemManager[725]: <info>  Couldn't check support for device
> > '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by any plugin
> >
> > --
> > ath11k mailing list
> > ath11k@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/ath11k
>
> Hi Pavel,
>
>   I'm compiling it now as well.  For your testing did you revert
> 7fef431be9c9ac255838a9578331567b9dba4477 again?  The memmap
> reservation never functioned for me.

Ok, so I can answer my own question, no I didn't need to revert that
commit.  That said I seem to be activating the RT throttling message
way more frequently (4/5 boots, this fifth one was successful).  Kalle
- following the thought that something is going out of control in the
irq tasklet stuff, earlier today I was playing with the MSI patch that
introduces the irq_enable_flag and the functions to set/unset it and
noticed that in the ath11k_pci_ce_* functions that enable / disable
IRQs , if I switched the order of the flag assignment and the irq
enable/disable function call, I saw this behavior more frequently as
well.  I haven't fully groked the re-entrancy model of these
functions, but there's definitely a race occuring somehow.  It seems
to occur mostly during some of the actual 802.11 association:

[   26.945028] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
experimental!
[   26.945102] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
0x8e300000-0x8e3fffff 64bit]
[   26.945120] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
[   26.945207] ath11k_pci 0000:55:00.0: MSI vectors: 1
[   26.949329] NET: Registered protocol family 42
[   26.999257] mhi 0000:55:00.0: Requested to power ON
[   26.999419] mhi 0000:55:00.0: Power on setup success
[   27.171994] ath11k_pci 0000:55:00.0: qmi req mem_seg[0] 0x27800000 3522560 1
[   27.171999] ath11k_pci 0000:55:00.0: qmi req mem_seg[1] 0x27d00000 884736 4
[   27.183341] ath11k_pci 0000:55:00.0: chip_id 0x0 chip_family 0xb
board_id 0xff soc_id 0xffffffff
[   27.183345] ath11k_pci 0000:55:00.0: fw_version 0x101c06cc
fw_build_timestamp 2020-06-24 19:50 fw_build_id
[   27.387420] ath11k_pci 0000:55:00.0 wlp85s0: renamed from wlan0

<snip>  Some time during the following pile of messages (after some
seconds) is when I usually experience the machine spinning out and
freezing.

[   34.843605] wlp85s0: authenticate with ec:08:6b:27:01:ea
[   34.990949] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
[   35.094334] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
[   35.096624] wlp85s0: authenticated
[   35.102421] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
[   35.105012] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
(capab=0x411 status=0 aid=6)
[   35.116898] wlp85s0: associated
[   35.154059] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready

If the machine/adapter survives about 10 seconds beyond this, it will
stay up indefinitely..

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-19 22:11     ` wi nk
  2020-11-19 23:08       ` wi nk
@ 2020-11-20  8:16       ` Pavel Procopiuc
  1 sibling, 0 replies; 19+ messages in thread
From: Pavel Procopiuc @ 2020-11-20  8:16 UTC (permalink / raw)
  To: wi nk; +Cc: ath11k, Kalle Valo

Op 19.11.2020 om 23:11 schreef wi nk:
> Hi Pavel,
> 
>    I'm compiling it now as well.  For your testing did you revert
> 7fef431be9c9ac255838a9578331567b9dba4477 again?  The memmap
> reservation never functioned for me.

No I did not, I'm using the vanilla 5.10-rc4 + the commits from ath11k-qca6390-bringup branch + Bluetooth LE fix [1] 
(not that it should matter in this case). The memmap=20M$12M option completely (at least for the last 10 or so boots) 
fixes the problem. I've tried reducing the non-allocatable window and this is the the minimum that works for me reliably 
so far.

[1]: https://patchwork.kernel.org/project/bluetooth/patch/20201022082304.31757-1-sathish.narasimman@intel.com/

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-19 22:00   ` Pavel Procopiuc
  2020-11-19 22:11     ` wi nk
@ 2020-11-20  9:40     ` Kalle Valo
  2020-11-20 10:24       ` Pavel Procopiuc
  1 sibling, 1 reply; 19+ messages in thread
From: Kalle Valo @ 2020-11-20  9:40 UTC (permalink / raw)
  To: Pavel Procopiuc; +Cc: ath11k

Pavel Procopiuc <pavel.procopiuc@gmail.com> writes:

> Suspend seems to work. The log after booting with memmap=20M$12M and doing suspend:
> Nov 19 22:47:59 razor kernel: Linux version 5.10.0-rc4 (root@razor)
> (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34 p6) 2.34.0) #14
> SMP Thu Nov 19 22:44:32 CET 2020
> Nov 19 22:47:59 razor kernel: Command line: ro root=/dev/nvme0n1p2
> resume=/dev/nvme1n1p1 zram.num_devices=2 memmap=20M$12M quiet
> Nov 19 22:47:59 razor kernel:   DMA zone: 47 pages used for memmap
> Nov 19 22:47:59 razor kernel:   DMA32 zone: 5149 pages used for memmap
> Nov 19 22:47:59 razor kernel:   Normal zone: 255840 pages used for memmap
> Nov 19 22:47:59 razor kernel: Kernel command line: ro
> root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2
> memmap=20M$12M quiet ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1
> zram.num_devices=2 memmap=20M$12M quiet
> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: reg 0x10: [mem
> 0xd2100000-0xd21fffff 64bit]
> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available
> PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:1c.1
> (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k
> PCI support is experimental!
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned
> [mem 0xd2100000-0xd21fffff 64bit]
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
> Nov 19 22:48:00 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 19 22:48:00 razor kernel: mhi 0000:05:00.0: Power on setup success
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: qmi req
> mem_seg[0] 0x2800000 3522560 1
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: qmi req
> mem_seg[1] 0x2500000 884736 4
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0
> chip_family 0xb board_id 0xff soc_id 0xffffffff
> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: fw_version
> 0x101c06cc fw_build_timestamp 2020-06-24 19:50 fw_build_id
> Nov 19 22:48:02 razor NetworkManager[793]: <info>  [1605822482.1378]
> rfkill1: found Wi-Fi radio killswitch (at
> /sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1)
> (driver ath11k_pci)
> Nov 19 22:48:04 razor ModemManager[725]: <info>  Couldn't check
> support for device
> '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by
> any plugin
>
> ... suspend here ...
>
> Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Allowing M3 transition
> Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Wait for M3 completion
> Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Entered with PM state:
> M3, MHI state: M3
> Nov 19 22:49:33 razor ModemManager[725]: <info>  Couldn't check
> support for device
> '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by
> any plugin

And I guess resume also worked? :) Do you have logs about resume?

You had Dell XPS 17, right?

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-20  9:40     ` Kalle Valo
@ 2020-11-20 10:24       ` Pavel Procopiuc
  2020-11-20 15:56         ` Kalle Valo
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Procopiuc @ 2020-11-20 10:24 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath11k

Op 20.11.2020 om 10:40 schreef Kalle Valo:
> Pavel Procopiuc <pavel.procopiuc@gmail.com> writes:
> 
>> Suspend seems to work. The log after booting with memmap=20M$12M and doing suspend:
>> Nov 19 22:47:59 razor kernel: Linux version 5.10.0-rc4 (root@razor)
>> (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34 p6) 2.34.0) #14
>> SMP Thu Nov 19 22:44:32 CET 2020
>> Nov 19 22:47:59 razor kernel: Command line: ro root=/dev/nvme0n1p2
>> resume=/dev/nvme1n1p1 zram.num_devices=2 memmap=20M$12M quiet
>> Nov 19 22:47:59 razor kernel:   DMA zone: 47 pages used for memmap
>> Nov 19 22:47:59 razor kernel:   DMA32 zone: 5149 pages used for memmap
>> Nov 19 22:47:59 razor kernel:   Normal zone: 255840 pages used for memmap
>> Nov 19 22:47:59 razor kernel: Kernel command line: ro
>> root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1 zram.num_devices=2
>> memmap=20M$12M quiet ro root=/dev/nvme0n1p2 resume=/dev/nvme1n1p1
>> zram.num_devices=2 memmap=20M$12M quiet
>> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
>> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: reg 0x10: [mem
>> 0xd2100000-0xd21fffff 64bit]
>> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
>> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available
>> PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:1c.1
>> (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
>> Nov 19 22:47:59 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
>> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k
>> PCI support is experimental!
>> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned
>> [mem 0xd2100000-0xd21fffff 64bit]
>> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
>> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
>> Nov 19 22:48:00 razor kernel: mhi 0000:05:00.0: Requested to power ON
>> Nov 19 22:48:00 razor kernel: mhi 0000:05:00.0: Power on setup success
>> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: qmi req
>> mem_seg[0] 0x2800000 3522560 1
>> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: qmi req
>> mem_seg[1] 0x2500000 884736 4
>> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: chip_id 0x0
>> chip_family 0xb board_id 0xff soc_id 0xffffffff
>> Nov 19 22:48:00 razor kernel: ath11k_pci 0000:05:00.0: fw_version
>> 0x101c06cc fw_build_timestamp 2020-06-24 19:50 fw_build_id
>> Nov 19 22:48:02 razor NetworkManager[793]: <info>  [1605822482.1378]
>> rfkill1: found Wi-Fi radio killswitch (at
>> /sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0/ieee80211/phy0/rfkill1)
>> (driver ath11k_pci)
>> Nov 19 22:48:04 razor ModemManager[725]: <info>  Couldn't check
>> support for device
>> '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by
>> any plugin
>>
>> ... suspend here ...
>>
>> Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Allowing M3 transition
>> Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Wait for M3 completion
>> Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Entered with PM state:
>> M3, MHI state: M3
>> Nov 19 22:49:33 razor ModemManager[725]: <info>  Couldn't check
>> support for device
>> '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by
>> any plugin
> 
> And I guess resume also worked? :) Do you have logs about resume?

Yes, resume was successful and I could use wireless network thereafter :)) The ath11k module didn't produce any 
additional log lines after resume, only those "kernel: mhi ..." lines were visible.

> You had Dell XPS 17, right?

Yes, I've got the XPS 17 model with the i9 processor (the i7 and lower are outfitted with the Killer AX1650 wireless card).

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-20 10:24       ` Pavel Procopiuc
@ 2020-11-20 15:56         ` Kalle Valo
  0 siblings, 0 replies; 19+ messages in thread
From: Kalle Valo @ 2020-11-20 15:56 UTC (permalink / raw)
  To: Pavel Procopiuc; +Cc: ath11k

Pavel Procopiuc <pavel.procopiuc@gmail.com> writes:

>>> ... suspend here ...
>>>
>>> Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Allowing M3 transition
>>> Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Wait for M3 completion
>>> Nov 19 22:49:30 razor kernel: mhi 0000:05:00.0: Entered with PM state:
>>> M3, MHI state: M3
>>> Nov 19 22:49:33 razor ModemManager[725]: <info>  Couldn't check
>>> support for device
>>> '/sys/devices/pci0000:00/0000:00:1c.1/0000:05:00.0': not supported by
>>> any plugin
>>
>> And I guess resume also worked? :) Do you have logs about resume?
>
> Yes, resume was successful and I could use wireless network thereafter
> :)) The ath11k module didn't produce any additional log lines after
> resume, only those "kernel: mhi ..." lines were visible.
>
>> You had Dell XPS 17, right?
>
> Yes, I've got the XPS 17 model with the i9 processor (the i7 and lower
> are outfitted with the Killer AX1650 wireless card).

Thanks. I got a report that on XPS 15 this suspend implementation
doesn't work at all, haven't figured out yet why.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-19 23:08       ` wi nk
@ 2020-11-20 16:01         ` Kalle Valo
  2020-11-20 16:59           ` wi nk
  0 siblings, 1 reply; 19+ messages in thread
From: Kalle Valo @ 2020-11-20 16:01 UTC (permalink / raw)
  To: wi nk; +Cc: Pavel Procopiuc, ath11k

wi nk <wink@technolu.st> writes:

> Ok, so I can answer my own question, no I didn't need to revert that
> commit.  That said I seem to be activating the RT throttling message
> way more frequently (4/5 boots, this fifth one was successful).  Kalle
> - following the thought that something is going out of control in the
> irq tasklet stuff, earlier today I was playing with the MSI patch that
> introduces the irq_enable_flag and the functions to set/unset it and
> noticed that in the ath11k_pci_ce_* functions that enable / disable
> IRQs , if I switched the order of the flag assignment and the irq
> enable/disable function call, I saw this behavior more frequently as
> well.  I haven't fully groked the re-entrancy model of these
> functions, but there's definitely a race occuring somehow.  It seems
> to occur mostly during some of the actual 802.11 association:
>
> [   26.945028] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
> experimental!
> [   26.945102] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
> 0x8e300000-0x8e3fffff 64bit]
> [   26.945120] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
> [   26.945207] ath11k_pci 0000:55:00.0: MSI vectors: 1
> [   26.949329] NET: Registered protocol family 42
> [   26.999257] mhi 0000:55:00.0: Requested to power ON
> [   26.999419] mhi 0000:55:00.0: Power on setup success
> [   27.171994] ath11k_pci 0000:55:00.0: qmi req mem_seg[0] 0x27800000 3522560 1
> [   27.171999] ath11k_pci 0000:55:00.0: qmi req mem_seg[1] 0x27d00000 884736 4
> [   27.183341] ath11k_pci 0000:55:00.0: chip_id 0x0 chip_family 0xb
> board_id 0xff soc_id 0xffffffff
> [   27.183345] ath11k_pci 0000:55:00.0: fw_version 0x101c06cc
> fw_build_timestamp 2020-06-24 19:50 fw_build_id
> [   27.387420] ath11k_pci 0000:55:00.0 wlp85s0: renamed from wlan0
>
> <snip>  Some time during the following pile of messages (after some
> seconds) is when I usually experience the machine spinning out and
> freezing.
>
> [   34.843605] wlp85s0: authenticate with ec:08:6b:27:01:ea
> [   34.990949] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> [   35.094334] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
> [   35.096624] wlp85s0: authenticated
> [   35.102421] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> [   35.105012] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> (capab=0x411 status=0 aid=6)
> [   35.116898] wlp85s0: associated
> [   35.154059] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
>
> If the machine/adapter survives about 10 seconds beyond this, it will
> stay up indefinitely..

Yeah, there's something strange happening which is causing different
symptoms, and some people don't see it at all. We are still
investigating it, but if you find any possible ideas please let me (and
the list) know.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-20 16:01         ` Kalle Valo
@ 2020-11-20 16:59           ` wi nk
  0 siblings, 0 replies; 19+ messages in thread
From: wi nk @ 2020-11-20 16:59 UTC (permalink / raw)
  To: Kalle Valo; +Cc: Pavel Procopiuc, ath11k

On Fri, Nov 20, 2020 at 5:02 PM Kalle Valo <kvalo@codeaurora.org> wrote:
>
> wi nk <wink@technolu.st> writes:
>
> > Ok, so I can answer my own question, no I didn't need to revert that
> > commit.  That said I seem to be activating the RT throttling message
> > way more frequently (4/5 boots, this fifth one was successful).  Kalle
> > - following the thought that something is going out of control in the
> > irq tasklet stuff, earlier today I was playing with the MSI patch that
> > introduces the irq_enable_flag and the functions to set/unset it and
> > noticed that in the ath11k_pci_ce_* functions that enable / disable
> > IRQs , if I switched the order of the flag assignment and the irq
> > enable/disable function call, I saw this behavior more frequently as
> > well.  I haven't fully groked the re-entrancy model of these
> > functions, but there's definitely a race occuring somehow.  It seems
> > to occur mostly during some of the actual 802.11 association:
> >
> > [   26.945028] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
> > experimental!
> > [   26.945102] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
> > 0x8e300000-0x8e3fffff 64bit]
> > [   26.945120] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
> > [   26.945207] ath11k_pci 0000:55:00.0: MSI vectors: 1
> > [   26.949329] NET: Registered protocol family 42
> > [   26.999257] mhi 0000:55:00.0: Requested to power ON
> > [   26.999419] mhi 0000:55:00.0: Power on setup success
> > [   27.171994] ath11k_pci 0000:55:00.0: qmi req mem_seg[0] 0x27800000 3522560 1
> > [   27.171999] ath11k_pci 0000:55:00.0: qmi req mem_seg[1] 0x27d00000 884736 4
> > [   27.183341] ath11k_pci 0000:55:00.0: chip_id 0x0 chip_family 0xb
> > board_id 0xff soc_id 0xffffffff
> > [   27.183345] ath11k_pci 0000:55:00.0: fw_version 0x101c06cc
> > fw_build_timestamp 2020-06-24 19:50 fw_build_id
> > [   27.387420] ath11k_pci 0000:55:00.0 wlp85s0: renamed from wlan0
> >
> > <snip>  Some time during the following pile of messages (after some
> > seconds) is when I usually experience the machine spinning out and
> > freezing.
> >
> > [   34.843605] wlp85s0: authenticate with ec:08:6b:27:01:ea
> > [   34.990949] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> > [   35.094334] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
> > [   35.096624] wlp85s0: authenticated
> > [   35.102421] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> > [   35.105012] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> > (capab=0x411 status=0 aid=6)
> > [   35.116898] wlp85s0: associated
> > [   35.154059] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
> >
> > If the machine/adapter survives about 10 seconds beyond this, it will
> > stay up indefinitely..
>
> Yeah, there's something strange happening which is causing different
> symptoms, and some people don't see it at all. We are still
> investigating it, but if you find any possible ideas please let me (and
> the list) know.
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

I think one of the large differences is that Pavel's XPS is exposing /
allowing the 32 MSI vectors, whereas the 13 inch XPS with the killer
1650 are only providing one, forcing this new code path that handles
multiplexing / demultiplexing them all.  Am I understanding that
difference correctly?  I'm still spinning up on my knowledge of these
internals, but one of the big changes in that difference is that it
introduces a new set of flags that control enabling/disabling the irqs
based on index.  Does reading/writing to that array need any
synchronization?  I see the disable_irq(_nosync) calls are issued in a
way they won't block intentionally, but the enable_irqs are not (and
don't seem to be able to be).  Is there some kind of deadlocking
occuring there as a result?   Here are the changes I was referring to
in my previous email, here is piece from the single MSI vector patch:

+ if (vecs_32_cap)
+ enable_irq(ab->irq_num[irq_idx]);
+ ath11k_pci_set_irq_enable_flag(ab, irq_idx, 1);

If I swap the  ordering of the conditional/enable_irq and the setting
of the flag in the array, so:

+ ath11k_pci_set_irq_enable_flag(ab, irq_idx, 1);
+ if (vecs_32_cap)
+ enable_irq(ab->irq_num[irq_idx]);

If re-entrancy weren't an issue, I wouldn't expect any difference
between these pieces of code, however there seems to be changes in
behavior when I play with these amongst the occurrences of
enabling/disabling the irqs/flags.  As with any of these races, this
could be a red herring and just changing the timing of things
slightly, but with the observation of the XPS 17 working without the
single MSI and this version going nuts and causing the RT throttling
or freezing entirely, this seems to be a reasonable suspect.

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-19 19:52 ` Kalle Valo
  2020-11-19 22:00   ` Pavel Procopiuc
@ 2020-11-21 23:44   ` Mitchell Nordine
  2020-11-22 13:15     ` Mitchell Nordine
  1 sibling, 1 reply; 19+ messages in thread
From: Mitchell Nordine @ 2020-11-21 23:44 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath11k

Thanks for the update!

I no longer notice any errors related to ath11k during boot of NixOS
on my XPS 13 9310 with these patches:

[mindtree@mindtree:~]$ dmesg | grep -e ath11
[    4.084314] ath11k_pci 0000:56:00.0: WARNING: ath11k PCI support is
experimental!
[    4.084358] ath11k_pci 0000:56:00.0: BAR 0: assigned [mem
0x8c300000-0x8c3fffff 64bit]
[    4.084377] ath11k_pci 0000:56:00.0: enabling device (0000 -> 0002)
[    4.084442] ath11k_pci 0000:56:00.0: MSI vectors: 1
[    4.320847] ath11k_pci 0000:56:00.0: qmi req mem_seg[0] 0x59c00000 3522560 1
[    4.320849] ath11k_pci 0000:56:00.0: qmi req mem_seg[1] 0x5a200000 884736 4
[    4.330816] ath11k_pci 0000:56:00.0: chip_id 0x0 chip_family 0xb
board_id 0xff soc_id 0xffffffff
[    4.330818] ath11k_pci 0000:56:00.0: fw_version 0x101c06cc
fw_build_timestamp 2020-06-24 19:50 fw_build_id
[    4.521522] ath11k_pci 0000:56:00.0 wlp86s0: renamed from wlan0

Everything appears to run smoothly for the first 5-10 minutes, then
the firmware appears to crash and the internet drops out:

[  293.677300] ath11k_pci 0000:56:00.0: firmware crashed:
MHI_CB_SYS_ERROR
[  385.774509] mhi 0000:56:00.0: Device failed to exit MHI Reset state

I haven't yet been able to identify an action that consistently causes
the crash.

Following the crash, the gnome shell appears to still believe that the
connection is up, however upon clicking on the wifi in the top-right
drop-down menu and clicking the "Turn Off" option, the shell freezes
for a few seconds and a few more errors show up in dmesg:

[  634.018718] wlp86s0: deauthenticating from 7a:8a:20:d5:98:d7 by
local choice (Reason: 3=DEAUTH_LEAVING)
[  639.151611] ath11k_pci 0000:56:00.0: failed to flush transmit queue
0
[  642.159384] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
[  642.159388] ath11k_pci 0000:56:00.0: failed to send
WMI_PEER_REORDER_QUEUE_SETUP
[  642.159394] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
[  642.159400] wlp86s0: HW problem - can not stop rx aggregation for
7a:8a:20:d5:98:d7 tid 0
[  645.168070] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
[  645.168072] ath11k_pci 0000:56:00.0: failed to send
WMI_PEER_REORDER_QUEUE_SETUP
[  645.168074] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
[  645.168077] wlp86s0: HW problem - can not stop rx aggregation for
7a:8a:20:d5:98:d7 tid 1
[  648.174960] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
[  648.174965] ath11k_pci 0000:56:00.0: failed to send
WMI_PEER_REORDER_QUEUE_SETUP
[  648.174971] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
[  648.174976] wlp86s0: HW problem - can not stop rx aggregation for
7a:8a:20:d5:98:d7 tid 6
[  651.183596] ath11k_pci 0000:56:00.0: wmi command 20489 timeout
[  651.183601] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_INSTALL_KEY cmd
[  651.183606] ath11k_pci 0000:56:00.0: ath11k_install_key failed (-11)
[  651.183610] wlp86s0: failed to remove key (0, 7a:8a:20:d5:98:d7)
from hardware (-11)
[  654.190511] ath11k_pci 0000:56:00.0: wmi command 24578 timeout
[  654.190516] ath11k_pci 0000:56:00.0: failed to send WMI_PEER_DELETE cmd
[  654.190523] ath11k_pci 0000:56:00.0: failed to delete peer vdev_id
0 addr 7a:8a:20:d5:98:d7 ret -11
[  654.190526] ath11k_pci 0000:56:00.0: Failed to delete peer:
7a:8a:20:d5:98:d7 for VDEV: 0
[  654.190528] ath11k_pci 0000:56:00.0: Found peer entry
9c:b6:d0:3e:43:4a n vdev 0 after it was supposedly removed
[  654.190574] ------------[ cut here ]------------
[  654.190594] WARNING: CPU: 5 PID: 1208 at
net/mac80211/sta_info.c:1098 __sta_info_destroy_part2+0x11c/0x140
[mac80211]
[  654.190595] Modules linked in: ath9k_htc ath9k_common ath9k_hw ath
fuse ctr ccm michael_mic af_packet cdc_ether usbnet r8152 mii
typec_displayport uvcvideo
videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common
videodev mc hid_sensor_als hid_sensor_trigger
industrialio_triggered_buffer kfifo_buf hid_se
nsor_iio_common industrialio hid_sensor_hub intel_ishtp_loader joydev
mousedev intel_ishtp_hid wacom usbhid hid_multitouch hid_generic
qrtr_mhi iTCO_wdt intel_
pmc_bxt 8250_dw watchdog mei_hdcp i2c_designware_platform
i2c_designware_core intel_rapl_msr snd_sof_pci snd_sof_intel_byt
snd_sof_intel_ipc qrtr dell_wmi wmi_
bmof ns snd_sof_intel_hda_common dell_laptop ath11k_pci
snd_soc_hdac_hda dell_smbios snd_sof_xtensa_dsp snd_hda_codec_hdmi mhi
snd_sof_intel_hda dell_wmi_descr
iptor dcdbas snd_sof ath11k x86_pkg_temp_thermal intel_powerclamp
dell_smm_hwmon qmi_helpers snd_hda_ext_core coretemp crc32_pclmul
ghash_clmulni_intel snd_soc
_acpi_intel_match aesni_intel snd_soc_acpi
[  654.190666]  snd_hda_codec_realtek libaes mac80211 crypto_simd
cryptd glue_helper snd_hda_codec_generic ledtrig_audio intel_cstate
snd_soc_core intel_uncore
 snd_compress sha256_ssse3 ac97_bus snd_pcm_dmaengine sha256_generic
input_leds led_class deflate snd_hda_intel intel_spi_pci efi_pstore
snd_intel_dspcfg cfg80
211 intel_spi serio_raw pstore spi_nor snd_hda_codec mtd nls_iso8859_1
nls_cp437 snd_hda_core vfat i2c_i801 snd_hwdep i2c_smbus rfkill
tpm_crb fat libarc4 sch_
fq_codel intel_ish_ipc mei_me intel_lpss_pci tpm_tis intel_ishtp
intel_lpss tpm_tis_core mei ucsi_acpi idma64 processor_thermal_device
virt_dma tpm typec_ucsi
intel_rapl_common 8250_pci intel_soc_dts_iosf typec snd_pcm_oss
rng_core snd_mixer_oss tiny_power_button snd_pcm wmi battery button
snd_timer snd i2c_hid sound
core hid msr int3403_thermal evdev int340x_thermal_zone mac_hid
int3400_thermal acpi_thermal_rel intel_hid sparse_keymap
pinctrl_tigerlake intel_pmc_core acpi_
tad ac acpi_pad loop cpufreq_powersave tun tap
[  654.190754]  macvlan bridge stp llc kvm_intel kvm irqbypass
efivarfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache
jbd2 xhci_pci xhci_pci_ren
esas rtsx_pci_sdmmc xhci_hcd mmc_core atkbd libps2 usbcore thunderbolt
nvme nvme_core rtsx_pci crc32c_intel t10_pi crc_t10dif
crct10dif_generic crct10dif_pclmu
l usb_common crct10dif_common i8042 rtc_cmos serio dm_mod i915 video
intel_gtt i2c_algo_bit cec drm_kms_helper syscopyarea sysfillrect
sysimgblt fb_sys_fops dr
m i2c_core backlight agpgart
[  654.190811] CPU: 5 PID: 1208 Comm: NetworkManager Tainted: G
W I       5.10.0-rc4 #1-NixOS
[  654.190813] Hardware name: Dell Inc. XPS 13 9310/0F7M4C, BIOS 1.1.1
10/05/2020
[  654.190825] RIP: 0010:__sta_info_destroy_part2+0x11c/0x140 [mac80211]
[  654.190829] Code: ff 0f 0b 80 bd 14 01 00 00 00 74 82 45 31 c0 b9
01 00 00 00 48 89 ea 48 89 de 4c 89 e7 e8 ac ad ff ff 85 c0 0f 84 64
ff ff ff <0f> 0b e9 5
d ff ff ff be 03 00 00 00 48 89 ef e8 10 ea ff ff 85 c0
[  654.190831] RSP: 0018:ffffac81c0897b80 EFLAGS: 00010286
[  654.190834] RAX: 00000000fffffff5 RBX: ffff9d54d5800900 RCX:
0000000000000000
[  654.190836] RDX: ffff9d54c3d0bf00 RSI: 000000000020001a RDI:
ffff9d54d629b5d8
[  654.190837] RBP: ffff9d54c778f000 R08: 0000000000000000 R09:
ffffffffc1245800
[  654.190838] R10: ffff9d54cde07800 R11: 0000000000000001 R12:
ffff9d54d6298800
[  654.190840] R13: ffff9d54d5800900 R14: 0000000000000001 R15:
ffff9d54d6298de0
[  654.190842] FS:  00007f1bf8509040(0000) GS:ffff9d5c2f740000(0000)
knlGS:0000000000000000
[  654.190844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  654.190845] CR2: 00007f6d3cb34000 CR3: 0000000118d9a006 CR4:
0000000000770ee0
[  654.190847] PKRU: 55555554
[  654.190848] Call Trace:
[  654.190866]  __sta_info_flush+0x123/0x180 [mac80211]
[  654.190885]  ieee80211_set_disassoc+0xba/0x5d0 [mac80211]
[  654.190902]  ieee80211_mgd_deauth.cold+0x49/0x1bf [mac80211]
[  654.190923]  cfg80211_mlme_deauth+0xb1/0x1b0 [cfg80211]
[  654.190939]  cfg80211_mlme_down+0x66/0x90 [cfg80211]
[  654.190955]  cfg80211_disconnect+0x128/0x1b0 [cfg80211]
[  654.190967]  cfg80211_leave+0x27/0x40 [cfg80211]
[  654.190977]  cfg80211_netdev_notifier_call+0xec/0x440 [cfg80211]
[  654.190984]  raw_notifier_call_chain+0x44/0x60
[  654.190991]  __dev_close_many+0x5f/0x110
[  654.190995]  dev_close_many+0x81/0x130
[  654.190999]  dev_close.part.0+0x3e/0x70
[  654.191008]  cfg80211_shutdown_all_interfaces+0x71/0xd0 [cfg80211]
[  654.191017]  cfg80211_rfkill_set_block+0x22/0x30 [cfg80211]
[  654.191022]  rfkill_set_block+0x92/0x140 [rfkill]
[  654.191026]  rfkill_fop_write+0x11f/0x1c0 [rfkill]
[  654.191032]  vfs_write+0xc7/0x280
[  654.191035]  ksys_write+0xa7/0xe0
[  654.191041]  do_syscall_64+0x33/0x40
[  654.191045]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  654.191048] RIP: 0033:0x7f1bf93906f7
[  654.191052] Code: 1f 40 00 41 54 49 89 d4 55 48 89 f5 53 89 fb 48
83 ec 10 e8 fb fc ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00
00 0f 05 <48> 3d 00 f
0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 54 fd ff ff 48
[  654.191053] RSP: 002b:00007ffc79f67e10 EFLAGS: 00000293 ORIG_RAX:
0000000000000001
[  654.191056] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007f1bf93906f7
[  654.191057] RDX: 0000000000000008 RSI: 00007ffc79f67e48 RDI: 000000000000001d
[  654.191059] RBP: 00007ffc79f67e48 R08: 0000000000000000 R09: 0000000000000001
[  654.191060] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
[  654.191061] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000001b10c20
[  654.191075] ---[ end trace 4fd47da3698c4a9f ]---
[  657.198288] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
[  657.198293] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
[  657.198299] ath11k_pci 0000:56:00.0: Failed to set CTS prot for VDEV: 0
[  660.205991] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
[  660.205995] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
[  660.206000] ath11k_pci 0000:56:00.0: Failed to set erp slot for VDEV: 0
[  663.213835] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
[  663.213840] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
[  663.213846] ath11k_pci 0000:56:00.0: Failed to set preamble for VDEV: 0
[  666.221628] ath11k_pci 0000:56:00.0: wmi command 20487 timeout
[  666.221633] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_DOWN cmd
[  666.221639] ath11k_pci 0000:56:00.0: failed to down vdev 0: -11
[  669.229407] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
[  669.229412] ath11k_pci 0000:56:00.0: failed to send
WMI_VDEV_SET_WMM_PARAMS_CMDID
[  669.229417] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
[  672.237193] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
[  672.237198] ath11k_pci 0000:56:00.0: failed to send
WMI_VDEV_SET_WMM_PARAMS_CMDID
[  672.237203] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
[  675.244963] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
[  675.244968] ath11k_pci 0000:56:00.0: failed to send
WMI_VDEV_SET_WMM_PARAMS_CMDID
[  675.244971] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
[  678.252682] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
[  678.252689] ath11k_pci 0000:56:00.0: failed to send
WMI_VDEV_SET_WMM_PARAMS_CMDID
[  678.252695] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
[  681.260582] ath11k_pci 0000:56:00.0: wmi command 20486 timeout
[  681.260587] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_STOP
cmd
[  681.260594] ath11k_pci 0000:56:00.0: failed to stop WMI vdev 0: -11
[  681.260596] ath11k_pci 0000:56:00.0: failed to stop vdev 0: -11
[  686.764099] ath11k_pci 0000:56:00.0: failed to flush transmit queue
0
[  689.771891] ath11k_pci 0000:56:00.0: wmi command 20482 timeout
[  689.771897] ath11k_pci 0000:56:00.0: failed to submit
WMI_VDEV_DELETE_CMDID
[  689.771904] ath11k_pci 0000:56:00.0: failed to delete WMI vdev 0:
-11
[  719.529733] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
[  719.529740] ath11k_pci 0000:56:00.0: failed to send
WMI_PDEV_SET_PARAM cmd
[  719.529748] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
[  722.793499] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
[  722.793517] ath11k_pci 0000:56:00.0: failed to send
WMI_PDEV_SET_PARAM cmd
[  722.793524] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11

Apologies for the long output, hopefully something here is useful.

I haven't had my whole system freeze yet like I did prior to these
patches, however I've only been running these patches for a few hours
so far, currently on my third boot.

You can find the nix configuration I'm working on for the xps 9310
that includes the new patches here:

https://github.com/NixOS/nixos-hardware/pull/207

On Thu, Nov 19, 2020 at 8:52 PM Kalle Valo <kvalo@codeaurora.org> wrote:
>
> Kalle Valo <kvalo@codeaurora.org> writes:
>
> > (Bcc: people reporting qca6390 problems)
> >
> > Hi,
> >
> > I collected all important QCA6390 fixes to ath11k-qca6390 branch so that
> > there's a good baseline for all testing:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=ath11k-qca6390-bringup
> >
> > At the moment it's based on v5.10-rc4 and I will try to update it to a
> > recent -rc release every few weeks or so. Everytime I update the branch
> > I create a new tag and the latest tag is now:
> >
> > ath11k-qca6390-bringup-202011191920
> >
> > In this tag there's now a brand new implementation for suspend, which
> > relies that the platform provides power to QCA6390 during suspend. Not
> > all platforms do, but most of them should do that. ath11k also prints a
> > warning whenever it notices that the firmware has crashed, but I'm not
> > sure yet if it (the MHI subsystem to be exact) can detect every case.
> >
> > The MSI patch is mostly the same, it had just some refactoring since the
> > last version. Unfortunately there's no solution still for the weird
> > crashes some people are seeing.
>
> Forgot to mention when debugging ath11k PCI issues it's a good idea to
> enable MHI debug messages. To do that enable CONFIG_MHI_BUS_DEBUG and
> CONFIG_DYNAMIC_DEBUG and run:
>
> sudo sh -c "echo -n 'module mhi +p' > /sys/kernel/debug/dynamic_debug/control"
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-21 23:44   ` Mitchell Nordine
@ 2020-11-22 13:15     ` Mitchell Nordine
  2020-11-22 15:07       ` wi nk
  0 siblings, 1 reply; 19+ messages in thread
From: Mitchell Nordine @ 2020-11-22 13:15 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath11k

> Unfortunately there's no solution still for the weird
crashes some people are seeing.

Can confirm, the spurious system freezing still continues. This time
while typing my password into the gdm UI for login.

On Sun, Nov 22, 2020 at 12:44 AM Mitchell Nordine
<mitchell.nordine@gmail.com> wrote:
>
> Thanks for the update!
>
> I no longer notice any errors related to ath11k during boot of NixOS
> on my XPS 13 9310 with these patches:
>
> [mindtree@mindtree:~]$ dmesg | grep -e ath11
> [    4.084314] ath11k_pci 0000:56:00.0: WARNING: ath11k PCI support is
> experimental!
> [    4.084358] ath11k_pci 0000:56:00.0: BAR 0: assigned [mem
> 0x8c300000-0x8c3fffff 64bit]
> [    4.084377] ath11k_pci 0000:56:00.0: enabling device (0000 -> 0002)
> [    4.084442] ath11k_pci 0000:56:00.0: MSI vectors: 1
> [    4.320847] ath11k_pci 0000:56:00.0: qmi req mem_seg[0] 0x59c00000 3522560 1
> [    4.320849] ath11k_pci 0000:56:00.0: qmi req mem_seg[1] 0x5a200000 884736 4
> [    4.330816] ath11k_pci 0000:56:00.0: chip_id 0x0 chip_family 0xb
> board_id 0xff soc_id 0xffffffff
> [    4.330818] ath11k_pci 0000:56:00.0: fw_version 0x101c06cc
> fw_build_timestamp 2020-06-24 19:50 fw_build_id
> [    4.521522] ath11k_pci 0000:56:00.0 wlp86s0: renamed from wlan0
>
> Everything appears to run smoothly for the first 5-10 minutes, then
> the firmware appears to crash and the internet drops out:
>
> [  293.677300] ath11k_pci 0000:56:00.0: firmware crashed:
> MHI_CB_SYS_ERROR
> [  385.774509] mhi 0000:56:00.0: Device failed to exit MHI Reset state
>
> I haven't yet been able to identify an action that consistently causes
> the crash.
>
> Following the crash, the gnome shell appears to still believe that the
> connection is up, however upon clicking on the wifi in the top-right
> drop-down menu and clicking the "Turn Off" option, the shell freezes
> for a few seconds and a few more errors show up in dmesg:
>
> [  634.018718] wlp86s0: deauthenticating from 7a:8a:20:d5:98:d7 by
> local choice (Reason: 3=DEAUTH_LEAVING)
> [  639.151611] ath11k_pci 0000:56:00.0: failed to flush transmit queue
> 0
> [  642.159384] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> [  642.159388] ath11k_pci 0000:56:00.0: failed to send
> WMI_PEER_REORDER_QUEUE_SETUP
> [  642.159394] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> [  642.159400] wlp86s0: HW problem - can not stop rx aggregation for
> 7a:8a:20:d5:98:d7 tid 0
> [  645.168070] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> [  645.168072] ath11k_pci 0000:56:00.0: failed to send
> WMI_PEER_REORDER_QUEUE_SETUP
> [  645.168074] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> [  645.168077] wlp86s0: HW problem - can not stop rx aggregation for
> 7a:8a:20:d5:98:d7 tid 1
> [  648.174960] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> [  648.174965] ath11k_pci 0000:56:00.0: failed to send
> WMI_PEER_REORDER_QUEUE_SETUP
> [  648.174971] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> [  648.174976] wlp86s0: HW problem - can not stop rx aggregation for
> 7a:8a:20:d5:98:d7 tid 6
> [  651.183596] ath11k_pci 0000:56:00.0: wmi command 20489 timeout
> [  651.183601] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_INSTALL_KEY cmd
> [  651.183606] ath11k_pci 0000:56:00.0: ath11k_install_key failed (-11)
> [  651.183610] wlp86s0: failed to remove key (0, 7a:8a:20:d5:98:d7)
> from hardware (-11)
> [  654.190511] ath11k_pci 0000:56:00.0: wmi command 24578 timeout
> [  654.190516] ath11k_pci 0000:56:00.0: failed to send WMI_PEER_DELETE cmd
> [  654.190523] ath11k_pci 0000:56:00.0: failed to delete peer vdev_id
> 0 addr 7a:8a:20:d5:98:d7 ret -11
> [  654.190526] ath11k_pci 0000:56:00.0: Failed to delete peer:
> 7a:8a:20:d5:98:d7 for VDEV: 0
> [  654.190528] ath11k_pci 0000:56:00.0: Found peer entry
> 9c:b6:d0:3e:43:4a n vdev 0 after it was supposedly removed
> [  654.190574] ------------[ cut here ]------------
> [  654.190594] WARNING: CPU: 5 PID: 1208 at
> net/mac80211/sta_info.c:1098 __sta_info_destroy_part2+0x11c/0x140
> [mac80211]
> [  654.190595] Modules linked in: ath9k_htc ath9k_common ath9k_hw ath
> fuse ctr ccm michael_mic af_packet cdc_ether usbnet r8152 mii
> typec_displayport uvcvideo
> videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common
> videodev mc hid_sensor_als hid_sensor_trigger
> industrialio_triggered_buffer kfifo_buf hid_se
> nsor_iio_common industrialio hid_sensor_hub intel_ishtp_loader joydev
> mousedev intel_ishtp_hid wacom usbhid hid_multitouch hid_generic
> qrtr_mhi iTCO_wdt intel_
> pmc_bxt 8250_dw watchdog mei_hdcp i2c_designware_platform
> i2c_designware_core intel_rapl_msr snd_sof_pci snd_sof_intel_byt
> snd_sof_intel_ipc qrtr dell_wmi wmi_
> bmof ns snd_sof_intel_hda_common dell_laptop ath11k_pci
> snd_soc_hdac_hda dell_smbios snd_sof_xtensa_dsp snd_hda_codec_hdmi mhi
> snd_sof_intel_hda dell_wmi_descr
> iptor dcdbas snd_sof ath11k x86_pkg_temp_thermal intel_powerclamp
> dell_smm_hwmon qmi_helpers snd_hda_ext_core coretemp crc32_pclmul
> ghash_clmulni_intel snd_soc
> _acpi_intel_match aesni_intel snd_soc_acpi
> [  654.190666]  snd_hda_codec_realtek libaes mac80211 crypto_simd
> cryptd glue_helper snd_hda_codec_generic ledtrig_audio intel_cstate
> snd_soc_core intel_uncore
>  snd_compress sha256_ssse3 ac97_bus snd_pcm_dmaengine sha256_generic
> input_leds led_class deflate snd_hda_intel intel_spi_pci efi_pstore
> snd_intel_dspcfg cfg80
> 211 intel_spi serio_raw pstore spi_nor snd_hda_codec mtd nls_iso8859_1
> nls_cp437 snd_hda_core vfat i2c_i801 snd_hwdep i2c_smbus rfkill
> tpm_crb fat libarc4 sch_
> fq_codel intel_ish_ipc mei_me intel_lpss_pci tpm_tis intel_ishtp
> intel_lpss tpm_tis_core mei ucsi_acpi idma64 processor_thermal_device
> virt_dma tpm typec_ucsi
> intel_rapl_common 8250_pci intel_soc_dts_iosf typec snd_pcm_oss
> rng_core snd_mixer_oss tiny_power_button snd_pcm wmi battery button
> snd_timer snd i2c_hid sound
> core hid msr int3403_thermal evdev int340x_thermal_zone mac_hid
> int3400_thermal acpi_thermal_rel intel_hid sparse_keymap
> pinctrl_tigerlake intel_pmc_core acpi_
> tad ac acpi_pad loop cpufreq_powersave tun tap
> [  654.190754]  macvlan bridge stp llc kvm_intel kvm irqbypass
> efivarfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache
> jbd2 xhci_pci xhci_pci_ren
> esas rtsx_pci_sdmmc xhci_hcd mmc_core atkbd libps2 usbcore thunderbolt
> nvme nvme_core rtsx_pci crc32c_intel t10_pi crc_t10dif
> crct10dif_generic crct10dif_pclmu
> l usb_common crct10dif_common i8042 rtc_cmos serio dm_mod i915 video
> intel_gtt i2c_algo_bit cec drm_kms_helper syscopyarea sysfillrect
> sysimgblt fb_sys_fops dr
> m i2c_core backlight agpgart
> [  654.190811] CPU: 5 PID: 1208 Comm: NetworkManager Tainted: G
> W I       5.10.0-rc4 #1-NixOS
> [  654.190813] Hardware name: Dell Inc. XPS 13 9310/0F7M4C, BIOS 1.1.1
> 10/05/2020
> [  654.190825] RIP: 0010:__sta_info_destroy_part2+0x11c/0x140 [mac80211]
> [  654.190829] Code: ff 0f 0b 80 bd 14 01 00 00 00 74 82 45 31 c0 b9
> 01 00 00 00 48 89 ea 48 89 de 4c 89 e7 e8 ac ad ff ff 85 c0 0f 84 64
> ff ff ff <0f> 0b e9 5
> d ff ff ff be 03 00 00 00 48 89 ef e8 10 ea ff ff 85 c0
> [  654.190831] RSP: 0018:ffffac81c0897b80 EFLAGS: 00010286
> [  654.190834] RAX: 00000000fffffff5 RBX: ffff9d54d5800900 RCX:
> 0000000000000000
> [  654.190836] RDX: ffff9d54c3d0bf00 RSI: 000000000020001a RDI:
> ffff9d54d629b5d8
> [  654.190837] RBP: ffff9d54c778f000 R08: 0000000000000000 R09:
> ffffffffc1245800
> [  654.190838] R10: ffff9d54cde07800 R11: 0000000000000001 R12:
> ffff9d54d6298800
> [  654.190840] R13: ffff9d54d5800900 R14: 0000000000000001 R15:
> ffff9d54d6298de0
> [  654.190842] FS:  00007f1bf8509040(0000) GS:ffff9d5c2f740000(0000)
> knlGS:0000000000000000
> [  654.190844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  654.190845] CR2: 00007f6d3cb34000 CR3: 0000000118d9a006 CR4:
> 0000000000770ee0
> [  654.190847] PKRU: 55555554
> [  654.190848] Call Trace:
> [  654.190866]  __sta_info_flush+0x123/0x180 [mac80211]
> [  654.190885]  ieee80211_set_disassoc+0xba/0x5d0 [mac80211]
> [  654.190902]  ieee80211_mgd_deauth.cold+0x49/0x1bf [mac80211]
> [  654.190923]  cfg80211_mlme_deauth+0xb1/0x1b0 [cfg80211]
> [  654.190939]  cfg80211_mlme_down+0x66/0x90 [cfg80211]
> [  654.190955]  cfg80211_disconnect+0x128/0x1b0 [cfg80211]
> [  654.190967]  cfg80211_leave+0x27/0x40 [cfg80211]
> [  654.190977]  cfg80211_netdev_notifier_call+0xec/0x440 [cfg80211]
> [  654.190984]  raw_notifier_call_chain+0x44/0x60
> [  654.190991]  __dev_close_many+0x5f/0x110
> [  654.190995]  dev_close_many+0x81/0x130
> [  654.190999]  dev_close.part.0+0x3e/0x70
> [  654.191008]  cfg80211_shutdown_all_interfaces+0x71/0xd0 [cfg80211]
> [  654.191017]  cfg80211_rfkill_set_block+0x22/0x30 [cfg80211]
> [  654.191022]  rfkill_set_block+0x92/0x140 [rfkill]
> [  654.191026]  rfkill_fop_write+0x11f/0x1c0 [rfkill]
> [  654.191032]  vfs_write+0xc7/0x280
> [  654.191035]  ksys_write+0xa7/0xe0
> [  654.191041]  do_syscall_64+0x33/0x40
> [  654.191045]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  654.191048] RIP: 0033:0x7f1bf93906f7
> [  654.191052] Code: 1f 40 00 41 54 49 89 d4 55 48 89 f5 53 89 fb 48
> 83 ec 10 e8 fb fc ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00
> 00 0f 05 <48> 3d 00 f
> 0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 54 fd ff ff 48
> [  654.191053] RSP: 002b:00007ffc79f67e10 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000001
> [  654.191056] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007f1bf93906f7
> [  654.191057] RDX: 0000000000000008 RSI: 00007ffc79f67e48 RDI: 000000000000001d
> [  654.191059] RBP: 00007ffc79f67e48 R08: 0000000000000000 R09: 0000000000000001
> [  654.191060] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
> [  654.191061] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000001b10c20
> [  654.191075] ---[ end trace 4fd47da3698c4a9f ]---
> [  657.198288] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> [  657.198293] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> [  657.198299] ath11k_pci 0000:56:00.0: Failed to set CTS prot for VDEV: 0
> [  660.205991] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> [  660.205995] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> [  660.206000] ath11k_pci 0000:56:00.0: Failed to set erp slot for VDEV: 0
> [  663.213835] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> [  663.213840] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> [  663.213846] ath11k_pci 0000:56:00.0: Failed to set preamble for VDEV: 0
> [  666.221628] ath11k_pci 0000:56:00.0: wmi command 20487 timeout
> [  666.221633] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_DOWN cmd
> [  666.221639] ath11k_pci 0000:56:00.0: failed to down vdev 0: -11
> [  669.229407] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> [  669.229412] ath11k_pci 0000:56:00.0: failed to send
> WMI_VDEV_SET_WMM_PARAMS_CMDID
> [  669.229417] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> [  672.237193] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> [  672.237198] ath11k_pci 0000:56:00.0: failed to send
> WMI_VDEV_SET_WMM_PARAMS_CMDID
> [  672.237203] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> [  675.244963] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> [  675.244968] ath11k_pci 0000:56:00.0: failed to send
> WMI_VDEV_SET_WMM_PARAMS_CMDID
> [  675.244971] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> [  678.252682] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> [  678.252689] ath11k_pci 0000:56:00.0: failed to send
> WMI_VDEV_SET_WMM_PARAMS_CMDID
> [  678.252695] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> [  681.260582] ath11k_pci 0000:56:00.0: wmi command 20486 timeout
> [  681.260587] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_STOP
> cmd
> [  681.260594] ath11k_pci 0000:56:00.0: failed to stop WMI vdev 0: -11
> [  681.260596] ath11k_pci 0000:56:00.0: failed to stop vdev 0: -11
> [  686.764099] ath11k_pci 0000:56:00.0: failed to flush transmit queue
> 0
> [  689.771891] ath11k_pci 0000:56:00.0: wmi command 20482 timeout
> [  689.771897] ath11k_pci 0000:56:00.0: failed to submit
> WMI_VDEV_DELETE_CMDID
> [  689.771904] ath11k_pci 0000:56:00.0: failed to delete WMI vdev 0:
> -11
> [  719.529733] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
> [  719.529740] ath11k_pci 0000:56:00.0: failed to send
> WMI_PDEV_SET_PARAM cmd
> [  719.529748] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
> [  722.793499] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
> [  722.793517] ath11k_pci 0000:56:00.0: failed to send
> WMI_PDEV_SET_PARAM cmd
> [  722.793524] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
>
> Apologies for the long output, hopefully something here is useful.
>
> I haven't had my whole system freeze yet like I did prior to these
> patches, however I've only been running these patches for a few hours
> so far, currently on my third boot.
>
> You can find the nix configuration I'm working on for the xps 9310
> that includes the new patches here:
>
> https://github.com/NixOS/nixos-hardware/pull/207
>
> On Thu, Nov 19, 2020 at 8:52 PM Kalle Valo <kvalo@codeaurora.org> wrote:
> >
> > Kalle Valo <kvalo@codeaurora.org> writes:
> >
> > > (Bcc: people reporting qca6390 problems)
> > >
> > > Hi,
> > >
> > > I collected all important QCA6390 fixes to ath11k-qca6390 branch so that
> > > there's a good baseline for all testing:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=ath11k-qca6390-bringup
> > >
> > > At the moment it's based on v5.10-rc4 and I will try to update it to a
> > > recent -rc release every few weeks or so. Everytime I update the branch
> > > I create a new tag and the latest tag is now:
> > >
> > > ath11k-qca6390-bringup-202011191920
> > >
> > > In this tag there's now a brand new implementation for suspend, which
> > > relies that the platform provides power to QCA6390 during suspend. Not
> > > all platforms do, but most of them should do that. ath11k also prints a
> > > warning whenever it notices that the firmware has crashed, but I'm not
> > > sure yet if it (the MHI subsystem to be exact) can detect every case.
> > >
> > > The MSI patch is mostly the same, it had just some refactoring since the
> > > last version. Unfortunately there's no solution still for the weird
> > > crashes some people are seeing.
> >
> > Forgot to mention when debugging ath11k PCI issues it's a good idea to
> > enable MHI debug messages. To do that enable CONFIG_MHI_BUS_DEBUG and
> > CONFIG_DYNAMIC_DEBUG and run:
> >
> > sudo sh -c "echo -n 'module mhi +p' > /sys/kernel/debug/dynamic_debug/control"
> >
> > --
> > https://patchwork.kernel.org/project/linux-wireless/list/
> >
> > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-22 13:15     ` Mitchell Nordine
@ 2020-11-22 15:07       ` wi nk
  2020-11-23  3:14         ` wi nk
  0 siblings, 1 reply; 19+ messages in thread
From: wi nk @ 2020-11-22 15:07 UTC (permalink / raw)
  To: Mitchell Nordine; +Cc: ath11k, Kalle Valo

On Sun, Nov 22, 2020 at 2:15 PM Mitchell Nordine
<mitchell.nordine@gmail.com> wrote:
>
> > Unfortunately there's no solution still for the weird
> crashes some people are seeing.
>
> Can confirm, the spurious system freezing still continues. This time
> while typing my password into the gdm UI for login.
>
> On Sun, Nov 22, 2020 at 12:44 AM Mitchell Nordine
> <mitchell.nordine@gmail.com> wrote:
> >
> > Thanks for the update!
> >
> > I no longer notice any errors related to ath11k during boot of NixOS
> > on my XPS 13 9310 with these patches:
> >
> > [mindtree@mindtree:~]$ dmesg | grep -e ath11
> > [    4.084314] ath11k_pci 0000:56:00.0: WARNING: ath11k PCI support is
> > experimental!
> > [    4.084358] ath11k_pci 0000:56:00.0: BAR 0: assigned [mem
> > 0x8c300000-0x8c3fffff 64bit]
> > [    4.084377] ath11k_pci 0000:56:00.0: enabling device (0000 -> 0002)
> > [    4.084442] ath11k_pci 0000:56:00.0: MSI vectors: 1
> > [    4.320847] ath11k_pci 0000:56:00.0: qmi req mem_seg[0] 0x59c00000 3522560 1
> > [    4.320849] ath11k_pci 0000:56:00.0: qmi req mem_seg[1] 0x5a200000 884736 4
> > [    4.330816] ath11k_pci 0000:56:00.0: chip_id 0x0 chip_family 0xb
> > board_id 0xff soc_id 0xffffffff
> > [    4.330818] ath11k_pci 0000:56:00.0: fw_version 0x101c06cc
> > fw_build_timestamp 2020-06-24 19:50 fw_build_id
> > [    4.521522] ath11k_pci 0000:56:00.0 wlp86s0: renamed from wlan0
> >
> > Everything appears to run smoothly for the first 5-10 minutes, then
> > the firmware appears to crash and the internet drops out:
> >
> > [  293.677300] ath11k_pci 0000:56:00.0: firmware crashed:
> > MHI_CB_SYS_ERROR
> > [  385.774509] mhi 0000:56:00.0: Device failed to exit MHI Reset state
> >
> > I haven't yet been able to identify an action that consistently causes
> > the crash.
> >
> > Following the crash, the gnome shell appears to still believe that the
> > connection is up, however upon clicking on the wifi in the top-right
> > drop-down menu and clicking the "Turn Off" option, the shell freezes
> > for a few seconds and a few more errors show up in dmesg:
> >
> > [  634.018718] wlp86s0: deauthenticating from 7a:8a:20:d5:98:d7 by
> > local choice (Reason: 3=DEAUTH_LEAVING)
> > [  639.151611] ath11k_pci 0000:56:00.0: failed to flush transmit queue
> > 0
> > [  642.159384] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > [  642.159388] ath11k_pci 0000:56:00.0: failed to send
> > WMI_PEER_REORDER_QUEUE_SETUP
> > [  642.159394] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > [  642.159400] wlp86s0: HW problem - can not stop rx aggregation for
> > 7a:8a:20:d5:98:d7 tid 0
> > [  645.168070] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > [  645.168072] ath11k_pci 0000:56:00.0: failed to send
> > WMI_PEER_REORDER_QUEUE_SETUP
> > [  645.168074] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > [  645.168077] wlp86s0: HW problem - can not stop rx aggregation for
> > 7a:8a:20:d5:98:d7 tid 1
> > [  648.174960] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > [  648.174965] ath11k_pci 0000:56:00.0: failed to send
> > WMI_PEER_REORDER_QUEUE_SETUP
> > [  648.174971] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > [  648.174976] wlp86s0: HW problem - can not stop rx aggregation for
> > 7a:8a:20:d5:98:d7 tid 6
> > [  651.183596] ath11k_pci 0000:56:00.0: wmi command 20489 timeout
> > [  651.183601] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_INSTALL_KEY cmd
> > [  651.183606] ath11k_pci 0000:56:00.0: ath11k_install_key failed (-11)
> > [  651.183610] wlp86s0: failed to remove key (0, 7a:8a:20:d5:98:d7)
> > from hardware (-11)
> > [  654.190511] ath11k_pci 0000:56:00.0: wmi command 24578 timeout
> > [  654.190516] ath11k_pci 0000:56:00.0: failed to send WMI_PEER_DELETE cmd
> > [  654.190523] ath11k_pci 0000:56:00.0: failed to delete peer vdev_id
> > 0 addr 7a:8a:20:d5:98:d7 ret -11
> > [  654.190526] ath11k_pci 0000:56:00.0: Failed to delete peer:
> > 7a:8a:20:d5:98:d7 for VDEV: 0
> > [  654.190528] ath11k_pci 0000:56:00.0: Found peer entry
> > 9c:b6:d0:3e:43:4a n vdev 0 after it was supposedly removed
> > [  654.190574] ------------[ cut here ]------------
> > [  654.190594] WARNING: CPU: 5 PID: 1208 at
> > net/mac80211/sta_info.c:1098 __sta_info_destroy_part2+0x11c/0x140
> > [mac80211]
> > [  654.190595] Modules linked in: ath9k_htc ath9k_common ath9k_hw ath
> > fuse ctr ccm michael_mic af_packet cdc_ether usbnet r8152 mii
> > typec_displayport uvcvideo
> > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common
> > videodev mc hid_sensor_als hid_sensor_trigger
> > industrialio_triggered_buffer kfifo_buf hid_se
> > nsor_iio_common industrialio hid_sensor_hub intel_ishtp_loader joydev
> > mousedev intel_ishtp_hid wacom usbhid hid_multitouch hid_generic
> > qrtr_mhi iTCO_wdt intel_
> > pmc_bxt 8250_dw watchdog mei_hdcp i2c_designware_platform
> > i2c_designware_core intel_rapl_msr snd_sof_pci snd_sof_intel_byt
> > snd_sof_intel_ipc qrtr dell_wmi wmi_
> > bmof ns snd_sof_intel_hda_common dell_laptop ath11k_pci
> > snd_soc_hdac_hda dell_smbios snd_sof_xtensa_dsp snd_hda_codec_hdmi mhi
> > snd_sof_intel_hda dell_wmi_descr
> > iptor dcdbas snd_sof ath11k x86_pkg_temp_thermal intel_powerclamp
> > dell_smm_hwmon qmi_helpers snd_hda_ext_core coretemp crc32_pclmul
> > ghash_clmulni_intel snd_soc
> > _acpi_intel_match aesni_intel snd_soc_acpi
> > [  654.190666]  snd_hda_codec_realtek libaes mac80211 crypto_simd
> > cryptd glue_helper snd_hda_codec_generic ledtrig_audio intel_cstate
> > snd_soc_core intel_uncore
> >  snd_compress sha256_ssse3 ac97_bus snd_pcm_dmaengine sha256_generic
> > input_leds led_class deflate snd_hda_intel intel_spi_pci efi_pstore
> > snd_intel_dspcfg cfg80
> > 211 intel_spi serio_raw pstore spi_nor snd_hda_codec mtd nls_iso8859_1
> > nls_cp437 snd_hda_core vfat i2c_i801 snd_hwdep i2c_smbus rfkill
> > tpm_crb fat libarc4 sch_
> > fq_codel intel_ish_ipc mei_me intel_lpss_pci tpm_tis intel_ishtp
> > intel_lpss tpm_tis_core mei ucsi_acpi idma64 processor_thermal_device
> > virt_dma tpm typec_ucsi
> > intel_rapl_common 8250_pci intel_soc_dts_iosf typec snd_pcm_oss
> > rng_core snd_mixer_oss tiny_power_button snd_pcm wmi battery button
> > snd_timer snd i2c_hid sound
> > core hid msr int3403_thermal evdev int340x_thermal_zone mac_hid
> > int3400_thermal acpi_thermal_rel intel_hid sparse_keymap
> > pinctrl_tigerlake intel_pmc_core acpi_
> > tad ac acpi_pad loop cpufreq_powersave tun tap
> > [  654.190754]  macvlan bridge stp llc kvm_intel kvm irqbypass
> > efivarfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache
> > jbd2 xhci_pci xhci_pci_ren
> > esas rtsx_pci_sdmmc xhci_hcd mmc_core atkbd libps2 usbcore thunderbolt
> > nvme nvme_core rtsx_pci crc32c_intel t10_pi crc_t10dif
> > crct10dif_generic crct10dif_pclmu
> > l usb_common crct10dif_common i8042 rtc_cmos serio dm_mod i915 video
> > intel_gtt i2c_algo_bit cec drm_kms_helper syscopyarea sysfillrect
> > sysimgblt fb_sys_fops dr
> > m i2c_core backlight agpgart
> > [  654.190811] CPU: 5 PID: 1208 Comm: NetworkManager Tainted: G
> > W I       5.10.0-rc4 #1-NixOS
> > [  654.190813] Hardware name: Dell Inc. XPS 13 9310/0F7M4C, BIOS 1.1.1
> > 10/05/2020
> > [  654.190825] RIP: 0010:__sta_info_destroy_part2+0x11c/0x140 [mac80211]
> > [  654.190829] Code: ff 0f 0b 80 bd 14 01 00 00 00 74 82 45 31 c0 b9
> > 01 00 00 00 48 89 ea 48 89 de 4c 89 e7 e8 ac ad ff ff 85 c0 0f 84 64
> > ff ff ff <0f> 0b e9 5
> > d ff ff ff be 03 00 00 00 48 89 ef e8 10 ea ff ff 85 c0
> > [  654.190831] RSP: 0018:ffffac81c0897b80 EFLAGS: 00010286
> > [  654.190834] RAX: 00000000fffffff5 RBX: ffff9d54d5800900 RCX:
> > 0000000000000000
> > [  654.190836] RDX: ffff9d54c3d0bf00 RSI: 000000000020001a RDI:
> > ffff9d54d629b5d8
> > [  654.190837] RBP: ffff9d54c778f000 R08: 0000000000000000 R09:
> > ffffffffc1245800
> > [  654.190838] R10: ffff9d54cde07800 R11: 0000000000000001 R12:
> > ffff9d54d6298800
> > [  654.190840] R13: ffff9d54d5800900 R14: 0000000000000001 R15:
> > ffff9d54d6298de0
> > [  654.190842] FS:  00007f1bf8509040(0000) GS:ffff9d5c2f740000(0000)
> > knlGS:0000000000000000
> > [  654.190844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  654.190845] CR2: 00007f6d3cb34000 CR3: 0000000118d9a006 CR4:
> > 0000000000770ee0
> > [  654.190847] PKRU: 55555554
> > [  654.190848] Call Trace:
> > [  654.190866]  __sta_info_flush+0x123/0x180 [mac80211]
> > [  654.190885]  ieee80211_set_disassoc+0xba/0x5d0 [mac80211]
> > [  654.190902]  ieee80211_mgd_deauth.cold+0x49/0x1bf [mac80211]
> > [  654.190923]  cfg80211_mlme_deauth+0xb1/0x1b0 [cfg80211]
> > [  654.190939]  cfg80211_mlme_down+0x66/0x90 [cfg80211]
> > [  654.190955]  cfg80211_disconnect+0x128/0x1b0 [cfg80211]
> > [  654.190967]  cfg80211_leave+0x27/0x40 [cfg80211]
> > [  654.190977]  cfg80211_netdev_notifier_call+0xec/0x440 [cfg80211]
> > [  654.190984]  raw_notifier_call_chain+0x44/0x60
> > [  654.190991]  __dev_close_many+0x5f/0x110
> > [  654.190995]  dev_close_many+0x81/0x130
> > [  654.190999]  dev_close.part.0+0x3e/0x70
> > [  654.191008]  cfg80211_shutdown_all_interfaces+0x71/0xd0 [cfg80211]
> > [  654.191017]  cfg80211_rfkill_set_block+0x22/0x30 [cfg80211]
> > [  654.191022]  rfkill_set_block+0x92/0x140 [rfkill]
> > [  654.191026]  rfkill_fop_write+0x11f/0x1c0 [rfkill]
> > [  654.191032]  vfs_write+0xc7/0x280
> > [  654.191035]  ksys_write+0xa7/0xe0
> > [  654.191041]  do_syscall_64+0x33/0x40
> > [  654.191045]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [  654.191048] RIP: 0033:0x7f1bf93906f7
> > [  654.191052] Code: 1f 40 00 41 54 49 89 d4 55 48 89 f5 53 89 fb 48
> > 83 ec 10 e8 fb fc ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00
> > 00 0f 05 <48> 3d 00 f
> > 0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 54 fd ff ff 48
> > [  654.191053] RSP: 002b:00007ffc79f67e10 EFLAGS: 00000293 ORIG_RAX:
> > 0000000000000001
> > [  654.191056] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007f1bf93906f7
> > [  654.191057] RDX: 0000000000000008 RSI: 00007ffc79f67e48 RDI: 000000000000001d
> > [  654.191059] RBP: 00007ffc79f67e48 R08: 0000000000000000 R09: 0000000000000001
> > [  654.191060] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
> > [  654.191061] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000001b10c20
> > [  654.191075] ---[ end trace 4fd47da3698c4a9f ]---
> > [  657.198288] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > [  657.198293] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > [  657.198299] ath11k_pci 0000:56:00.0: Failed to set CTS prot for VDEV: 0
> > [  660.205991] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > [  660.205995] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > [  660.206000] ath11k_pci 0000:56:00.0: Failed to set erp slot for VDEV: 0
> > [  663.213835] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > [  663.213840] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > [  663.213846] ath11k_pci 0000:56:00.0: Failed to set preamble for VDEV: 0
> > [  666.221628] ath11k_pci 0000:56:00.0: wmi command 20487 timeout
> > [  666.221633] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_DOWN cmd
> > [  666.221639] ath11k_pci 0000:56:00.0: failed to down vdev 0: -11
> > [  669.229407] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > [  669.229412] ath11k_pci 0000:56:00.0: failed to send
> > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > [  669.229417] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > [  672.237193] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > [  672.237198] ath11k_pci 0000:56:00.0: failed to send
> > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > [  672.237203] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > [  675.244963] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > [  675.244968] ath11k_pci 0000:56:00.0: failed to send
> > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > [  675.244971] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > [  678.252682] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > [  678.252689] ath11k_pci 0000:56:00.0: failed to send
> > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > [  678.252695] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > [  681.260582] ath11k_pci 0000:56:00.0: wmi command 20486 timeout
> > [  681.260587] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_STOP
> > cmd
> > [  681.260594] ath11k_pci 0000:56:00.0: failed to stop WMI vdev 0: -11
> > [  681.260596] ath11k_pci 0000:56:00.0: failed to stop vdev 0: -11
> > [  686.764099] ath11k_pci 0000:56:00.0: failed to flush transmit queue
> > 0
> > [  689.771891] ath11k_pci 0000:56:00.0: wmi command 20482 timeout
> > [  689.771897] ath11k_pci 0000:56:00.0: failed to submit
> > WMI_VDEV_DELETE_CMDID
> > [  689.771904] ath11k_pci 0000:56:00.0: failed to delete WMI vdev 0:
> > -11
> > [  719.529733] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
> > [  719.529740] ath11k_pci 0000:56:00.0: failed to send
> > WMI_PDEV_SET_PARAM cmd
> > [  719.529748] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
> > [  722.793499] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
> > [  722.793517] ath11k_pci 0000:56:00.0: failed to send
> > WMI_PDEV_SET_PARAM cmd
> > [  722.793524] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
> >
> > Apologies for the long output, hopefully something here is useful.
> >
> > I haven't had my whole system freeze yet like I did prior to these
> > patches, however I've only been running these patches for a few hours
> > so far, currently on my third boot.
> >
> > You can find the nix configuration I'm working on for the xps 9310
> > that includes the new patches here:
> >
> > https://github.com/NixOS/nixos-hardware/pull/207
> >
> > On Thu, Nov 19, 2020 at 8:52 PM Kalle Valo <kvalo@codeaurora.org> wrote:
> > >
> > > Kalle Valo <kvalo@codeaurora.org> writes:
> > >
> > > > (Bcc: people reporting qca6390 problems)
> > > >
> > > > Hi,
> > > >
> > > > I collected all important QCA6390 fixes to ath11k-qca6390 branch so that
> > > > there's a good baseline for all testing:
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=ath11k-qca6390-bringup
> > > >
> > > > At the moment it's based on v5.10-rc4 and I will try to update it to a
> > > > recent -rc release every few weeks or so. Everytime I update the branch
> > > > I create a new tag and the latest tag is now:
> > > >
> > > > ath11k-qca6390-bringup-202011191920
> > > >
> > > > In this tag there's now a brand new implementation for suspend, which
> > > > relies that the platform provides power to QCA6390 during suspend. Not
> > > > all platforms do, but most of them should do that. ath11k also prints a
> > > > warning whenever it notices that the firmware has crashed, but I'm not
> > > > sure yet if it (the MHI subsystem to be exact) can detect every case.
> > > >
> > > > The MSI patch is mostly the same, it had just some refactoring since the
> > > > last version. Unfortunately there's no solution still for the weird
> > > > crashes some people are seeing.
> > >
> > > Forgot to mention when debugging ath11k PCI issues it's a good idea to
> > > enable MHI debug messages. To do that enable CONFIG_MHI_BUS_DEBUG and
> > > CONFIG_DYNAMIC_DEBUG and run:
> > >
> > > sudo sh -c "echo -n 'module mhi +p' > /sys/kernel/debug/dynamic_debug/control"
> > >
> > > --
> > > https://patchwork.kernel.org/project/linux-wireless/list/
> > >
> > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>
> --
> ath11k mailing list
> ath11k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath11k

So after your message I was wracking my brain to sort out any
differences between our configs.  I did disable vt / vt-d and that
seems to have increased the stability of things some, but I still see
occasional hangs on initialization / association.

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-22 15:07       ` wi nk
@ 2020-11-23  3:14         ` wi nk
  2020-11-23 23:30           ` wi nk
  0 siblings, 1 reply; 19+ messages in thread
From: wi nk @ 2020-11-23  3:14 UTC (permalink / raw)
  To: Mitchell Nordine, Carl Huang; +Cc: ath11k, Kalle Valo

On Sun, Nov 22, 2020 at 4:07 PM wi nk <wink@technolu.st> wrote:
>
> On Sun, Nov 22, 2020 at 2:15 PM Mitchell Nordine
> <mitchell.nordine@gmail.com> wrote:
> >
> > > Unfortunately there's no solution still for the weird
> > crashes some people are seeing.
> >
> > Can confirm, the spurious system freezing still continues. This time
> > while typing my password into the gdm UI for login.
> >
> > On Sun, Nov 22, 2020 at 12:44 AM Mitchell Nordine
> > <mitchell.nordine@gmail.com> wrote:
> > >
> > > Thanks for the update!
> > >
> > > I no longer notice any errors related to ath11k during boot of NixOS
> > > on my XPS 13 9310 with these patches:
> > >
> > > [mindtree@mindtree:~]$ dmesg | grep -e ath11
> > > [    4.084314] ath11k_pci 0000:56:00.0: WARNING: ath11k PCI support is
> > > experimental!
> > > [    4.084358] ath11k_pci 0000:56:00.0: BAR 0: assigned [mem
> > > 0x8c300000-0x8c3fffff 64bit]
> > > [    4.084377] ath11k_pci 0000:56:00.0: enabling device (0000 -> 0002)
> > > [    4.084442] ath11k_pci 0000:56:00.0: MSI vectors: 1
> > > [    4.320847] ath11k_pci 0000:56:00.0: qmi req mem_seg[0] 0x59c00000 3522560 1
> > > [    4.320849] ath11k_pci 0000:56:00.0: qmi req mem_seg[1] 0x5a200000 884736 4
> > > [    4.330816] ath11k_pci 0000:56:00.0: chip_id 0x0 chip_family 0xb
> > > board_id 0xff soc_id 0xffffffff
> > > [    4.330818] ath11k_pci 0000:56:00.0: fw_version 0x101c06cc
> > > fw_build_timestamp 2020-06-24 19:50 fw_build_id
> > > [    4.521522] ath11k_pci 0000:56:00.0 wlp86s0: renamed from wlan0
> > >
> > > Everything appears to run smoothly for the first 5-10 minutes, then
> > > the firmware appears to crash and the internet drops out:
> > >
> > > [  293.677300] ath11k_pci 0000:56:00.0: firmware crashed:
> > > MHI_CB_SYS_ERROR
> > > [  385.774509] mhi 0000:56:00.0: Device failed to exit MHI Reset state
> > >
> > > I haven't yet been able to identify an action that consistently causes
> > > the crash.
> > >
> > > Following the crash, the gnome shell appears to still believe that the
> > > connection is up, however upon clicking on the wifi in the top-right
> > > drop-down menu and clicking the "Turn Off" option, the shell freezes
> > > for a few seconds and a few more errors show up in dmesg:
> > >
> > > [  634.018718] wlp86s0: deauthenticating from 7a:8a:20:d5:98:d7 by
> > > local choice (Reason: 3=DEAUTH_LEAVING)
> > > [  639.151611] ath11k_pci 0000:56:00.0: failed to flush transmit queue
> > > 0
> > > [  642.159384] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > > [  642.159388] ath11k_pci 0000:56:00.0: failed to send
> > > WMI_PEER_REORDER_QUEUE_SETUP
> > > [  642.159394] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > > [  642.159400] wlp86s0: HW problem - can not stop rx aggregation for
> > > 7a:8a:20:d5:98:d7 tid 0
> > > [  645.168070] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > > [  645.168072] ath11k_pci 0000:56:00.0: failed to send
> > > WMI_PEER_REORDER_QUEUE_SETUP
> > > [  645.168074] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > > [  645.168077] wlp86s0: HW problem - can not stop rx aggregation for
> > > 7a:8a:20:d5:98:d7 tid 1
> > > [  648.174960] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > > [  648.174965] ath11k_pci 0000:56:00.0: failed to send
> > > WMI_PEER_REORDER_QUEUE_SETUP
> > > [  648.174971] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > > [  648.174976] wlp86s0: HW problem - can not stop rx aggregation for
> > > 7a:8a:20:d5:98:d7 tid 6
> > > [  651.183596] ath11k_pci 0000:56:00.0: wmi command 20489 timeout
> > > [  651.183601] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_INSTALL_KEY cmd
> > > [  651.183606] ath11k_pci 0000:56:00.0: ath11k_install_key failed (-11)
> > > [  651.183610] wlp86s0: failed to remove key (0, 7a:8a:20:d5:98:d7)
> > > from hardware (-11)
> > > [  654.190511] ath11k_pci 0000:56:00.0: wmi command 24578 timeout
> > > [  654.190516] ath11k_pci 0000:56:00.0: failed to send WMI_PEER_DELETE cmd
> > > [  654.190523] ath11k_pci 0000:56:00.0: failed to delete peer vdev_id
> > > 0 addr 7a:8a:20:d5:98:d7 ret -11
> > > [  654.190526] ath11k_pci 0000:56:00.0: Failed to delete peer:
> > > 7a:8a:20:d5:98:d7 for VDEV: 0
> > > [  654.190528] ath11k_pci 0000:56:00.0: Found peer entry
> > > 9c:b6:d0:3e:43:4a n vdev 0 after it was supposedly removed
> > > [  654.190574] ------------[ cut here ]------------
> > > [  654.190594] WARNING: CPU: 5 PID: 1208 at
> > > net/mac80211/sta_info.c:1098 __sta_info_destroy_part2+0x11c/0x140
> > > [mac80211]
> > > [  654.190595] Modules linked in: ath9k_htc ath9k_common ath9k_hw ath
> > > fuse ctr ccm michael_mic af_packet cdc_ether usbnet r8152 mii
> > > typec_displayport uvcvideo
> > > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common
> > > videodev mc hid_sensor_als hid_sensor_trigger
> > > industrialio_triggered_buffer kfifo_buf hid_se
> > > nsor_iio_common industrialio hid_sensor_hub intel_ishtp_loader joydev
> > > mousedev intel_ishtp_hid wacom usbhid hid_multitouch hid_generic
> > > qrtr_mhi iTCO_wdt intel_
> > > pmc_bxt 8250_dw watchdog mei_hdcp i2c_designware_platform
> > > i2c_designware_core intel_rapl_msr snd_sof_pci snd_sof_intel_byt
> > > snd_sof_intel_ipc qrtr dell_wmi wmi_
> > > bmof ns snd_sof_intel_hda_common dell_laptop ath11k_pci
> > > snd_soc_hdac_hda dell_smbios snd_sof_xtensa_dsp snd_hda_codec_hdmi mhi
> > > snd_sof_intel_hda dell_wmi_descr
> > > iptor dcdbas snd_sof ath11k x86_pkg_temp_thermal intel_powerclamp
> > > dell_smm_hwmon qmi_helpers snd_hda_ext_core coretemp crc32_pclmul
> > > ghash_clmulni_intel snd_soc
> > > _acpi_intel_match aesni_intel snd_soc_acpi
> > > [  654.190666]  snd_hda_codec_realtek libaes mac80211 crypto_simd
> > > cryptd glue_helper snd_hda_codec_generic ledtrig_audio intel_cstate
> > > snd_soc_core intel_uncore
> > >  snd_compress sha256_ssse3 ac97_bus snd_pcm_dmaengine sha256_generic
> > > input_leds led_class deflate snd_hda_intel intel_spi_pci efi_pstore
> > > snd_intel_dspcfg cfg80
> > > 211 intel_spi serio_raw pstore spi_nor snd_hda_codec mtd nls_iso8859_1
> > > nls_cp437 snd_hda_core vfat i2c_i801 snd_hwdep i2c_smbus rfkill
> > > tpm_crb fat libarc4 sch_
> > > fq_codel intel_ish_ipc mei_me intel_lpss_pci tpm_tis intel_ishtp
> > > intel_lpss tpm_tis_core mei ucsi_acpi idma64 processor_thermal_device
> > > virt_dma tpm typec_ucsi
> > > intel_rapl_common 8250_pci intel_soc_dts_iosf typec snd_pcm_oss
> > > rng_core snd_mixer_oss tiny_power_button snd_pcm wmi battery button
> > > snd_timer snd i2c_hid sound
> > > core hid msr int3403_thermal evdev int340x_thermal_zone mac_hid
> > > int3400_thermal acpi_thermal_rel intel_hid sparse_keymap
> > > pinctrl_tigerlake intel_pmc_core acpi_
> > > tad ac acpi_pad loop cpufreq_powersave tun tap
> > > [  654.190754]  macvlan bridge stp llc kvm_intel kvm irqbypass
> > > efivarfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache
> > > jbd2 xhci_pci xhci_pci_ren
> > > esas rtsx_pci_sdmmc xhci_hcd mmc_core atkbd libps2 usbcore thunderbolt
> > > nvme nvme_core rtsx_pci crc32c_intel t10_pi crc_t10dif
> > > crct10dif_generic crct10dif_pclmu
> > > l usb_common crct10dif_common i8042 rtc_cmos serio dm_mod i915 video
> > > intel_gtt i2c_algo_bit cec drm_kms_helper syscopyarea sysfillrect
> > > sysimgblt fb_sys_fops dr
> > > m i2c_core backlight agpgart
> > > [  654.190811] CPU: 5 PID: 1208 Comm: NetworkManager Tainted: G
> > > W I       5.10.0-rc4 #1-NixOS
> > > [  654.190813] Hardware name: Dell Inc. XPS 13 9310/0F7M4C, BIOS 1.1.1
> > > 10/05/2020
> > > [  654.190825] RIP: 0010:__sta_info_destroy_part2+0x11c/0x140 [mac80211]
> > > [  654.190829] Code: ff 0f 0b 80 bd 14 01 00 00 00 74 82 45 31 c0 b9
> > > 01 00 00 00 48 89 ea 48 89 de 4c 89 e7 e8 ac ad ff ff 85 c0 0f 84 64
> > > ff ff ff <0f> 0b e9 5
> > > d ff ff ff be 03 00 00 00 48 89 ef e8 10 ea ff ff 85 c0
> > > [  654.190831] RSP: 0018:ffffac81c0897b80 EFLAGS: 00010286
> > > [  654.190834] RAX: 00000000fffffff5 RBX: ffff9d54d5800900 RCX:
> > > 0000000000000000
> > > [  654.190836] RDX: ffff9d54c3d0bf00 RSI: 000000000020001a RDI:
> > > ffff9d54d629b5d8
> > > [  654.190837] RBP: ffff9d54c778f000 R08: 0000000000000000 R09:
> > > ffffffffc1245800
> > > [  654.190838] R10: ffff9d54cde07800 R11: 0000000000000001 R12:
> > > ffff9d54d6298800
> > > [  654.190840] R13: ffff9d54d5800900 R14: 0000000000000001 R15:
> > > ffff9d54d6298de0
> > > [  654.190842] FS:  00007f1bf8509040(0000) GS:ffff9d5c2f740000(0000)
> > > knlGS:0000000000000000
> > > [  654.190844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  654.190845] CR2: 00007f6d3cb34000 CR3: 0000000118d9a006 CR4:
> > > 0000000000770ee0
> > > [  654.190847] PKRU: 55555554
> > > [  654.190848] Call Trace:
> > > [  654.190866]  __sta_info_flush+0x123/0x180 [mac80211]
> > > [  654.190885]  ieee80211_set_disassoc+0xba/0x5d0 [mac80211]
> > > [  654.190902]  ieee80211_mgd_deauth.cold+0x49/0x1bf [mac80211]
> > > [  654.190923]  cfg80211_mlme_deauth+0xb1/0x1b0 [cfg80211]
> > > [  654.190939]  cfg80211_mlme_down+0x66/0x90 [cfg80211]
> > > [  654.190955]  cfg80211_disconnect+0x128/0x1b0 [cfg80211]
> > > [  654.190967]  cfg80211_leave+0x27/0x40 [cfg80211]
> > > [  654.190977]  cfg80211_netdev_notifier_call+0xec/0x440 [cfg80211]
> > > [  654.190984]  raw_notifier_call_chain+0x44/0x60
> > > [  654.190991]  __dev_close_many+0x5f/0x110
> > > [  654.190995]  dev_close_many+0x81/0x130
> > > [  654.190999]  dev_close.part.0+0x3e/0x70
> > > [  654.191008]  cfg80211_shutdown_all_interfaces+0x71/0xd0 [cfg80211]
> > > [  654.191017]  cfg80211_rfkill_set_block+0x22/0x30 [cfg80211]
> > > [  654.191022]  rfkill_set_block+0x92/0x140 [rfkill]
> > > [  654.191026]  rfkill_fop_write+0x11f/0x1c0 [rfkill]
> > > [  654.191032]  vfs_write+0xc7/0x280
> > > [  654.191035]  ksys_write+0xa7/0xe0
> > > [  654.191041]  do_syscall_64+0x33/0x40
> > > [  654.191045]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > [  654.191048] RIP: 0033:0x7f1bf93906f7
> > > [  654.191052] Code: 1f 40 00 41 54 49 89 d4 55 48 89 f5 53 89 fb 48
> > > 83 ec 10 e8 fb fc ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00
> > > 00 0f 05 <48> 3d 00 f
> > > 0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 54 fd ff ff 48
> > > [  654.191053] RSP: 002b:00007ffc79f67e10 EFLAGS: 00000293 ORIG_RAX:
> > > 0000000000000001
> > > [  654.191056] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007f1bf93906f7
> > > [  654.191057] RDX: 0000000000000008 RSI: 00007ffc79f67e48 RDI: 000000000000001d
> > > [  654.191059] RBP: 00007ffc79f67e48 R08: 0000000000000000 R09: 0000000000000001
> > > [  654.191060] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
> > > [  654.191061] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000001b10c20
> > > [  654.191075] ---[ end trace 4fd47da3698c4a9f ]---
> > > [  657.198288] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > > [  657.198293] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > > [  657.198299] ath11k_pci 0000:56:00.0: Failed to set CTS prot for VDEV: 0
> > > [  660.205991] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > > [  660.205995] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > > [  660.206000] ath11k_pci 0000:56:00.0: Failed to set erp slot for VDEV: 0
> > > [  663.213835] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > > [  663.213840] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > > [  663.213846] ath11k_pci 0000:56:00.0: Failed to set preamble for VDEV: 0
> > > [  666.221628] ath11k_pci 0000:56:00.0: wmi command 20487 timeout
> > > [  666.221633] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_DOWN cmd
> > > [  666.221639] ath11k_pci 0000:56:00.0: failed to down vdev 0: -11
> > > [  669.229407] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > [  669.229412] ath11k_pci 0000:56:00.0: failed to send
> > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > [  669.229417] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > [  672.237193] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > [  672.237198] ath11k_pci 0000:56:00.0: failed to send
> > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > [  672.237203] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > [  675.244963] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > [  675.244968] ath11k_pci 0000:56:00.0: failed to send
> > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > [  675.244971] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > [  678.252682] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > [  678.252689] ath11k_pci 0000:56:00.0: failed to send
> > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > [  678.252695] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > [  681.260582] ath11k_pci 0000:56:00.0: wmi command 20486 timeout
> > > [  681.260587] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_STOP
> > > cmd
> > > [  681.260594] ath11k_pci 0000:56:00.0: failed to stop WMI vdev 0: -11
> > > [  681.260596] ath11k_pci 0000:56:00.0: failed to stop vdev 0: -11
> > > [  686.764099] ath11k_pci 0000:56:00.0: failed to flush transmit queue
> > > 0
> > > [  689.771891] ath11k_pci 0000:56:00.0: wmi command 20482 timeout
> > > [  689.771897] ath11k_pci 0000:56:00.0: failed to submit
> > > WMI_VDEV_DELETE_CMDID
> > > [  689.771904] ath11k_pci 0000:56:00.0: failed to delete WMI vdev 0:
> > > -11
> > > [  719.529733] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
> > > [  719.529740] ath11k_pci 0000:56:00.0: failed to send
> > > WMI_PDEV_SET_PARAM cmd
> > > [  719.529748] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
> > > [  722.793499] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
> > > [  722.793517] ath11k_pci 0000:56:00.0: failed to send
> > > WMI_PDEV_SET_PARAM cmd
> > > [  722.793524] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
> > >
> > > Apologies for the long output, hopefully something here is useful.
> > >
> > > I haven't had my whole system freeze yet like I did prior to these
> > > patches, however I've only been running these patches for a few hours
> > > so far, currently on my third boot.
> > >
> > > You can find the nix configuration I'm working on for the xps 9310
> > > that includes the new patches here:
> > >
> > > https://github.com/NixOS/nixos-hardware/pull/207
> > >
> > > On Thu, Nov 19, 2020 at 8:52 PM Kalle Valo <kvalo@codeaurora.org> wrote:
> > > >
> > > > Kalle Valo <kvalo@codeaurora.org> writes:
> > > >
> > > > > (Bcc: people reporting qca6390 problems)
> > > > >
> > > > > Hi,
> > > > >
> > > > > I collected all important QCA6390 fixes to ath11k-qca6390 branch so that
> > > > > there's a good baseline for all testing:
> > > > >
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=ath11k-qca6390-bringup
> > > > >
> > > > > At the moment it's based on v5.10-rc4 and I will try to update it to a
> > > > > recent -rc release every few weeks or so. Everytime I update the branch
> > > > > I create a new tag and the latest tag is now:
> > > > >
> > > > > ath11k-qca6390-bringup-202011191920
> > > > >
> > > > > In this tag there's now a brand new implementation for suspend, which
> > > > > relies that the platform provides power to QCA6390 during suspend. Not
> > > > > all platforms do, but most of them should do that. ath11k also prints a
> > > > > warning whenever it notices that the firmware has crashed, but I'm not
> > > > > sure yet if it (the MHI subsystem to be exact) can detect every case.
> > > > >
> > > > > The MSI patch is mostly the same, it had just some refactoring since the
> > > > > last version. Unfortunately there's no solution still for the weird
> > > > > crashes some people are seeing.
> > > >
> > > > Forgot to mention when debugging ath11k PCI issues it's a good idea to
> > > > enable MHI debug messages. To do that enable CONFIG_MHI_BUS_DEBUG and
> > > > CONFIG_DYNAMIC_DEBUG and run:
> > > >
> > > > sudo sh -c "echo -n 'module mhi +p' > /sys/kernel/debug/dynamic_debug/control"
> > > >
> > > > --
> > > > https://patchwork.kernel.org/project/linux-wireless/list/
> > > >
> > > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> >
> > --
> > ath11k mailing list
> > ath11k@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/ath11k
>
> So after your message I was wracking my brain to sort out any
> differences between our configs.  I did disable vt / vt-d and that
> seems to have increased the stability of things some, but I still see
> occasional hangs on initialization / association.

Good morning,

  As I've been bouncing around reading up on the current kernel
internals + the single MSI patch trying to get to a point where I can
dive into this deeply, I think I may have found part of the racing.
When I check /proc/interrupts to find the driver I see:

194:          0          0          0          0          0          0
      7111          0   PCI-MSI 44564480-edge      ce0, ce1, ce2, ce3,
ce5, ce7, ce8, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
DP_EXT_IRQ, bhi, mhi, mhi

Looking at the patch, there are 4 places where IRQs are being
requested, 2 in the MHI code (bhi/mhi), and 2 in the ath11k PCI code
(ce* and DP_EXT_IRQ).  The patch changes the calls to
request_(threaded)_irq and modifies the flags to add IRQ_SHARED which
is allowing these irq handlers to mount this single available IRQ.
Each handler is accepting the dev_id parameter as a void * which then
gets cast into a relevant data structure for the handler and used /
accessed.  My understanding from the reading I did is that since the
IRQ is now shared, each of these handlers needs to ensure/detect that
the dev_id is actually relevant for it to handle it, and if not return
IRQ_NONE.  Is it possible the wrong handlers are being
invoked/executing occasionally and casting/accessing things
incorrectly, or did I misunderstand how the IRQ sharing works?  If I
am reading that correctly, does that also have implications for the
disabling/enabling of the IRQ everywhere?

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-23  3:14         ` wi nk
@ 2020-11-23 23:30           ` wi nk
  2020-11-23 23:38             ` wi nk
  0 siblings, 1 reply; 19+ messages in thread
From: wi nk @ 2020-11-23 23:30 UTC (permalink / raw)
  To: Mitchell Nordine, Carl Huang; +Cc: ath11k, Kalle Valo

On Mon, Nov 23, 2020 at 4:14 AM wi nk <wink@technolu.st> wrote:
>
> On Sun, Nov 22, 2020 at 4:07 PM wi nk <wink@technolu.st> wrote:
> >
> > On Sun, Nov 22, 2020 at 2:15 PM Mitchell Nordine
> > <mitchell.nordine@gmail.com> wrote:
> > >
> > > > Unfortunately there's no solution still for the weird
> > > crashes some people are seeing.
> > >
> > > Can confirm, the spurious system freezing still continues. This time
> > > while typing my password into the gdm UI for login.
> > >
> > > On Sun, Nov 22, 2020 at 12:44 AM Mitchell Nordine
> > > <mitchell.nordine@gmail.com> wrote:
> > > >
> > > > Thanks for the update!
> > > >
> > > > I no longer notice any errors related to ath11k during boot of NixOS
> > > > on my XPS 13 9310 with these patches:
> > > >
> > > > [mindtree@mindtree:~]$ dmesg | grep -e ath11
> > > > [    4.084314] ath11k_pci 0000:56:00.0: WARNING: ath11k PCI support is
> > > > experimental!
> > > > [    4.084358] ath11k_pci 0000:56:00.0: BAR 0: assigned [mem
> > > > 0x8c300000-0x8c3fffff 64bit]
> > > > [    4.084377] ath11k_pci 0000:56:00.0: enabling device (0000 -> 0002)
> > > > [    4.084442] ath11k_pci 0000:56:00.0: MSI vectors: 1
> > > > [    4.320847] ath11k_pci 0000:56:00.0: qmi req mem_seg[0] 0x59c00000 3522560 1
> > > > [    4.320849] ath11k_pci 0000:56:00.0: qmi req mem_seg[1] 0x5a200000 884736 4
> > > > [    4.330816] ath11k_pci 0000:56:00.0: chip_id 0x0 chip_family 0xb
> > > > board_id 0xff soc_id 0xffffffff
> > > > [    4.330818] ath11k_pci 0000:56:00.0: fw_version 0x101c06cc
> > > > fw_build_timestamp 2020-06-24 19:50 fw_build_id
> > > > [    4.521522] ath11k_pci 0000:56:00.0 wlp86s0: renamed from wlan0
> > > >
> > > > Everything appears to run smoothly for the first 5-10 minutes, then
> > > > the firmware appears to crash and the internet drops out:
> > > >
> > > > [  293.677300] ath11k_pci 0000:56:00.0: firmware crashed:
> > > > MHI_CB_SYS_ERROR
> > > > [  385.774509] mhi 0000:56:00.0: Device failed to exit MHI Reset state
> > > >
> > > > I haven't yet been able to identify an action that consistently causes
> > > > the crash.
> > > >
> > > > Following the crash, the gnome shell appears to still believe that the
> > > > connection is up, however upon clicking on the wifi in the top-right
> > > > drop-down menu and clicking the "Turn Off" option, the shell freezes
> > > > for a few seconds and a few more errors show up in dmesg:
> > > >
> > > > [  634.018718] wlp86s0: deauthenticating from 7a:8a:20:d5:98:d7 by
> > > > local choice (Reason: 3=DEAUTH_LEAVING)
> > > > [  639.151611] ath11k_pci 0000:56:00.0: failed to flush transmit queue
> > > > 0
> > > > [  642.159384] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > > > [  642.159388] ath11k_pci 0000:56:00.0: failed to send
> > > > WMI_PEER_REORDER_QUEUE_SETUP
> > > > [  642.159394] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > > > [  642.159400] wlp86s0: HW problem - can not stop rx aggregation for
> > > > 7a:8a:20:d5:98:d7 tid 0
> > > > [  645.168070] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > > > [  645.168072] ath11k_pci 0000:56:00.0: failed to send
> > > > WMI_PEER_REORDER_QUEUE_SETUP
> > > > [  645.168074] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > > > [  645.168077] wlp86s0: HW problem - can not stop rx aggregation for
> > > > 7a:8a:20:d5:98:d7 tid 1
> > > > [  648.174960] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > > > [  648.174965] ath11k_pci 0000:56:00.0: failed to send
> > > > WMI_PEER_REORDER_QUEUE_SETUP
> > > > [  648.174971] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > > > [  648.174976] wlp86s0: HW problem - can not stop rx aggregation for
> > > > 7a:8a:20:d5:98:d7 tid 6
> > > > [  651.183596] ath11k_pci 0000:56:00.0: wmi command 20489 timeout
> > > > [  651.183601] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_INSTALL_KEY cmd
> > > > [  651.183606] ath11k_pci 0000:56:00.0: ath11k_install_key failed (-11)
> > > > [  651.183610] wlp86s0: failed to remove key (0, 7a:8a:20:d5:98:d7)
> > > > from hardware (-11)
> > > > [  654.190511] ath11k_pci 0000:56:00.0: wmi command 24578 timeout
> > > > [  654.190516] ath11k_pci 0000:56:00.0: failed to send WMI_PEER_DELETE cmd
> > > > [  654.190523] ath11k_pci 0000:56:00.0: failed to delete peer vdev_id
> > > > 0 addr 7a:8a:20:d5:98:d7 ret -11
> > > > [  654.190526] ath11k_pci 0000:56:00.0: Failed to delete peer:
> > > > 7a:8a:20:d5:98:d7 for VDEV: 0
> > > > [  654.190528] ath11k_pci 0000:56:00.0: Found peer entry
> > > > 9c:b6:d0:3e:43:4a n vdev 0 after it was supposedly removed
> > > > [  654.190574] ------------[ cut here ]------------
> > > > [  654.190594] WARNING: CPU: 5 PID: 1208 at
> > > > net/mac80211/sta_info.c:1098 __sta_info_destroy_part2+0x11c/0x140
> > > > [mac80211]
> > > > [  654.190595] Modules linked in: ath9k_htc ath9k_common ath9k_hw ath
> > > > fuse ctr ccm michael_mic af_packet cdc_ether usbnet r8152 mii
> > > > typec_displayport uvcvideo
> > > > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common
> > > > videodev mc hid_sensor_als hid_sensor_trigger
> > > > industrialio_triggered_buffer kfifo_buf hid_se
> > > > nsor_iio_common industrialio hid_sensor_hub intel_ishtp_loader joydev
> > > > mousedev intel_ishtp_hid wacom usbhid hid_multitouch hid_generic
> > > > qrtr_mhi iTCO_wdt intel_
> > > > pmc_bxt 8250_dw watchdog mei_hdcp i2c_designware_platform
> > > > i2c_designware_core intel_rapl_msr snd_sof_pci snd_sof_intel_byt
> > > > snd_sof_intel_ipc qrtr dell_wmi wmi_
> > > > bmof ns snd_sof_intel_hda_common dell_laptop ath11k_pci
> > > > snd_soc_hdac_hda dell_smbios snd_sof_xtensa_dsp snd_hda_codec_hdmi mhi
> > > > snd_sof_intel_hda dell_wmi_descr
> > > > iptor dcdbas snd_sof ath11k x86_pkg_temp_thermal intel_powerclamp
> > > > dell_smm_hwmon qmi_helpers snd_hda_ext_core coretemp crc32_pclmul
> > > > ghash_clmulni_intel snd_soc
> > > > _acpi_intel_match aesni_intel snd_soc_acpi
> > > > [  654.190666]  snd_hda_codec_realtek libaes mac80211 crypto_simd
> > > > cryptd glue_helper snd_hda_codec_generic ledtrig_audio intel_cstate
> > > > snd_soc_core intel_uncore
> > > >  snd_compress sha256_ssse3 ac97_bus snd_pcm_dmaengine sha256_generic
> > > > input_leds led_class deflate snd_hda_intel intel_spi_pci efi_pstore
> > > > snd_intel_dspcfg cfg80
> > > > 211 intel_spi serio_raw pstore spi_nor snd_hda_codec mtd nls_iso8859_1
> > > > nls_cp437 snd_hda_core vfat i2c_i801 snd_hwdep i2c_smbus rfkill
> > > > tpm_crb fat libarc4 sch_
> > > > fq_codel intel_ish_ipc mei_me intel_lpss_pci tpm_tis intel_ishtp
> > > > intel_lpss tpm_tis_core mei ucsi_acpi idma64 processor_thermal_device
> > > > virt_dma tpm typec_ucsi
> > > > intel_rapl_common 8250_pci intel_soc_dts_iosf typec snd_pcm_oss
> > > > rng_core snd_mixer_oss tiny_power_button snd_pcm wmi battery button
> > > > snd_timer snd i2c_hid sound
> > > > core hid msr int3403_thermal evdev int340x_thermal_zone mac_hid
> > > > int3400_thermal acpi_thermal_rel intel_hid sparse_keymap
> > > > pinctrl_tigerlake intel_pmc_core acpi_
> > > > tad ac acpi_pad loop cpufreq_powersave tun tap
> > > > [  654.190754]  macvlan bridge stp llc kvm_intel kvm irqbypass
> > > > efivarfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache
> > > > jbd2 xhci_pci xhci_pci_ren
> > > > esas rtsx_pci_sdmmc xhci_hcd mmc_core atkbd libps2 usbcore thunderbolt
> > > > nvme nvme_core rtsx_pci crc32c_intel t10_pi crc_t10dif
> > > > crct10dif_generic crct10dif_pclmu
> > > > l usb_common crct10dif_common i8042 rtc_cmos serio dm_mod i915 video
> > > > intel_gtt i2c_algo_bit cec drm_kms_helper syscopyarea sysfillrect
> > > > sysimgblt fb_sys_fops dr
> > > > m i2c_core backlight agpgart
> > > > [  654.190811] CPU: 5 PID: 1208 Comm: NetworkManager Tainted: G
> > > > W I       5.10.0-rc4 #1-NixOS
> > > > [  654.190813] Hardware name: Dell Inc. XPS 13 9310/0F7M4C, BIOS 1.1.1
> > > > 10/05/2020
> > > > [  654.190825] RIP: 0010:__sta_info_destroy_part2+0x11c/0x140 [mac80211]
> > > > [  654.190829] Code: ff 0f 0b 80 bd 14 01 00 00 00 74 82 45 31 c0 b9
> > > > 01 00 00 00 48 89 ea 48 89 de 4c 89 e7 e8 ac ad ff ff 85 c0 0f 84 64
> > > > ff ff ff <0f> 0b e9 5
> > > > d ff ff ff be 03 00 00 00 48 89 ef e8 10 ea ff ff 85 c0
> > > > [  654.190831] RSP: 0018:ffffac81c0897b80 EFLAGS: 00010286
> > > > [  654.190834] RAX: 00000000fffffff5 RBX: ffff9d54d5800900 RCX:
> > > > 0000000000000000
> > > > [  654.190836] RDX: ffff9d54c3d0bf00 RSI: 000000000020001a RDI:
> > > > ffff9d54d629b5d8
> > > > [  654.190837] RBP: ffff9d54c778f000 R08: 0000000000000000 R09:
> > > > ffffffffc1245800
> > > > [  654.190838] R10: ffff9d54cde07800 R11: 0000000000000001 R12:
> > > > ffff9d54d6298800
> > > > [  654.190840] R13: ffff9d54d5800900 R14: 0000000000000001 R15:
> > > > ffff9d54d6298de0
> > > > [  654.190842] FS:  00007f1bf8509040(0000) GS:ffff9d5c2f740000(0000)
> > > > knlGS:0000000000000000
> > > > [  654.190844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  654.190845] CR2: 00007f6d3cb34000 CR3: 0000000118d9a006 CR4:
> > > > 0000000000770ee0
> > > > [  654.190847] PKRU: 55555554
> > > > [  654.190848] Call Trace:
> > > > [  654.190866]  __sta_info_flush+0x123/0x180 [mac80211]
> > > > [  654.190885]  ieee80211_set_disassoc+0xba/0x5d0 [mac80211]
> > > > [  654.190902]  ieee80211_mgd_deauth.cold+0x49/0x1bf [mac80211]
> > > > [  654.190923]  cfg80211_mlme_deauth+0xb1/0x1b0 [cfg80211]
> > > > [  654.190939]  cfg80211_mlme_down+0x66/0x90 [cfg80211]
> > > > [  654.190955]  cfg80211_disconnect+0x128/0x1b0 [cfg80211]
> > > > [  654.190967]  cfg80211_leave+0x27/0x40 [cfg80211]
> > > > [  654.190977]  cfg80211_netdev_notifier_call+0xec/0x440 [cfg80211]
> > > > [  654.190984]  raw_notifier_call_chain+0x44/0x60
> > > > [  654.190991]  __dev_close_many+0x5f/0x110
> > > > [  654.190995]  dev_close_many+0x81/0x130
> > > > [  654.190999]  dev_close.part.0+0x3e/0x70
> > > > [  654.191008]  cfg80211_shutdown_all_interfaces+0x71/0xd0 [cfg80211]
> > > > [  654.191017]  cfg80211_rfkill_set_block+0x22/0x30 [cfg80211]
> > > > [  654.191022]  rfkill_set_block+0x92/0x140 [rfkill]
> > > > [  654.191026]  rfkill_fop_write+0x11f/0x1c0 [rfkill]
> > > > [  654.191032]  vfs_write+0xc7/0x280
> > > > [  654.191035]  ksys_write+0xa7/0xe0
> > > > [  654.191041]  do_syscall_64+0x33/0x40
> > > > [  654.191045]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > [  654.191048] RIP: 0033:0x7f1bf93906f7
> > > > [  654.191052] Code: 1f 40 00 41 54 49 89 d4 55 48 89 f5 53 89 fb 48
> > > > 83 ec 10 e8 fb fc ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00
> > > > 00 0f 05 <48> 3d 00 f
> > > > 0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 54 fd ff ff 48
> > > > [  654.191053] RSP: 002b:00007ffc79f67e10 EFLAGS: 00000293 ORIG_RAX:
> > > > 0000000000000001
> > > > [  654.191056] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007f1bf93906f7
> > > > [  654.191057] RDX: 0000000000000008 RSI: 00007ffc79f67e48 RDI: 000000000000001d
> > > > [  654.191059] RBP: 00007ffc79f67e48 R08: 0000000000000000 R09: 0000000000000001
> > > > [  654.191060] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
> > > > [  654.191061] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000001b10c20
> > > > [  654.191075] ---[ end trace 4fd47da3698c4a9f ]---
> > > > [  657.198288] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > > > [  657.198293] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > > > [  657.198299] ath11k_pci 0000:56:00.0: Failed to set CTS prot for VDEV: 0
> > > > [  660.205991] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > > > [  660.205995] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > > > [  660.206000] ath11k_pci 0000:56:00.0: Failed to set erp slot for VDEV: 0
> > > > [  663.213835] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > > > [  663.213840] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > > > [  663.213846] ath11k_pci 0000:56:00.0: Failed to set preamble for VDEV: 0
> > > > [  666.221628] ath11k_pci 0000:56:00.0: wmi command 20487 timeout
> > > > [  666.221633] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_DOWN cmd
> > > > [  666.221639] ath11k_pci 0000:56:00.0: failed to down vdev 0: -11
> > > > [  669.229407] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > > [  669.229412] ath11k_pci 0000:56:00.0: failed to send
> > > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > > [  669.229417] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > > [  672.237193] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > > [  672.237198] ath11k_pci 0000:56:00.0: failed to send
> > > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > > [  672.237203] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > > [  675.244963] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > > [  675.244968] ath11k_pci 0000:56:00.0: failed to send
> > > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > > [  675.244971] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > > [  678.252682] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > > [  678.252689] ath11k_pci 0000:56:00.0: failed to send
> > > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > > [  678.252695] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > > [  681.260582] ath11k_pci 0000:56:00.0: wmi command 20486 timeout
> > > > [  681.260587] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_STOP
> > > > cmd
> > > > [  681.260594] ath11k_pci 0000:56:00.0: failed to stop WMI vdev 0: -11
> > > > [  681.260596] ath11k_pci 0000:56:00.0: failed to stop vdev 0: -11
> > > > [  686.764099] ath11k_pci 0000:56:00.0: failed to flush transmit queue
> > > > 0
> > > > [  689.771891] ath11k_pci 0000:56:00.0: wmi command 20482 timeout
> > > > [  689.771897] ath11k_pci 0000:56:00.0: failed to submit
> > > > WMI_VDEV_DELETE_CMDID
> > > > [  689.771904] ath11k_pci 0000:56:00.0: failed to delete WMI vdev 0:
> > > > -11
> > > > [  719.529733] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
> > > > [  719.529740] ath11k_pci 0000:56:00.0: failed to send
> > > > WMI_PDEV_SET_PARAM cmd
> > > > [  719.529748] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
> > > > [  722.793499] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
> > > > [  722.793517] ath11k_pci 0000:56:00.0: failed to send
> > > > WMI_PDEV_SET_PARAM cmd
> > > > [  722.793524] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
> > > >
> > > > Apologies for the long output, hopefully something here is useful.
> > > >
> > > > I haven't had my whole system freeze yet like I did prior to these
> > > > patches, however I've only been running these patches for a few hours
> > > > so far, currently on my third boot.
> > > >
> > > > You can find the nix configuration I'm working on for the xps 9310
> > > > that includes the new patches here:
> > > >
> > > > https://github.com/NixOS/nixos-hardware/pull/207
> > > >
> > > > On Thu, Nov 19, 2020 at 8:52 PM Kalle Valo <kvalo@codeaurora.org> wrote:
> > > > >
> > > > > Kalle Valo <kvalo@codeaurora.org> writes:
> > > > >
> > > > > > (Bcc: people reporting qca6390 problems)
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I collected all important QCA6390 fixes to ath11k-qca6390 branch so that
> > > > > > there's a good baseline for all testing:
> > > > > >
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=ath11k-qca6390-bringup
> > > > > >
> > > > > > At the moment it's based on v5.10-rc4 and I will try to update it to a
> > > > > > recent -rc release every few weeks or so. Everytime I update the branch
> > > > > > I create a new tag and the latest tag is now:
> > > > > >
> > > > > > ath11k-qca6390-bringup-202011191920
> > > > > >
> > > > > > In this tag there's now a brand new implementation for suspend, which
> > > > > > relies that the platform provides power to QCA6390 during suspend. Not
> > > > > > all platforms do, but most of them should do that. ath11k also prints a
> > > > > > warning whenever it notices that the firmware has crashed, but I'm not
> > > > > > sure yet if it (the MHI subsystem to be exact) can detect every case.
> > > > > >
> > > > > > The MSI patch is mostly the same, it had just some refactoring since the
> > > > > > last version. Unfortunately there's no solution still for the weird
> > > > > > crashes some people are seeing.
> > > > >
> > > > > Forgot to mention when debugging ath11k PCI issues it's a good idea to
> > > > > enable MHI debug messages. To do that enable CONFIG_MHI_BUS_DEBUG and
> > > > > CONFIG_DYNAMIC_DEBUG and run:
> > > > >
> > > > > sudo sh -c "echo -n 'module mhi +p' > /sys/kernel/debug/dynamic_debug/control"
> > > > >
> > > > > --
> > > > > https://patchwork.kernel.org/project/linux-wireless/list/
> > > > >
> > > > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > >
> > > --
> > > ath11k mailing list
> > > ath11k@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/ath11k
> >
> > So after your message I was wracking my brain to sort out any
> > differences between our configs.  I did disable vt / vt-d and that
> > seems to have increased the stability of things some, but I still see
> > occasional hangs on initialization / association.
>
> Good morning,
>
>   As I've been bouncing around reading up on the current kernel
> internals + the single MSI patch trying to get to a point where I can
> dive into this deeply, I think I may have found part of the racing.
> When I check /proc/interrupts to find the driver I see:
>
> 194:          0          0          0          0          0          0
>       7111          0   PCI-MSI 44564480-edge      ce0, ce1, ce2, ce3,
> ce5, ce7, ce8, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
> DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
> DP_EXT_IRQ, bhi, mhi, mhi
>
> Looking at the patch, there are 4 places where IRQs are being
> requested, 2 in the MHI code (bhi/mhi), and 2 in the ath11k PCI code
> (ce* and DP_EXT_IRQ).  The patch changes the calls to
> request_(threaded)_irq and modifies the flags to add IRQ_SHARED which
> is allowing these irq handlers to mount this single available IRQ.
> Each handler is accepting the dev_id parameter as a void * which then
> gets cast into a relevant data structure for the handler and used /
> accessed.  My understanding from the reading I did is that since the
> IRQ is now shared, each of these handlers needs to ensure/detect that
> the dev_id is actually relevant for it to handle it, and if not return
> IRQ_NONE.  Is it possible the wrong handlers are being
> invoked/executing occasionally and casting/accessing things
> incorrectly, or did I misunderstand how the IRQ sharing works?  If I
> am reading that correctly, does that also have implications for the
> disabling/enabling of the IRQ everywhere?

I went ahead and hacked a quick patch together that implements the
dev_id checking per interrupt handler and that seems to have fixed the
freezes without any indication.  Now reliably if things are going to
crash, I'll receive the RT throttling message from the scheduler and
then things will completely hang about a number of seconds later.  I
added the instrumentation to enable the verbose MHI printing, and it
seems the mhi_intvec_threaded_handler is printing some additional
information.  First, if things are behaving nominally, I see the state
transitions from m0 -> m1 -> m2 and then things stay mostly in m2 (I
can't say for 100%, it's quite fast).  However when things are
crashing, this printing is showing it.

[  312.xxx] mhi 0000:55:00.0: local ee:AMSS device ee:AMSS dev_state:M2
[  313.024033] mhi 0000:55:00.0: local ee:INVALID_EE device
ee:INVALID_EE dev_state:SYS_ERR
[  313.024033] mhi 0000:55:00.0: System error detected

I'll see the last 2 prints repeat a 5-6 times, then comes the throttling:

[  313.124033] sched: RT throttling activated

then a couple more attempts to reset the state of things, then the
machine will hang with the fans spinning fully.

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-23 23:30           ` wi nk
@ 2020-11-23 23:38             ` wi nk
  2020-11-26 22:45               ` wi nk
  0 siblings, 1 reply; 19+ messages in thread
From: wi nk @ 2020-11-23 23:38 UTC (permalink / raw)
  To: Mitchell Nordine, Carl Huang; +Cc: ath11k, Kalle Valo

On Tue, Nov 24, 2020 at 12:30 AM wi nk <wink@technolu.st> wrote:
>
> On Mon, Nov 23, 2020 at 4:14 AM wi nk <wink@technolu.st> wrote:
> >
> > On Sun, Nov 22, 2020 at 4:07 PM wi nk <wink@technolu.st> wrote:
> > >
> > > On Sun, Nov 22, 2020 at 2:15 PM Mitchell Nordine
> > > <mitchell.nordine@gmail.com> wrote:
> > > >
> > > > > Unfortunately there's no solution still for the weird
> > > > crashes some people are seeing.
> > > >
> > > > Can confirm, the spurious system freezing still continues. This time
> > > > while typing my password into the gdm UI for login.
> > > >
> > > > On Sun, Nov 22, 2020 at 12:44 AM Mitchell Nordine
> > > > <mitchell.nordine@gmail.com> wrote:
> > > > >
> > > > > Thanks for the update!
> > > > >
> > > > > I no longer notice any errors related to ath11k during boot of NixOS
> > > > > on my XPS 13 9310 with these patches:
> > > > >
> > > > > [mindtree@mindtree:~]$ dmesg | grep -e ath11
> > > > > [    4.084314] ath11k_pci 0000:56:00.0: WARNING: ath11k PCI support is
> > > > > experimental!
> > > > > [    4.084358] ath11k_pci 0000:56:00.0: BAR 0: assigned [mem
> > > > > 0x8c300000-0x8c3fffff 64bit]
> > > > > [    4.084377] ath11k_pci 0000:56:00.0: enabling device (0000 -> 0002)
> > > > > [    4.084442] ath11k_pci 0000:56:00.0: MSI vectors: 1
> > > > > [    4.320847] ath11k_pci 0000:56:00.0: qmi req mem_seg[0] 0x59c00000 3522560 1
> > > > > [    4.320849] ath11k_pci 0000:56:00.0: qmi req mem_seg[1] 0x5a200000 884736 4
> > > > > [    4.330816] ath11k_pci 0000:56:00.0: chip_id 0x0 chip_family 0xb
> > > > > board_id 0xff soc_id 0xffffffff
> > > > > [    4.330818] ath11k_pci 0000:56:00.0: fw_version 0x101c06cc
> > > > > fw_build_timestamp 2020-06-24 19:50 fw_build_id
> > > > > [    4.521522] ath11k_pci 0000:56:00.0 wlp86s0: renamed from wlan0
> > > > >
> > > > > Everything appears to run smoothly for the first 5-10 minutes, then
> > > > > the firmware appears to crash and the internet drops out:
> > > > >
> > > > > [  293.677300] ath11k_pci 0000:56:00.0: firmware crashed:
> > > > > MHI_CB_SYS_ERROR
> > > > > [  385.774509] mhi 0000:56:00.0: Device failed to exit MHI Reset state
> > > > >
> > > > > I haven't yet been able to identify an action that consistently causes
> > > > > the crash.
> > > > >
> > > > > Following the crash, the gnome shell appears to still believe that the
> > > > > connection is up, however upon clicking on the wifi in the top-right
> > > > > drop-down menu and clicking the "Turn Off" option, the shell freezes
> > > > > for a few seconds and a few more errors show up in dmesg:
> > > > >
> > > > > [  634.018718] wlp86s0: deauthenticating from 7a:8a:20:d5:98:d7 by
> > > > > local choice (Reason: 3=DEAUTH_LEAVING)
> > > > > [  639.151611] ath11k_pci 0000:56:00.0: failed to flush transmit queue
> > > > > 0
> > > > > [  642.159384] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > > > > [  642.159388] ath11k_pci 0000:56:00.0: failed to send
> > > > > WMI_PEER_REORDER_QUEUE_SETUP
> > > > > [  642.159394] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > > > > [  642.159400] wlp86s0: HW problem - can not stop rx aggregation for
> > > > > 7a:8a:20:d5:98:d7 tid 0
> > > > > [  645.168070] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > > > > [  645.168072] ath11k_pci 0000:56:00.0: failed to send
> > > > > WMI_PEER_REORDER_QUEUE_SETUP
> > > > > [  645.168074] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > > > > [  645.168077] wlp86s0: HW problem - can not stop rx aggregation for
> > > > > 7a:8a:20:d5:98:d7 tid 1
> > > > > [  648.174960] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > > > > [  648.174965] ath11k_pci 0000:56:00.0: failed to send
> > > > > WMI_PEER_REORDER_QUEUE_SETUP
> > > > > [  648.174971] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > > > > [  648.174976] wlp86s0: HW problem - can not stop rx aggregation for
> > > > > 7a:8a:20:d5:98:d7 tid 6
> > > > > [  651.183596] ath11k_pci 0000:56:00.0: wmi command 20489 timeout
> > > > > [  651.183601] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_INSTALL_KEY cmd
> > > > > [  651.183606] ath11k_pci 0000:56:00.0: ath11k_install_key failed (-11)
> > > > > [  651.183610] wlp86s0: failed to remove key (0, 7a:8a:20:d5:98:d7)
> > > > > from hardware (-11)
> > > > > [  654.190511] ath11k_pci 0000:56:00.0: wmi command 24578 timeout
> > > > > [  654.190516] ath11k_pci 0000:56:00.0: failed to send WMI_PEER_DELETE cmd
> > > > > [  654.190523] ath11k_pci 0000:56:00.0: failed to delete peer vdev_id
> > > > > 0 addr 7a:8a:20:d5:98:d7 ret -11
> > > > > [  654.190526] ath11k_pci 0000:56:00.0: Failed to delete peer:
> > > > > 7a:8a:20:d5:98:d7 for VDEV: 0
> > > > > [  654.190528] ath11k_pci 0000:56:00.0: Found peer entry
> > > > > 9c:b6:d0:3e:43:4a n vdev 0 after it was supposedly removed
> > > > > [  654.190574] ------------[ cut here ]------------
> > > > > [  654.190594] WARNING: CPU: 5 PID: 1208 at
> > > > > net/mac80211/sta_info.c:1098 __sta_info_destroy_part2+0x11c/0x140
> > > > > [mac80211]
> > > > > [  654.190595] Modules linked in: ath9k_htc ath9k_common ath9k_hw ath
> > > > > fuse ctr ccm michael_mic af_packet cdc_ether usbnet r8152 mii
> > > > > typec_displayport uvcvideo
> > > > > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common
> > > > > videodev mc hid_sensor_als hid_sensor_trigger
> > > > > industrialio_triggered_buffer kfifo_buf hid_se
> > > > > nsor_iio_common industrialio hid_sensor_hub intel_ishtp_loader joydev
> > > > > mousedev intel_ishtp_hid wacom usbhid hid_multitouch hid_generic
> > > > > qrtr_mhi iTCO_wdt intel_
> > > > > pmc_bxt 8250_dw watchdog mei_hdcp i2c_designware_platform
> > > > > i2c_designware_core intel_rapl_msr snd_sof_pci snd_sof_intel_byt
> > > > > snd_sof_intel_ipc qrtr dell_wmi wmi_
> > > > > bmof ns snd_sof_intel_hda_common dell_laptop ath11k_pci
> > > > > snd_soc_hdac_hda dell_smbios snd_sof_xtensa_dsp snd_hda_codec_hdmi mhi
> > > > > snd_sof_intel_hda dell_wmi_descr
> > > > > iptor dcdbas snd_sof ath11k x86_pkg_temp_thermal intel_powerclamp
> > > > > dell_smm_hwmon qmi_helpers snd_hda_ext_core coretemp crc32_pclmul
> > > > > ghash_clmulni_intel snd_soc
> > > > > _acpi_intel_match aesni_intel snd_soc_acpi
> > > > > [  654.190666]  snd_hda_codec_realtek libaes mac80211 crypto_simd
> > > > > cryptd glue_helper snd_hda_codec_generic ledtrig_audio intel_cstate
> > > > > snd_soc_core intel_uncore
> > > > >  snd_compress sha256_ssse3 ac97_bus snd_pcm_dmaengine sha256_generic
> > > > > input_leds led_class deflate snd_hda_intel intel_spi_pci efi_pstore
> > > > > snd_intel_dspcfg cfg80
> > > > > 211 intel_spi serio_raw pstore spi_nor snd_hda_codec mtd nls_iso8859_1
> > > > > nls_cp437 snd_hda_core vfat i2c_i801 snd_hwdep i2c_smbus rfkill
> > > > > tpm_crb fat libarc4 sch_
> > > > > fq_codel intel_ish_ipc mei_me intel_lpss_pci tpm_tis intel_ishtp
> > > > > intel_lpss tpm_tis_core mei ucsi_acpi idma64 processor_thermal_device
> > > > > virt_dma tpm typec_ucsi
> > > > > intel_rapl_common 8250_pci intel_soc_dts_iosf typec snd_pcm_oss
> > > > > rng_core snd_mixer_oss tiny_power_button snd_pcm wmi battery button
> > > > > snd_timer snd i2c_hid sound
> > > > > core hid msr int3403_thermal evdev int340x_thermal_zone mac_hid
> > > > > int3400_thermal acpi_thermal_rel intel_hid sparse_keymap
> > > > > pinctrl_tigerlake intel_pmc_core acpi_
> > > > > tad ac acpi_pad loop cpufreq_powersave tun tap
> > > > > [  654.190754]  macvlan bridge stp llc kvm_intel kvm irqbypass
> > > > > efivarfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache
> > > > > jbd2 xhci_pci xhci_pci_ren
> > > > > esas rtsx_pci_sdmmc xhci_hcd mmc_core atkbd libps2 usbcore thunderbolt
> > > > > nvme nvme_core rtsx_pci crc32c_intel t10_pi crc_t10dif
> > > > > crct10dif_generic crct10dif_pclmu
> > > > > l usb_common crct10dif_common i8042 rtc_cmos serio dm_mod i915 video
> > > > > intel_gtt i2c_algo_bit cec drm_kms_helper syscopyarea sysfillrect
> > > > > sysimgblt fb_sys_fops dr
> > > > > m i2c_core backlight agpgart
> > > > > [  654.190811] CPU: 5 PID: 1208 Comm: NetworkManager Tainted: G
> > > > > W I       5.10.0-rc4 #1-NixOS
> > > > > [  654.190813] Hardware name: Dell Inc. XPS 13 9310/0F7M4C, BIOS 1.1.1
> > > > > 10/05/2020
> > > > > [  654.190825] RIP: 0010:__sta_info_destroy_part2+0x11c/0x140 [mac80211]
> > > > > [  654.190829] Code: ff 0f 0b 80 bd 14 01 00 00 00 74 82 45 31 c0 b9
> > > > > 01 00 00 00 48 89 ea 48 89 de 4c 89 e7 e8 ac ad ff ff 85 c0 0f 84 64
> > > > > ff ff ff <0f> 0b e9 5
> > > > > d ff ff ff be 03 00 00 00 48 89 ef e8 10 ea ff ff 85 c0
> > > > > [  654.190831] RSP: 0018:ffffac81c0897b80 EFLAGS: 00010286
> > > > > [  654.190834] RAX: 00000000fffffff5 RBX: ffff9d54d5800900 RCX:
> > > > > 0000000000000000
> > > > > [  654.190836] RDX: ffff9d54c3d0bf00 RSI: 000000000020001a RDI:
> > > > > ffff9d54d629b5d8
> > > > > [  654.190837] RBP: ffff9d54c778f000 R08: 0000000000000000 R09:
> > > > > ffffffffc1245800
> > > > > [  654.190838] R10: ffff9d54cde07800 R11: 0000000000000001 R12:
> > > > > ffff9d54d6298800
> > > > > [  654.190840] R13: ffff9d54d5800900 R14: 0000000000000001 R15:
> > > > > ffff9d54d6298de0
> > > > > [  654.190842] FS:  00007f1bf8509040(0000) GS:ffff9d5c2f740000(0000)
> > > > > knlGS:0000000000000000
> > > > > [  654.190844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [  654.190845] CR2: 00007f6d3cb34000 CR3: 0000000118d9a006 CR4:
> > > > > 0000000000770ee0
> > > > > [  654.190847] PKRU: 55555554
> > > > > [  654.190848] Call Trace:
> > > > > [  654.190866]  __sta_info_flush+0x123/0x180 [mac80211]
> > > > > [  654.190885]  ieee80211_set_disassoc+0xba/0x5d0 [mac80211]
> > > > > [  654.190902]  ieee80211_mgd_deauth.cold+0x49/0x1bf [mac80211]
> > > > > [  654.190923]  cfg80211_mlme_deauth+0xb1/0x1b0 [cfg80211]
> > > > > [  654.190939]  cfg80211_mlme_down+0x66/0x90 [cfg80211]
> > > > > [  654.190955]  cfg80211_disconnect+0x128/0x1b0 [cfg80211]
> > > > > [  654.190967]  cfg80211_leave+0x27/0x40 [cfg80211]
> > > > > [  654.190977]  cfg80211_netdev_notifier_call+0xec/0x440 [cfg80211]
> > > > > [  654.190984]  raw_notifier_call_chain+0x44/0x60
> > > > > [  654.190991]  __dev_close_many+0x5f/0x110
> > > > > [  654.190995]  dev_close_many+0x81/0x130
> > > > > [  654.190999]  dev_close.part.0+0x3e/0x70
> > > > > [  654.191008]  cfg80211_shutdown_all_interfaces+0x71/0xd0 [cfg80211]
> > > > > [  654.191017]  cfg80211_rfkill_set_block+0x22/0x30 [cfg80211]
> > > > > [  654.191022]  rfkill_set_block+0x92/0x140 [rfkill]
> > > > > [  654.191026]  rfkill_fop_write+0x11f/0x1c0 [rfkill]
> > > > > [  654.191032]  vfs_write+0xc7/0x280
> > > > > [  654.191035]  ksys_write+0xa7/0xe0
> > > > > [  654.191041]  do_syscall_64+0x33/0x40
> > > > > [  654.191045]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > [  654.191048] RIP: 0033:0x7f1bf93906f7
> > > > > [  654.191052] Code: 1f 40 00 41 54 49 89 d4 55 48 89 f5 53 89 fb 48
> > > > > 83 ec 10 e8 fb fc ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00
> > > > > 00 0f 05 <48> 3d 00 f
> > > > > 0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 54 fd ff ff 48
> > > > > [  654.191053] RSP: 002b:00007ffc79f67e10 EFLAGS: 00000293 ORIG_RAX:
> > > > > 0000000000000001
> > > > > [  654.191056] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007f1bf93906f7
> > > > > [  654.191057] RDX: 0000000000000008 RSI: 00007ffc79f67e48 RDI: 000000000000001d
> > > > > [  654.191059] RBP: 00007ffc79f67e48 R08: 0000000000000000 R09: 0000000000000001
> > > > > [  654.191060] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
> > > > > [  654.191061] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000001b10c20
> > > > > [  654.191075] ---[ end trace 4fd47da3698c4a9f ]---
> > > > > [  657.198288] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > > > > [  657.198293] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > > > > [  657.198299] ath11k_pci 0000:56:00.0: Failed to set CTS prot for VDEV: 0
> > > > > [  660.205991] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > > > > [  660.205995] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > > > > [  660.206000] ath11k_pci 0000:56:00.0: Failed to set erp slot for VDEV: 0
> > > > > [  663.213835] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > > > > [  663.213840] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > > > > [  663.213846] ath11k_pci 0000:56:00.0: Failed to set preamble for VDEV: 0
> > > > > [  666.221628] ath11k_pci 0000:56:00.0: wmi command 20487 timeout
> > > > > [  666.221633] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_DOWN cmd
> > > > > [  666.221639] ath11k_pci 0000:56:00.0: failed to down vdev 0: -11
> > > > > [  669.229407] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > > > [  669.229412] ath11k_pci 0000:56:00.0: failed to send
> > > > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > > > [  669.229417] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > > > [  672.237193] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > > > [  672.237198] ath11k_pci 0000:56:00.0: failed to send
> > > > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > > > [  672.237203] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > > > [  675.244963] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > > > [  675.244968] ath11k_pci 0000:56:00.0: failed to send
> > > > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > > > [  675.244971] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > > > [  678.252682] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > > > [  678.252689] ath11k_pci 0000:56:00.0: failed to send
> > > > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > > > [  678.252695] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > > > [  681.260582] ath11k_pci 0000:56:00.0: wmi command 20486 timeout
> > > > > [  681.260587] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_STOP
> > > > > cmd
> > > > > [  681.260594] ath11k_pci 0000:56:00.0: failed to stop WMI vdev 0: -11
> > > > > [  681.260596] ath11k_pci 0000:56:00.0: failed to stop vdev 0: -11
> > > > > [  686.764099] ath11k_pci 0000:56:00.0: failed to flush transmit queue
> > > > > 0
> > > > > [  689.771891] ath11k_pci 0000:56:00.0: wmi command 20482 timeout
> > > > > [  689.771897] ath11k_pci 0000:56:00.0: failed to submit
> > > > > WMI_VDEV_DELETE_CMDID
> > > > > [  689.771904] ath11k_pci 0000:56:00.0: failed to delete WMI vdev 0:
> > > > > -11
> > > > > [  719.529733] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
> > > > > [  719.529740] ath11k_pci 0000:56:00.0: failed to send
> > > > > WMI_PDEV_SET_PARAM cmd
> > > > > [  719.529748] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
> > > > > [  722.793499] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
> > > > > [  722.793517] ath11k_pci 0000:56:00.0: failed to send
> > > > > WMI_PDEV_SET_PARAM cmd
> > > > > [  722.793524] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
> > > > >
> > > > > Apologies for the long output, hopefully something here is useful.
> > > > >
> > > > > I haven't had my whole system freeze yet like I did prior to these
> > > > > patches, however I've only been running these patches for a few hours
> > > > > so far, currently on my third boot.
> > > > >
> > > > > You can find the nix configuration I'm working on for the xps 9310
> > > > > that includes the new patches here:
> > > > >
> > > > > https://github.com/NixOS/nixos-hardware/pull/207
> > > > >
> > > > > On Thu, Nov 19, 2020 at 8:52 PM Kalle Valo <kvalo@codeaurora.org> wrote:
> > > > > >
> > > > > > Kalle Valo <kvalo@codeaurora.org> writes:
> > > > > >
> > > > > > > (Bcc: people reporting qca6390 problems)
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I collected all important QCA6390 fixes to ath11k-qca6390 branch so that
> > > > > > > there's a good baseline for all testing:
> > > > > > >
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=ath11k-qca6390-bringup
> > > > > > >
> > > > > > > At the moment it's based on v5.10-rc4 and I will try to update it to a
> > > > > > > recent -rc release every few weeks or so. Everytime I update the branch
> > > > > > > I create a new tag and the latest tag is now:
> > > > > > >
> > > > > > > ath11k-qca6390-bringup-202011191920
> > > > > > >
> > > > > > > In this tag there's now a brand new implementation for suspend, which
> > > > > > > relies that the platform provides power to QCA6390 during suspend. Not
> > > > > > > all platforms do, but most of them should do that. ath11k also prints a
> > > > > > > warning whenever it notices that the firmware has crashed, but I'm not
> > > > > > > sure yet if it (the MHI subsystem to be exact) can detect every case.
> > > > > > >
> > > > > > > The MSI patch is mostly the same, it had just some refactoring since the
> > > > > > > last version. Unfortunately there's no solution still for the weird
> > > > > > > crashes some people are seeing.
> > > > > >
> > > > > > Forgot to mention when debugging ath11k PCI issues it's a good idea to
> > > > > > enable MHI debug messages. To do that enable CONFIG_MHI_BUS_DEBUG and
> > > > > > CONFIG_DYNAMIC_DEBUG and run:
> > > > > >
> > > > > > sudo sh -c "echo -n 'module mhi +p' > /sys/kernel/debug/dynamic_debug/control"
> > > > > >
> > > > > > --
> > > > > > https://patchwork.kernel.org/project/linux-wireless/list/
> > > > > >
> > > > > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > > >
> > > > --
> > > > ath11k mailing list
> > > > ath11k@lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/ath11k
> > >
> > > So after your message I was wracking my brain to sort out any
> > > differences between our configs.  I did disable vt / vt-d and that
> > > seems to have increased the stability of things some, but I still see
> > > occasional hangs on initialization / association.
> >
> > Good morning,
> >
> >   As I've been bouncing around reading up on the current kernel
> > internals + the single MSI patch trying to get to a point where I can
> > dive into this deeply, I think I may have found part of the racing.
> > When I check /proc/interrupts to find the driver I see:
> >
> > 194:          0          0          0          0          0          0
> >       7111          0   PCI-MSI 44564480-edge      ce0, ce1, ce2, ce3,
> > ce5, ce7, ce8, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
> > DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
> > DP_EXT_IRQ, bhi, mhi, mhi
> >
> > Looking at the patch, there are 4 places where IRQs are being
> > requested, 2 in the MHI code (bhi/mhi), and 2 in the ath11k PCI code
> > (ce* and DP_EXT_IRQ).  The patch changes the calls to
> > request_(threaded)_irq and modifies the flags to add IRQ_SHARED which
> > is allowing these irq handlers to mount this single available IRQ.
> > Each handler is accepting the dev_id parameter as a void * which then
> > gets cast into a relevant data structure for the handler and used /
> > accessed.  My understanding from the reading I did is that since the
> > IRQ is now shared, each of these handlers needs to ensure/detect that
> > the dev_id is actually relevant for it to handle it, and if not return
> > IRQ_NONE.  Is it possible the wrong handlers are being
> > invoked/executing occasionally and casting/accessing things
> > incorrectly, or did I misunderstand how the IRQ sharing works?  If I
> > am reading that correctly, does that also have implications for the
> > disabling/enabling of the IRQ everywhere?
>
> I went ahead and hacked a quick patch together that implements the
> dev_id checking per interrupt handler and that seems to have fixed the
> freezes without any indication.  Now reliably if things are going to
> crash, I'll receive the RT throttling message from the scheduler and
> then things will completely hang about a number of seconds later.  I
> added the instrumentation to enable the verbose MHI printing, and it
> seems the mhi_intvec_threaded_handler is printing some additional
> information.  First, if things are behaving nominally, I see the state
> transitions from m0 -> m1 -> m2 and then things stay mostly in m2 (I
> can't say for 100%, it's quite fast).  However when things are
> crashing, this printing is showing it.
>
> [  312.xxx] mhi 0000:55:00.0: local ee:AMSS device ee:AMSS dev_state:M2
> [  313.024033] mhi 0000:55:00.0: local ee:INVALID_EE device
> ee:INVALID_EE dev_state:SYS_ERR
> [  313.024033] mhi 0000:55:00.0: System error detected
>
> I'll see the last 2 prints repeat a 5-6 times, then comes the throttling:
>
> [  313.124033] sched: RT throttling activated
>
> then a couple more attempts to reset the state of things, then the
> machine will hang with the fans spinning fully.

Sorry I found one more thing in my notes I wanted to mention.  In
drivers/bus/mhi/core/main.c , mhi_process_ctrl_ev_ring , there is a
switch handling different event types, one of those being
MHI_PKT_TYPE_STATE_CHANGE_EVENT.  When that occurs this prints:

dev_dbg(dev, "State change event to state: %s\n",
    TO_MHI_STATE_STR(new_state));

I never see a printed transition here from M1 -> M2 , the ee just
updates from under the mhi_intvec_threaded_handler printing.  I'm not
sure if that means somehow this event is being missed or it doesn't
fire for this transition?

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-23 23:38             ` wi nk
@ 2020-11-26 22:45               ` wi nk
  0 siblings, 0 replies; 19+ messages in thread
From: wi nk @ 2020-11-26 22:45 UTC (permalink / raw)
  To: Mitchell Nordine, Carl Huang; +Cc: ath11k, Kalle Valo

Good evening all,

  I've had a bit more time to hack at this to try to sort out what's
going on.  I've narrowed down the racing a bit, it begins when
ath11k_pci_ext_irq_enable is called and the napi system is enabled.
If I forcibly prevent it from enabling, the adapter will never fully
associate (as expected), but it prevents any crashing and the CE
interrupt handler still attempts to do it's work.  I haven't sorted
much beyond that as I'm still reading docs but from what I can find,
having the CE and EXT handlers sharing this IRQ and enabling /
disabling it out from under each other seems like it could cause
issues if it happens at the wrong moment.  Looking at the AHB
implementation (where I guess the PCI version was ported from?), it's
handling the IRQ management quite differently, is this a difference in
the hardware or porting?

Thanks!

On Tue, Nov 24, 2020 at 12:38 AM wi nk <wink@technolu.st> wrote:
>
> On Tue, Nov 24, 2020 at 12:30 AM wi nk <wink@technolu.st> wrote:
> >
> > On Mon, Nov 23, 2020 at 4:14 AM wi nk <wink@technolu.st> wrote:
> > >
> > > On Sun, Nov 22, 2020 at 4:07 PM wi nk <wink@technolu.st> wrote:
> > > >
> > > > On Sun, Nov 22, 2020 at 2:15 PM Mitchell Nordine
> > > > <mitchell.nordine@gmail.com> wrote:
> > > > >
> > > > > > Unfortunately there's no solution still for the weird
> > > > > crashes some people are seeing.
> > > > >
> > > > > Can confirm, the spurious system freezing still continues. This time
> > > > > while typing my password into the gdm UI for login.
> > > > >
> > > > > On Sun, Nov 22, 2020 at 12:44 AM Mitchell Nordine
> > > > > <mitchell.nordine@gmail.com> wrote:
> > > > > >
> > > > > > Thanks for the update!
> > > > > >
> > > > > > I no longer notice any errors related to ath11k during boot of NixOS
> > > > > > on my XPS 13 9310 with these patches:
> > > > > >
> > > > > > [mindtree@mindtree:~]$ dmesg | grep -e ath11
> > > > > > [    4.084314] ath11k_pci 0000:56:00.0: WARNING: ath11k PCI support is
> > > > > > experimental!
> > > > > > [    4.084358] ath11k_pci 0000:56:00.0: BAR 0: assigned [mem
> > > > > > 0x8c300000-0x8c3fffff 64bit]
> > > > > > [    4.084377] ath11k_pci 0000:56:00.0: enabling device (0000 -> 0002)
> > > > > > [    4.084442] ath11k_pci 0000:56:00.0: MSI vectors: 1
> > > > > > [    4.320847] ath11k_pci 0000:56:00.0: qmi req mem_seg[0] 0x59c00000 3522560 1
> > > > > > [    4.320849] ath11k_pci 0000:56:00.0: qmi req mem_seg[1] 0x5a200000 884736 4
> > > > > > [    4.330816] ath11k_pci 0000:56:00.0: chip_id 0x0 chip_family 0xb
> > > > > > board_id 0xff soc_id 0xffffffff
> > > > > > [    4.330818] ath11k_pci 0000:56:00.0: fw_version 0x101c06cc
> > > > > > fw_build_timestamp 2020-06-24 19:50 fw_build_id
> > > > > > [    4.521522] ath11k_pci 0000:56:00.0 wlp86s0: renamed from wlan0
> > > > > >
> > > > > > Everything appears to run smoothly for the first 5-10 minutes, then
> > > > > > the firmware appears to crash and the internet drops out:
> > > > > >
> > > > > > [  293.677300] ath11k_pci 0000:56:00.0: firmware crashed:
> > > > > > MHI_CB_SYS_ERROR
> > > > > > [  385.774509] mhi 0000:56:00.0: Device failed to exit MHI Reset state
> > > > > >
> > > > > > I haven't yet been able to identify an action that consistently causes
> > > > > > the crash.
> > > > > >
> > > > > > Following the crash, the gnome shell appears to still believe that the
> > > > > > connection is up, however upon clicking on the wifi in the top-right
> > > > > > drop-down menu and clicking the "Turn Off" option, the shell freezes
> > > > > > for a few seconds and a few more errors show up in dmesg:
> > > > > >
> > > > > > [  634.018718] wlp86s0: deauthenticating from 7a:8a:20:d5:98:d7 by
> > > > > > local choice (Reason: 3=DEAUTH_LEAVING)
> > > > > > [  639.151611] ath11k_pci 0000:56:00.0: failed to flush transmit queue
> > > > > > 0
> > > > > > [  642.159384] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > > > > > [  642.159388] ath11k_pci 0000:56:00.0: failed to send
> > > > > > WMI_PEER_REORDER_QUEUE_SETUP
> > > > > > [  642.159394] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > > > > > [  642.159400] wlp86s0: HW problem - can not stop rx aggregation for
> > > > > > 7a:8a:20:d5:98:d7 tid 0
> > > > > > [  645.168070] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > > > > > [  645.168072] ath11k_pci 0000:56:00.0: failed to send
> > > > > > WMI_PEER_REORDER_QUEUE_SETUP
> > > > > > [  645.168074] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > > > > > [  645.168077] wlp86s0: HW problem - can not stop rx aggregation for
> > > > > > 7a:8a:20:d5:98:d7 tid 1
> > > > > > [  648.174960] ath11k_pci 0000:56:00.0: wmi command 24595 timeout
> > > > > > [  648.174965] ath11k_pci 0000:56:00.0: failed to send
> > > > > > WMI_PEER_REORDER_QUEUE_SETUP
> > > > > > [  648.174971] ath11k_pci 0000:56:00.0: failed to send wmi to delete rx tid -11
> > > > > > [  648.174976] wlp86s0: HW problem - can not stop rx aggregation for
> > > > > > 7a:8a:20:d5:98:d7 tid 6
> > > > > > [  651.183596] ath11k_pci 0000:56:00.0: wmi command 20489 timeout
> > > > > > [  651.183601] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_INSTALL_KEY cmd
> > > > > > [  651.183606] ath11k_pci 0000:56:00.0: ath11k_install_key failed (-11)
> > > > > > [  651.183610] wlp86s0: failed to remove key (0, 7a:8a:20:d5:98:d7)
> > > > > > from hardware (-11)
> > > > > > [  654.190511] ath11k_pci 0000:56:00.0: wmi command 24578 timeout
> > > > > > [  654.190516] ath11k_pci 0000:56:00.0: failed to send WMI_PEER_DELETE cmd
> > > > > > [  654.190523] ath11k_pci 0000:56:00.0: failed to delete peer vdev_id
> > > > > > 0 addr 7a:8a:20:d5:98:d7 ret -11
> > > > > > [  654.190526] ath11k_pci 0000:56:00.0: Failed to delete peer:
> > > > > > 7a:8a:20:d5:98:d7 for VDEV: 0
> > > > > > [  654.190528] ath11k_pci 0000:56:00.0: Found peer entry
> > > > > > 9c:b6:d0:3e:43:4a n vdev 0 after it was supposedly removed
> > > > > > [  654.190574] ------------[ cut here ]------------
> > > > > > [  654.190594] WARNING: CPU: 5 PID: 1208 at
> > > > > > net/mac80211/sta_info.c:1098 __sta_info_destroy_part2+0x11c/0x140
> > > > > > [mac80211]
> > > > > > [  654.190595] Modules linked in: ath9k_htc ath9k_common ath9k_hw ath
> > > > > > fuse ctr ccm michael_mic af_packet cdc_ether usbnet r8152 mii
> > > > > > typec_displayport uvcvideo
> > > > > > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common
> > > > > > videodev mc hid_sensor_als hid_sensor_trigger
> > > > > > industrialio_triggered_buffer kfifo_buf hid_se
> > > > > > nsor_iio_common industrialio hid_sensor_hub intel_ishtp_loader joydev
> > > > > > mousedev intel_ishtp_hid wacom usbhid hid_multitouch hid_generic
> > > > > > qrtr_mhi iTCO_wdt intel_
> > > > > > pmc_bxt 8250_dw watchdog mei_hdcp i2c_designware_platform
> > > > > > i2c_designware_core intel_rapl_msr snd_sof_pci snd_sof_intel_byt
> > > > > > snd_sof_intel_ipc qrtr dell_wmi wmi_
> > > > > > bmof ns snd_sof_intel_hda_common dell_laptop ath11k_pci
> > > > > > snd_soc_hdac_hda dell_smbios snd_sof_xtensa_dsp snd_hda_codec_hdmi mhi
> > > > > > snd_sof_intel_hda dell_wmi_descr
> > > > > > iptor dcdbas snd_sof ath11k x86_pkg_temp_thermal intel_powerclamp
> > > > > > dell_smm_hwmon qmi_helpers snd_hda_ext_core coretemp crc32_pclmul
> > > > > > ghash_clmulni_intel snd_soc
> > > > > > _acpi_intel_match aesni_intel snd_soc_acpi
> > > > > > [  654.190666]  snd_hda_codec_realtek libaes mac80211 crypto_simd
> > > > > > cryptd glue_helper snd_hda_codec_generic ledtrig_audio intel_cstate
> > > > > > snd_soc_core intel_uncore
> > > > > >  snd_compress sha256_ssse3 ac97_bus snd_pcm_dmaengine sha256_generic
> > > > > > input_leds led_class deflate snd_hda_intel intel_spi_pci efi_pstore
> > > > > > snd_intel_dspcfg cfg80
> > > > > > 211 intel_spi serio_raw pstore spi_nor snd_hda_codec mtd nls_iso8859_1
> > > > > > nls_cp437 snd_hda_core vfat i2c_i801 snd_hwdep i2c_smbus rfkill
> > > > > > tpm_crb fat libarc4 sch_
> > > > > > fq_codel intel_ish_ipc mei_me intel_lpss_pci tpm_tis intel_ishtp
> > > > > > intel_lpss tpm_tis_core mei ucsi_acpi idma64 processor_thermal_device
> > > > > > virt_dma tpm typec_ucsi
> > > > > > intel_rapl_common 8250_pci intel_soc_dts_iosf typec snd_pcm_oss
> > > > > > rng_core snd_mixer_oss tiny_power_button snd_pcm wmi battery button
> > > > > > snd_timer snd i2c_hid sound
> > > > > > core hid msr int3403_thermal evdev int340x_thermal_zone mac_hid
> > > > > > int3400_thermal acpi_thermal_rel intel_hid sparse_keymap
> > > > > > pinctrl_tigerlake intel_pmc_core acpi_
> > > > > > tad ac acpi_pad loop cpufreq_powersave tun tap
> > > > > > [  654.190754]  macvlan bridge stp llc kvm_intel kvm irqbypass
> > > > > > efivarfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache
> > > > > > jbd2 xhci_pci xhci_pci_ren
> > > > > > esas rtsx_pci_sdmmc xhci_hcd mmc_core atkbd libps2 usbcore thunderbolt
> > > > > > nvme nvme_core rtsx_pci crc32c_intel t10_pi crc_t10dif
> > > > > > crct10dif_generic crct10dif_pclmu
> > > > > > l usb_common crct10dif_common i8042 rtc_cmos serio dm_mod i915 video
> > > > > > intel_gtt i2c_algo_bit cec drm_kms_helper syscopyarea sysfillrect
> > > > > > sysimgblt fb_sys_fops dr
> > > > > > m i2c_core backlight agpgart
> > > > > > [  654.190811] CPU: 5 PID: 1208 Comm: NetworkManager Tainted: G
> > > > > > W I       5.10.0-rc4 #1-NixOS
> > > > > > [  654.190813] Hardware name: Dell Inc. XPS 13 9310/0F7M4C, BIOS 1.1.1
> > > > > > 10/05/2020
> > > > > > [  654.190825] RIP: 0010:__sta_info_destroy_part2+0x11c/0x140 [mac80211]
> > > > > > [  654.190829] Code: ff 0f 0b 80 bd 14 01 00 00 00 74 82 45 31 c0 b9
> > > > > > 01 00 00 00 48 89 ea 48 89 de 4c 89 e7 e8 ac ad ff ff 85 c0 0f 84 64
> > > > > > ff ff ff <0f> 0b e9 5
> > > > > > d ff ff ff be 03 00 00 00 48 89 ef e8 10 ea ff ff 85 c0
> > > > > > [  654.190831] RSP: 0018:ffffac81c0897b80 EFLAGS: 00010286
> > > > > > [  654.190834] RAX: 00000000fffffff5 RBX: ffff9d54d5800900 RCX:
> > > > > > 0000000000000000
> > > > > > [  654.190836] RDX: ffff9d54c3d0bf00 RSI: 000000000020001a RDI:
> > > > > > ffff9d54d629b5d8
> > > > > > [  654.190837] RBP: ffff9d54c778f000 R08: 0000000000000000 R09:
> > > > > > ffffffffc1245800
> > > > > > [  654.190838] R10: ffff9d54cde07800 R11: 0000000000000001 R12:
> > > > > > ffff9d54d6298800
> > > > > > [  654.190840] R13: ffff9d54d5800900 R14: 0000000000000001 R15:
> > > > > > ffff9d54d6298de0
> > > > > > [  654.190842] FS:  00007f1bf8509040(0000) GS:ffff9d5c2f740000(0000)
> > > > > > knlGS:0000000000000000
> > > > > > [  654.190844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > [  654.190845] CR2: 00007f6d3cb34000 CR3: 0000000118d9a006 CR4:
> > > > > > 0000000000770ee0
> > > > > > [  654.190847] PKRU: 55555554
> > > > > > [  654.190848] Call Trace:
> > > > > > [  654.190866]  __sta_info_flush+0x123/0x180 [mac80211]
> > > > > > [  654.190885]  ieee80211_set_disassoc+0xba/0x5d0 [mac80211]
> > > > > > [  654.190902]  ieee80211_mgd_deauth.cold+0x49/0x1bf [mac80211]
> > > > > > [  654.190923]  cfg80211_mlme_deauth+0xb1/0x1b0 [cfg80211]
> > > > > > [  654.190939]  cfg80211_mlme_down+0x66/0x90 [cfg80211]
> > > > > > [  654.190955]  cfg80211_disconnect+0x128/0x1b0 [cfg80211]
> > > > > > [  654.190967]  cfg80211_leave+0x27/0x40 [cfg80211]
> > > > > > [  654.190977]  cfg80211_netdev_notifier_call+0xec/0x440 [cfg80211]
> > > > > > [  654.190984]  raw_notifier_call_chain+0x44/0x60
> > > > > > [  654.190991]  __dev_close_many+0x5f/0x110
> > > > > > [  654.190995]  dev_close_many+0x81/0x130
> > > > > > [  654.190999]  dev_close.part.0+0x3e/0x70
> > > > > > [  654.191008]  cfg80211_shutdown_all_interfaces+0x71/0xd0 [cfg80211]
> > > > > > [  654.191017]  cfg80211_rfkill_set_block+0x22/0x30 [cfg80211]
> > > > > > [  654.191022]  rfkill_set_block+0x92/0x140 [rfkill]
> > > > > > [  654.191026]  rfkill_fop_write+0x11f/0x1c0 [rfkill]
> > > > > > [  654.191032]  vfs_write+0xc7/0x280
> > > > > > [  654.191035]  ksys_write+0xa7/0xe0
> > > > > > [  654.191041]  do_syscall_64+0x33/0x40
> > > > > > [  654.191045]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > > [  654.191048] RIP: 0033:0x7f1bf93906f7
> > > > > > [  654.191052] Code: 1f 40 00 41 54 49 89 d4 55 48 89 f5 53 89 fb 48
> > > > > > 83 ec 10 e8 fb fc ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00
> > > > > > 00 0f 05 <48> 3d 00 f
> > > > > > 0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 54 fd ff ff 48
> > > > > > [  654.191053] RSP: 002b:00007ffc79f67e10 EFLAGS: 00000293 ORIG_RAX:
> > > > > > 0000000000000001
> > > > > > [  654.191056] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007f1bf93906f7
> > > > > > [  654.191057] RDX: 0000000000000008 RSI: 00007ffc79f67e48 RDI: 000000000000001d
> > > > > > [  654.191059] RBP: 00007ffc79f67e48 R08: 0000000000000000 R09: 0000000000000001
> > > > > > [  654.191060] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
> > > > > > [  654.191061] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000001b10c20
> > > > > > [  654.191075] ---[ end trace 4fd47da3698c4a9f ]---
> > > > > > [  657.198288] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > > > > > [  657.198293] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > > > > > [  657.198299] ath11k_pci 0000:56:00.0: Failed to set CTS prot for VDEV: 0
> > > > > > [  660.205991] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > > > > > [  660.205995] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > > > > > [  660.206000] ath11k_pci 0000:56:00.0: Failed to set erp slot for VDEV: 0
> > > > > > [  663.213835] ath11k_pci 0000:56:00.0: wmi command 20488 timeout
> > > > > > [  663.213840] ath11k_pci 0000:56:00.0: failed to send WMI_VDEV_SET_PARAM_CMDID
> > > > > > [  663.213846] ath11k_pci 0000:56:00.0: Failed to set preamble for VDEV: 0
> > > > > > [  666.221628] ath11k_pci 0000:56:00.0: wmi command 20487 timeout
> > > > > > [  666.221633] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_DOWN cmd
> > > > > > [  666.221639] ath11k_pci 0000:56:00.0: failed to down vdev 0: -11
> > > > > > [  669.229407] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > > > > [  669.229412] ath11k_pci 0000:56:00.0: failed to send
> > > > > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > > > > [  669.229417] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > > > > [  672.237193] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > > > > [  672.237198] ath11k_pci 0000:56:00.0: failed to send
> > > > > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > > > > [  672.237203] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > > > > [  675.244963] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > > > > [  675.244968] ath11k_pci 0000:56:00.0: failed to send
> > > > > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > > > > [  675.244971] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > > > > [  678.252682] ath11k_pci 0000:56:00.0: wmi command 20493 timeout
> > > > > > [  678.252689] ath11k_pci 0000:56:00.0: failed to send
> > > > > > WMI_VDEV_SET_WMM_PARAMS_CMDID
> > > > > > [  678.252695] ath11k_pci 0000:56:00.0: failed to set wmm params: -11
> > > > > > [  681.260582] ath11k_pci 0000:56:00.0: wmi command 20486 timeout
> > > > > > [  681.260587] ath11k_pci 0000:56:00.0: failed to submit WMI_VDEV_STOP
> > > > > > cmd
> > > > > > [  681.260594] ath11k_pci 0000:56:00.0: failed to stop WMI vdev 0: -11
> > > > > > [  681.260596] ath11k_pci 0000:56:00.0: failed to stop vdev 0: -11
> > > > > > [  686.764099] ath11k_pci 0000:56:00.0: failed to flush transmit queue
> > > > > > 0
> > > > > > [  689.771891] ath11k_pci 0000:56:00.0: wmi command 20482 timeout
> > > > > > [  689.771897] ath11k_pci 0000:56:00.0: failed to submit
> > > > > > WMI_VDEV_DELETE_CMDID
> > > > > > [  689.771904] ath11k_pci 0000:56:00.0: failed to delete WMI vdev 0:
> > > > > > -11
> > > > > > [  719.529733] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
> > > > > > [  719.529740] ath11k_pci 0000:56:00.0: failed to send
> > > > > > WMI_PDEV_SET_PARAM cmd
> > > > > > [  719.529748] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
> > > > > > [  722.793499] ath11k_pci 0000:56:00.0: wmi command 16387 timeout
> > > > > > [  722.793517] ath11k_pci 0000:56:00.0: failed to send
> > > > > > WMI_PDEV_SET_PARAM cmd
> > > > > > [  722.793524] ath11k_pci 0000:56:00.0: failed to enable PMF QOS: (-11
> > > > > >
> > > > > > Apologies for the long output, hopefully something here is useful.
> > > > > >
> > > > > > I haven't had my whole system freeze yet like I did prior to these
> > > > > > patches, however I've only been running these patches for a few hours
> > > > > > so far, currently on my third boot.
> > > > > >
> > > > > > You can find the nix configuration I'm working on for the xps 9310
> > > > > > that includes the new patches here:
> > > > > >
> > > > > > https://github.com/NixOS/nixos-hardware/pull/207
> > > > > >
> > > > > > On Thu, Nov 19, 2020 at 8:52 PM Kalle Valo <kvalo@codeaurora.org> wrote:
> > > > > > >
> > > > > > > Kalle Valo <kvalo@codeaurora.org> writes:
> > > > > > >
> > > > > > > > (Bcc: people reporting qca6390 problems)
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I collected all important QCA6390 fixes to ath11k-qca6390 branch so that
> > > > > > > > there's a good baseline for all testing:
> > > > > > > >
> > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=ath11k-qca6390-bringup
> > > > > > > >
> > > > > > > > At the moment it's based on v5.10-rc4 and I will try to update it to a
> > > > > > > > recent -rc release every few weeks or so. Everytime I update the branch
> > > > > > > > I create a new tag and the latest tag is now:
> > > > > > > >
> > > > > > > > ath11k-qca6390-bringup-202011191920
> > > > > > > >
> > > > > > > > In this tag there's now a brand new implementation for suspend, which
> > > > > > > > relies that the platform provides power to QCA6390 during suspend. Not
> > > > > > > > all platforms do, but most of them should do that. ath11k also prints a
> > > > > > > > warning whenever it notices that the firmware has crashed, but I'm not
> > > > > > > > sure yet if it (the MHI subsystem to be exact) can detect every case.
> > > > > > > >
> > > > > > > > The MSI patch is mostly the same, it had just some refactoring since the
> > > > > > > > last version. Unfortunately there's no solution still for the weird
> > > > > > > > crashes some people are seeing.
> > > > > > >
> > > > > > > Forgot to mention when debugging ath11k PCI issues it's a good idea to
> > > > > > > enable MHI debug messages. To do that enable CONFIG_MHI_BUS_DEBUG and
> > > > > > > CONFIG_DYNAMIC_DEBUG and run:
> > > > > > >
> > > > > > > sudo sh -c "echo -n 'module mhi +p' > /sys/kernel/debug/dynamic_debug/control"
> > > > > > >
> > > > > > > --
> > > > > > > https://patchwork.kernel.org/project/linux-wireless/list/
> > > > > > >
> > > > > > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > > > >
> > > > > --
> > > > > ath11k mailing list
> > > > > ath11k@lists.infradead.org
> > > > > http://lists.infradead.org/mailman/listinfo/ath11k
> > > >
> > > > So after your message I was wracking my brain to sort out any
> > > > differences between our configs.  I did disable vt / vt-d and that
> > > > seems to have increased the stability of things some, but I still see
> > > > occasional hangs on initialization / association.
> > >
> > > Good morning,
> > >
> > >   As I've been bouncing around reading up on the current kernel
> > > internals + the single MSI patch trying to get to a point where I can
> > > dive into this deeply, I think I may have found part of the racing.
> > > When I check /proc/interrupts to find the driver I see:
> > >
> > > 194:          0          0          0          0          0          0
> > >       7111          0   PCI-MSI 44564480-edge      ce0, ce1, ce2, ce3,
> > > ce5, ce7, ce8, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
> > > DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
> > > DP_EXT_IRQ, bhi, mhi, mhi
> > >
> > > Looking at the patch, there are 4 places where IRQs are being
> > > requested, 2 in the MHI code (bhi/mhi), and 2 in the ath11k PCI code
> > > (ce* and DP_EXT_IRQ).  The patch changes the calls to
> > > request_(threaded)_irq and modifies the flags to add IRQ_SHARED which
> > > is allowing these irq handlers to mount this single available IRQ.
> > > Each handler is accepting the dev_id parameter as a void * which then
> > > gets cast into a relevant data structure for the handler and used /
> > > accessed.  My understanding from the reading I did is that since the
> > > IRQ is now shared, each of these handlers needs to ensure/detect that
> > > the dev_id is actually relevant for it to handle it, and if not return
> > > IRQ_NONE.  Is it possible the wrong handlers are being
> > > invoked/executing occasionally and casting/accessing things
> > > incorrectly, or did I misunderstand how the IRQ sharing works?  If I
> > > am reading that correctly, does that also have implications for the
> > > disabling/enabling of the IRQ everywhere?
> >
> > I went ahead and hacked a quick patch together that implements the
> > dev_id checking per interrupt handler and that seems to have fixed the
> > freezes without any indication.  Now reliably if things are going to
> > crash, I'll receive the RT throttling message from the scheduler and
> > then things will completely hang about a number of seconds later.  I
> > added the instrumentation to enable the verbose MHI printing, and it
> > seems the mhi_intvec_threaded_handler is printing some additional
> > information.  First, if things are behaving nominally, I see the state
> > transitions from m0 -> m1 -> m2 and then things stay mostly in m2 (I
> > can't say for 100%, it's quite fast).  However when things are
> > crashing, this printing is showing it.
> >
> > [  312.xxx] mhi 0000:55:00.0: local ee:AMSS device ee:AMSS dev_state:M2
> > [  313.024033] mhi 0000:55:00.0: local ee:INVALID_EE device
> > ee:INVALID_EE dev_state:SYS_ERR
> > [  313.024033] mhi 0000:55:00.0: System error detected
> >
> > I'll see the last 2 prints repeat a 5-6 times, then comes the throttling:
> >
> > [  313.124033] sched: RT throttling activated
> >
> > then a couple more attempts to reset the state of things, then the
> > machine will hang with the fans spinning fully.
>
> Sorry I found one more thing in my notes I wanted to mention.  In
> drivers/bus/mhi/core/main.c , mhi_process_ctrl_ev_ring , there is a
> switch handling different event types, one of those being
> MHI_PKT_TYPE_STATE_CHANGE_EVENT.  When that occurs this prints:
>
> dev_dbg(dev, "State change event to state: %s\n",
>     TO_MHI_STATE_STR(new_state));
>
> I never see a printed transition here from M1 -> M2 , the ee just
> updates from under the mhi_intvec_threaded_handler printing.  I'm not
> sure if that means somehow this event is being missed or it doesn't
> fire for this transition?

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: ath11k-qca6390-bringup-202011191920: new suspend implementation
  2020-11-19 19:48 ath11k-qca6390-bringup-202011191920: new suspend implementation Kalle Valo
  2020-11-19 19:52 ` Kalle Valo
@ 2020-12-02 18:51 ` Pavel Procopiuc
  1 sibling, 0 replies; 19+ messages in thread
From: Pavel Procopiuc @ 2020-12-02 18:51 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath11k

> All feedback very welcome, both positive and negative. In the reports
> try to include at least:
> 
> * name of the tag

ath11k-qca6390-bringup-202011301608 cleanly rebased on top of the vanilla 5.10.0-rc6

> * if there are other changes in the kernel, reverts etc

Only kernel argument memmap=20M$12M is used.

> * make and model of the laptop/computer (eg. Dell XPS 9310)

Dell XPS 17 (9700) with i9 CPU (only those come with QCA9390 wifi module), BIOS version: 1.5.0

Everything works perfectly, suspend/resume works too.

Without the aforementioned memmap kernel argument the wireless card firmware doesn't load:

Dec 02 19:37:35 razor kernel: Linux version 5.10.0-rc6 (root@razor) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34 
p6) 2.34.0) #6 SMP Wed Dec 2 19:33:41 CET 2020
Dec 02 19:37:35 razor kernel:   DMA zone: 64 pages used for memmap
Dec 02 19:37:35 razor kernel:   DMA32 zone: 5213 pages used for memmap
Dec 02 19:37:35 razor kernel:   Normal zone: 255840 pages used for memmap
Dec 02 19:37:35 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
Dec 02 19:37:35 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
Dec 02 19:37:35 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
Dec 02 19:37:35 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 
0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
Dec 02 19:37:35 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
Dec 02 19:37:36 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
Dec 02 19:37:36 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
Dec 02 19:37:36 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
Dec 02 19:37:36 razor kernel: ath11k_pci 0000:05:00.0: MSI vectors: 32
Dec 02 19:37:36 razor kernel: mhi 0000:05:00.0: Requested to power ON
Dec 02 19:37:36 razor kernel: mhi 0000:05:00.0: Power on setup success
Dec 02 19:37:36 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[0] 0x1800000 3522560 1
Dec 02 19:37:36 razor kernel: ath11k_pci 0000:05:00.0: qmi req mem_seg[1] 0x1500000 884736 4
Dec 02 19:37:36 razor kernel: ath11k_pci 0000:05:00.0: firmware crashed: MHI_CB_SYS_ERROR
Dec 02 19:37:42 razor kernel: ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
Dec 02 19:37:42 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
Dec 02 19:39:07 razor kernel: mhi 0000:05:00.0: Device failed to exit MHI Reset state

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-12-02 18:51 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-19 19:48 ath11k-qca6390-bringup-202011191920: new suspend implementation Kalle Valo
2020-11-19 19:52 ` Kalle Valo
2020-11-19 22:00   ` Pavel Procopiuc
2020-11-19 22:11     ` wi nk
2020-11-19 23:08       ` wi nk
2020-11-20 16:01         ` Kalle Valo
2020-11-20 16:59           ` wi nk
2020-11-20  8:16       ` Pavel Procopiuc
2020-11-20  9:40     ` Kalle Valo
2020-11-20 10:24       ` Pavel Procopiuc
2020-11-20 15:56         ` Kalle Valo
2020-11-21 23:44   ` Mitchell Nordine
2020-11-22 13:15     ` Mitchell Nordine
2020-11-22 15:07       ` wi nk
2020-11-23  3:14         ` wi nk
2020-11-23 23:30           ` wi nk
2020-11-23 23:38             ` wi nk
2020-11-26 22:45               ` wi nk
2020-12-02 18:51 ` Pavel Procopiuc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).