regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [regression] mhi: ath11k resume fails on some devices
@ 2021-09-16  8:00 Kalle Valo
  2021-09-16 10:18 ` Loic Poulain
  2021-09-25 17:40 ` Thorsten Leemhuis
  0 siblings, 2 replies; 24+ messages in thread
From: Kalle Valo @ 2021-09-16  8:00 UTC (permalink / raw)
  To: Loic Poulain, Manivannan Sadhasivam
  Cc: ath11k, linux-wireless, linux-arm-msm, regressions

Hi Loic and Mani,

I hate to be the bearer of bad news again :)

I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
problem, I only see the problem on the NUC. I do not know what's causing
this difference.

At the moment I'm running my tests with commit 020d3b26c07a reverted and
everything works without problems. Is there a simple way to fix this? Or
maybe we should just revert the commit? Commit log and kernel logs from
a failing case below.

Kalle

commit 020d3b26c07abe274ac17f64999bbd3bf3342195
Author:     Loic Poulain <loic.poulain@linaro.org>
AuthorDate: Fri Mar 5 17:14:01 2021 +0100
Commit:     Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
CommitDate: Wed Mar 10 20:11:22 2021 +0530

    bus: mhi: Early MHI resume failure in non M3 state
    
    MHI suspend/resume are symmetric and balanced procedures. If device is
    not in M3 state on a resume, that means something happened behind our
    back. In this case resume is aborted and error reported, to let the
    controller handle the situation.
    
    This is mainly requested for system wide suspend-resume operation in
    PCI context which may lead to power-down/reset of the controller which
    will then lose its MHI context. In such cases, PCI driver is supposed
    to recover and reinitialize the device.
    
    Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
    Reviewed-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
    Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
    Link: https://lore.kernel.org/r/1614960841-20233-1-git-send-email-loic.poulain@linaro.org
    Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

[  267.182376] ACPI: PM: Waking up from system sleep state S3
[  268.192783] ACPI: EC: interrupt unblocked
[  268.193023] pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
[  268.204389] pcieport 0000:00:1d.0: Intel SPT PCH root port ACS workaround enabled
[  268.204391] pcieport 0000:00:1c.1: Intel SPT PCH root port ACS workaround enabled
[  268.205227] pcieport 0000:00:1c.2: Intel SPT PCH root port ACS workaround enabled
[  269.360336] ACPI: EC: event unblocked
[  269.367187] usb usb3: root hub lost power or was reset
[  269.367215] usb usb4: root hub lost power or was reset
[  269.368584] ath11k_pci 0000:06:00.0: failed to set mhi state: RESUME(6)
[  269.455966] nvme nvme0: 8/0/0 default/read/poll queues
[  272.289737] igb 0000:05:00.0 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[  272.424084] ath11k_pci 0000:06:00.0: timed out while waiting for wow wakeup completion
[  272.424091] ath11k_pci 0000:06:00.0: failed to wakeup wow during resume: -110
[  272.424096] ath11k_pci 0000:06:00.0: failed to resume core: -110
[  272.424101] PM: dpm_run_callback(): pci_pm_resume+0x0/0x2d0 returns -110
[  272.424119] ath11k_pci 0000:06:00.0: PM: failed to resume async: error -110
[  275.432003] ath11k_pci 0000:06:00.0: wmi command 16387 timeout
[  275.432034] ath11k_pci 0000:06:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[  275.432088] ath11k_pci 0000:06:00.0: failed to enable PMF QOS: (-11
[  275.432094] ------------[ cut here ]------------
[  275.432114] Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
[  275.432144] WARNING: CPU: 3 PID: 3164 at net/mac80211/util.c:2361 ieee80211_reconfig+0x216/0x22a0 [mac80211]
[  275.432225] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
[  275.432287] CPU: 3 PID: 3164 Comm: kworker/u16:20 Not tainted 5.15.0-rc1 #483
[  275.432293] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
[  275.432298] Workqueue: events_unbound async_run_entry_fn
[  275.432307] RIP: 0010:ieee80211_reconfig+0x216/0x22a0 [mac80211]
[  275.432381] Code: c0 0f 85 4b 1f 00 00 41 c6 87 7c 08 00 00 00 4c 89 ff e8 ed 41 f1 ff 41 89 c5 85 c0 74 13 48 c7 c7 40 bc 7e c0 e8 ef 63 07 e4 <0f> 0b e9 12 ff ff ff 88 5c 24 37 49 8d 47 40 48 89 c2 48 89 44 24
[  275.432386] RSP: 0000:ffffc90002bc7ab0 EFLAGS: 00010286
[  275.432394] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  275.432399] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000578f48
[  275.432403] RBP: ffff88810890169a R08: 0000000000000001 R09: ffff888234fe581b
[  275.432408] R10: ffffed10469fcb03 R11: 0000000000000001 R12: ffff88810890169e
[  275.432412] R13: 00000000fffffff5 R14: 0000000000000000 R15: ffff888108900e20
[  275.432417] FS:  0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
[  275.432421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  275.432426] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
[  275.432430] Call Trace:
[  275.432443]  ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
[  275.432515]  wiphy_resume+0x190/0x370 [cfg80211]
[  275.432574]  ? trace_device_pm_callback_start+0x123/0x1b0
[  275.432584]  ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
[  275.432642]  dpm_run_callback+0xf4/0x1b0
[  275.432650]  ? trace_device_pm_callback_end+0x1a0/0x1a0
[  275.432658]  ? device_links_read_unlock+0x1b/0x30
[  275.432665]  ? dpm_wait_for_superior+0x256/0x430
[  275.432679]  device_resume+0x3d5/0x980
[  275.432688]  ? dpm_run_callback+0x1b0/0x1b0
[  275.432693]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
[  275.432701]  ? ktime_get+0x214/0x2f0
[  275.432707]  ? trace_hardirqs_on+0x1c/0x120
[  275.432715]  ? recalibrate_cpu_khz+0x10/0x10
[  275.432726]  ? device_resume+0x980/0x980
[  275.432732]  async_resume+0x14/0x30
[  275.432738]  async_run_entry_fn+0x90/0x4f0
[  275.432750]  process_one_work+0x866/0x1460
[  275.432768]  ? pwq_dec_nr_in_flight+0x230/0x230
[  275.432787]  ? worker_thread+0x152/0x1010
[  275.432798]  worker_thread+0x596/0x1010
[  275.432818]  ? process_one_work+0x1460/0x1460
[  275.432828]  kthread+0x322/0x3e0
[  275.432833]  ? _raw_spin_unlock_irq+0x1f/0x30
[  275.432838]  ? set_kthread_struct+0x100/0x100
[  275.432848]  ret_from_fork+0x22/0x30
[  275.432872] irq event stamp: 977
[  275.432876] hardirqs last  enabled at (985): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
[  275.432882] hardirqs last disabled at (992): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
[  275.432888] softirqs last  enabled at (402): [<ffffffffc095f878>] ath11k_htc_send+0x668/0xc10 [ath11k]
[  275.432914] softirqs last disabled at (400): [<ffffffffc095f797>] ath11k_htc_send+0x587/0xc10 [ath11k]
[  275.432937] ---[ end trace 88fd8120acef327c ]---
[  275.433884] ------------[ cut here ]------------
[  275.433888] wlan0: Failed check-sdata-in-driver check, flags: 0x4
[  275.433917] WARNING: CPU: 3 PID: 3164 at net/mac80211/driver-ops.c:97 drv_remove_interface+0x2cb/0x330 [mac80211]
[  275.434008] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
[  275.434068] CPU: 3 PID: 3164 Comm: kworker/u16:20 Tainted: G        W         5.15.0-rc1 #483
[  275.434074] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
[  275.434079] Workqueue: events_unbound async_run_entry_fn
[  275.434087] RIP: 0010:drv_remove_interface+0x2cb/0x330 [mac80211]
[  275.434154] Code: c1 e9 03 80 3c 01 00 75 72 48 8b 83 88 06 00 00 48 8d b3 a8 06 00 00 48 c7 c7 60 2a 7e c0 48 85 c0 48 0f 45 f0 e8 8a 12 16 e4 <0f> 0b eb 90 e8 6c a8 23 e2 e9 e9 fd ff ff e8 62 a8 23 e2 e9 06 fe
[  275.434159] RSP: 0000:ffffc90002bc7788 EFLAGS: 00010282
[  275.434167] RAX: 0000000000000000 RBX: ffff8881735a0c40 RCX: 0000000000000000
[  275.434171] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000578ee3
[  275.434176] RBP: ffff888108900e20 R08: 0000000000000001 R09: ffff888234fe581b
[  275.434180] R10: ffffed10469fcb03 R11: 0000000000000001 R12: dffffc0000000000
[  275.434184] R13: ffff8881735a12d8 R14: ffff888108901568 R15: 000000000000000f
[  275.434189] FS:  0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
[  275.434194] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  275.434198] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
[  275.434203] Call Trace:
[  275.434212]  ieee80211_do_stop+0xe27/0x1a20 [mac80211]
[  275.434291]  ? mutex_lock_io_nested+0x1490/0x1490
[  275.434303]  ? ieee80211_del_virtual_monitor+0x1a0/0x1a0 [mac80211]
[  275.434370]  ? mark_held_locks+0xa5/0xe0
[  275.434382]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
[  275.434389]  ? __local_bh_enable_ip+0x9d/0xf0
[  275.434394]  ? trace_hardirqs_on+0x1c/0x120
[  275.434410]  ieee80211_stop+0xb2/0x230 [mac80211]
[  275.434484]  __dev_close_many+0x191/0x2a0
[  275.434491]  ? netif_tx_stop_all_queues+0xf0/0xf0
[  275.434496]  ? find_held_lock+0x33/0x110
[  275.434507]  ? __lock_release+0x494/0xa40
[  275.434518]  dev_close_many+0x1c5/0x540
[  275.434527]  ? wait_for_completion_io+0x280/0x280
[  275.434535]  ? dev_get_by_napi_id+0x110/0x110
[  275.434544]  ? wiphy_resume+0x1a5/0x370 [cfg80211]
[  275.434610]  dev_close+0x132/0x1d0
[  275.434617]  ? dev_xdp_attach.constprop.0+0x750/0x750
[  275.434633]  cfg80211_shutdown_all_interfaces+0x71/0x180 [cfg80211]
[  275.434697]  wiphy_resume+0x1b2/0x370 [cfg80211]
[  275.434755]  ? trace_device_pm_callback_start+0x123/0x1b0
[  275.434765]  ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
[  275.434822]  dpm_run_callback+0xf4/0x1b0
[  275.434830]  ? trace_device_pm_callback_end+0x1a0/0x1a0
[  275.434839]  ? device_links_read_unlock+0x1b/0x30
[  275.434845]  ? dpm_wait_for_superior+0x256/0x430
[  275.434859]  device_resume+0x3d5/0x980
[  275.434868]  ? dpm_run_callback+0x1b0/0x1b0
[  275.434873]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
[  275.434880]  ? ktime_get+0x214/0x2f0
[  275.434886]  ? trace_hardirqs_on+0x1c/0x120
[  275.434893]  ? recalibrate_cpu_khz+0x10/0x10
[  275.434904]  ? device_resume+0x980/0x980
[  275.434910]  async_resume+0x14/0x30
[  275.434916]  async_run_entry_fn+0x90/0x4f0
[  275.434928]  process_one_work+0x866/0x1460
[  275.434946]  ? pwq_dec_nr_in_flight+0x230/0x230
[  275.434965]  ? worker_thread+0x152/0x1010
[  275.434992]  worker_thread+0x596/0x1010
[  275.435013]  ? process_one_work+0x1460/0x1460
[  275.435022]  kthread+0x322/0x3e0
[  275.435027]  ? _raw_spin_unlock_irq+0x1f/0x30
[  275.435032]  ? set_kthread_struct+0x100/0x100
[  275.435042]  ret_from_fork+0x22/0x30
[  275.435065] irq event stamp: 1923
[  275.435069] hardirqs last  enabled at (1931): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
[  275.435076] hardirqs last disabled at (1938): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
[  275.435082] softirqs last  enabled at (1290): [<ffffffffa4c0050a>] __do_softirq+0x50a/0x7d5
[  275.435087] softirqs last disabled at (1283): [<ffffffffa2336935>] __irq_exit_rcu+0xe5/0x120
[  275.435093] ---[ end trace 88fd8120acef327d ]---
[  275.435126] ------------[ cut here ]------------
[  275.435130] WARNING: CPU: 3 PID: 3164 at net/mac80211/driver-ops.c:36 drv_stop+0x290/0x310 [mac80211]
[  275.435197] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
[  275.435256] CPU: 3 PID: 3164 Comm: kworker/u16:20 Tainted: G        W         5.15.0-rc1 #483
[  275.435261] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
[  275.435265] Workqueue: events_unbound async_run_entry_fn
[  275.435274] RIP: 0010:drv_stop+0x290/0x310 [mac80211]
[  275.435339] Code: 80 3d 5f f1 29 00 00 75 e2 48 c7 c2 c0 29 7e c0 be 34 01 00 00 48 c7 c7 20 2a 7e c0 c6 05 43 f1 29 00 01 e8 af 64 16 e4 eb c1 <0f> 0b 5b 5d 41 5c 41 5d c3 0f 0b e9 d3 fd ff ff 48 89 ef e8 18 b2
[  275.435344] RSP: 0000:ffffc90002bc7790 EFLAGS: 00010246
[  275.435352] RAX: 0000000000000000 RBX: ffff888108900e20 RCX: 0000000000000001
[  275.435356] RDX: 0000000000000004 RSI: ffffffffa5a021a0 RDI: ffff888145778920
[  275.435360] RBP: ffff88810890169c R08: 0000000000000001 R09: ffffc90002bc757f
[  275.435365] R10: ffffc90002bc77a8 R11: 0000000000000001 R12: dffffc0000000000
[  275.435369] R13: ffff888108900e20 R14: ffff888108901568 R15: 000000000000000f
[  275.435373] FS:  0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
[  275.435378] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  275.435382] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
[  275.435387] Call Trace:
[  275.435394]  ieee80211_do_stop+0x11dd/0x1a20 [mac80211]
[  275.435472]  ? mutex_lock_io_nested+0x1490/0x1490
[  275.435484]  ? ieee80211_del_virtual_monitor+0x1a0/0x1a0 [mac80211]
[  275.435551]  ? mark_held_locks+0xa5/0xe0
[  275.435562]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
[  275.435569]  ? __local_bh_enable_ip+0x9d/0xf0
[  275.435574]  ? trace_hardirqs_on+0x1c/0x120
[  275.435590]  ieee80211_stop+0xb2/0x230 [mac80211]
[  275.435663]  __dev_close_many+0x191/0x2a0
[  275.435670]  ? netif_tx_stop_all_queues+0xf0/0xf0
[  275.435675]  ? find_held_lock+0x33/0x110
[  275.435686]  ? __lock_release+0x494/0xa40
[  275.435697]  dev_close_many+0x1c5/0x540
[  275.435706]  ? wait_for_completion_io+0x280/0x280
[  275.435713]  ? dev_get_by_napi_id+0x110/0x110
[  275.435723]  ? wiphy_resume+0x1a5/0x370 [cfg80211]
[  275.435790]  dev_close+0x132/0x1d0
[  275.435797]  ? dev_xdp_attach.constprop.0+0x750/0x750
[  275.435813]  cfg80211_shutdown_all_interfaces+0x71/0x180 [cfg80211]
[  275.435876]  wiphy_resume+0x1b2/0x370 [cfg80211]
[  275.435935]  ? trace_device_pm_callback_start+0x123/0x1b0
[  275.435944]  ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
[  275.436018]  dpm_run_callback+0xf4/0x1b0
[  275.436026]  ? trace_device_pm_callback_end+0x1a0/0x1a0
[  275.436035]  ? device_links_read_unlock+0x1b/0x30
[  275.436041]  ? dpm_wait_for_superior+0x256/0x430
[  275.436055]  device_resume+0x3d5/0x980
[  275.436064]  ? dpm_run_callback+0x1b0/0x1b0
[  275.436069]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
[  275.436076]  ? ktime_get+0x214/0x2f0
[  275.436082]  ? trace_hardirqs_on+0x1c/0x120
[  275.436089]  ? recalibrate_cpu_khz+0x10/0x10
[  275.436100]  ? device_resume+0x980/0x980
[  275.436106]  async_resume+0x14/0x30
[  275.436112]  async_run_entry_fn+0x90/0x4f0
[  275.436124]  process_one_work+0x866/0x1460
[  275.436142]  ? pwq_dec_nr_in_flight+0x230/0x230
[  275.436161]  ? worker_thread+0x152/0x1010
[  275.436172]  worker_thread+0x596/0x1010
[  275.436191]  ? process_one_work+0x1460/0x1460
[  275.436201]  kthread+0x322/0x3e0
[  275.436206]  ? _raw_spin_unlock_irq+0x1f/0x30
[  275.436211]  ? set_kthread_struct+0x100/0x100
[  275.436221]  ret_from_fork+0x22/0x30
[  275.436244] irq event stamp: 2619
[  275.436248] hardirqs last  enabled at (2627): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
[  275.436254] hardirqs last disabled at (2634): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
[  275.436260] softirqs last  enabled at (1290): [<ffffffffa4c0050a>] __do_softirq+0x50a/0x7d5
[  275.436266] softirqs last disabled at (1283): [<ffffffffa2336935>] __irq_exit_rcu+0xe5/0x120
[  275.436271] ---[ end trace 88fd8120acef327e ]---
[  275.438124] PM: dpm_run_callback(): wiphy_resume+0x0/0x370 [cfg80211] returns -11
[  275.438194] ieee80211 phy0: PM: failed to resume async: error -11

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-16  8:00 [regression] mhi: ath11k resume fails on some devices Kalle Valo
@ 2021-09-16 10:18 ` Loic Poulain
  2021-09-16 11:12   ` Manivannan Sadhasivam
  2021-09-24  8:36   ` Kalle Valo
  2021-09-25 17:40 ` Thorsten Leemhuis
  1 sibling, 2 replies; 24+ messages in thread
From: Loic Poulain @ 2021-09-16 10:18 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Manivannan Sadhasivam, ath11k, linux-wireless, linux-arm-msm,
	regressions

Hi Kalle,

On Thu, 16 Sept 2021 at 10:00, Kalle Valo <kvalo@codeaurora.org> wrote:
>
> Hi Loic and Mani,
>
> I hate to be the bearer of bad news again :)
>
> I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
> MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
> ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
> Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
> problem, I only see the problem on the NUC. I do not know what's causing
> this difference.

I suppose the NUC is current PCI-Express power during suspend while
the laptop maintains PCIe/M2 power.

>
> At the moment I'm running my tests with commit 020d3b26c07a reverted and
> everything works without problems. Is there a simple way to fix this? Or
> maybe we should just revert the commit? Commit log and kernel logs from
> a failing case below.

Do you have log of success case?

To me, the device loses power, that is why MHI resuming is failing.
Normally the device should be properly recovered/reinitialized. Before
that patch the power loss was simply not detected (or handled at
higher stack level).

Regards,
Loic


>
> Kalle
>
> commit 020d3b26c07abe274ac17f64999bbd3bf3342195
> Author:     Loic Poulain <loic.poulain@linaro.org>
> AuthorDate: Fri Mar 5 17:14:01 2021 +0100
> Commit:     Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> CommitDate: Wed Mar 10 20:11:22 2021 +0530
>
>     bus: mhi: Early MHI resume failure in non M3 state
>
>     MHI suspend/resume are symmetric and balanced procedures. If device is
>     not in M3 state on a resume, that means something happened behind our
>     back. In this case resume is aborted and error reported, to let the
>     controller handle the situation.
>
>     This is mainly requested for system wide suspend-resume operation in
>     PCI context which may lead to power-down/reset of the controller which
>     will then lose its MHI context. In such cases, PCI driver is supposed
>     to recover and reinitialize the device.
>
>     Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
>     Reviewed-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
>     Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
>     Link: https://lore.kernel.org/r/1614960841-20233-1-git-send-email-loic.poulain@linaro.org
>     Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
>
> [  267.182376] ACPI: PM: Waking up from system sleep state S3
> [  268.192783] ACPI: EC: interrupt unblocked
> [  268.193023] pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
> [  268.204389] pcieport 0000:00:1d.0: Intel SPT PCH root port ACS workaround enabled
> [  268.204391] pcieport 0000:00:1c.1: Intel SPT PCH root port ACS workaround enabled
> [  268.205227] pcieport 0000:00:1c.2: Intel SPT PCH root port ACS workaround enabled
> [  269.360336] ACPI: EC: event unblocked
> [  269.367187] usb usb3: root hub lost power or was reset
> [  269.367215] usb usb4: root hub lost power or was reset
> [  269.368584] ath11k_pci 0000:06:00.0: failed to set mhi state: RESUME(6)
> [  269.455966] nvme nvme0: 8/0/0 default/read/poll queues
> [  272.289737] igb 0000:05:00.0 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> [  272.424084] ath11k_pci 0000:06:00.0: timed out while waiting for wow wakeup completion
> [  272.424091] ath11k_pci 0000:06:00.0: failed to wakeup wow during resume: -110
> [  272.424096] ath11k_pci 0000:06:00.0: failed to resume core: -110
> [  272.424101] PM: dpm_run_callback(): pci_pm_resume+0x0/0x2d0 returns -110
> [  272.424119] ath11k_pci 0000:06:00.0: PM: failed to resume async: error -110
> [  275.432003] ath11k_pci 0000:06:00.0: wmi command 16387 timeout
> [  275.432034] ath11k_pci 0000:06:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> [  275.432088] ath11k_pci 0000:06:00.0: failed to enable PMF QOS: (-11
> [  275.432094] ------------[ cut here ]------------
> [  275.432114] Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> [  275.432144] WARNING: CPU: 3 PID: 3164 at net/mac80211/util.c:2361 ieee80211_reconfig+0x216/0x22a0 [mac80211]
> [  275.432225] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> [  275.432287] CPU: 3 PID: 3164 Comm: kworker/u16:20 Not tainted 5.15.0-rc1 #483
> [  275.432293] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> [  275.432298] Workqueue: events_unbound async_run_entry_fn
> [  275.432307] RIP: 0010:ieee80211_reconfig+0x216/0x22a0 [mac80211]
> [  275.432381] Code: c0 0f 85 4b 1f 00 00 41 c6 87 7c 08 00 00 00 4c 89 ff e8 ed 41 f1 ff 41 89 c5 85 c0 74 13 48 c7 c7 40 bc 7e c0 e8 ef 63 07 e4 <0f> 0b e9 12 ff ff ff 88 5c 24 37 49 8d 47 40 48 89 c2 48 89 44 24
> [  275.432386] RSP: 0000:ffffc90002bc7ab0 EFLAGS: 00010286
> [  275.432394] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [  275.432399] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000578f48
> [  275.432403] RBP: ffff88810890169a R08: 0000000000000001 R09: ffff888234fe581b
> [  275.432408] R10: ffffed10469fcb03 R11: 0000000000000001 R12: ffff88810890169e
> [  275.432412] R13: 00000000fffffff5 R14: 0000000000000000 R15: ffff888108900e20
> [  275.432417] FS:  0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
> [  275.432421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  275.432426] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
> [  275.432430] Call Trace:
> [  275.432443]  ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> [  275.432515]  wiphy_resume+0x190/0x370 [cfg80211]
> [  275.432574]  ? trace_device_pm_callback_start+0x123/0x1b0
> [  275.432584]  ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> [  275.432642]  dpm_run_callback+0xf4/0x1b0
> [  275.432650]  ? trace_device_pm_callback_end+0x1a0/0x1a0
> [  275.432658]  ? device_links_read_unlock+0x1b/0x30
> [  275.432665]  ? dpm_wait_for_superior+0x256/0x430
> [  275.432679]  device_resume+0x3d5/0x980
> [  275.432688]  ? dpm_run_callback+0x1b0/0x1b0
> [  275.432693]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> [  275.432701]  ? ktime_get+0x214/0x2f0
> [  275.432707]  ? trace_hardirqs_on+0x1c/0x120
> [  275.432715]  ? recalibrate_cpu_khz+0x10/0x10
> [  275.432726]  ? device_resume+0x980/0x980
> [  275.432732]  async_resume+0x14/0x30
> [  275.432738]  async_run_entry_fn+0x90/0x4f0
> [  275.432750]  process_one_work+0x866/0x1460
> [  275.432768]  ? pwq_dec_nr_in_flight+0x230/0x230
> [  275.432787]  ? worker_thread+0x152/0x1010
> [  275.432798]  worker_thread+0x596/0x1010
> [  275.432818]  ? process_one_work+0x1460/0x1460
> [  275.432828]  kthread+0x322/0x3e0
> [  275.432833]  ? _raw_spin_unlock_irq+0x1f/0x30
> [  275.432838]  ? set_kthread_struct+0x100/0x100
> [  275.432848]  ret_from_fork+0x22/0x30
> [  275.432872] irq event stamp: 977
> [  275.432876] hardirqs last  enabled at (985): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
> [  275.432882] hardirqs last disabled at (992): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
> [  275.432888] softirqs last  enabled at (402): [<ffffffffc095f878>] ath11k_htc_send+0x668/0xc10 [ath11k]
> [  275.432914] softirqs last disabled at (400): [<ffffffffc095f797>] ath11k_htc_send+0x587/0xc10 [ath11k]
> [  275.432937] ---[ end trace 88fd8120acef327c ]---
> [  275.433884] ------------[ cut here ]------------
> [  275.433888] wlan0: Failed check-sdata-in-driver check, flags: 0x4
> [  275.433917] WARNING: CPU: 3 PID: 3164 at net/mac80211/driver-ops.c:97 drv_remove_interface+0x2cb/0x330 [mac80211]
> [  275.434008] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> [  275.434068] CPU: 3 PID: 3164 Comm: kworker/u16:20 Tainted: G        W         5.15.0-rc1 #483
> [  275.434074] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> [  275.434079] Workqueue: events_unbound async_run_entry_fn
> [  275.434087] RIP: 0010:drv_remove_interface+0x2cb/0x330 [mac80211]
> [  275.434154] Code: c1 e9 03 80 3c 01 00 75 72 48 8b 83 88 06 00 00 48 8d b3 a8 06 00 00 48 c7 c7 60 2a 7e c0 48 85 c0 48 0f 45 f0 e8 8a 12 16 e4 <0f> 0b eb 90 e8 6c a8 23 e2 e9 e9 fd ff ff e8 62 a8 23 e2 e9 06 fe
> [  275.434159] RSP: 0000:ffffc90002bc7788 EFLAGS: 00010282
> [  275.434167] RAX: 0000000000000000 RBX: ffff8881735a0c40 RCX: 0000000000000000
> [  275.434171] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000578ee3
> [  275.434176] RBP: ffff888108900e20 R08: 0000000000000001 R09: ffff888234fe581b
> [  275.434180] R10: ffffed10469fcb03 R11: 0000000000000001 R12: dffffc0000000000
> [  275.434184] R13: ffff8881735a12d8 R14: ffff888108901568 R15: 000000000000000f
> [  275.434189] FS:  0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
> [  275.434194] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  275.434198] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
> [  275.434203] Call Trace:
> [  275.434212]  ieee80211_do_stop+0xe27/0x1a20 [mac80211]
> [  275.434291]  ? mutex_lock_io_nested+0x1490/0x1490
> [  275.434303]  ? ieee80211_del_virtual_monitor+0x1a0/0x1a0 [mac80211]
> [  275.434370]  ? mark_held_locks+0xa5/0xe0
> [  275.434382]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> [  275.434389]  ? __local_bh_enable_ip+0x9d/0xf0
> [  275.434394]  ? trace_hardirqs_on+0x1c/0x120
> [  275.434410]  ieee80211_stop+0xb2/0x230 [mac80211]
> [  275.434484]  __dev_close_many+0x191/0x2a0
> [  275.434491]  ? netif_tx_stop_all_queues+0xf0/0xf0
> [  275.434496]  ? find_held_lock+0x33/0x110
> [  275.434507]  ? __lock_release+0x494/0xa40
> [  275.434518]  dev_close_many+0x1c5/0x540
> [  275.434527]  ? wait_for_completion_io+0x280/0x280
> [  275.434535]  ? dev_get_by_napi_id+0x110/0x110
> [  275.434544]  ? wiphy_resume+0x1a5/0x370 [cfg80211]
> [  275.434610]  dev_close+0x132/0x1d0
> [  275.434617]  ? dev_xdp_attach.constprop.0+0x750/0x750
> [  275.434633]  cfg80211_shutdown_all_interfaces+0x71/0x180 [cfg80211]
> [  275.434697]  wiphy_resume+0x1b2/0x370 [cfg80211]
> [  275.434755]  ? trace_device_pm_callback_start+0x123/0x1b0
> [  275.434765]  ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> [  275.434822]  dpm_run_callback+0xf4/0x1b0
> [  275.434830]  ? trace_device_pm_callback_end+0x1a0/0x1a0
> [  275.434839]  ? device_links_read_unlock+0x1b/0x30
> [  275.434845]  ? dpm_wait_for_superior+0x256/0x430
> [  275.434859]  device_resume+0x3d5/0x980
> [  275.434868]  ? dpm_run_callback+0x1b0/0x1b0
> [  275.434873]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> [  275.434880]  ? ktime_get+0x214/0x2f0
> [  275.434886]  ? trace_hardirqs_on+0x1c/0x120
> [  275.434893]  ? recalibrate_cpu_khz+0x10/0x10
> [  275.434904]  ? device_resume+0x980/0x980
> [  275.434910]  async_resume+0x14/0x30
> [  275.434916]  async_run_entry_fn+0x90/0x4f0
> [  275.434928]  process_one_work+0x866/0x1460
> [  275.434946]  ? pwq_dec_nr_in_flight+0x230/0x230
> [  275.434965]  ? worker_thread+0x152/0x1010
> [  275.434992]  worker_thread+0x596/0x1010
> [  275.435013]  ? process_one_work+0x1460/0x1460
> [  275.435022]  kthread+0x322/0x3e0
> [  275.435027]  ? _raw_spin_unlock_irq+0x1f/0x30
> [  275.435032]  ? set_kthread_struct+0x100/0x100
> [  275.435042]  ret_from_fork+0x22/0x30
> [  275.435065] irq event stamp: 1923
> [  275.435069] hardirqs last  enabled at (1931): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
> [  275.435076] hardirqs last disabled at (1938): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
> [  275.435082] softirqs last  enabled at (1290): [<ffffffffa4c0050a>] __do_softirq+0x50a/0x7d5
> [  275.435087] softirqs last disabled at (1283): [<ffffffffa2336935>] __irq_exit_rcu+0xe5/0x120
> [  275.435093] ---[ end trace 88fd8120acef327d ]---
> [  275.435126] ------------[ cut here ]------------
> [  275.435130] WARNING: CPU: 3 PID: 3164 at net/mac80211/driver-ops.c:36 drv_stop+0x290/0x310 [mac80211]
> [  275.435197] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> [  275.435256] CPU: 3 PID: 3164 Comm: kworker/u16:20 Tainted: G        W         5.15.0-rc1 #483
> [  275.435261] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> [  275.435265] Workqueue: events_unbound async_run_entry_fn
> [  275.435274] RIP: 0010:drv_stop+0x290/0x310 [mac80211]
> [  275.435339] Code: 80 3d 5f f1 29 00 00 75 e2 48 c7 c2 c0 29 7e c0 be 34 01 00 00 48 c7 c7 20 2a 7e c0 c6 05 43 f1 29 00 01 e8 af 64 16 e4 eb c1 <0f> 0b 5b 5d 41 5c 41 5d c3 0f 0b e9 d3 fd ff ff 48 89 ef e8 18 b2
> [  275.435344] RSP: 0000:ffffc90002bc7790 EFLAGS: 00010246
> [  275.435352] RAX: 0000000000000000 RBX: ffff888108900e20 RCX: 0000000000000001
> [  275.435356] RDX: 0000000000000004 RSI: ffffffffa5a021a0 RDI: ffff888145778920
> [  275.435360] RBP: ffff88810890169c R08: 0000000000000001 R09: ffffc90002bc757f
> [  275.435365] R10: ffffc90002bc77a8 R11: 0000000000000001 R12: dffffc0000000000
> [  275.435369] R13: ffff888108900e20 R14: ffff888108901568 R15: 000000000000000f
> [  275.435373] FS:  0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
> [  275.435378] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  275.435382] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
> [  275.435387] Call Trace:
> [  275.435394]  ieee80211_do_stop+0x11dd/0x1a20 [mac80211]
> [  275.435472]  ? mutex_lock_io_nested+0x1490/0x1490
> [  275.435484]  ? ieee80211_del_virtual_monitor+0x1a0/0x1a0 [mac80211]
> [  275.435551]  ? mark_held_locks+0xa5/0xe0
> [  275.435562]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> [  275.435569]  ? __local_bh_enable_ip+0x9d/0xf0
> [  275.435574]  ? trace_hardirqs_on+0x1c/0x120
> [  275.435590]  ieee80211_stop+0xb2/0x230 [mac80211]
> [  275.435663]  __dev_close_many+0x191/0x2a0
> [  275.435670]  ? netif_tx_stop_all_queues+0xf0/0xf0
> [  275.435675]  ? find_held_lock+0x33/0x110
> [  275.435686]  ? __lock_release+0x494/0xa40
> [  275.435697]  dev_close_many+0x1c5/0x540
> [  275.435706]  ? wait_for_completion_io+0x280/0x280
> [  275.435713]  ? dev_get_by_napi_id+0x110/0x110
> [  275.435723]  ? wiphy_resume+0x1a5/0x370 [cfg80211]
> [  275.435790]  dev_close+0x132/0x1d0
> [  275.435797]  ? dev_xdp_attach.constprop.0+0x750/0x750
> [  275.435813]  cfg80211_shutdown_all_interfaces+0x71/0x180 [cfg80211]
> [  275.435876]  wiphy_resume+0x1b2/0x370 [cfg80211]
> [  275.435935]  ? trace_device_pm_callback_start+0x123/0x1b0
> [  275.435944]  ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> [  275.436018]  dpm_run_callback+0xf4/0x1b0
> [  275.436026]  ? trace_device_pm_callback_end+0x1a0/0x1a0
> [  275.436035]  ? device_links_read_unlock+0x1b/0x30
> [  275.436041]  ? dpm_wait_for_superior+0x256/0x430
> [  275.436055]  device_resume+0x3d5/0x980
> [  275.436064]  ? dpm_run_callback+0x1b0/0x1b0
> [  275.436069]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> [  275.436076]  ? ktime_get+0x214/0x2f0
> [  275.436082]  ? trace_hardirqs_on+0x1c/0x120
> [  275.436089]  ? recalibrate_cpu_khz+0x10/0x10
> [  275.436100]  ? device_resume+0x980/0x980
> [  275.436106]  async_resume+0x14/0x30
> [  275.436112]  async_run_entry_fn+0x90/0x4f0
> [  275.436124]  process_one_work+0x866/0x1460
> [  275.436142]  ? pwq_dec_nr_in_flight+0x230/0x230
> [  275.436161]  ? worker_thread+0x152/0x1010
> [  275.436172]  worker_thread+0x596/0x1010
> [  275.436191]  ? process_one_work+0x1460/0x1460
> [  275.436201]  kthread+0x322/0x3e0
> [  275.436206]  ? _raw_spin_unlock_irq+0x1f/0x30
> [  275.436211]  ? set_kthread_struct+0x100/0x100
> [  275.436221]  ret_from_fork+0x22/0x30
> [  275.436244] irq event stamp: 2619
> [  275.436248] hardirqs last  enabled at (2627): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
> [  275.436254] hardirqs last disabled at (2634): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
> [  275.436260] softirqs last  enabled at (1290): [<ffffffffa4c0050a>] __do_softirq+0x50a/0x7d5
> [  275.436266] softirqs last disabled at (1283): [<ffffffffa2336935>] __irq_exit_rcu+0xe5/0x120
> [  275.436271] ---[ end trace 88fd8120acef327e ]---
> [  275.438124] PM: dpm_run_callback(): wiphy_resume+0x0/0x370 [cfg80211] returns -11
> [  275.438194] ieee80211 phy0: PM: failed to resume async: error -11
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-16 10:18 ` Loic Poulain
@ 2021-09-16 11:12   ` Manivannan Sadhasivam
       [not found]     ` <CAMZdPi94607mZorp+Zmkw3seWXak6p9Jr05CQ5hhfgKQoG8n7Q@mail.gmail.com>
  2021-09-24  8:36   ` Kalle Valo
  1 sibling, 1 reply; 24+ messages in thread
From: Manivannan Sadhasivam @ 2021-09-16 11:12 UTC (permalink / raw)
  To: Loic Poulain
  Cc: Kalle Valo, ath11k, linux-wireless, linux-arm-msm, regressions

On Thu, Sep 16, 2021 at 12:18:10PM +0200, Loic Poulain wrote:
> Hi Kalle,
> 
> On Thu, 16 Sept 2021 at 10:00, Kalle Valo <kvalo@codeaurora.org> wrote:
> >
> > Hi Loic and Mani,
> >
> > I hate to be the bearer of bad news again :)
> >
> > I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
> > MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
> > ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
> > Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
> > problem, I only see the problem on the NUC. I do not know what's causing
> > this difference.
> 
> I suppose the NUC is current PCI-Express power during suspend while
> the laptop maintains PCIe/M2 power.
> 

Yes, that could be the case here.

> >
> > At the moment I'm running my tests with commit 020d3b26c07a reverted and
> > everything works without problems. Is there a simple way to fix this? Or
> > maybe we should just revert the commit? Commit log and kernel logs from
> > a failing case below.
> 
> Do you have log of success case?
> 
> To me, the device loses power, that is why MHI resuming is failing.
> Normally the device should be properly recovered/reinitialized. Before
> that patch the power loss was simply not detected (or handled at
> higher stack level).
> 

If things seems to work fine without that patch, then it implies that setting M0
state works during resume. I think we should just revert that patch.

Loic, did that patch fix any issue for you or it was a cosmetic fix only?

Thanks,
Mani

> Regards,
> Loic
> 
> 
> >
> > Kalle
> >
> > commit 020d3b26c07abe274ac17f64999bbd3bf3342195
> > Author:     Loic Poulain <loic.poulain@linaro.org>
> > AuthorDate: Fri Mar 5 17:14:01 2021 +0100
> > Commit:     Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > CommitDate: Wed Mar 10 20:11:22 2021 +0530
> >
> >     bus: mhi: Early MHI resume failure in non M3 state
> >
> >     MHI suspend/resume are symmetric and balanced procedures. If device is
> >     not in M3 state on a resume, that means something happened behind our
> >     back. In this case resume is aborted and error reported, to let the
> >     controller handle the situation.
> >
> >     This is mainly requested for system wide suspend-resume operation in
> >     PCI context which may lead to power-down/reset of the controller which
> >     will then lose its MHI context. In such cases, PCI driver is supposed
> >     to recover and reinitialize the device.
> >
> >     Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
> >     Reviewed-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
> >     Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> >     Link: https://lore.kernel.org/r/1614960841-20233-1-git-send-email-loic.poulain@linaro.org
> >     Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> >
> > [  267.182376] ACPI: PM: Waking up from system sleep state S3
> > [  268.192783] ACPI: EC: interrupt unblocked
> > [  268.193023] pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
> > [  268.204389] pcieport 0000:00:1d.0: Intel SPT PCH root port ACS workaround enabled
> > [  268.204391] pcieport 0000:00:1c.1: Intel SPT PCH root port ACS workaround enabled
> > [  268.205227] pcieport 0000:00:1c.2: Intel SPT PCH root port ACS workaround enabled
> > [  269.360336] ACPI: EC: event unblocked
> > [  269.367187] usb usb3: root hub lost power or was reset
> > [  269.367215] usb usb4: root hub lost power or was reset
> > [  269.368584] ath11k_pci 0000:06:00.0: failed to set mhi state: RESUME(6)
> > [  269.455966] nvme nvme0: 8/0/0 default/read/poll queues
> > [  272.289737] igb 0000:05:00.0 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> > [  272.424084] ath11k_pci 0000:06:00.0: timed out while waiting for wow wakeup completion
> > [  272.424091] ath11k_pci 0000:06:00.0: failed to wakeup wow during resume: -110
> > [  272.424096] ath11k_pci 0000:06:00.0: failed to resume core: -110
> > [  272.424101] PM: dpm_run_callback(): pci_pm_resume+0x0/0x2d0 returns -110
> > [  272.424119] ath11k_pci 0000:06:00.0: PM: failed to resume async: error -110
> > [  275.432003] ath11k_pci 0000:06:00.0: wmi command 16387 timeout
> > [  275.432034] ath11k_pci 0000:06:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> > [  275.432088] ath11k_pci 0000:06:00.0: failed to enable PMF QOS: (-11
> > [  275.432094] ------------[ cut here ]------------
> > [  275.432114] Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> > [  275.432144] WARNING: CPU: 3 PID: 3164 at net/mac80211/util.c:2361 ieee80211_reconfig+0x216/0x22a0 [mac80211]
> > [  275.432225] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> > [  275.432287] CPU: 3 PID: 3164 Comm: kworker/u16:20 Not tainted 5.15.0-rc1 #483
> > [  275.432293] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> > [  275.432298] Workqueue: events_unbound async_run_entry_fn
> > [  275.432307] RIP: 0010:ieee80211_reconfig+0x216/0x22a0 [mac80211]
> > [  275.432381] Code: c0 0f 85 4b 1f 00 00 41 c6 87 7c 08 00 00 00 4c 89 ff e8 ed 41 f1 ff 41 89 c5 85 c0 74 13 48 c7 c7 40 bc 7e c0 e8 ef 63 07 e4 <0f> 0b e9 12 ff ff ff 88 5c 24 37 49 8d 47 40 48 89 c2 48 89 44 24
> > [  275.432386] RSP: 0000:ffffc90002bc7ab0 EFLAGS: 00010286
> > [  275.432394] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > [  275.432399] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000578f48
> > [  275.432403] RBP: ffff88810890169a R08: 0000000000000001 R09: ffff888234fe581b
> > [  275.432408] R10: ffffed10469fcb03 R11: 0000000000000001 R12: ffff88810890169e
> > [  275.432412] R13: 00000000fffffff5 R14: 0000000000000000 R15: ffff888108900e20
> > [  275.432417] FS:  0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
> > [  275.432421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  275.432426] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
> > [  275.432430] Call Trace:
> > [  275.432443]  ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> > [  275.432515]  wiphy_resume+0x190/0x370 [cfg80211]
> > [  275.432574]  ? trace_device_pm_callback_start+0x123/0x1b0
> > [  275.432584]  ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> > [  275.432642]  dpm_run_callback+0xf4/0x1b0
> > [  275.432650]  ? trace_device_pm_callback_end+0x1a0/0x1a0
> > [  275.432658]  ? device_links_read_unlock+0x1b/0x30
> > [  275.432665]  ? dpm_wait_for_superior+0x256/0x430
> > [  275.432679]  device_resume+0x3d5/0x980
> > [  275.432688]  ? dpm_run_callback+0x1b0/0x1b0
> > [  275.432693]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> > [  275.432701]  ? ktime_get+0x214/0x2f0
> > [  275.432707]  ? trace_hardirqs_on+0x1c/0x120
> > [  275.432715]  ? recalibrate_cpu_khz+0x10/0x10
> > [  275.432726]  ? device_resume+0x980/0x980
> > [  275.432732]  async_resume+0x14/0x30
> > [  275.432738]  async_run_entry_fn+0x90/0x4f0
> > [  275.432750]  process_one_work+0x866/0x1460
> > [  275.432768]  ? pwq_dec_nr_in_flight+0x230/0x230
> > [  275.432787]  ? worker_thread+0x152/0x1010
> > [  275.432798]  worker_thread+0x596/0x1010
> > [  275.432818]  ? process_one_work+0x1460/0x1460
> > [  275.432828]  kthread+0x322/0x3e0
> > [  275.432833]  ? _raw_spin_unlock_irq+0x1f/0x30
> > [  275.432838]  ? set_kthread_struct+0x100/0x100
> > [  275.432848]  ret_from_fork+0x22/0x30
> > [  275.432872] irq event stamp: 977
> > [  275.432876] hardirqs last  enabled at (985): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
> > [  275.432882] hardirqs last disabled at (992): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
> > [  275.432888] softirqs last  enabled at (402): [<ffffffffc095f878>] ath11k_htc_send+0x668/0xc10 [ath11k]
> > [  275.432914] softirqs last disabled at (400): [<ffffffffc095f797>] ath11k_htc_send+0x587/0xc10 [ath11k]
> > [  275.432937] ---[ end trace 88fd8120acef327c ]---
> > [  275.433884] ------------[ cut here ]------------
> > [  275.433888] wlan0: Failed check-sdata-in-driver check, flags: 0x4
> > [  275.433917] WARNING: CPU: 3 PID: 3164 at net/mac80211/driver-ops.c:97 drv_remove_interface+0x2cb/0x330 [mac80211]
> > [  275.434008] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> > [  275.434068] CPU: 3 PID: 3164 Comm: kworker/u16:20 Tainted: G        W         5.15.0-rc1 #483
> > [  275.434074] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> > [  275.434079] Workqueue: events_unbound async_run_entry_fn
> > [  275.434087] RIP: 0010:drv_remove_interface+0x2cb/0x330 [mac80211]
> > [  275.434154] Code: c1 e9 03 80 3c 01 00 75 72 48 8b 83 88 06 00 00 48 8d b3 a8 06 00 00 48 c7 c7 60 2a 7e c0 48 85 c0 48 0f 45 f0 e8 8a 12 16 e4 <0f> 0b eb 90 e8 6c a8 23 e2 e9 e9 fd ff ff e8 62 a8 23 e2 e9 06 fe
> > [  275.434159] RSP: 0000:ffffc90002bc7788 EFLAGS: 00010282
> > [  275.434167] RAX: 0000000000000000 RBX: ffff8881735a0c40 RCX: 0000000000000000
> > [  275.434171] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000578ee3
> > [  275.434176] RBP: ffff888108900e20 R08: 0000000000000001 R09: ffff888234fe581b
> > [  275.434180] R10: ffffed10469fcb03 R11: 0000000000000001 R12: dffffc0000000000
> > [  275.434184] R13: ffff8881735a12d8 R14: ffff888108901568 R15: 000000000000000f
> > [  275.434189] FS:  0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
> > [  275.434194] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  275.434198] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
> > [  275.434203] Call Trace:
> > [  275.434212]  ieee80211_do_stop+0xe27/0x1a20 [mac80211]
> > [  275.434291]  ? mutex_lock_io_nested+0x1490/0x1490
> > [  275.434303]  ? ieee80211_del_virtual_monitor+0x1a0/0x1a0 [mac80211]
> > [  275.434370]  ? mark_held_locks+0xa5/0xe0
> > [  275.434382]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> > [  275.434389]  ? __local_bh_enable_ip+0x9d/0xf0
> > [  275.434394]  ? trace_hardirqs_on+0x1c/0x120
> > [  275.434410]  ieee80211_stop+0xb2/0x230 [mac80211]
> > [  275.434484]  __dev_close_many+0x191/0x2a0
> > [  275.434491]  ? netif_tx_stop_all_queues+0xf0/0xf0
> > [  275.434496]  ? find_held_lock+0x33/0x110
> > [  275.434507]  ? __lock_release+0x494/0xa40
> > [  275.434518]  dev_close_many+0x1c5/0x540
> > [  275.434527]  ? wait_for_completion_io+0x280/0x280
> > [  275.434535]  ? dev_get_by_napi_id+0x110/0x110
> > [  275.434544]  ? wiphy_resume+0x1a5/0x370 [cfg80211]
> > [  275.434610]  dev_close+0x132/0x1d0
> > [  275.434617]  ? dev_xdp_attach.constprop.0+0x750/0x750
> > [  275.434633]  cfg80211_shutdown_all_interfaces+0x71/0x180 [cfg80211]
> > [  275.434697]  wiphy_resume+0x1b2/0x370 [cfg80211]
> > [  275.434755]  ? trace_device_pm_callback_start+0x123/0x1b0
> > [  275.434765]  ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> > [  275.434822]  dpm_run_callback+0xf4/0x1b0
> > [  275.434830]  ? trace_device_pm_callback_end+0x1a0/0x1a0
> > [  275.434839]  ? device_links_read_unlock+0x1b/0x30
> > [  275.434845]  ? dpm_wait_for_superior+0x256/0x430
> > [  275.434859]  device_resume+0x3d5/0x980
> > [  275.434868]  ? dpm_run_callback+0x1b0/0x1b0
> > [  275.434873]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> > [  275.434880]  ? ktime_get+0x214/0x2f0
> > [  275.434886]  ? trace_hardirqs_on+0x1c/0x120
> > [  275.434893]  ? recalibrate_cpu_khz+0x10/0x10
> > [  275.434904]  ? device_resume+0x980/0x980
> > [  275.434910]  async_resume+0x14/0x30
> > [  275.434916]  async_run_entry_fn+0x90/0x4f0
> > [  275.434928]  process_one_work+0x866/0x1460
> > [  275.434946]  ? pwq_dec_nr_in_flight+0x230/0x230
> > [  275.434965]  ? worker_thread+0x152/0x1010
> > [  275.434992]  worker_thread+0x596/0x1010
> > [  275.435013]  ? process_one_work+0x1460/0x1460
> > [  275.435022]  kthread+0x322/0x3e0
> > [  275.435027]  ? _raw_spin_unlock_irq+0x1f/0x30
> > [  275.435032]  ? set_kthread_struct+0x100/0x100
> > [  275.435042]  ret_from_fork+0x22/0x30
> > [  275.435065] irq event stamp: 1923
> > [  275.435069] hardirqs last  enabled at (1931): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
> > [  275.435076] hardirqs last disabled at (1938): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
> > [  275.435082] softirqs last  enabled at (1290): [<ffffffffa4c0050a>] __do_softirq+0x50a/0x7d5
> > [  275.435087] softirqs last disabled at (1283): [<ffffffffa2336935>] __irq_exit_rcu+0xe5/0x120
> > [  275.435093] ---[ end trace 88fd8120acef327d ]---
> > [  275.435126] ------------[ cut here ]------------
> > [  275.435130] WARNING: CPU: 3 PID: 3164 at net/mac80211/driver-ops.c:36 drv_stop+0x290/0x310 [mac80211]
> > [  275.435197] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> > [  275.435256] CPU: 3 PID: 3164 Comm: kworker/u16:20 Tainted: G        W         5.15.0-rc1 #483
> > [  275.435261] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> > [  275.435265] Workqueue: events_unbound async_run_entry_fn
> > [  275.435274] RIP: 0010:drv_stop+0x290/0x310 [mac80211]
> > [  275.435339] Code: 80 3d 5f f1 29 00 00 75 e2 48 c7 c2 c0 29 7e c0 be 34 01 00 00 48 c7 c7 20 2a 7e c0 c6 05 43 f1 29 00 01 e8 af 64 16 e4 eb c1 <0f> 0b 5b 5d 41 5c 41 5d c3 0f 0b e9 d3 fd ff ff 48 89 ef e8 18 b2
> > [  275.435344] RSP: 0000:ffffc90002bc7790 EFLAGS: 00010246
> > [  275.435352] RAX: 0000000000000000 RBX: ffff888108900e20 RCX: 0000000000000001
> > [  275.435356] RDX: 0000000000000004 RSI: ffffffffa5a021a0 RDI: ffff888145778920
> > [  275.435360] RBP: ffff88810890169c R08: 0000000000000001 R09: ffffc90002bc757f
> > [  275.435365] R10: ffffc90002bc77a8 R11: 0000000000000001 R12: dffffc0000000000
> > [  275.435369] R13: ffff888108900e20 R14: ffff888108901568 R15: 000000000000000f
> > [  275.435373] FS:  0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
> > [  275.435378] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  275.435382] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
> > [  275.435387] Call Trace:
> > [  275.435394]  ieee80211_do_stop+0x11dd/0x1a20 [mac80211]
> > [  275.435472]  ? mutex_lock_io_nested+0x1490/0x1490
> > [  275.435484]  ? ieee80211_del_virtual_monitor+0x1a0/0x1a0 [mac80211]
> > [  275.435551]  ? mark_held_locks+0xa5/0xe0
> > [  275.435562]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> > [  275.435569]  ? __local_bh_enable_ip+0x9d/0xf0
> > [  275.435574]  ? trace_hardirqs_on+0x1c/0x120
> > [  275.435590]  ieee80211_stop+0xb2/0x230 [mac80211]
> > [  275.435663]  __dev_close_many+0x191/0x2a0
> > [  275.435670]  ? netif_tx_stop_all_queues+0xf0/0xf0
> > [  275.435675]  ? find_held_lock+0x33/0x110
> > [  275.435686]  ? __lock_release+0x494/0xa40
> > [  275.435697]  dev_close_many+0x1c5/0x540
> > [  275.435706]  ? wait_for_completion_io+0x280/0x280
> > [  275.435713]  ? dev_get_by_napi_id+0x110/0x110
> > [  275.435723]  ? wiphy_resume+0x1a5/0x370 [cfg80211]
> > [  275.435790]  dev_close+0x132/0x1d0
> > [  275.435797]  ? dev_xdp_attach.constprop.0+0x750/0x750
> > [  275.435813]  cfg80211_shutdown_all_interfaces+0x71/0x180 [cfg80211]
> > [  275.435876]  wiphy_resume+0x1b2/0x370 [cfg80211]
> > [  275.435935]  ? trace_device_pm_callback_start+0x123/0x1b0
> > [  275.435944]  ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> > [  275.436018]  dpm_run_callback+0xf4/0x1b0
> > [  275.436026]  ? trace_device_pm_callback_end+0x1a0/0x1a0
> > [  275.436035]  ? device_links_read_unlock+0x1b/0x30
> > [  275.436041]  ? dpm_wait_for_superior+0x256/0x430
> > [  275.436055]  device_resume+0x3d5/0x980
> > [  275.436064]  ? dpm_run_callback+0x1b0/0x1b0
> > [  275.436069]  ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> > [  275.436076]  ? ktime_get+0x214/0x2f0
> > [  275.436082]  ? trace_hardirqs_on+0x1c/0x120
> > [  275.436089]  ? recalibrate_cpu_khz+0x10/0x10
> > [  275.436100]  ? device_resume+0x980/0x980
> > [  275.436106]  async_resume+0x14/0x30
> > [  275.436112]  async_run_entry_fn+0x90/0x4f0
> > [  275.436124]  process_one_work+0x866/0x1460
> > [  275.436142]  ? pwq_dec_nr_in_flight+0x230/0x230
> > [  275.436161]  ? worker_thread+0x152/0x1010
> > [  275.436172]  worker_thread+0x596/0x1010
> > [  275.436191]  ? process_one_work+0x1460/0x1460
> > [  275.436201]  kthread+0x322/0x3e0
> > [  275.436206]  ? _raw_spin_unlock_irq+0x1f/0x30
> > [  275.436211]  ? set_kthread_struct+0x100/0x100
> > [  275.436221]  ret_from_fork+0x22/0x30
> > [  275.436244] irq event stamp: 2619
> > [  275.436248] hardirqs last  enabled at (2627): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
> > [  275.436254] hardirqs last disabled at (2634): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
> > [  275.436260] softirqs last  enabled at (1290): [<ffffffffa4c0050a>] __do_softirq+0x50a/0x7d5
> > [  275.436266] softirqs last disabled at (1283): [<ffffffffa2336935>] __irq_exit_rcu+0xe5/0x120
> > [  275.436271] ---[ end trace 88fd8120acef327e ]---
> > [  275.438124] PM: dpm_run_callback(): wiphy_resume+0x0/0x370 [cfg80211] returns -11
> > [  275.438194] ieee80211 phy0: PM: failed to resume async: error -11
> >
> > --
> > https://patchwork.kernel.org/project/linux-wireless/list/
> >
> > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
       [not found]     ` <CAMZdPi94607mZorp+Zmkw3seWXak6p9Jr05CQ5hhfgKQoG8n7Q@mail.gmail.com>
@ 2021-09-16 16:35       ` Manivannan Sadhasivam
  2021-09-16 16:42         ` Kalle Valo
  0 siblings, 1 reply; 24+ messages in thread
From: Manivannan Sadhasivam @ 2021-09-16 16:35 UTC (permalink / raw)
  To: Loic Poulain
  Cc: Kalle Valo, ath11k, linux-arm-msm, linux-wireless, regressions

On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
> Le jeu. 16 sept. 2021 à 13:12, Manivannan Sadhasivam <
> manivannan.sadhasivam@linaro.org> a écrit :
> 

[...]

> > If things seems to work fine without that patch, then it implies that
> > setting M0
> > state works during resume. I think we should just revert that patch.
> >
> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
> 
> 
> It fixes sdx modem resuming issue, without that we don’t know modem needs
> to be reinitialized.
> 

Okay. Then in that case, the recovery mechanism has to be added to the ath11k
MHI controller.

If that's too much of work for Kalle, then I'll look into it. But I might get
time only after Plumbers.

Thanks,
Mani

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-16 16:35       ` Manivannan Sadhasivam
@ 2021-09-16 16:42         ` Kalle Valo
  2021-09-16 17:19           ` Manivannan Sadhasivam
  0 siblings, 1 reply; 24+ messages in thread
From: Kalle Valo @ 2021-09-16 16:42 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Loic Poulain, ath11k, linux-arm-msm, linux-wireless, regressions

Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:

> On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
>> Le jeu. 16 sept. 2021 à 13:12, Manivannan Sadhasivam <
>> manivannan.sadhasivam@linaro.org> a écrit :
>> 
>
> [...]
>
>> > If things seems to work fine without that patch, then it implies that
>> > setting M0
>> > state works during resume. I think we should just revert that patch.
>> >
>> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
>> 
>> 
>> It fixes sdx modem resuming issue, without that we don’t know modem needs
>> to be reinitialized.
>> 
>
> Okay. Then in that case, the recovery mechanism has to be added to the ath11k
> MHI controller.

What does that mean in practise, do you have any pointers or examples? I
have no clue what you are proposing :)

> If that's too much of work for Kalle, then I'll look into it. But I might get
> time only after Plumbers.

I'm busy, as always, so not sure when I'm able to do it either. I think
we should seriously consider reverting 020d3b26c07a and adding it back
after ath11k is able to handle this new situation.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-16 16:42         ` Kalle Valo
@ 2021-09-16 17:19           ` Manivannan Sadhasivam
  2021-09-23  8:34             ` Carl Huang
  0 siblings, 1 reply; 24+ messages in thread
From: Manivannan Sadhasivam @ 2021-09-16 17:19 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Loic Poulain, ath11k, linux-arm-msm, linux-wireless, regressions

On Thu, Sep 16, 2021 at 07:42:02PM +0300, Kalle Valo wrote:
> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
> 
> > On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
> >> Le jeu. 16 sept. 2021 à 13:12, Manivannan Sadhasivam <
> >> manivannan.sadhasivam@linaro.org> a écrit :
> >> 
> >
> > [...]
> >
> >> > If things seems to work fine without that patch, then it implies that
> >> > setting M0
> >> > state works during resume. I think we should just revert that patch.
> >> >
> >> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
> >> 
> >> 
> >> It fixes sdx modem resuming issue, without that we don’t know modem needs
> >> to be reinitialized.
> >> 
> >
> > Okay. Then in that case, the recovery mechanism has to be added to the ath11k
> > MHI controller.
> 
> What does that mean in practise, do you have any pointers or examples? I
> have no clue what you are proposing :)
> 

Take a look at the mhi_pci_recovery_work() function below:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/pci_generic.c#n610

You need to implement something similar that basically powers up the MHI
endpoint (QCA6390) in case pm_resume() fails. At minimum, you need to call
below functions:

# Check if the device is powered on. If yes, then power it down to bring it back
mhi_power_down()
mhi_unprepare_after_power_down()

# Power up the device
mhi_prepare_for_power_up()
mhi_sync_power_up()

This implies that the WLAN device has been powered off during suspend, so the
resume fails and we are bringing the device back to working state.

> > If that's too much of work for Kalle, then I'll look into it. But I might get
> > time only after Plumbers.
> 
> I'm busy, as always, so not sure when I'm able to do it either. I think
> we should seriously consider reverting 020d3b26c07a and adding it back
> after ath11k is able to handle this new situation.
> 

Since Loic said that reverting would cause his modem (SDX device) to fail during
resume, this is not possible.

Thanks,
Mani

> -- 
> https://patchwork.kernel.org/project/linux-wireless/list/
> 
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-16 17:19           ` Manivannan Sadhasivam
@ 2021-09-23  8:34             ` Carl Huang
  2021-09-23  8:59               ` Manivannan Sadhasivam
  0 siblings, 1 reply; 24+ messages in thread
From: Carl Huang @ 2021-09-23  8:34 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Kalle Valo, Loic Poulain, ath11k, linux-arm-msm, linux-wireless,
	regressions

On 2021-09-17 01:19, Manivannan Sadhasivam wrote:
> On Thu, Sep 16, 2021 at 07:42:02PM +0300, Kalle Valo wrote:
>> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
>> 
>> > On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
>> >> Le jeu. 16 sept. 2021 à 13:12, Manivannan Sadhasivam <
>> >> manivannan.sadhasivam@linaro.org> a écrit :
>> >>
>> >
>> > [...]
>> >
>> >> > If things seems to work fine without that patch, then it implies that
>> >> > setting M0
>> >> > state works during resume. I think we should just revert that patch.
>> >> >
>> >> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
>> >>
>> >>
>> >> It fixes sdx modem resuming issue, without that we don’t know modem needs
>> >> to be reinitialized.
>> >>
>> >
>> > Okay. Then in that case, the recovery mechanism has to be added to the ath11k
>> > MHI controller.
>> 
>> What does that mean in practise, do you have any pointers or examples? 
>> I
>> have no clue what you are proposing :)
>> 
> 
> Take a look at the mhi_pci_recovery_work() function below:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/pci_generic.c#n610
> 
> You need to implement something similar that basically powers up the 
> MHI
> endpoint (QCA6390) in case pm_resume() fails. At minimum, you need to 
> call
> below functions:
> 
> # Check if the device is powered on. If yes, then power it down to 
> bring it back
> mhi_power_down()
> mhi_unprepare_after_power_down()
> 
> # Power up the device
> mhi_prepare_for_power_up()
> mhi_sync_power_up()
> 
> This implies that the WLAN device has been powered off during suspend, 
> so the
> resume fails and we are bringing the device back to working state.
> 
This is fine for platform which doesn't provide power supply during 
suspend.
But NUC has power supply in suspend state.
QCA6390 on NUC works after just reverting this commit also proves NUC 
has power supply in
suspend state.

The reason is MHI-STATUS register can't be read somehow in M3 state on 
NUC.
Does the MHI spec state that MHI-STATUS register can be read in M3 
state?

>> > If that's too much of work for Kalle, then I'll look into it. But I might get
>> > time only after Plumbers.
>> 
>> I'm busy, as always, so not sure when I'm able to do it either. I 
>> think
>> we should seriously consider reverting 020d3b26c07a and adding it back
>> after ath11k is able to handle this new situation.
>> 
> 
> Since Loic said that reverting would cause his modem (SDX device) to 
> fail during
> resume, this is not possible.
> 
> Thanks,
> Mani
> 
>> --
>> https://patchwork.kernel.org/project/linux-wireless/list/
>> 
>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-23  8:34             ` Carl Huang
@ 2021-09-23  8:59               ` Manivannan Sadhasivam
  2021-09-23  9:26                 ` Carl Huang
  2021-09-24  9:07                 ` Kalle Valo
  0 siblings, 2 replies; 24+ messages in thread
From: Manivannan Sadhasivam @ 2021-09-23  8:59 UTC (permalink / raw)
  To: Carl Huang
  Cc: Kalle Valo, Loic Poulain, ath11k, linux-arm-msm, linux-wireless,
	regressions

On Thu, Sep 23, 2021 at 04:34:43PM +0800, Carl Huang wrote:
> On 2021-09-17 01:19, Manivannan Sadhasivam wrote:
> > On Thu, Sep 16, 2021 at 07:42:02PM +0300, Kalle Valo wrote:
> > > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
> > > 
> > > > On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
> > > >> Le jeu. 16 sept. 2021 à 13:12, Manivannan Sadhasivam <
> > > >> manivannan.sadhasivam@linaro.org> a écrit :
> > > >>
> > > >
> > > > [...]
> > > >
> > > >> > If things seems to work fine without that patch, then it implies that
> > > >> > setting M0
> > > >> > state works during resume. I think we should just revert that patch.
> > > >> >
> > > >> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
> > > >>
> > > >>
> > > >> It fixes sdx modem resuming issue, without that we don’t know modem needs
> > > >> to be reinitialized.
> > > >>
> > > >
> > > > Okay. Then in that case, the recovery mechanism has to be added to the ath11k
> > > > MHI controller.
> > > 
> > > What does that mean in practise, do you have any pointers or
> > > examples? I
> > > have no clue what you are proposing :)
> > > 
> > 
> > Take a look at the mhi_pci_recovery_work() function below:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/pci_generic.c#n610
> > 
> > You need to implement something similar that basically powers up the MHI
> > endpoint (QCA6390) in case pm_resume() fails. At minimum, you need to
> > call
> > below functions:
> > 
> > # Check if the device is powered on. If yes, then power it down to bring
> > it back
> > mhi_power_down()
> > mhi_unprepare_after_power_down()
> > 
> > # Power up the device
> > mhi_prepare_for_power_up()
> > mhi_sync_power_up()
> > 
> > This implies that the WLAN device has been powered off during suspend,
> > so the
> > resume fails and we are bringing the device back to working state.
> > 
> This is fine for platform which doesn't provide power supply during suspend.
> But NUC has power supply in suspend state.

If NUC retains power supply during suspend then it should work with that commit.
During resume, the device is expected to be in M3 state and that's what the
commit verifies.

If the device is in a different state, then most likely the device have power
cycled.

> QCA6390 on NUC works after just reverting this commit also proves NUC has
> power supply in
> suspend state.
> 

That's because we allowed the device to be in any state during resume and if it
responds to the M0 transition it worked.

> The reason is MHI-STATUS register can't be read somehow in M3 state on NUC.

No, that's not correct.

> Does the MHI spec state that MHI-STATUS register can be read in M3 state?
> 

Yes, all the MHI registers are accessible in all states. During M3, both MHI
host and device (if supported) will transition to D3 Cold. Then during resume,
host will switch to D0 link state and will also notify the device to enter D0.

For aid debugging, please see the state the device is in during mhi_pm_resume().
You can use below diff:

diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index fb99e3727155..482d55dd209e 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -898,6 +898,9 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
        if (MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))
                return -EIO;
 
+       dev_info(dev, "Device state: %s\n",
+                TO_MHI_STATE_STR(mhi_get_mhi_state(mhi_cntrl)));
+
        if (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_M3)
                return -EINVAL;


Thanks,
Mani

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-23  8:59               ` Manivannan Sadhasivam
@ 2021-09-23  9:26                 ` Carl Huang
  2021-09-23 10:50                   ` Loic Poulain
  2021-09-24  9:07                 ` Kalle Valo
  1 sibling, 1 reply; 24+ messages in thread
From: Carl Huang @ 2021-09-23  9:26 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Kalle Valo, Loic Poulain, ath11k, linux-arm-msm, linux-wireless,
	regressions

On 2021-09-23 16:59, Manivannan Sadhasivam wrote:
> On Thu, Sep 23, 2021 at 04:34:43PM +0800, Carl Huang wrote:
>> On 2021-09-17 01:19, Manivannan Sadhasivam wrote:
>> > On Thu, Sep 16, 2021 at 07:42:02PM +0300, Kalle Valo wrote:
>> > > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
>> > >
>> > > > On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
>> > > >> Le jeu. 16 sept. 2021 à 13:12, Manivannan Sadhasivam <
>> > > >> manivannan.sadhasivam@linaro.org> a écrit :
>> > > >>
>> > > >
>> > > > [...]
>> > > >
>> > > >> > If things seems to work fine without that patch, then it implies that
>> > > >> > setting M0
>> > > >> > state works during resume. I think we should just revert that patch.
>> > > >> >
>> > > >> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
>> > > >>
>> > > >>
>> > > >> It fixes sdx modem resuming issue, without that we don’t know modem needs
>> > > >> to be reinitialized.
>> > > >>
>> > > >
>> > > > Okay. Then in that case, the recovery mechanism has to be added to the ath11k
>> > > > MHI controller.
>> > >
>> > > What does that mean in practise, do you have any pointers or
>> > > examples? I
>> > > have no clue what you are proposing :)
>> > >
>> >
>> > Take a look at the mhi_pci_recovery_work() function below:
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/pci_generic.c#n610
>> >
>> > You need to implement something similar that basically powers up the MHI
>> > endpoint (QCA6390) in case pm_resume() fails. At minimum, you need to
>> > call
>> > below functions:
>> >
>> > # Check if the device is powered on. If yes, then power it down to bring
>> > it back
>> > mhi_power_down()
>> > mhi_unprepare_after_power_down()
>> >
>> > # Power up the device
>> > mhi_prepare_for_power_up()
>> > mhi_sync_power_up()
>> >
>> > This implies that the WLAN device has been powered off during suspend,
>> > so the
>> > resume fails and we are bringing the device back to working state.
>> >
>> This is fine for platform which doesn't provide power supply during 
>> suspend.
>> But NUC has power supply in suspend state.
> 
> If NUC retains power supply during suspend then it should work with 
> that commit.
> During resume, the device is expected to be in M3 state and that's what 
> the
> commit verifies.
> 
> If the device is in a different state, then most likely the device have 
> power
> cycled.
> 
But the tricky thing here is that upstream QCA6390 doesn't have recovery 
mechanism to download
firmware again, so QCA6390 has no way to work after a power cycle.

>> QCA6390 on NUC works after just reverting this commit also proves NUC 
>> has
>> power supply in
>> suspend state.
>> 
> 
> That's because we allowed the device to be in any state during resume 
> and if it
> responds to the M0 transition it worked.
> 
>> The reason is MHI-STATUS register can't be read somehow in M3 state on 
>> NUC.
> 
> No, that's not correct.
> 
>> Does the MHI spec state that MHI-STATUS register can be read in M3 
>> state?
>> 
> 
> Yes, all the MHI registers are accessible in all states. During M3, 
> both MHI
> host and device (if supported) will transition to D3 Cold. Then during 
> resume,
> host will switch to D0 link state and will also notify the device to 
> enter D0.
> 
> For aid debugging, please see the state the device is in during 
> mhi_pm_resume().
> You can use below diff:
> 
> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> index fb99e3727155..482d55dd209e 100644
> --- a/drivers/bus/mhi/core/pm.c
> +++ b/drivers/bus/mhi/core/pm.c
> @@ -898,6 +898,9 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
>         if (MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))
>                 return -EIO;
> 
> +       dev_info(dev, "Device state: %s\n",
> +                TO_MHI_STATE_STR(mhi_get_mhi_state(mhi_cntrl)));
> +
>         if (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_M3)
>                 return -EINVAL;
> 
> 
> Thanks,
> Mani

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-23  9:26                 ` Carl Huang
@ 2021-09-23 10:50                   ` Loic Poulain
  0 siblings, 0 replies; 24+ messages in thread
From: Loic Poulain @ 2021-09-23 10:50 UTC (permalink / raw)
  To: Carl Huang, Kalle Valo
  Cc: Manivannan Sadhasivam, ath11k, linux-arm-msm, linux-wireless,
	regressions

Hi Carl and Kalle,

On Thu, 23 Sept 2021 at 11:26, Carl Huang <cjhuang@codeaurora.org> wrote:
>
> On 2021-09-23 16:59, Manivannan Sadhasivam wrote:
> > On Thu, Sep 23, 2021 at 04:34:43PM +0800, Carl Huang wrote:
> >> On 2021-09-17 01:19, Manivannan Sadhasivam wrote:
> >> > On Thu, Sep 16, 2021 at 07:42:02PM +0300, Kalle Valo wrote:
> >> > > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
> >> > >
> >> > > > On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
> >> > > >> Le jeu. 16 sept. 2021 à 13:12, Manivannan Sadhasivam <
> >> > > >> manivannan.sadhasivam@linaro.org> a écrit :
> >> > > >>
> >> > > >
> >> > > > [...]
> >> > > >
> >> > > >> > If things seems to work fine without that patch, then it implies that
> >> > > >> > setting M0
> >> > > >> > state works during resume. I think we should just revert that patch.
> >> > > >> >
> >> > > >> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
> >> > > >>
> >> > > >>
> >> > > >> It fixes sdx modem resuming issue, without that we don’t know modem needs
> >> > > >> to be reinitialized.
> >> > > >>
> >> > > >
> >> > > > Okay. Then in that case, the recovery mechanism has to be added to the ath11k
> >> > > > MHI controller.
> >> > >
> >> > > What does that mean in practise, do you have any pointers or
> >> > > examples? I
> >> > > have no clue what you are proposing :)
> >> > >
> >> >
> >> > Take a look at the mhi_pci_recovery_work() function below:
> >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/pci_generic.c#n610
> >> >
> >> > You need to implement something similar that basically powers up the MHI
> >> > endpoint (QCA6390) in case pm_resume() fails. At minimum, you need to
> >> > call
> >> > below functions:
> >> >
> >> > # Check if the device is powered on. If yes, then power it down to bring
> >> > it back
> >> > mhi_power_down()
> >> > mhi_unprepare_after_power_down()
> >> >
> >> > # Power up the device
> >> > mhi_prepare_for_power_up()
> >> > mhi_sync_power_up()
> >> >
> >> > This implies that the WLAN device has been powered off during suspend,
> >> > so the
> >> > resume fails and we are bringing the device back to working state.
> >> >
> >> This is fine for platform which doesn't provide power supply during
> >> suspend.
> >> But NUC has power supply in suspend state.
> >
> > If NUC retains power supply during suspend then it should work with
> > that commit.
> > During resume, the device is expected to be in M3 state and that's what
> > the
> > commit verifies.
> >
> > If the device is in a different state, then most likely the device have
> > power
> > cycled.
> >
> But the tricky thing here is that upstream QCA6390 doesn't have recovery
> mechanism to download
> firmware again, so QCA6390 has no way to work after a power cycle.

Maybe a simple quick-fix would be to add a 'force' parameter to the
mhi resume function and discard state testing in case it is forced,
that would allow both ath11k and modem to work for now. Then
investigating what happens on ath11k side.

Thoughts?

Regards,
Loic

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-16 10:18 ` Loic Poulain
  2021-09-16 11:12   ` Manivannan Sadhasivam
@ 2021-09-24  8:36   ` Kalle Valo
  2021-09-24  9:43     ` Loic Poulain
  1 sibling, 1 reply; 24+ messages in thread
From: Kalle Valo @ 2021-09-24  8:36 UTC (permalink / raw)
  To: Loic Poulain
  Cc: Manivannan Sadhasivam, ath11k, linux-wireless, linux-arm-msm,
	regressions

Loic Poulain <loic.poulain@linaro.org> writes:

> Hi Kalle,
>
> On Thu, 16 Sept 2021 at 10:00, Kalle Valo <kvalo@codeaurora.org> wrote:
>>
>> Hi Loic and Mani,
>>
>> I hate to be the bearer of bad news again :)
>>
>> I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
>> MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
>> ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
>> Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
>> problem, I only see the problem on the NUC. I do not know what's causing
>> this difference.
>
> I suppose the NUC is current PCI-Express power during suspend while
> the laptop maintains PCIe/M2 power.

Sorry, I'm not able to parse that sentence. Can you elaborate more?

>> At the moment I'm running my tests with commit 020d3b26c07a reverted and
>> everything works without problems. Is there a simple way to fix this? Or
>> maybe we should just revert the commit? Commit log and kernel logs from
>> a failing case below.
>
> Do you have log of success case?

A log from a successful case in the end of email, using v5.15-rc1 plus
revert of commit 020d3b26c07abe27.

> To me, the device loses power, that is why MHI resuming is failing.
> Normally the device should be properly recovered/reinitialized. Before
> that patch the power loss was simply not detected (or handled at
> higher stack level).

Currently in ath11k we always keep the firmware running when in suspend,
this is a workaround due to problems between mac80211 and MHI stack.
IIRC the problem was something related MHI creating struct device during
resume or something like that.

[  164.088772] PM: suspend entry (deep)
[  164.089867] Filesystems sync: 0.000 seconds
[  164.140383] Freezing user space processes ... (elapsed 0.004 seconds) done.
[  164.146245] OOM killer disabled.
[  164.148024] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  164.151767] printk: Suspending console(s) (use no_console_suspend to debug)
[  164.155767] wlan0: deauthenticating from <SENSORED> by local choice (Reason: 3=DEAUTH_LEAVING)
[  164.197460] e1000e: EEE TX LPI TIMER: 00000011
[  164.787849] ACPI: EC: interrupt blocked
[  164.863887] ACPI: PM: Preparing to enter system sleep state S3
[  164.898479] ACPI: EC: event blocked
[  164.898483] ACPI: EC: EC stopped
[  164.898487] ACPI: PM: Saving platform NVS memory
[  164.898496] Disabling non-boot CPUs ...
[  164.910527] numa_remove_cpu cpu 1 node 0: mask now 0,2-7
[  164.911609] smpboot: CPU 1 is now offline
[  164.929506] numa_remove_cpu cpu 2 node 0: mask now 0,3-7
[  164.930593] smpboot: CPU 2 is now offline
[  164.947111] numa_remove_cpu cpu 3 node 0: mask now 0,4-7
[  164.948192] smpboot: CPU 3 is now offline
[  164.965687] numa_remove_cpu cpu 4 node 0: mask now 0,5-7
[  164.967133] smpboot: CPU 4 is now offline
[  164.983150] numa_remove_cpu cpu 5 node 0: mask now 0,6-7
[  164.984211] smpboot: CPU 5 is now offline
[  164.992047] numa_remove_cpu cpu 6 node 0: mask now 0,7
[  164.993549] smpboot: CPU 6 is now offline
[  165.004382] numa_remove_cpu cpu 7 node 0: mask now 0
[  165.005456] smpboot: CPU 7 is now offline
[  165.009866] ACPI: PM: Low-level resume complete
[  165.010106] ACPI: EC: EC started
[  165.010109] ACPI: PM: Restoring platform NVS memory
[  165.012344] Enabling non-boot CPUs ...
[  165.012978] x86: Booting SMP configuration:
[  165.012984] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  165.014850] numa_add_cpu cpu 1 node 0: mask now 0-1
[  165.023818] CPU1 is up
[  165.024455] smpboot: Booting Node 0 Processor 2 APIC 0x4
[  165.026190] numa_add_cpu cpu 2 node 0: mask now 0-2
[  165.034904] CPU2 is up
[  165.035479] smpboot: Booting Node 0 Processor 3 APIC 0x6
[  165.037193] numa_add_cpu cpu 3 node 0: mask now 0-3
[  165.046102] CPU3 is up
[  165.046639] smpboot: Booting Node 0 Processor 4 APIC 0x1
[  165.047005] numa_add_cpu cpu 4 node 0: mask now 0-4
[  165.058328] CPU4 is up
[  165.058976] smpboot: Booting Node 0 Processor 5 APIC 0x3
[  165.059342] numa_add_cpu cpu 5 node 0: mask now 0-5
[  165.070520] CPU5 is up
[  165.071192] smpboot: Booting Node 0 Processor 6 APIC 0x5
[  165.071574] numa_add_cpu cpu 6 node 0: mask now 0-6
[  165.082952] CPU6 is up
[  165.083609] smpboot: Booting Node 0 Processor 7 APIC 0x7
[  165.083980] numa_add_cpu cpu 7 node 0: mask now 0-7
[  165.095544] CPU7 is up
[  165.099137] ACPI: PM: Waking up from system sleep state S3
[  166.045084] ACPI: EC: interrupt unblocked
[  166.045242] pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
[  166.056234] pcieport 0000:00:1c.1: Intel SPT PCH root port ACS workaround enabled
[  166.057410] pcieport 0000:00:1d.0: Intel SPT PCH root port ACS workaround enabled
[  166.057413] pcieport 0000:00:1c.2: Intel SPT PCH root port ACS workaround enabled
[  167.210794] ACPI: EC: event unblocked
[  167.258815] nvme nvme0: 8/0/0 default/read/poll queues
[  167.694965] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[  167.695953] OOM killer enabled.
[  167.697336] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[  167.750241] Restarting tasks ... done.
[  167.770450] PM: suspend exit

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-23  8:59               ` Manivannan Sadhasivam
  2021-09-23  9:26                 ` Carl Huang
@ 2021-09-24  9:07                 ` Kalle Valo
  2021-09-24  9:57                   ` Manivannan Sadhasivam
  1 sibling, 1 reply; 24+ messages in thread
From: Kalle Valo @ 2021-09-24  9:07 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Carl Huang, Loic Poulain, ath11k, linux-arm-msm, linux-wireless,
	regressions

Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:

> For aid debugging, please see the state the device is in during mhi_pm_resume().
> You can use below diff:
>
> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> index fb99e3727155..482d55dd209e 100644
> --- a/drivers/bus/mhi/core/pm.c
> +++ b/drivers/bus/mhi/core/pm.c
> @@ -898,6 +898,9 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
>         if (MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))
>                 return -EIO;
>  
> +       dev_info(dev, "Device state: %s\n",
> +                TO_MHI_STATE_STR(mhi_get_mhi_state(mhi_cntrl)));
> +
>         if (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_M3)
>                 return -EINVAL;

This is what I get with my NUC testbox:

[  970.488202] ACPI: EC: event unblocked
[  970.492484] hpet: Lost 1587 RTC interrupts
[  970.492749] mhi mhi0: Device state: RESET
[  970.492805] ath11k_pci 0000:06:00.0: failed to set mhi state: RESUME(6)

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-24  8:36   ` Kalle Valo
@ 2021-09-24  9:43     ` Loic Poulain
  2021-09-24 10:00       ` Manivannan Sadhasivam
  2021-10-07  9:48       ` Kalle Valo
  0 siblings, 2 replies; 24+ messages in thread
From: Loic Poulain @ 2021-09-24  9:43 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Manivannan Sadhasivam, ath11k, linux-wireless, linux-arm-msm,
	regressions

[-- Attachment #1: Type: text/plain, Size: 2165 bytes --]

Hi Kalle,

On Fri, 24 Sept 2021 at 10:36, Kalle Valo <kvalo@codeaurora.org> wrote:
>
> Loic Poulain <loic.poulain@linaro.org> writes:
>
> > Hi Kalle,
> >
> > On Thu, 16 Sept 2021 at 10:00, Kalle Valo <kvalo@codeaurora.org> wrote:
> >>
> >> Hi Loic and Mani,
> >>
> >> I hate to be the bearer of bad news again :)
> >>
> >> I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
> >> MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
> >> ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
> >> Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
> >> problem, I only see the problem on the NUC. I do not know what's causing
> >> this difference.
> >
> > I suppose the NUC is current PCI-Express power during suspend while
> > the laptop maintains PCIe/M2 power.
>
> Sorry, I'm not able to parse that sentence. Can you elaborate more?

Ouch, yes, I wanted to say that the NUC does not maintain the power of
PCI express during suspend (leading to PCI D3cold state), whereas the
laptop maintains the power of the M2 card... well, not sure now I see
your logs.

>
> >> At the moment I'm running my tests with commit 020d3b26c07a reverted and
> >> everything works without problems. Is there a simple way to fix this? Or
> >> maybe we should just revert the commit? Commit log and kernel logs from
> >> a failing case below.
> >
> > Do you have log of success case?
>
> A log from a successful case in the end of email, using v5.15-rc1 plus
> revert of commit 020d3b26c07abe27.
>
> > To me, the device loses power, that is why MHI resuming is failing.
> > Normally the device should be properly recovered/reinitialized. Before
> > that patch the power loss was simply not detected (or handled at
> > higher stack level).
>
> Currently in ath11k we always keep the firmware running when in suspend,
> this is a workaround due to problems between mac80211 and MHI stack.
> IIRC the problem was something related MHI creating struct device during
> resume or something like that.

Could you give a try with the attached patch? It should solve your
issue without breaking modem support.

Regards,
Loic

[-- Attachment #2: 0001-bus-mhi-Add-support-for-forced-resume.patch --]
[-- Type: application/x-patch, Size: 3661 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-24  9:07                 ` Kalle Valo
@ 2021-09-24  9:57                   ` Manivannan Sadhasivam
  2021-10-07  9:55                     ` Kalle Valo
  0 siblings, 1 reply; 24+ messages in thread
From: Manivannan Sadhasivam @ 2021-09-24  9:57 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Carl Huang, Loic Poulain, ath11k, linux-arm-msm, linux-wireless,
	regressions

On Fri, Sep 24, 2021 at 12:07:41PM +0300, Kalle Valo wrote:
> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
> 
> > For aid debugging, please see the state the device is in during mhi_pm_resume().
> > You can use below diff:
> >
> > diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> > index fb99e3727155..482d55dd209e 100644
> > --- a/drivers/bus/mhi/core/pm.c
> > +++ b/drivers/bus/mhi/core/pm.c
> > @@ -898,6 +898,9 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
> >         if (MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))
> >                 return -EIO;
> >  
> > +       dev_info(dev, "Device state: %s\n",
> > +                TO_MHI_STATE_STR(mhi_get_mhi_state(mhi_cntrl)));
> > +
> >         if (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_M3)
> >                 return -EINVAL;
> 
> This is what I get with my NUC testbox:
> 
> [  970.488202] ACPI: EC: event unblocked
> [  970.492484] hpet: Lost 1587 RTC interrupts
> [  970.492749] mhi mhi0: Device state: RESET

Looks like the MHI device went into RESET state! It also looks to be a
firmware thing. But let's nail this down before adding any workaround in
the MHI stack.

Can you also rebuild the kernel with MHI debug enabled and capture the
logs in faliure case? Sorry if it is too much of work for you!

Thanks,
Mani

> [  970.492805] ath11k_pci 0000:06:00.0: failed to set mhi state: RESUME(6)
> 
> -- 
> https://patchwork.kernel.org/project/linux-wireless/list/
> 
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-24  9:43     ` Loic Poulain
@ 2021-09-24 10:00       ` Manivannan Sadhasivam
  2021-10-07  9:48       ` Kalle Valo
  1 sibling, 0 replies; 24+ messages in thread
From: Manivannan Sadhasivam @ 2021-09-24 10:00 UTC (permalink / raw)
  To: Loic Poulain
  Cc: Kalle Valo, ath11k, linux-wireless, linux-arm-msm, regressions

On Fri, Sep 24, 2021 at 11:43:55AM +0200, Loic Poulain wrote:
> Hi Kalle,
> 
> On Fri, 24 Sept 2021 at 10:36, Kalle Valo <kvalo@codeaurora.org> wrote:
> >
> > Loic Poulain <loic.poulain@linaro.org> writes:
> >
> > > Hi Kalle,
> > >
> > > On Thu, 16 Sept 2021 at 10:00, Kalle Valo <kvalo@codeaurora.org> wrote:
> > >>
> > >> Hi Loic and Mani,
> > >>
> > >> I hate to be the bearer of bad news again :)
> > >>
> > >> I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
> > >> MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
> > >> ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
> > >> Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
> > >> problem, I only see the problem on the NUC. I do not know what's causing
> > >> this difference.
> > >
> > > I suppose the NUC is current PCI-Express power during suspend while
> > > the laptop maintains PCIe/M2 power.
> >
> > Sorry, I'm not able to parse that sentence. Can you elaborate more?
> 
> Ouch, yes, I wanted to say that the NUC does not maintain the power of
> PCI express during suspend (leading to PCI D3cold state), whereas the
> laptop maintains the power of the M2 card... well, not sure now I see
> your logs.
> 
> >
> > >> At the moment I'm running my tests with commit 020d3b26c07a reverted and
> > >> everything works without problems. Is there a simple way to fix this? Or
> > >> maybe we should just revert the commit? Commit log and kernel logs from
> > >> a failing case below.
> > >
> > > Do you have log of success case?
> >
> > A log from a successful case in the end of email, using v5.15-rc1 plus
> > revert of commit 020d3b26c07abe27.
> >
> > > To me, the device loses power, that is why MHI resuming is failing.
> > > Normally the device should be properly recovered/reinitialized. Before
> > > that patch the power loss was simply not detected (or handled at
> > > higher stack level).
> >
> > Currently in ath11k we always keep the firmware running when in suspend,
> > this is a workaround due to problems between mac80211 and MHI stack.
> > IIRC the problem was something related MHI creating struct device during
> > resume or something like that.
> 
> Could you give a try with the attached patch? It should solve your
> issue without breaking modem support.
> 

It will... But we should first try to see what is causing the device to
be in MHI RESET state. We can't add a force resume case without knowing
the rootcause.

And for workaround, we can proceed resume if device is in RESET state
adding a comment on why. But let's first get the MHI debug logs.

> Regards,
> Loic



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-16  8:00 [regression] mhi: ath11k resume fails on some devices Kalle Valo
  2021-09-16 10:18 ` Loic Poulain
@ 2021-09-25 17:40 ` Thorsten Leemhuis
  1 sibling, 0 replies; 24+ messages in thread
From: Thorsten Leemhuis @ 2021-09-25 17:40 UTC (permalink / raw)
  To: regressions

On 16.09.21 10:00, Kalle Valo wrote:
> 
> I hate to be the bearer of bad news again :)
> 
> I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
> MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
> ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
> Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
> problem, I only see the problem on the NUC. I do not know what's causing
> this difference.
> 
> At the moment I'm running my tests with commit 020d3b26c07a reverted and
> everything works without problems. Is there a simple way to fix this? Or
> maybe we should just revert the commit? Commit log and kernel logs from
> a failing case below.
> 
> Kalle
> 
> commit 020d3b26c07abe274ac17f64999bbd3bf3342195

Feel free to ignore this message. I write it to make regzbot track this
issue. Regzbot is the regression tracking bot I'm working on. It's still
in the early stages and this is still one of the first few regression I
make it track to get started and things tested in the field. That also
why I'm sending the mail just to the regressions list (it will do it's
magic nevertheless). For details see:
https://linux-regtracking.leemhuis.info/post/inital-regzbot-running/
https://linux-regtracking.leemhuis.info/post/regzbot-approach/

#regzbot ^introduced 020d3b26c07abe274ac17f64999bbd3bf3342195

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-24  9:43     ` Loic Poulain
  2021-09-24 10:00       ` Manivannan Sadhasivam
@ 2021-10-07  9:48       ` Kalle Valo
  2021-10-19 12:12         ` Kalle Valo
  1 sibling, 1 reply; 24+ messages in thread
From: Kalle Valo @ 2021-10-07  9:48 UTC (permalink / raw)
  To: Loic Poulain
  Cc: Manivannan Sadhasivam, ath11k, linux-wireless, linux-arm-msm,
	regressions, mhi

(adding the new mhi list, yay)

Hi Loic,

Loic Poulain <loic.poulain@linaro.org> writes:

>> Loic Poulain <loic.poulain@linaro.org> writes:
>>
>> > On Thu, 16 Sept 2021 at 10:00, Kalle Valo <kvalo@codeaurora.org> wrote:
>>
>> >> At the moment I'm running my tests with commit 020d3b26c07a reverted and
>> >> everything works without problems. Is there a simple way to fix this? Or
>> >> maybe we should just revert the commit? Commit log and kernel logs from
>> >> a failing case below.
>> >
>> > Do you have log of success case?
>>
>> A log from a successful case in the end of email, using v5.15-rc1 plus
>> revert of commit 020d3b26c07abe27.
>>
>> > To me, the device loses power, that is why MHI resuming is failing.
>> > Normally the device should be properly recovered/reinitialized. Before
>> > that patch the power loss was simply not detected (or handled at
>> > higher stack level).
>>
>> Currently in ath11k we always keep the firmware running when in suspend,
>> this is a workaround due to problems between mac80211 and MHI stack.
>> IIRC the problem was something related MHI creating struct device during
>> resume or something like that.
>
> Could you give a try with the attached patch? It should solve your
> issue without breaking modem support.

Sorry for taking so long, but I now tested your patch on top of
v5.15-rc3 and, as expected, everything works as before with QCA6390 on
NUC x86 testbox.

Tested-by: Kalle Valo <kvalo@codeaurora.org>

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-09-24  9:57                   ` Manivannan Sadhasivam
@ 2021-10-07  9:55                     ` Kalle Valo
  2021-10-21 10:01                       ` Manivannan Sadhasivam
  0 siblings, 1 reply; 24+ messages in thread
From: Kalle Valo @ 2021-10-07  9:55 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Carl Huang, Loic Poulain, ath11k, linux-arm-msm, linux-wireless,
	regressions, mhi

(adding also mhi list)

Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:

> On Fri, Sep 24, 2021 at 12:07:41PM +0300, Kalle Valo wrote:
>> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
>> 
>> > For aid debugging, please see the state the device is in during mhi_pm_resume().
>> > You can use below diff:
>> >
>> > diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
>> > index fb99e3727155..482d55dd209e 100644
>> > --- a/drivers/bus/mhi/core/pm.c
>> > +++ b/drivers/bus/mhi/core/pm.c
>> > @@ -898,6 +898,9 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
>> >         if (MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))
>> >                 return -EIO;
>> >  
>> > +       dev_info(dev, "Device state: %s\n",
>> > +                TO_MHI_STATE_STR(mhi_get_mhi_state(mhi_cntrl)));
>> > +
>> >         if (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_M3)
>> >                 return -EINVAL;
>> 
>> This is what I get with my NUC testbox:
>> 
>> [  970.488202] ACPI: EC: event unblocked
>> [  970.492484] hpet: Lost 1587 RTC interrupts
>> [  970.492749] mhi mhi0: Device state: RESET
>
> Looks like the MHI device went into RESET state! It also looks to be a
> firmware thing. But let's nail this down before adding any workaround in
> the MHI stack.
>
> Can you also rebuild the kernel with MHI debug enabled and capture the
> logs in faliure case?

So what I should exactly do to enable debug messages?

I have this in my Kconfig:

CONFIG_MHI_BUS=m
# CONFIG_MHI_BUS_DEBUG is not set
# CONFIG_MHI_BUS_PCI_GENERIC is not set

And AFAICS CONFIG_MHI_BUS_DEBUG only enables the debugfs interface, I
doubt you meant that.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-10-07  9:48       ` Kalle Valo
@ 2021-10-19 12:12         ` Kalle Valo
  2021-10-21 10:03           ` Manivannan Sadhasivam
  0 siblings, 1 reply; 24+ messages in thread
From: Kalle Valo @ 2021-10-19 12:12 UTC (permalink / raw)
  To: Loic Poulain
  Cc: Manivannan Sadhasivam, ath11k, linux-wireless, linux-arm-msm,
	regressions, mhi

Kalle Valo <kvalo@codeaurora.org> writes:

> (adding the new mhi list, yay)
>
> Hi Loic,
>
> Loic Poulain <loic.poulain@linaro.org> writes:
>
>>> Loic Poulain <loic.poulain@linaro.org> writes:
>>>
>>> > On Thu, 16 Sept 2021 at 10:00, Kalle Valo <kvalo@codeaurora.org> wrote:
>>>
>>> >> At the moment I'm running my tests with commit 020d3b26c07a reverted and
>>> >> everything works without problems. Is there a simple way to fix this? Or
>>> >> maybe we should just revert the commit? Commit log and kernel logs from
>>> >> a failing case below.
>>> >
>>> > Do you have log of success case?
>>>
>>> A log from a successful case in the end of email, using v5.15-rc1 plus
>>> revert of commit 020d3b26c07abe27.
>>>
>>> > To me, the device loses power, that is why MHI resuming is failing.
>>> > Normally the device should be properly recovered/reinitialized. Before
>>> > that patch the power loss was simply not detected (or handled at
>>> > higher stack level).
>>>
>>> Currently in ath11k we always keep the firmware running when in suspend,
>>> this is a workaround due to problems between mac80211 and MHI stack.
>>> IIRC the problem was something related MHI creating struct device during
>>> resume or something like that.
>>
>> Could you give a try with the attached patch? It should solve your
>> issue without breaking modem support.
>
> Sorry for taking so long, but I now tested your patch on top of
> v5.15-rc3 and, as expected, everything works as before with QCA6390 on
> NUC x86 testbox.
>
> Tested-by: Kalle Valo <kvalo@codeaurora.org>

I doubt we will find enough time to fully debug this mhi issue anytime
soon. Can we commit Loic's patch so that this regression is resolved?

At the moment I'm doing all my regression testing with commit
020d3b26c07abe27 reverted. That's a risk, I would prefer to do my
testing without any hacks.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-10-07  9:55                     ` Kalle Valo
@ 2021-10-21 10:01                       ` Manivannan Sadhasivam
  0 siblings, 0 replies; 24+ messages in thread
From: Manivannan Sadhasivam @ 2021-10-21 10:01 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Carl Huang, Loic Poulain, ath11k, linux-arm-msm, linux-wireless,
	regressions, mhi

On Thu, Oct 07, 2021 at 12:55:52PM +0300, Kalle Valo wrote:
> (adding also mhi list)
> 
> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
> 
> > On Fri, Sep 24, 2021 at 12:07:41PM +0300, Kalle Valo wrote:
> >> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
> >> 
> >> > For aid debugging, please see the state the device is in during mhi_pm_resume().
> >> > You can use below diff:
> >> >
> >> > diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> >> > index fb99e3727155..482d55dd209e 100644
> >> > --- a/drivers/bus/mhi/core/pm.c
> >> > +++ b/drivers/bus/mhi/core/pm.c
> >> > @@ -898,6 +898,9 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
> >> >         if (MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))
> >> >                 return -EIO;
> >> >  
> >> > +       dev_info(dev, "Device state: %s\n",
> >> > +                TO_MHI_STATE_STR(mhi_get_mhi_state(mhi_cntrl)));
> >> > +
> >> >         if (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_M3)
> >> >                 return -EINVAL;
> >> 
> >> This is what I get with my NUC testbox:
> >> 
> >> [  970.488202] ACPI: EC: event unblocked
> >> [  970.492484] hpet: Lost 1587 RTC interrupts
> >> [  970.492749] mhi mhi0: Device state: RESET
> >
> > Looks like the MHI device went into RESET state! It also looks to be a
> > firmware thing. But let's nail this down before adding any workaround in
> > the MHI stack.
> >
> > Can you also rebuild the kernel with MHI debug enabled and capture the
> > logs in faliure case?
> 
> So what I should exactly do to enable debug messages?
> 
> I have this in my Kconfig:
> 
> CONFIG_MHI_BUS=m
> # CONFIG_MHI_BUS_DEBUG is not set
> # CONFIG_MHI_BUS_PCI_GENERIC is not set
> 
> And AFAICS CONFIG_MHI_BUS_DEBUG only enables the debugfs interface, I
> doubt you meant that.
> 

No. You should enable the dev_dbg messages in MHI core by adding the -DDEBUG
flag to the Makefile or by CONFIG_DYNAMIC_DEBUG.

Thanks,
Mani

> -- 
> https://patchwork.kernel.org/project/linux-wireless/list/
> 
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-10-19 12:12         ` Kalle Valo
@ 2021-10-21 10:03           ` Manivannan Sadhasivam
  2021-11-12 11:36             ` Thorsten Leemhuis
  2021-11-18 17:41             ` Manivannan Sadhasivam
  0 siblings, 2 replies; 24+ messages in thread
From: Manivannan Sadhasivam @ 2021-10-21 10:03 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Loic Poulain, ath11k, linux-wireless, linux-arm-msm, regressions, mhi

On Tue, Oct 19, 2021 at 03:12:01PM +0300, Kalle Valo wrote:
> Kalle Valo <kvalo@codeaurora.org> writes:
> 
> > (adding the new mhi list, yay)
> >
> > Hi Loic,
> >
> > Loic Poulain <loic.poulain@linaro.org> writes:
> >
> >>> Loic Poulain <loic.poulain@linaro.org> writes:
> >>>
> >>> > On Thu, 16 Sept 2021 at 10:00, Kalle Valo <kvalo@codeaurora.org> wrote:
> >>>
> >>> >> At the moment I'm running my tests with commit 020d3b26c07a reverted and
> >>> >> everything works without problems. Is there a simple way to fix this? Or
> >>> >> maybe we should just revert the commit? Commit log and kernel logs from
> >>> >> a failing case below.
> >>> >
> >>> > Do you have log of success case?
> >>>
> >>> A log from a successful case in the end of email, using v5.15-rc1 plus
> >>> revert of commit 020d3b26c07abe27.
> >>>
> >>> > To me, the device loses power, that is why MHI resuming is failing.
> >>> > Normally the device should be properly recovered/reinitialized. Before
> >>> > that patch the power loss was simply not detected (or handled at
> >>> > higher stack level).
> >>>
> >>> Currently in ath11k we always keep the firmware running when in suspend,
> >>> this is a workaround due to problems between mac80211 and MHI stack.
> >>> IIRC the problem was something related MHI creating struct device during
> >>> resume or something like that.
> >>
> >> Could you give a try with the attached patch? It should solve your
> >> issue without breaking modem support.
> >
> > Sorry for taking so long, but I now tested your patch on top of
> > v5.15-rc3 and, as expected, everything works as before with QCA6390 on
> > NUC x86 testbox.
> >
> > Tested-by: Kalle Valo <kvalo@codeaurora.org>
> 
> I doubt we will find enough time to fully debug this mhi issue anytime
> soon. Can we commit Loic's patch so that this regression is resolved?
> 

Sorry no :( Eventhough Loic's patch is working, I want to understand the
issue properly so that we could add a proper fix or patch the firmware
if possible.

Let's try to get the debug logs as I requested.

Thanks,
Mani

> At the moment I'm doing all my regression testing with commit
> 020d3b26c07abe27 reverted. That's a risk, I would prefer to do my
> testing without any hacks.
> 
> -- 
> https://patchwork.kernel.org/project/linux-wireless/list/
> 
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-10-21 10:03           ` Manivannan Sadhasivam
@ 2021-11-12 11:36             ` Thorsten Leemhuis
  2021-11-18 17:41             ` Manivannan Sadhasivam
  1 sibling, 0 replies; 24+ messages in thread
From: Thorsten Leemhuis @ 2021-11-12 11:36 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Kalle Valo
  Cc: Loic Poulain, ath11k, linux-wireless, linux-arm-msm, regressions, mhi

On 21.10.21 12:03, Manivannan Sadhasivam wrote:
> On Tue, Oct 19, 2021 at 03:12:01PM +0300, Kalle Valo wrote:
>> Kalle Valo <kvalo@codeaurora.org> writes:
>>> (adding the new mhi list, yay)
>>> Loic Poulain <loic.poulain@linaro.org> writes:
>>>>> Loic Poulain <loic.poulain@linaro.org> writes:
>>>>>> On Thu, 16 Sept 2021 at 10:00, Kalle Valo <kvalo@codeaurora.org> wrote:
>>>>>>> At the moment I'm running my tests with commit 020d3b26c07a reverted and
>>>>>>> everything works without problems. Is there a simple way to fix this? Or
>>>>>>> maybe we should just revert the commit? Commit log and kernel logs from
>>>>>>> a failing case below.
>>>>>>
>>>>>> Do you have log of success case?
>>>>>
>>>>> A log from a successful case in the end of email, using v5.15-rc1 plus
>>>>> revert of commit 020d3b26c07abe27.
>>>>>
>>>>>> To me, the device loses power, that is why MHI resuming is failing.
>>>>>> Normally the device should be properly recovered/reinitialized. Before
>>>>>> that patch the power loss was simply not detected (or handled at
>>>>>> higher stack level).
>>>>>
>>>>> Currently in ath11k we always keep the firmware running when in suspend,
>>>>> this is a workaround due to problems between mac80211 and MHI stack.
>>>>> IIRC the problem was something related MHI creating struct device during
>>>>> resume or something like that.
>>>>
>>>> Could you give a try with the attached patch? It should solve your
>>>> issue without breaking modem support.
>>>
>>> Sorry for taking so long, but I now tested your patch on top of
>>> v5.15-rc3 and, as expected, everything works as before with QCA6390 on
>>> NUC x86 testbox.
>>>
>>> Tested-by: Kalle Valo <kvalo@codeaurora.org>
>>
>> I doubt we will find enough time to fully debug this mhi issue anytime
>> soon. Can we commit Loic's patch so that this regression is resolved?
> 
> Sorry no :( Eventhough Loic's patch is working, I want to understand the
> issue properly so that we could add a proper fix or patch the firmware
> if possible.

Lo, this is your Linux kernel regression tracker speaking!

> Let's try to get the debug logs as I requested.

That was 3 weeks ago. Afaics nothing happened since then (except the
other mail about this on the same day in this thread). Or did I miss
anything? And if not: How can we get the ball rolling somehow again to
get this regression finally fixed?

Ciao, Thorsten (carrying his Linux kernel regression tracker hat)

P.S.: I have no personal interest in this issue and watch it using
regzbot. Hence feel free to exclude me on further messages in this
thread, as I'm only posting this mail to hopefully get a status update
and things rolling again.

#regzbot poke

>> At the moment I'm doing all my regression testing with commit
>> 020d3b26c07abe27 reverted. That's a risk, I would prefer to do my
>> testing without any hacks.
>>
>> -- 
>> https://patchwork.kernel.org/project/linux-wireless/list/
>>
>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-10-21 10:03           ` Manivannan Sadhasivam
  2021-11-12 11:36             ` Thorsten Leemhuis
@ 2021-11-18 17:41             ` Manivannan Sadhasivam
  2021-12-01  7:34               ` Thorsten Leemhuis
  1 sibling, 1 reply; 24+ messages in thread
From: Manivannan Sadhasivam @ 2021-11-18 17:41 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Loic Poulain, ath11k, linux-wireless, linux-arm-msm, regressions, mhi

On Thu, Oct 21, 2021 at 03:33:05PM +0530, Manivannan Sadhasivam wrote:
> On Tue, Oct 19, 2021 at 03:12:01PM +0300, Kalle Valo wrote:
> > Kalle Valo <kvalo@codeaurora.org> writes:
> > 
> > > (adding the new mhi list, yay)
> > >
> > > Hi Loic,
> > >
> > > Loic Poulain <loic.poulain@linaro.org> writes:
> > >
> > >>> Loic Poulain <loic.poulain@linaro.org> writes:
> > >>>
> > >>> > On Thu, 16 Sept 2021 at 10:00, Kalle Valo <kvalo@codeaurora.org> wrote:
> > >>>
> > >>> >> At the moment I'm running my tests with commit 020d3b26c07a reverted and
> > >>> >> everything works without problems. Is there a simple way to fix this? Or
> > >>> >> maybe we should just revert the commit? Commit log and kernel logs from
> > >>> >> a failing case below.
> > >>> >
> > >>> > Do you have log of success case?
> > >>>
> > >>> A log from a successful case in the end of email, using v5.15-rc1 plus
> > >>> revert of commit 020d3b26c07abe27.
> > >>>
> > >>> > To me, the device loses power, that is why MHI resuming is failing.
> > >>> > Normally the device should be properly recovered/reinitialized. Before
> > >>> > that patch the power loss was simply not detected (or handled at
> > >>> > higher stack level).
> > >>>
> > >>> Currently in ath11k we always keep the firmware running when in suspend,
> > >>> this is a workaround due to problems between mac80211 and MHI stack.
> > >>> IIRC the problem was something related MHI creating struct device during
> > >>> resume or something like that.
> > >>
> > >> Could you give a try with the attached patch? It should solve your
> > >> issue without breaking modem support.
> > >
> > > Sorry for taking so long, but I now tested your patch on top of
> > > v5.15-rc3 and, as expected, everything works as before with QCA6390 on
> > > NUC x86 testbox.
> > >
> > > Tested-by: Kalle Valo <kvalo@codeaurora.org>
> > 
> > I doubt we will find enough time to fully debug this mhi issue anytime
> > soon. Can we commit Loic's patch so that this regression is resolved?
> > 
> 
> Sorry no :( Eventhough Loic's patch is working, I want to understand the
> issue properly so that we could add a proper fix or patch the firmware
> if possible.
> 
> Let's try to get the debug logs as I requested.
> 

I'm able to reproduce the issue on my NUC. I'm still investigating on how to
properly fix this issue. Expect a patch soon.

Thanks,
Mani 

> Thanks,
> Mani
> 
> > At the moment I'm doing all my regression testing with commit
> > 020d3b26c07abe27 reverted. That's a risk, I would prefer to do my
> > testing without any hacks.
> > 
> > -- 
> > https://patchwork.kernel.org/project/linux-wireless/list/
> > 
> > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression] mhi: ath11k resume fails on some devices
  2021-11-18 17:41             ` Manivannan Sadhasivam
@ 2021-12-01  7:34               ` Thorsten Leemhuis
  0 siblings, 0 replies; 24+ messages in thread
From: Thorsten Leemhuis @ 2021-12-01  7:34 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Kalle Valo
  Cc: Loic Poulain, ath11k, linux-wireless, linux-arm-msm, regressions, mhi

Hi, this is your Linux kernel regression tracker speaking, this time
looking for a status update.

On 18.11.21 18:41, Manivannan Sadhasivam wrote:
> On Thu, Oct 21, 2021 at 03:33:05PM +0530, Manivannan Sadhasivam wrote:
>> On Tue, Oct 19, 2021 at 03:12:01PM +0300, Kalle Valo wrote:
>>> Kalle Valo <kvalo@codeaurora.org> writes:
>>>
>>>> (adding the new mhi list, yay)
>>>>
>>>> Hi Loic,
>>>>
>>>> Loic Poulain <loic.poulain@linaro.org> writes:
>>>>
>>>>>> Loic Poulain <loic.poulain@linaro.org> writes:
>>>>>>
>>>>>>> On Thu, 16 Sept 2021 at 10:00, Kalle Valo <kvalo@codeaurora.org> wrote:
>>>>>>
>>>>>>>> At the moment I'm running my tests with commit 020d3b26c07a reverted and
>>>>>>>> everything works without problems. Is there a simple way to fix this? Or
>>>>>>>> maybe we should just revert the commit? Commit log and kernel logs from
>>>>>>>> a failing case below.
>>>>>>>
>>>>>>> Do you have log of success case?
>>>>>>
>>>>>> A log from a successful case in the end of email, using v5.15-rc1 plus
>>>>>> revert of commit 020d3b26c07abe27.
>>>>>>
>>>>>>> To me, the device loses power, that is why MHI resuming is failing.
>>>>>>> Normally the device should be properly recovered/reinitialized. Before
>>>>>>> that patch the power loss was simply not detected (or handled at
>>>>>>> higher stack level).
>>>>>>
>>>>>> Currently in ath11k we always keep the firmware running when in suspend,
>>>>>> this is a workaround due to problems between mac80211 and MHI stack.
>>>>>> IIRC the problem was something related MHI creating struct device during
>>>>>> resume or something like that.
>>>>>
>>>>> Could you give a try with the attached patch? It should solve your
>>>>> issue without breaking modem support.
>>>>
>>>> Sorry for taking so long, but I now tested your patch on top of
>>>> v5.15-rc3 and, as expected, everything works as before with QCA6390 on
>>>> NUC x86 testbox.
>>>>
>>>> Tested-by: Kalle Valo <kvalo@codeaurora.org>
>>>
>>> I doubt we will find enough time to fully debug this mhi issue anytime
>>> soon. Can we commit Loic's patch so that this regression is resolved?
>>>
>>
>> Sorry no :( Eventhough Loic's patch is working, I want to understand the
>> issue properly so that we could add a proper fix or patch the firmware
>> if possible.
>>
>> Let's try to get the debug logs as I requested.
> 
> I'm able to reproduce the issue on my NUC. I'm still investigating on how to
> properly fix this issue. Expect a patch soon.

Was there some progress? This issue was reported 75 days ago and still
is not fixed. From the point of the Linux kernel regression tracker I'd
say: it should not take this long. Looking back at it I wonder if
'reverted the culprit and reapply later together with a proper fix'
would have been the better strategy. I wonder if that still would be the
best way forward if no patch is forthcoming soon.

Ciao, Thorsten

#regzbot poke

>>> At the moment I'm doing all my regression testing with commit
>>> 020d3b26c07abe27 reverted. That's a risk, I would prefer to do my
>>> testing without any hacks.
>>>
>>> -- 
>>> https://patchwork.kernel.org/project/linux-wireless/list/
>>>
>>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply. That's in everyone's interest, as
what I wrote above might be misleading to everyone reading this; any
suggestion I gave they thus might sent someone reading this down the
wrong rabbit hole, which none of us wants.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-12-01  7:34 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-16  8:00 [regression] mhi: ath11k resume fails on some devices Kalle Valo
2021-09-16 10:18 ` Loic Poulain
2021-09-16 11:12   ` Manivannan Sadhasivam
     [not found]     ` <CAMZdPi94607mZorp+Zmkw3seWXak6p9Jr05CQ5hhfgKQoG8n7Q@mail.gmail.com>
2021-09-16 16:35       ` Manivannan Sadhasivam
2021-09-16 16:42         ` Kalle Valo
2021-09-16 17:19           ` Manivannan Sadhasivam
2021-09-23  8:34             ` Carl Huang
2021-09-23  8:59               ` Manivannan Sadhasivam
2021-09-23  9:26                 ` Carl Huang
2021-09-23 10:50                   ` Loic Poulain
2021-09-24  9:07                 ` Kalle Valo
2021-09-24  9:57                   ` Manivannan Sadhasivam
2021-10-07  9:55                     ` Kalle Valo
2021-10-21 10:01                       ` Manivannan Sadhasivam
2021-09-24  8:36   ` Kalle Valo
2021-09-24  9:43     ` Loic Poulain
2021-09-24 10:00       ` Manivannan Sadhasivam
2021-10-07  9:48       ` Kalle Valo
2021-10-19 12:12         ` Kalle Valo
2021-10-21 10:03           ` Manivannan Sadhasivam
2021-11-12 11:36             ` Thorsten Leemhuis
2021-11-18 17:41             ` Manivannan Sadhasivam
2021-12-01  7:34               ` Thorsten Leemhuis
2021-09-25 17:40 ` Thorsten Leemhuis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).