* [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds.
@ 2021-03-04 14:59 Kalle Valo
2021-03-04 15:10 ` Manivannan Sadhasivam
0 siblings, 1 reply; 5+ messages in thread
From: Kalle Valo @ 2021-03-04 14:59 UTC (permalink / raw)
To: manivannan.sadhasivam, Hemant Kumar, Bhaumik Bhatt; +Cc: linux-arm-msm, ath11k
Hi MHI folks,
I upgraded my QCA6390 x86 test box to v5.12-rc1 and started seeing
kernel crashes when testing ath11k. I don't recall seeing this on v5.11
so it looks like a new problem, but I cannot be 100% sure. Netconsole
output is below. I have most of the kernel debug functionality enabled
(KASAN etc).
I can fairly easy reproduce this by looping insmod and rmmod of mhi,
wireless and ath11k modules. It does not happen every time, but I would
say I can reproduce the problem within 10 test loops or so.
Any ideas what could cause this? I have not bisected this due to lack of
time, but I can test patches etc.
Kalle
[ 385.427842] ath11k_pci 0000:06:00.0: BAR 0: assigned [mem 0xdb000000-0xdbffffff 64bit]
[ 385.513269] mhi mhi0: Requested to power ON
[ 385.519621] mhi mhi0: Power on setup success
[ 386.754217] rmmod ath11k_pci
[ 554.714680] INFO: task kworker/u17:0:94 blocked for more than 61 seconds.
[ 554.714745] Tainted: G B 5.12.0-rc1+ #412
[ 554.714785] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 554.714824] task:kworker/u17:0 state:D stack:23960 pid: 94 ppid: 2 flags:0x00004000
[ 554.714870] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
[ 554.714923] Call Trace:
[ 554.714963] __schedule+0x555/0x1250
[ 554.715005] ? io_schedule_timeout+0x140/0x140
[ 554.715045] ? trace_hardirqs_on+0x1c/0x120
[ 554.715086] schedule+0xcb/0x270
[ 554.715124] schedule_timeout+0x116/0x200
[ 554.715162] ? usleep_range+0x140/0x140
[ 554.715200] ? __next_timer_interrupt+0x160/0x160
[ 554.715239] ? __wake_up_locked_sync_key+0x20/0x20
[ 554.715279] ? trace_hardirqs_on+0x1c/0x120
[ 554.715319] mhi_fw_load_bhie+0x4bd/0x5c0 [mhi]
[ 554.715363] ? __mhi_download_rddm_in_panic+0x560/0x560 [mhi]
[ 554.715405] ? finish_wait+0xb2/0x260
[ 554.715444] ? finish_wait+0x260/0x260
[ 554.715483] ? _raw_spin_lock_irqsave+0x55/0x80
[ 554.715525] mhi_fw_load_handler+0x67a/0xd90 [mhi]
[ 554.715569] ? mhi_alloc_bhie_table+0x4c0/0x4c0 [mhi]
[ 554.715613] ? finish_wait+0x260/0x260
[ 554.715652] ? mhi_register_controller+0x11f0/0x11f0 [mhi]
[ 554.715709] mhi_pm_st_worker+0x425/0x720 [mhi]
[ 554.715758] ? mhi_pm_mission_mode_transition.isra.0+0x710/0x710 [mhi]
[ 554.715803] ? lock_acquire+0x19d/0x200
[ 554.715842] ? process_one_work+0x77f/0x1370
[ 554.715885] process_one_work+0x833/0x1370
[ 554.715925] ? pwq_dec_nr_in_flight+0x260/0x260
[ 554.715964] ? lock_acquire+0x19d/0x200
[ 554.716004] ? worker_thread+0x23d/0xc90
[ 554.716044] worker_thread+0xd5/0xc90
[ 554.716083] ? process_one_work+0x1370/0x1370
[ 554.716123] kthread+0x361/0x430
[ 554.716162] ? _raw_spin_unlock_irq+0x1f/0x30
[ 554.716199] ? __kthread_bind_mask+0x90/0x90
[ 554.716241] ret_from_fork+0x22/0x30
[ 554.716337] INFO: lockdep is turned off.
[ 554.716374] Kernel panic - not syncing: hung_task: blocked tasks
[ 554.716411] CPU: 0 PID: 53 Comm: khungtaskd Tainted: G B 5.12.0-rc1+ #412
[ 554.716451] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
[ 554.716488] Call Trace:
[ 554.716524] dump_stack+0xa5/0xe6
[ 554.716564] panic+0x211/0x459
[ 554.716601] ? __warn_printk+0xee/0xee
[ 554.716639] ? memcpy+0x39/0x60
[ 554.716680] check_hung_uninterruptible_tasks.cold+0x11/0x11
[ 554.716719] watchdog+0xef/0x120
[ 554.716757] ? check_hung_uninterruptible_tasks+0x490/0x490
[ 554.716795] kthread+0x361/0x430
[ 554.716832] ? _raw_spin_unlock_irq+0x1f/0x30
[ 554.716868] ? __kthread_bind_mask+0x90/0x90
[ 554.716906] ret_from_fork+0x22/0x30
[ 554.717950] Kernel Offset: 0x2e200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 554.717993] Rebooting in 10 seconds..
[ 564.791281] list_add double add: new=ffffffffb26a8520, prev=ffffffffb2684f88, next=ffffffffb26a8520.
[ 564.791331] ------------[ cut here ]------------
[ 564.791365] kernel BUG at lib/list_debug.c:29!
[ 564.791403] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 564.791439] CPU: 0 PID: 53 Comm: khungtaskd Tainted: G B 5.12.0-rc1+ #412
[ 564.791476] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
[ 564.791511] RIP: 0010:__list_add_valid.cold+0x26/0x3c
[ 564.791547] Code: fb ed 8e fe 4c 89 e1 48 c7 c7 a0 ec db b1 e8 aa 5b fe ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 e0 ed db b1 e8 93 5b fe ff <0f> 0b 48 89 f1 48 c7 c7 60 ed db b1 4c 89 e6 e8 7f 5b fe ff 0f 0b
[ 564.791794] RSP: 0018:ffffc90000477c98 EFLAGS: 00010082
[ 564.791831] RAX: 0000000000000058 RBX: ffffffffb26a8520 RCX: 0000000000000000
[ 564.791865] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff5200008ef86
[ 564.791899] RBP: ffffffffb26a8520 R08: 0000000000000058 R09: ffff88822dde57bb
[ 564.791951] R10: ffffed1045bbcaf7 R11: 0000000000000001 R12: ffffffffb26a8520
[ 564.792010] R13: 0000000000000048 R14: 0000000000000046 R15: ffffffffb2684f40
[ 564.792061] FS: 0000000000000000(0000) GS:ffff88822dc00000(0000) knlGS:0000000000000000
[ 564.792112] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 564.792163] CR2: 00005645180f8d58 CR3: 000000011e83a006 CR4: 00000000003706f0
[ 564.792399] Call Trace:
[ 564.792450] __register_nmi_handler+0x1f4/0x370
[ 564.792506] nmi_shootdown_cpus+0x91/0x100
[ 564.792558] native_machine_emergency_restart+0x405/0x4f0
[ 564.792609] ? nmi_shootdown_cpus+0x100/0x100
[ 564.792657] ? kmsg_dump+0x3d/0x350
[ 564.792707] ? kmsg_dump+0x250/0x350
[ 564.792756] panic+0x3bc/0x459
[ 564.792804] ? __warn_printk+0xee/0xee
[ 564.792853] ? memcpy+0x39/0x60
[ 564.792903] check_hung_uninterruptible_tasks.cold+0x11/0x11
[ 564.792953] watchdog+0xef/0x120
[ 564.793001] ? check_hung_uninterruptible_tasks+0x490/0x490
[ 564.793050] kthread+0x361/0x430
[ 564.793099] ? _raw_spin_unlock_irq+0x1f/0x30
[ 564.793149] ? __kthread_bind_mask+0x90/0x90
[ 564.793199] ret_from_fork+0x22/0x30
[ 564.793250] Modules linked in: ath11k_pci(-) ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
[ 564.793326] ---[ end trace 71395487ba9b6a13 ]---
[ 564.793376] RIP: 0010:__list_add_valid.cold+0x26/0x3c
[ 564.793427] Code: fb ed 8e fe 4c 89 e1 48 c7 c7 a0 ec db b1 e8 aa 5b fe ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 e0 ed db b1 e8 93 5b fe ff <0f> 0b 48 89 f1 48 c7 c7 60 ed db b1 4c 89 e6 e8 7f 5b fe ff 0f 0b
[ 564.793490] RSP: 0018:ffffc90000477c98 EFLAGS: 00010082
[ 564.793540] RAX: 0000000000000058 RBX: ffffffffb26a8520 RCX: 0000000000000000
[ 564.793589] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff5200008ef86
[ 564.794009] RBP: ffffffffb26a8520 R08: 0000000000000058 R09: ffff88822dde57bb
[ 564.794061] R10: ffffed1045bbcaf7 R11: 0000000000000001 R12: ffffffffb26a8520
[ 564.794111] R13: 0000000000000048 R14: 0000000000000046 R15: ffffffffb2684f40
[ 564.794159] FS: 0000000000000000(0000) GS:ffff88822dc00000(0000) knlGS:0000000000000000
[ 564.794208] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 564.794257] CR2: 00005645180f8d58 CR3: 000000011e83a006 CR4: 00000000003706f0
[ 564.794306] Kernel panic - not syncing: Fatal exception
[ 564.794356] Kernel Offset: 0x2e200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 564.794412] Rebooting in 10 seconds..
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds.
2021-03-04 14:59 [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds Kalle Valo
@ 2021-03-04 15:10 ` Manivannan Sadhasivam
2021-03-04 17:21 ` Kalle Valo
0 siblings, 1 reply; 5+ messages in thread
From: Manivannan Sadhasivam @ 2021-03-04 15:10 UTC (permalink / raw)
To: Kalle Valo; +Cc: Hemant Kumar, Bhaumik Bhatt, linux-arm-msm, ath11k
Hi Kalle,
On Thu, Mar 04, 2021 at 04:59:33PM +0200, Kalle Valo wrote:
> Hi MHI folks,
>
> I upgraded my QCA6390 x86 test box to v5.12-rc1 and started seeing
> kernel crashes when testing ath11k. I don't recall seeing this on v5.11
> so it looks like a new problem, but I cannot be 100% sure. Netconsole
> output is below. I have most of the kernel debug functionality enabled
> (KASAN etc).
>
> I can fairly easy reproduce this by looping insmod and rmmod of mhi,
> wireless and ath11k modules. It does not happen every time, but I would
> say I can reproduce the problem within 10 test loops or so.
>
> Any ideas what could cause this? I have not bisected this due to lack of
> time, but I can test patches etc.
>
Not sure if this is related, Loic sent a patch which fixes an issue with
"mhi_pm_state_worker":
https://patchwork.kernel.org/project/linux-arm-msm/patch/1614161930-8513-1-git-send-email-loic.poulain@linaro.org/
Can you please test see if it fixes your issue also?
Thanks,
Mani
> Kalle
>
> [ 385.427842] ath11k_pci 0000:06:00.0: BAR 0: assigned [mem 0xdb000000-0xdbffffff 64bit]
> [ 385.513269] mhi mhi0: Requested to power ON
> [ 385.519621] mhi mhi0: Power on setup success
> [ 386.754217] rmmod ath11k_pci
> [ 554.714680] INFO: task kworker/u17:0:94 blocked for more than 61 seconds.
> [ 554.714745] Tainted: G B 5.12.0-rc1+ #412
> [ 554.714785] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 554.714824] task:kworker/u17:0 state:D stack:23960 pid: 94 ppid: 2 flags:0x00004000
> [ 554.714870] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
> [ 554.714923] Call Trace:
> [ 554.714963] __schedule+0x555/0x1250
> [ 554.715005] ? io_schedule_timeout+0x140/0x140
> [ 554.715045] ? trace_hardirqs_on+0x1c/0x120
> [ 554.715086] schedule+0xcb/0x270
> [ 554.715124] schedule_timeout+0x116/0x200
> [ 554.715162] ? usleep_range+0x140/0x140
> [ 554.715200] ? __next_timer_interrupt+0x160/0x160
> [ 554.715239] ? __wake_up_locked_sync_key+0x20/0x20
> [ 554.715279] ? trace_hardirqs_on+0x1c/0x120
> [ 554.715319] mhi_fw_load_bhie+0x4bd/0x5c0 [mhi]
> [ 554.715363] ? __mhi_download_rddm_in_panic+0x560/0x560 [mhi]
> [ 554.715405] ? finish_wait+0xb2/0x260
> [ 554.715444] ? finish_wait+0x260/0x260
> [ 554.715483] ? _raw_spin_lock_irqsave+0x55/0x80
> [ 554.715525] mhi_fw_load_handler+0x67a/0xd90 [mhi]
> [ 554.715569] ? mhi_alloc_bhie_table+0x4c0/0x4c0 [mhi]
> [ 554.715613] ? finish_wait+0x260/0x260
> [ 554.715652] ? mhi_register_controller+0x11f0/0x11f0 [mhi]
> [ 554.715709] mhi_pm_st_worker+0x425/0x720 [mhi]
> [ 554.715758] ? mhi_pm_mission_mode_transition.isra.0+0x710/0x710 [mhi]
> [ 554.715803] ? lock_acquire+0x19d/0x200
> [ 554.715842] ? process_one_work+0x77f/0x1370
> [ 554.715885] process_one_work+0x833/0x1370
> [ 554.715925] ? pwq_dec_nr_in_flight+0x260/0x260
> [ 554.715964] ? lock_acquire+0x19d/0x200
> [ 554.716004] ? worker_thread+0x23d/0xc90
> [ 554.716044] worker_thread+0xd5/0xc90
> [ 554.716083] ? process_one_work+0x1370/0x1370
> [ 554.716123] kthread+0x361/0x430
> [ 554.716162] ? _raw_spin_unlock_irq+0x1f/0x30
> [ 554.716199] ? __kthread_bind_mask+0x90/0x90
> [ 554.716241] ret_from_fork+0x22/0x30
> [ 554.716337] INFO: lockdep is turned off.
> [ 554.716374] Kernel panic - not syncing: hung_task: blocked tasks
> [ 554.716411] CPU: 0 PID: 53 Comm: khungtaskd Tainted: G B 5.12.0-rc1+ #412
> [ 554.716451] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> [ 554.716488] Call Trace:
> [ 554.716524] dump_stack+0xa5/0xe6
> [ 554.716564] panic+0x211/0x459
> [ 554.716601] ? __warn_printk+0xee/0xee
> [ 554.716639] ? memcpy+0x39/0x60
> [ 554.716680] check_hung_uninterruptible_tasks.cold+0x11/0x11
> [ 554.716719] watchdog+0xef/0x120
> [ 554.716757] ? check_hung_uninterruptible_tasks+0x490/0x490
> [ 554.716795] kthread+0x361/0x430
> [ 554.716832] ? _raw_spin_unlock_irq+0x1f/0x30
> [ 554.716868] ? __kthread_bind_mask+0x90/0x90
> [ 554.716906] ret_from_fork+0x22/0x30
> [ 554.717950] Kernel Offset: 0x2e200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 554.717993] Rebooting in 10 seconds..
> [ 564.791281] list_add double add: new=ffffffffb26a8520, prev=ffffffffb2684f88, next=ffffffffb26a8520.
> [ 564.791331] ------------[ cut here ]------------
> [ 564.791365] kernel BUG at lib/list_debug.c:29!
> [ 564.791403] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [ 564.791439] CPU: 0 PID: 53 Comm: khungtaskd Tainted: G B 5.12.0-rc1+ #412
> [ 564.791476] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> [ 564.791511] RIP: 0010:__list_add_valid.cold+0x26/0x3c
> [ 564.791547] Code: fb ed 8e fe 4c 89 e1 48 c7 c7 a0 ec db b1 e8 aa 5b fe ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 e0 ed db b1 e8 93 5b fe ff <0f> 0b 48 89 f1 48 c7 c7 60 ed db b1 4c 89 e6 e8 7f 5b fe ff 0f 0b
> [ 564.791794] RSP: 0018:ffffc90000477c98 EFLAGS: 00010082
> [ 564.791831] RAX: 0000000000000058 RBX: ffffffffb26a8520 RCX: 0000000000000000
> [ 564.791865] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff5200008ef86
> [ 564.791899] RBP: ffffffffb26a8520 R08: 0000000000000058 R09: ffff88822dde57bb
> [ 564.791951] R10: ffffed1045bbcaf7 R11: 0000000000000001 R12: ffffffffb26a8520
> [ 564.792010] R13: 0000000000000048 R14: 0000000000000046 R15: ffffffffb2684f40
> [ 564.792061] FS: 0000000000000000(0000) GS:ffff88822dc00000(0000) knlGS:0000000000000000
> [ 564.792112] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 564.792163] CR2: 00005645180f8d58 CR3: 000000011e83a006 CR4: 00000000003706f0
> [ 564.792399] Call Trace:
> [ 564.792450] __register_nmi_handler+0x1f4/0x370
> [ 564.792506] nmi_shootdown_cpus+0x91/0x100
> [ 564.792558] native_machine_emergency_restart+0x405/0x4f0
> [ 564.792609] ? nmi_shootdown_cpus+0x100/0x100
> [ 564.792657] ? kmsg_dump+0x3d/0x350
> [ 564.792707] ? kmsg_dump+0x250/0x350
> [ 564.792756] panic+0x3bc/0x459
> [ 564.792804] ? __warn_printk+0xee/0xee
> [ 564.792853] ? memcpy+0x39/0x60
> [ 564.792903] check_hung_uninterruptible_tasks.cold+0x11/0x11
> [ 564.792953] watchdog+0xef/0x120
> [ 564.793001] ? check_hung_uninterruptible_tasks+0x490/0x490
> [ 564.793050] kthread+0x361/0x430
> [ 564.793099] ? _raw_spin_unlock_irq+0x1f/0x30
> [ 564.793149] ? __kthread_bind_mask+0x90/0x90
> [ 564.793199] ret_from_fork+0x22/0x30
> [ 564.793250] Modules linked in: ath11k_pci(-) ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> [ 564.793326] ---[ end trace 71395487ba9b6a13 ]---
> [ 564.793376] RIP: 0010:__list_add_valid.cold+0x26/0x3c
> [ 564.793427] Code: fb ed 8e fe 4c 89 e1 48 c7 c7 a0 ec db b1 e8 aa 5b fe ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 e0 ed db b1 e8 93 5b fe ff <0f> 0b 48 89 f1 48 c7 c7 60 ed db b1 4c 89 e6 e8 7f 5b fe ff 0f 0b
> [ 564.793490] RSP: 0018:ffffc90000477c98 EFLAGS: 00010082
> [ 564.793540] RAX: 0000000000000058 RBX: ffffffffb26a8520 RCX: 0000000000000000
> [ 564.793589] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff5200008ef86
> [ 564.794009] RBP: ffffffffb26a8520 R08: 0000000000000058 R09: ffff88822dde57bb
> [ 564.794061] R10: ffffed1045bbcaf7 R11: 0000000000000001 R12: ffffffffb26a8520
> [ 564.794111] R13: 0000000000000048 R14: 0000000000000046 R15: ffffffffb2684f40
> [ 564.794159] FS: 0000000000000000(0000) GS:ffff88822dc00000(0000) knlGS:0000000000000000
> [ 564.794208] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 564.794257] CR2: 00005645180f8d58 CR3: 000000011e83a006 CR4: 00000000003706f0
> [ 564.794306] Kernel panic - not syncing: Fatal exception
> [ 564.794356] Kernel Offset: 0x2e200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 564.794412] Rebooting in 10 seconds..
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds.
2021-03-04 15:10 ` Manivannan Sadhasivam
@ 2021-03-04 17:21 ` Kalle Valo
2021-03-09 16:44 ` Kalle Valo
0 siblings, 1 reply; 5+ messages in thread
From: Kalle Valo @ 2021-03-04 17:21 UTC (permalink / raw)
To: Manivannan Sadhasivam; +Cc: Hemant Kumar, Bhaumik Bhatt, linux-arm-msm, ath11k
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
> Hi Kalle,
>
> On Thu, Mar 04, 2021 at 04:59:33PM +0200, Kalle Valo wrote:
>> Hi MHI folks,
>>
>> I upgraded my QCA6390 x86 test box to v5.12-rc1 and started seeing
>> kernel crashes when testing ath11k. I don't recall seeing this on v5.11
>> so it looks like a new problem, but I cannot be 100% sure. Netconsole
>> output is below. I have most of the kernel debug functionality enabled
>> (KASAN etc).
>>
>> I can fairly easy reproduce this by looping insmod and rmmod of mhi,
>> wireless and ath11k modules. It does not happen every time, but I would
>> say I can reproduce the problem within 10 test loops or so.
>>
>> Any ideas what could cause this? I have not bisected this due to lack of
>> time, but I can test patches etc.
>>
>
> Not sure if this is related, Loic sent a patch which fixes an issue with
> "mhi_pm_state_worker":
>
> https://patchwork.kernel.org/project/linux-arm-msm/patch/1614161930-8513-1-git-send-email-loic.poulain@linaro.org/
>
> Can you please test see if it fixes your issue also?
Thanks for the link, but unfortunately not :( I was able to reproduce
the issue just after 3 insmod/rmmod loops.
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds.
2021-03-04 17:21 ` Kalle Valo
@ 2021-03-09 16:44 ` Kalle Valo
2021-03-09 18:07 ` Bhaumik Bhatt
0 siblings, 1 reply; 5+ messages in thread
From: Kalle Valo @ 2021-03-09 16:44 UTC (permalink / raw)
To: Manivannan Sadhasivam; +Cc: Hemant Kumar, Bhaumik Bhatt, linux-arm-msm, ath11k
Kalle Valo <kvalo@codeaurora.org> writes:
> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
>
>> Hi Kalle,
>>
>> On Thu, Mar 04, 2021 at 04:59:33PM +0200, Kalle Valo wrote:
>>> Hi MHI folks,
>>>
>>> I upgraded my QCA6390 x86 test box to v5.12-rc1 and started seeing
>>> kernel crashes when testing ath11k. I don't recall seeing this on v5.11
>>> so it looks like a new problem, but I cannot be 100% sure. Netconsole
>>> output is below. I have most of the kernel debug functionality enabled
>>> (KASAN etc).
>>>
>>> I can fairly easy reproduce this by looping insmod and rmmod of mhi,
>>> wireless and ath11k modules. It does not happen every time, but I would
>>> say I can reproduce the problem within 10 test loops or so.
>>>
>>> Any ideas what could cause this? I have not bisected this due to lack of
>>> time, but I can test patches etc.
>>>
>>
>> Not sure if this is related, Loic sent a patch which fixes an issue with
>> "mhi_pm_state_worker":
>>
>> https://patchwork.kernel.org/project/linux-arm-msm/patch/1614161930-8513-1-git-send-email-loic.poulain@linaro.org/
>>
>> Can you please test see if it fixes your issue also?
>
> Thanks for the link, but unfortunately not :( I was able to reproduce
> the issue just after 3 insmod/rmmod loops.
I investigated this a bit more, I was actually able to reproduce this in
v5.11 as well. So this is not a new regression. The reason why I started
seeing this until now is that I enable more debug options in the kernel,
the diff below. Without those changes I don't see the problem.
I also found a workround, if I add sleep(1) after insmod ath11k_pci in
my test script I see 200 loops without crashes. But when I removed the
sleep the test script crashed only after 19 loops. So there definitely
is a race condition somewhere, just don't know where. I don't have time
to investigate this more, so I'll just use the workaround for the time
being.
--- ../configs/nuc-debug-5.11 2021-02-21 08:55:53.836061988 +0200
+++ .config 2021-03-09 16:22:53.598684524 +0200
@@ -12,6 +12,7 @@
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
+CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y
@@ -280,6 +281,7 @@
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
+CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
@@ -748,8 +750,7 @@
# CONFIG_MODULE_SIG is not set
# CONFIG_MODULE_COMPRESS is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
-# CONFIG_UNUSED_SYMBOLS is not set
-# CONFIG_TRIM_UNUSED_KSYMS is not set
+CONFIG_UNUSED_SYMBOLS=y
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLK_SCSI_REQUEST=y
@@ -1164,7 +1165,6 @@
# CONFIG_NET_NCSI is not set
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
-CONFIG_SOCK_RX_QUEUE_MAPPING=y
CONFIG_XPS=y
# CONFIG_CGROUP_NET_PRIO is not set
# CONFIG_CGROUP_NET_CLASSID is not set
@@ -1685,7 +1685,6 @@
#
# Distributed Switch Architecture drivers
#
-# CONFIG_NET_DSA_MV88E6XXX_PTP is not set
# end of Distributed Switch Architecture drivers
CONFIG_ETHERNET=y
@@ -1700,6 +1699,7 @@
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
# CONFIG_NET_VENDOR_ATHEROS is not set
+# CONFIG_NET_VENDOR_AURORA is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
@@ -1914,7 +1914,6 @@
# CONFIG_MT7615E is not set
# CONFIG_MT7663U is not set
# CONFIG_MT7915E is not set
-# CONFIG_MT7921E is not set
# CONFIG_WLAN_VENDOR_MICROCHIP is not set
CONFIG_WLAN_VENDOR_RALINK=y
# CONFIG_RT2X00 is not set
@@ -4500,7 +4499,7 @@
CONFIG_DEBUG_INFO_COMPRESSED=y
# CONFIG_DEBUG_INFO_SPLIT is not set
# CONFIG_DEBUG_INFO_DWARF4 is not set
-# CONFIG_GDB_SCRIPTS is not set
+CONFIG_GDB_SCRIPTS=y
CONFIG_FRAME_WARN=2048
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_READABLE_ASM is not set
@@ -4540,13 +4539,13 @@
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_PAGE_OWNER=y
CONFIG_PAGE_POISONING=y
-# CONFIG_DEBUG_PAGE_REF is not set
+CONFIG_DEBUG_PAGE_REF=y
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
CONFIG_DEBUG_WX=y
CONFIG_GENERIC_PTDUMP=y
CONFIG_PTDUMP_CORE=y
-# CONFIG_PTDUMP_DEBUGFS is not set
+CONFIG_PTDUMP_DEBUGFS=y
CONFIG_DEBUG_OBJECTS=y
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set
CONFIG_DEBUG_OBJECTS_FREE=y
@@ -4568,8 +4567,8 @@
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_VMACACHE=y
-# CONFIG_DEBUG_VM_RB is not set
-# CONFIG_DEBUG_VM_PGFLAGS is not set
+CONFIG_DEBUG_VM_RB=y
+CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_DEBUG_VM_PGTABLE=y
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
CONFIG_DEBUG_VIRTUAL=y
@@ -4581,7 +4580,13 @@
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
-# CONFIG_KASAN is not set
+CONFIG_KASAN=y
+CONFIG_KASAN_GENERIC=y
+# CONFIG_KASAN_OUTLINE is not set
+CONFIG_KASAN_INLINE=y
+CONFIG_KASAN_STACK=1
+CONFIG_KASAN_VMALLOC=y
+# CONFIG_TEST_KASAN_MODULE is not set
# end of Memory Debugging
CONFIG_DEBUG_SHIRQ=y
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds.
2021-03-09 16:44 ` Kalle Valo
@ 2021-03-09 18:07 ` Bhaumik Bhatt
0 siblings, 0 replies; 5+ messages in thread
From: Bhaumik Bhatt @ 2021-03-09 18:07 UTC (permalink / raw)
To: Kalle Valo
Cc: Manivannan Sadhasivam, Hemant Kumar, linux-arm-msm, ath11k,
kvalo=codeaurora.org
On 2021-03-09 08:44 AM, Kalle Valo wrote:
> Kalle Valo <kvalo@codeaurora.org> writes:
>
>> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
>>
>>> Hi Kalle,
>>>
>>> On Thu, Mar 04, 2021 at 04:59:33PM +0200, Kalle Valo wrote:
>>>> Hi MHI folks,
>>>>
>>>> I upgraded my QCA6390 x86 test box to v5.12-rc1 and started seeing
>>>> kernel crashes when testing ath11k. I don't recall seeing this on
>>>> v5.11
>>>> so it looks like a new problem, but I cannot be 100% sure.
>>>> Netconsole
>>>> output is below. I have most of the kernel debug functionality
>>>> enabled
>>>> (KASAN etc).
>>>>
>>>> I can fairly easy reproduce this by looping insmod and rmmod of mhi,
>>>> wireless and ath11k modules. It does not happen every time, but I
>>>> would
>>>> say I can reproduce the problem within 10 test loops or so.
>>>>
>>>> Any ideas what could cause this? I have not bisected this due to
>>>> lack of
>>>> time, but I can test patches etc.
>>>>
>>>
>>> Not sure if this is related, Loic sent a patch which fixes an issue
>>> with
>>> "mhi_pm_state_worker":
>>>
>>> https://patchwork.kernel.org/project/linux-arm-msm/patch/1614161930-8513-1-git-send-email-loic.poulain@linaro.org/
>>>
>>> Can you please test see if it fixes your issue also?
>>
>> Thanks for the link, but unfortunately not :( I was able to reproduce
>> the issue just after 3 insmod/rmmod loops.
>
> I investigated this a bit more, I was actually able to reproduce this
> in
> v5.11 as well. So this is not a new regression. The reason why I
> started
> seeing this until now is that I enable more debug options in the
> kernel,
> the diff below. Without those changes I don't see the problem.
>
> I also found a workround, if I add sleep(1) after insmod ath11k_pci in
> my test script I see 200 loops without crashes. But when I removed the
> sleep the test script crashed only after 19 loops. So there definitely
> is a race condition somewhere, just don't know where. I don't have time
> to investigate this more, so I'll just use the workaround for the time
> being.
>
> --- ../configs/nuc-debug-5.11 2021-02-21 08:55:53.836061988 +0200
> +++ .config 2021-03-09 16:22:53.598684524 +0200
> @@ -12,6 +12,7 @@
> CONFIG_CC_CAN_LINK_STATIC=y
> CONFIG_CC_HAS_ASM_GOTO=y
> CONFIG_CC_HAS_ASM_INLINE=y
> +CONFIG_CONSTRUCTORS=y
> CONFIG_IRQ_WORK=y
> CONFIG_BUILDTIME_TABLE_SORT=y
> CONFIG_THREAD_INFO_IN_TASK=y
> @@ -280,6 +281,7 @@
> CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
> CONFIG_ZONE_DMA32=y
> CONFIG_AUDIT_ARCH=y
> +CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
> CONFIG_HAVE_INTEL_TXT=y
> CONFIG_X86_64_SMP=y
> CONFIG_ARCH_SUPPORTS_UPROBES=y
> @@ -748,8 +750,7 @@
> # CONFIG_MODULE_SIG is not set
> # CONFIG_MODULE_COMPRESS is not set
> # CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
> -# CONFIG_UNUSED_SYMBOLS is not set
> -# CONFIG_TRIM_UNUSED_KSYMS is not set
> +CONFIG_UNUSED_SYMBOLS=y
> CONFIG_MODULES_TREE_LOOKUP=y
> CONFIG_BLOCK=y
> CONFIG_BLK_SCSI_REQUEST=y
> @@ -1164,7 +1165,6 @@
> # CONFIG_NET_NCSI is not set
> CONFIG_RPS=y
> CONFIG_RFS_ACCEL=y
> -CONFIG_SOCK_RX_QUEUE_MAPPING=y
> CONFIG_XPS=y
> # CONFIG_CGROUP_NET_PRIO is not set
> # CONFIG_CGROUP_NET_CLASSID is not set
> @@ -1685,7 +1685,6 @@
> #
> # Distributed Switch Architecture drivers
> #
> -# CONFIG_NET_DSA_MV88E6XXX_PTP is not set
> # end of Distributed Switch Architecture drivers
>
> CONFIG_ETHERNET=y
> @@ -1700,6 +1699,7 @@
> # CONFIG_NET_VENDOR_AQUANTIA is not set
> # CONFIG_NET_VENDOR_ARC is not set
> # CONFIG_NET_VENDOR_ATHEROS is not set
> +# CONFIG_NET_VENDOR_AURORA is not set
> # CONFIG_NET_VENDOR_BROADCOM is not set
> CONFIG_NET_VENDOR_BROCADE=y
> # CONFIG_BNA is not set
> @@ -1914,7 +1914,6 @@
> # CONFIG_MT7615E is not set
> # CONFIG_MT7663U is not set
> # CONFIG_MT7915E is not set
> -# CONFIG_MT7921E is not set
> # CONFIG_WLAN_VENDOR_MICROCHIP is not set
> CONFIG_WLAN_VENDOR_RALINK=y
> # CONFIG_RT2X00 is not set
> @@ -4500,7 +4499,7 @@
> CONFIG_DEBUG_INFO_COMPRESSED=y
> # CONFIG_DEBUG_INFO_SPLIT is not set
> # CONFIG_DEBUG_INFO_DWARF4 is not set
> -# CONFIG_GDB_SCRIPTS is not set
> +CONFIG_GDB_SCRIPTS=y
> CONFIG_FRAME_WARN=2048
> # CONFIG_STRIP_ASM_SYMS is not set
> # CONFIG_READABLE_ASM is not set
> @@ -4540,13 +4539,13 @@
> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
> CONFIG_PAGE_OWNER=y
> CONFIG_PAGE_POISONING=y
> -# CONFIG_DEBUG_PAGE_REF is not set
> +CONFIG_DEBUG_PAGE_REF=y
> # CONFIG_DEBUG_RODATA_TEST is not set
> CONFIG_ARCH_HAS_DEBUG_WX=y
> CONFIG_DEBUG_WX=y
> CONFIG_GENERIC_PTDUMP=y
> CONFIG_PTDUMP_CORE=y
> -# CONFIG_PTDUMP_DEBUGFS is not set
> +CONFIG_PTDUMP_DEBUGFS=y
> CONFIG_DEBUG_OBJECTS=y
> # CONFIG_DEBUG_OBJECTS_SELFTEST is not set
> CONFIG_DEBUG_OBJECTS_FREE=y
> @@ -4568,8 +4567,8 @@
> CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
> CONFIG_DEBUG_VM=y
> CONFIG_DEBUG_VM_VMACACHE=y
> -# CONFIG_DEBUG_VM_RB is not set
> -# CONFIG_DEBUG_VM_PGFLAGS is not set
> +CONFIG_DEBUG_VM_RB=y
> +CONFIG_DEBUG_VM_PGFLAGS=y
> CONFIG_DEBUG_VM_PGTABLE=y
> CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
> CONFIG_DEBUG_VIRTUAL=y
> @@ -4581,7 +4580,13 @@
> CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
> CONFIG_CC_HAS_KASAN_GENERIC=y
> CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
> -# CONFIG_KASAN is not set
> +CONFIG_KASAN=y
> +CONFIG_KASAN_GENERIC=y
> +# CONFIG_KASAN_OUTLINE is not set
> +CONFIG_KASAN_INLINE=y
> +CONFIG_KASAN_STACK=1
> +CONFIG_KASAN_VMALLOC=y
> +# CONFIG_TEST_KASAN_MODULE is not set
> # end of Memory Debugging
>
> CONFIG_DEBUG_SHIRQ=y
Thanks Kalle. We will have to try to reproduce and investigate this on a
different controller as well to dive in and fix any race if seen.
Thanks,
Bhaumik
---
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-03-09 18:08 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-04 14:59 [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds Kalle Valo
2021-03-04 15:10 ` Manivannan Sadhasivam
2021-03-04 17:21 ` Kalle Valo
2021-03-09 16:44 ` Kalle Valo
2021-03-09 18:07 ` Bhaumik Bhatt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).