All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kalle Valo <kvalo@codeaurora.org>
To: Wen Gong <wgong@codeaurora.org>
Cc: linux-wireless@vger.kernel.org, ath10k@lists.infradead.org,
	wgong@codeaurora.org
Subject: Re: [PATCH v3] ath10k: add flag to protect napi operation to avoid dead loop hang
Date: Thu, 17 Dec 2020 06:52:46 +0000 (UTC)	[thread overview]
Message-ID: <20201217065246.24296C43462@smtp.codeaurora.org> (raw)
In-Reply-To: <1598617348-2325-1-git-send-email-wgong@codeaurora.org>

Wen Gong <wgong@codeaurora.org> wrote:

> It happened "Kernel panic - not syncing: hung_task: blocked tasks" when
> test simulate crash and ifconfig down/rmmod meanwhile.
> 
> Test steps:
> 
> 1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC.
> echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down
> echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio
> echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci
> 
> 2. dmesg:
> [ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash
> [ 5622.655995] ieee80211 phy0: Hardware restart was requested
> [ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds.
> [ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds.
> [ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks
> [ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G        W         4.19.86 #137
> [ 5776.359846] Hardware name: MediaTek krane sku176 board (DT)
> [ 5776.359855] Call trace:
> [ 5776.359868]  dump_backtrace+0x0/0x170
> [ 5776.359881]  show_stack+0x20/0x2c
> [ 5776.359896]  dump_stack+0xd4/0x10c
> [ 5776.359916]  panic+0x12c/0x29c
> [ 5776.359937]  hung_task_panic+0x0/0x50
> [ 5776.359953]  kthread+0x120/0x130
> [ 5776.359965]  ret_from_fork+0x10/0x18
> [ 5776.359986] SMP: stopping secondary CPUs
> [ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000
> [ 5776.360026] CPU features: 0x0,2188200c
> [ 5776.360035] Memory Limit: none
> 
> command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked
> callstack of ifconfig:
> [<0>] __switch_to+0x120/0x13c
> [<0>] msleep+0x28/0x38
> [<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio]
> [<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
> [<0>] ath10k_halt+0x120/0x178 [ath10k_core]
> [<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
> [<0>] drv_stop+0xe0/0x1e4 [mac80211]
> [<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
> [<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
> [<0>] ieee80211_stop+0x20/0x30 [mac80211]
> [<0>] __dev_close_many+0xb8/0x11c
> [<0>] __dev_change_flags+0xe0/0x1d0
> [<0>] dev_change_flags+0x30/0x6c
> [<0>] devinet_ioctl+0x370/0x564
> [<0>] inet_ioctl+0xdc/0x304
> [<0>] sock_do_ioctl+0x50/0x288
> [<0>] compat_sock_ioctl+0x1b4/0x1aac
> [<0>] __se_compat_sys_ioctl+0x100/0x26fc
> [<0>] __arm64_compat_sys_ioctl+0x20/0x2c
> [<0>] el0_svc_common+0xa4/0x154
> [<0>] el0_svc_compat_handler+0x2c/0x38
> [<0>] el0_svc_compat+0x8/0x18
> [<0>] 0xffffffffffffffff
> 
> callstack of rmmod:
> [<0>] __switch_to+0x120/0x13c
> [<0>] msleep+0x28/0x38
> [<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio]
> [<0>] ath10k_core_stop+0x50/0x78 [ath10k_core]
> [<0>] ath10k_halt+0x120/0x178 [ath10k_core]
> [<0>] ath10k_stop+0x4c/0x8c [ath10k_core]
> [<0>] drv_stop+0xe0/0x1e4 [mac80211]
> [<0>] ieee80211_stop_device+0x48/0x54 [mac80211]
> [<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211]
> [<0>] ieee80211_stop+0x20/0x30 [mac80211]
> [<0>] __dev_close_many+0xb8/0x11c
> [<0>] dev_close_many+0x70/0x100
> [<0>] dev_close+0x4c/0x80
> [<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211]
> [<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211]
> [<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211]
> [<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
> [<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core]
> [<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio]
> [<0>] sdio_bus_remove+0x48/0x108
> [<0>] device_release_driver_internal+0x138/0x1ec
> [<0>] driver_detach+0x6c/0xa8
> [<0>] bus_remove_driver+0x78/0xa8
> [<0>] driver_unregister+0x30/0x50
> [<0>] sdio_unregister_driver+0x28/0x34
> [<0>] cleanup_module+0x14/0x6bc [ath10k_sdio]
> [<0>] __arm64_sys_delete_module+0x1e0/0x22c
> [<0>] el0_svc_common+0xa4/0x154
> [<0>] el0_svc_compat_handler+0x2c/0x38
> [<0>] el0_svc_compat+0x8/0x18
> [<0>] 0xffffffffffffffff
> 
> SNOC:
> [  647.156863] Call trace:
> [  647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c
> [  647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798
> [  647.170062] [<ffffff800899dad8>] schedule+0x74/0x94
> [  647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c
> [  647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40
> [  647.185780] [<ffffff80082a494>] msleep+0x28/0x38
> [  647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc]
> [  647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core]
> [  647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core]
> [  647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core]
> [  647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211]
> [  647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211]
> [  647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211]
> [  647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211]
> [  647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc
> [  647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100
> [  647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80
> [  647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211]
> [  647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211]
> [  647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211]
> [  647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core]
> [  647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core]
> [  647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc]
> [  647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50
> [  647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8
> [  647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8
> [  647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8
> [  647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50
> [  647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28
> [  647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc]
> [  647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c
> 
> PCIe:
> [  615.392770] rmmod           D    0  3523   3458 0x00000080
> [  615.392777] Call Trace:
> [  615.392784]  __schedule+0x617/0x7d3
> [  615.392791]  ? __mod_timer+0x263/0x35c
> [  615.392797]  schedule+0x62/0x72
> [  615.392803]  schedule_timeout+0x8d/0xf3
> [  615.392809]  ? run_local_timers+0x6b/0x6b
> [  615.392814]  msleep+0x1b/0x22
> [  615.392824]  ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci]
> [  615.392844]  ath10k_core_stop+0x44/0x67 [ath10k_core]
> [  615.392859]  ath10k_halt+0x102/0x153 [ath10k_core]
> [  615.392873]  ath10k_stop+0x38/0x75 [ath10k_core]
> [  615.392893]  drv_stop+0x9a/0x13c [mac80211]
> [  615.392915]  ieee80211_do_stop+0x772/0x7cd [mac80211]
> [  615.392937]  ieee80211_stop+0x1a/0x1e [mac80211]
> [  615.392945]  __dev_close_many+0x9e/0xf0
> [  615.392952]  dev_close_many+0x62/0xe8
> [  615.392958]  dev_close+0x54/0x7d
> [  615.392975]  cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211]
> [  615.393021]  ieee80211_remove_interfaces+0x52/0x1aa [mac80211]
> [  615.393049]  ieee80211_unregister_hw+0x54/0x136 [mac80211]
> [  615.393068]  ath10k_mac_unregister+0x19/0x4a [ath10k_core]
> [  615.393091]  ath10k_core_unregister+0x39/0x7e [ath10k_core]
> [  615.393104]  ath10k_pci_remove+0x3d/0x7f [ath10k_pci]
> [  615.393117]  pci_device_remove+0x41/0xa6
> [  615.393129]  device_release_driver_internal+0x123/0x1ec
> [  615.393140]  driver_detach+0x60/0x90
> [  615.393152]  bus_remove_driver+0x72/0x9f
> [  615.393164]  pci_unregister_driver+0x1e/0x87
> [  615.393177]  SyS_delete_module+0x1d7/0x277
> [  615.393188]  do_syscall_64+0x6b/0xf7
> [  615.393199]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
> 
> The test command run simulate_fw_crash firstly and it call into
> ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable
> is called and bit NAPI_STATE_SCHED is set. After that, function
> ath10k_sdio_hif_stop is called again from ath10k_stop by command
> "ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked.
> 
> It is blocked by napi_synchronize, napi_disable will set bit with
> NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop
> becuase bit NAPI_STATE_SCHED is set by napi_disable.
> 
> function of napi_synchronize
> static inline void napi_synchronize(const struct napi_struct *n)
> {
> 	if (IS_ENABLED(CONFIG_SMP))
> 		while (test_bit(NAPI_STATE_SCHED, &n->state))
> 			msleep(1);
> 	else
> 		barrier();
> }
> 
> function of napi_disable
> void napi_disable(struct napi_struct *n)
> {
> 	might_sleep();
> 	set_bit(NAPI_STATE_DISABLE, &n->state);
> 
> 	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
> 		msleep(1);
> 	while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
> 		msleep(1);
> 
> 	hrtimer_cancel(&n->timer);
> 
> 	clear_bit(NAPI_STATE_DISABLE, &n->state);
> }
> 
> Add flag for it avoid the hang and crash.
> 
> Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
> Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
> Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2
> 
> Signed-off-by: Wen Gong <wgong@codeaurora.org>
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

Patch applied to ath-next branch of ath.git, thanks.

e2f8b74e58cb ath10k: prevent deinitializing NAPI twice

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/1598617348-2325-1-git-send-email-wgong@codeaurora.org/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

      parent reply	other threads:[~2020-12-17  6:53 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-28 12:22 [PATCH v3] ath10k: add flag to protect napi operation to avoid dead loop hang Wen Gong
2020-08-28 12:22 ` Wen Gong
2020-08-28 14:09 ` Krishna Chaitanya
2020-08-28 14:09   ` Krishna Chaitanya
2020-12-09  9:11   ` Kalle Valo
2020-12-09  9:11     ` Kalle Valo
2020-09-07 16:23 ` Kalle Valo
2020-09-07 16:23 ` Kalle Valo
     [not found] ` <87d02x1rqb.fsf@codeaurora.org>
2020-09-08  3:45   ` Wen Gong
2020-12-09  9:24     ` Kalle Valo
2020-12-09  9:24       ` Kalle Valo
2020-12-09 15:00       ` Ben Greear
2020-12-09 15:00         ` Ben Greear
2020-12-10  2:29         ` Wen Gong
2020-12-10  2:29           ` Wen Gong
2020-12-15  8:05           ` Kalle Valo
2020-12-15  8:05             ` Kalle Valo
2020-12-15  7:56         ` Kalle Valo
2020-12-15  7:56           ` Kalle Valo
2020-12-10  2:27       ` Wen Gong
2020-12-10  2:27         ` Wen Gong
2020-09-08  3:45   ` Wen Gong
2020-12-17  6:52 ` Kalle Valo
2020-12-17  6:52 ` Kalle Valo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201217065246.24296C43462@smtp.codeaurora.org \
    --to=kvalo@codeaurora.org \
    --cc=ath10k@lists.infradead.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=wgong@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.