All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kalle Valo <kvalo@codeaurora.org>
To: Tsuchiya Yuto <kitakar@gmail.com>
Cc: Amitkumar Karwar <amitkarwar@gmail.com>,
	Ganapathi Bhat <ganapathi.bhat@nxp.com>,
	Xinming Hu <huxinming820@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Maximilian Luz <luzmaximilian@gmail.com>,
	Andy Shevchenko <andriy.shevchenko@intel.com>,
	verdre@v0yd.nl, Tsuchiya Yuto <kitakar@gmail.com>
Subject: Re: [PATCH] mwifiex: pcie: skip cancel_work_sync() on reset failure path
Date: Tue, 10 Nov 2020 18:51:39 +0000 (UTC)	[thread overview]
Message-ID: <20201110185139.D143DC433C6@smtp.codeaurora.org> (raw)
In-Reply-To: <20201028142346.18355-1-kitakar@gmail.com>

Tsuchiya Yuto <kitakar@gmail.com> wrote:

> If a reset is performed, but even the reset fails for some reasons (e.g.,
> on Surface devices, the fw reset requires another quirks),
> cancel_work_sync() hangs in mwifiex_cleanup_pcie().
> 
>     # firmware went into a bad state
>     [...]
>     [ 1608.281690] mwifiex_pcie 0000:03:00.0: info: shutdown mwifiex...
>     [ 1608.282724] mwifiex_pcie 0000:03:00.0: rx_pending=0, tx_pending=1,	cmd_pending=0
>     [ 1608.292400] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
>     [ 1608.292405] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
>     # reset performed after firmware went into a bad state
>     [ 1609.394320] mwifiex_pcie 0000:03:00.0: WLAN FW already running! Skip FW dnld
>     [ 1609.394335] mwifiex_pcie 0000:03:00.0: WLAN FW is active
>     # but even the reset failed
>     [ 1619.499049] mwifiex_pcie 0000:03:00.0: mwifiex_cmd_timeout_func: Timeout cmd id = 0xfa, act = 0xe000
>     [ 1619.499094] mwifiex_pcie 0000:03:00.0: num_data_h2c_failure = 0
>     [ 1619.499103] mwifiex_pcie 0000:03:00.0: num_cmd_h2c_failure = 0
>     [ 1619.499110] mwifiex_pcie 0000:03:00.0: is_cmd_timedout = 1
>     [ 1619.499117] mwifiex_pcie 0000:03:00.0: num_tx_timeout = 0
>     [ 1619.499124] mwifiex_pcie 0000:03:00.0: last_cmd_index = 0
>     [ 1619.499133] mwifiex_pcie 0000:03:00.0: last_cmd_id: fa 00 07 01 07 01 07 01 07 01
>     [ 1619.499140] mwifiex_pcie 0000:03:00.0: last_cmd_act: 00 e0 00 00 00 00 00 00 00 00
>     [ 1619.499147] mwifiex_pcie 0000:03:00.0: last_cmd_resp_index = 3
>     [ 1619.499155] mwifiex_pcie 0000:03:00.0: last_cmd_resp_id: 07 81 07 81 07 81 07 81 07 81
>     [ 1619.499162] mwifiex_pcie 0000:03:00.0: last_event_index = 2
>     [ 1619.499169] mwifiex_pcie 0000:03:00.0: last_event: 58 00 58 00 58 00 58 00 58 00
>     [ 1619.499177] mwifiex_pcie 0000:03:00.0: data_sent=0 cmd_sent=1
>     [ 1619.499185] mwifiex_pcie 0000:03:00.0: ps_mode=0 ps_state=0
>     [ 1619.499215] mwifiex_pcie 0000:03:00.0: info: _mwifiex_fw_dpc: unregister device
>     # mwifiex_pcie_work hang happening
>     [ 1823.233923] INFO: task kworker/3:1:44 blocked for more than 122 seconds.
>     [ 1823.233932]       Tainted: G        WC OE     5.10.0-rc1-1-mainline #1
>     [ 1823.233935] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>     [ 1823.233940] task:kworker/3:1     state:D stack:    0 pid:   44 ppid:     2 flags:0x00004000
>     [ 1823.233960] Workqueue: events mwifiex_pcie_work [mwifiex_pcie]
>     [ 1823.233965] Call Trace:
>     [ 1823.233981]  __schedule+0x292/0x820
>     [ 1823.233990]  schedule+0x45/0xe0
>     [ 1823.233995]  schedule_timeout+0x11c/0x160
>     [ 1823.234003]  wait_for_completion+0x9e/0x100
>     [ 1823.234012]  __flush_work.isra.0+0x156/0x210
>     [ 1823.234018]  ? flush_workqueue_prep_pwqs+0x130/0x130
>     [ 1823.234026]  __cancel_work_timer+0x11e/0x1a0
>     [ 1823.234035]  mwifiex_cleanup_pcie+0x28/0xd0 [mwifiex_pcie]
>     [ 1823.234049]  mwifiex_free_adapter+0x24/0xe0 [mwifiex]
>     [ 1823.234060]  _mwifiex_fw_dpc+0x294/0x560 [mwifiex]
>     [ 1823.234074]  mwifiex_reinit_sw+0x15d/0x300 [mwifiex]
>     [ 1823.234080]  mwifiex_pcie_reset_done+0x50/0x80 [mwifiex_pcie]
>     [ 1823.234087]  pci_try_reset_function+0x5c/0x90
>     [ 1823.234094]  process_one_work+0x1d6/0x3a0
>     [ 1823.234100]  worker_thread+0x4d/0x3d0
>     [ 1823.234107]  ? rescuer_thread+0x410/0x410
>     [ 1823.234112]  kthread+0x142/0x160
>     [ 1823.234117]  ? __kthread_bind_mask+0x60/0x60
>     [ 1823.234124]  ret_from_fork+0x22/0x30
>     [...]
> 
> This is a deadlock caused by calling cancel_work_sync() in
> mwifiex_cleanup_pcie():
> 
> - Device resets are done via mwifiex_pcie_card_reset()
> - which schedules card->work to call mwifiex_pcie_card_reset_work()
> - which calls pci_try_reset_function().
> - This leads to mwifiex_pcie_reset_done() be called on the same workqueue,
>   which in turn calls
> - mwifiex_reinit_sw() and that calls
> - _mwifiex_fw_dpc().
> 
> The problem is now that _mwifiex_fw_dpc() calls mwifiex_free_adapter()
> in case firmware initialization fails. That ends up calling
> mwifiex_cleanup_pcie().
> 
> Note that all those calls are still running on the workqueue. So when
> mwifiex_cleanup_pcie() now calls cancel_work_sync(), it's really waiting
> on itself to complete, causing a deadlock.
> 
> This commit fixes the deadlock by skipping cancel_work_sync() on a reset
> failure path.
> 
> After this commit, when reset fails, the following output is
> expected to be shown:
> 
>     kernel: mwifiex_pcie 0000:03:00.0: info: _mwifiex_fw_dpc: unregister device
>     kernel: mwifiex: Failed to bring up adapter: -5
>     kernel: mwifiex_pcie 0000:03:00.0: reinit failed: -5
> 
> To reproduce this issue, for example, try putting the root port of wifi
> into D3 (replace "00:1d.3" with your setup).
> 
>     # put into D3 (root port)
>     sudo setpci -v -s 00:1d.3 CAP_PM+4.b=0b
> 
> Cc: Maximilian Luz <luzmaximilian@gmail.com>
> Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>

Patch applied to wireless-drivers-next.git, thanks.

4add4d988f95 mwifiex: pcie: skip cancel_work_sync() on reset failure path

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/20201028142346.18355-1-kitakar@gmail.com/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


  reply	other threads:[~2020-11-10 18:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-28 14:23 [PATCH] mwifiex: pcie: skip cancel_work_sync() on reset failure path Tsuchiya Yuto
2020-11-10 18:51 ` Kalle Valo [this message]
     [not found] ` <20201110185139.A1541C433C9@smtp.codeaurora.org>
2020-11-11  8:53   ` Tsuchiya Yuto
2020-11-11  9:06     ` Kalle Valo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201110185139.D143DC433C6@smtp.codeaurora.org \
    --to=kvalo@codeaurora.org \
    --cc=amitkarwar@gmail.com \
    --cc=andriy.shevchenko@intel.com \
    --cc=davem@davemloft.net \
    --cc=ganapathi.bhat@nxp.com \
    --cc=huxinming820@gmail.com \
    --cc=kitakar@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=luzmaximilian@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=verdre@v0yd.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.