linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maximilian Luz <luzmaximilian@gmail.com>
To: Johannes Berg <johannes@sipsolutions.net>,
	linux-wireless@vger.kernel.org
Cc: Marek Szyprowski <m.szyprowski@samsung.com>,
	Kalle Valo <kvalo@codeaurora.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, Amitkumar Karwar <amitkarwar@gmail.com>,
	Ganapathi Bhat <ganapathi.bhat@nxp.com>,
	Xinming Hu <huxinming820@gmail.com>
Subject: Re: [BUG] Deadlock in _cfg80211_unregister_wdev()
Date: Fri, 14 May 2021 13:40:56 +0200	[thread overview]
Message-ID: <522833b9-08c1-f470-a328-0e7419e86617@gmail.com> (raw)
In-Reply-To: <57d41364f14ea660915b7afeebaa5912c4300541.camel@sipsolutions.net>

On 5/14/21 10:26 AM, Johannes Berg wrote:
> On Fri, 2021-05-14 at 01:07 +0200, Maximilian Luz wrote:
>> Following commit a05829a7222e ("cfg80211: avoid holding the RTNL when
>> calling the driver"), the mwifiex_pcie module fails to unload. This also
>> prevents the device from rebooting / shutting down.
>>
>> Attempting to unload the module
>>
> 
> I'm *guessing* that you're attempting to unload the module while the
> interface is still up, i.e. you didn't "ip link set wlan0 down" first?

I did not. Doing so indeed allows unloading of the module.

> If so, that is likely fixed by this commit as well:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ea6b2098dd02789f68770fd3d5a373732207be2f

Thanks for pointing this out. That's backported to 5.12.2, right?
Unfortunately the error still persists there (as on all other post 5.12
kernels, as mentioned below).

> However, your log says:
> 
>> [  245.504764]       Tainted: G         C OE     5.11.0-1-surface-dev #2
> 
> so I have no idea what kernel you're using, because 5.11 did *not*
> contain commit a05829a7222e ("cfg80211: avoid holding the RTNL when
> calling the driver"). If you backported the bug you get to be
> responsible for backporting the fixes too?

This is a log from a05829a7222e checked out, which was pre 5.12-rc1, so the
last tag is still 5.11 which sets the package version in my packaging scripts
to that as well. It's not an actual 5.11 kernel, sorry for the confusion.
There is no backporting going on, just bisecting.

> If that's all not solving the issue then please try to resolve with gdb
> what line of code "cfg80211_netdev_notifier_call+0x12a" is, and please
> also clarify exactly what (upstream!) kernel you're using.

The issue exists on 5.12 through 5.12.4, as well as 5.13-rc1 (which is
why I didn't bother specifying a version, sorry again).

If you really want me to, I'll try to find some time to learn GDB kernel
debugging (never done that before, so might take a bit) however, I think
it's fairly clear what's going wrong and why the fix you've linked below
doesn't apply in this case:

Your fix is fixing

     cfg80211_destroy_iface_wk() takes wiphy_lock
      -> cfg80211_destroy_ifaces()
       -> ieee80211_del_iface
        -> ieeee80211_if_remove
         -> cfg80211_unregister_wdev
          -> unregister_netdevice_queue
           -> dev_close_many
            -> __dev_close_many
             -> raw_notifier_call_chain
              -> cfg80211_netdev_notifier_call

by addressing this in cfg80211_destroy_iface{s,_wk}(). The trace from my
log shows

     mwifiex_uninit_sw() takes wiphy_lock
      -> mwifiex_del_virtual_intf
       -> cfg80211_unregister_netdevice()
        -> cfg80211_unregister_wdev()
         -> _cfg80211_unregister_wdev() has lockdep_assert_held(&rdev->wiphy.mtx)
          -> unregister_netdevice_queue
           -> dev_close_many
            -> __dev_close_many
             -> raw_notifier_call_chain
              -> cfg80211_netdev_notifier_call attempts to take wiphy_lock again

So your fix does not address this particular issue. It doesn't even
touch any of the affected code path. I believe it is instead fixing one
symptom of the same underlying problem.

While the last parts of the trace are the same (specifically following
cfg80211_unregister_wdev()), the lock is initially taken in different
functions. Your fix addresses this by changing cfg80211_destroy_ifaces(),
and cfg80211_destroy_iface_wk() which, however, were never called on the
path that's causing _this_ issue.

Furthermore, if you go through that trace, there's only one notifier
call in __dev_close_many(), which is

   call_netdevice_notifiers(NETDEV_GOING_DOWN, dev)

which I believe I have linked in my previous mail. Since the state value
is NETDEV_GOING_DOWN, this has to be case [3] (unless I'm missing
something).

Regards,
Max

[3]: https://elixir.bootlin.com/linux/v5.13-rc1/source/net/wireless/core.c#L1428

  reply	other threads:[~2021-05-14 11:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-13 23:07 [BUG] Deadlock in _cfg80211_unregister_wdev() Maximilian Luz
2021-05-14  8:26 ` Johannes Berg
2021-05-14 11:40   ` Maximilian Luz [this message]
2021-05-15  2:44   ` Brian Norris
2021-05-15 11:24     ` Maximilian Luz
2021-05-14 13:46 ` Maximilian Luz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=522833b9-08c1-f470-a328-0e7419e86617@gmail.com \
    --to=luzmaximilian@gmail.com \
    --cc=amitkarwar@gmail.com \
    --cc=davem@davemloft.net \
    --cc=ganapathi.bhat@nxp.com \
    --cc=huxinming820@gmail.com \
    --cc=johannes@sipsolutions.net \
    --cc=kuba@kernel.org \
    --cc=kvalo@codeaurora.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).