linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kalle Valo <kvalo@codeaurora.org>
To: Brian Norris <briannorris@chromium.org>
Cc: Ganapathi Bhat <gbhat@marvell.com>,
	Nishant Sarmukadam <nishants@marvell.com>,
	linux-kernel@vger.kernel.org,
	Dmitry Torokhov <dmitry.torokhov@gmail.com>,
	Amitkumar Karwar <amitkarwar@gmail.com>,
	linux-wireless@vger.kernel.org,
	Johannes Berg <johannes@sipsolutions.net>
Subject: Re: [PATCH 05/14] mwifiex: re-register wiphy across reset
Date: Mon, 05 Jun 2017 18:54:18 +0300	[thread overview]
Message-ID: <87inka77md.fsf@codeaurora.org> (raw)
In-Reply-To: <20170601173954.GA138807@google.com> (Brian Norris's message of "Thu, 1 Jun 2017 10:39:56 -0700")

Brian Norris <briannorris@chromium.org> writes:

> On Thu, Jun 01, 2017 at 12:15:45PM +0300, Kalle Valo wrote:
>> Brian Norris <briannorris@chromium.org> writes:
>> 
>> > In general, it's helpful to use the same code for device removal as for
>> > device reset, as this tends to have fewer bugs. Let's move the wiphy
>> > unregistration code into the common reset and removal code.
>> >
>> > In particular, it's very hard to properly handle the reset sequence when
>> > something fails. Currently, if mwifiex_reinit_sw() fails, we've failed
>> > to unregister the associated wiphy, and so running something as simple
>> > as "iw phy" can trigger an OOPS, as the wiphy still has hooks back into
>> > freed mwifiex data structures. For example, KASAN complained:
>> >
>> > [... see reset fail for other reasons ...]
>> > [ 1184.821158] mwifiex_pcie 0000:01:00.0: info: dnld wifi firmware from 174948 bytes
>> > [ 1186.870914] mwifiex_pcie 0000:01:00.0: info: FW download over, size 608396 bytes
>> > [ 1187.685990] mwifiex_pcie 0000:01:00.0: WLAN FW is active
>> > [ 1187.692673] mwifiex_pcie 0000:01:00.0: cmd_wait_q terminated: -512
>> > [ 1187.699075] mwifiex_pcie 0000:01:00.0: info: _mwifiex_fw_dpc: unregister device
>> > [ 1187.713476] mwifiex: Failed to bring up adapter: -5
>> > [ 1187.718644] mwifiex_pcie 0000:01:00.0: reinit failed: -5
>> >
>> > [... run `iw phy` ...]
>> > [ 1212.902419] ==================================================================
>> > [ 1212.909806] BUG: KASAN: use-after-free in mwifiex_cfg80211_get_antenna+0x54/0xfc [mwifiex] at addr ffffffc0ad1a8028
>> > [ 1212.920246] Read of size 1 by task iw/3127
>> > [...]
>> > [ 1212.934946] page dumped because: kasan: bad access detected
>> > [...]
>> > [ 1212.950665] Call trace:
>> > [ 1212.953148] [<ffffffc00020a69c>] dump_backtrace+0x0/0x190
>> > [ 1212.958572] [<ffffffc00020a96c>] show_stack+0x20/0x28
>> > [ 1212.963648] [<ffffffc0005ce18c>] dump_stack+0xa4/0xcc
>> > [ 1212.968723] [<ffffffc0003c4430>] kasan_report+0x378/0x500
>> > [ 1212.974140] [<ffffffc0003c3358>] __asan_load1+0x44/0x4c
>> > [ 1212.979462] [<ffffffbffc2e8360>] mwifiex_cfg80211_get_antenna+0x54/0xfc [mwifiex]
>> > [ 1212.987131] [<ffffffbffc084fc4>] nl80211_send_wiphy+0x75c/0x2de0 [cfg80211]
>> > [ 1212.994246] [<ffffffbffc094f60>] nl80211_dump_wiphy+0x32c/0x438 [cfg80211]
>> > [ 1213.001149] [<ffffffc000ab6404>] genl_lock_dumpit+0x48/0x64
>> > [ 1213.006746] [<ffffffc000ab3474>] netlink_dump+0x178/0x398
>> > [ 1213.012171] [<ffffffc000ab3d18>] __netlink_dump_start+0x1bc/0x260
>> > [...]
>> >
>> > This all goes away if we just tear down the wiphy on the way down, and
>> > set it back up if/when we bring the device back up.
>> 
>> I don't know exactly what kind of reset this is about,
>
> Marvell firmwares are known to be quite buggy, and there are plenty of
> situations in which they crash (often resulting in a command timeout).

That is a common problem with firmwares :/

> The current best workaround for these is to essentially unwind the
> whole driver, reset the card, and reprobe the whole thing. See
> anywhere that the ->card_reset() callback is called.
>
> This has been around for a long time on SDIO, and you recently merged my
> changes to enable this for PCIe:
>
> 6d7d579a8243 mwifiex: pcie: add card_reset() support
>
>> but isn't this a
>> quite intrusive solution? For example, phy name changes because of this?
>
> Yes, it is a bit intrusive. But the whole process is intrusive, as it
> deletes all the virtual interfaces and loses your settings. This all
> relies on user space being prepared to clean up and reinitialize
> everything afterward.
>
> And yes, this causes a phy name change.
>
> In favor: this is what the SDIO reset code *used* to do, before this
> commit:
>
> c742e623e941 mwifiex: sdio card reset enhancement
>
> where the SDIO driver started using the half-baked reset solution
> written for PCIe.
>
> Lastly, I still need to analyze a few more things in this driver, but
> AFAICT, if we *don't* unregister the wiphy, we are exposed to quite a
> few more race conditions -- not just the easy-to-notice condition
> described above. What happens if the wiphy still processes cfg80211
> operations while we're still resetting the firmware? Much of the driver
> may not be prepared for this. At the moment, I can't find anything
> terribly wrong; if I slow down the reset (e.g., with msleep()s) I can
> just trigger complaints about "cmd node not available" or "card is
> removed", but I haven't yet found a true bug.
>
> That's not to say that there aren't such bugs out there. I'd still be
> willing to bet there are. And IMO, it seems wise to just do the same
> teardown/setup as one would do for (e.g.) 'rmmod', to prevent exposing
> *too* many new permutations of "wiphy is available but rest of the
> driver is torn down".

This feels like a sledge hammer approach causing all sort of problems
for user space and I really like the mac80211 approach more. For
example, if an ath10k firmware crash happens user only sees a few second
pause in data traffic and a warning in kernel log, otherwise everything
happens behind the scenes. Of course there are very likely races
somewhere but at least I haven't seen that many reports related to
firmware restart functionality.

> But if none of this is convincing to you, I can take a stab at a
> different solution.

I don't have any problem applying this patch but more about being
curious why doing it like this. And hopefully finding a less intrusive
solution in the future.

> BTW, since you're taking an interest in this code now, can I trouble you
> with a question? Looking at mwifiex_uninit_sw() (after this patchset),
> you can see a loop like this:
>
>         /* Stop data */
>         for (i = 0; i < adapter->priv_num; i++) {
>                 priv = adapter->priv[i];
>                 if (priv && priv->netdev) {
>                         mwifiex_stop_net_dev_queue(priv->netdev, adapter);
>                         if (netif_carrier_ok(priv->netdev))
>                                 netif_carrier_off(priv->netdev);
>                         netif_device_detach(priv->netdev);
>                 }
>         }
>
> That seems to be the only attempt to prevent user space from talking to
> the device while we proceed to shut down (mwifiex_shutdown_drv()). AIUI,
> that's wholly insufficient, and we need to actually stop all the virtual
> interfaces (and possibly the wiphy as well) first. I'm looking at trying
> to move the mwifiex_del_virtual_intf() loop up much further in this
> function (but there are other bugs preventing me from doing that yet).
>
> Does that sound like the right approach to you? I'm kinda figuring this
> should better mimic the mac80211 ieee80211_remove_interfaces()
> structure.

Johannes is much better person to answer this (CCed).

-- 
Kalle Valo

  reply	other threads:[~2017-06-05 15:54 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-25  0:11 [PATCH 01/14] mwifiex: pcie: properly synchronize, disable interrupts in driver callbacks Brian Norris
2017-05-25  0:11 ` [PATCH 02/14] mwifiex: reunite copy-and-pasted remove/reset code Brian Norris
2017-05-25  0:11 ` [PATCH 03/14] mwifiex: reset interrupt status across device reset Brian Norris
2017-05-25  0:11 ` [PATCH 04/14] mwifiex: pcie: don't allow cmd buffer reuse after reset Brian Norris
2017-05-25  0:11 ` [PATCH 05/14] mwifiex: re-register wiphy across reset Brian Norris
2017-06-01  9:15   ` Kalle Valo
2017-06-01 17:39     ` Brian Norris
2017-06-05 15:54       ` Kalle Valo [this message]
2017-06-09  9:03         ` Johannes Berg
2017-06-21 18:27           ` Brian Norris
2017-06-22 13:02             ` Johannes Berg
2017-06-27 20:48               ` Brian Norris
2017-06-28  7:28                 ` Johannes Berg
2017-06-29 18:45                   ` Brian Norris
2017-07-04 14:10                 ` Kalle Valo
2017-06-21 17:48         ` Brian Norris
2017-06-22 12:59           ` Johannes Berg
2017-06-27 19:50             ` Brian Norris
2017-06-28  7:21               ` Johannes Berg
2017-05-25  0:11 ` [PATCH 06/14] mwifiex: don't short-circuit netdev notifiers on interface deletion Brian Norris
2017-05-25  0:11 ` [PATCH 07/14] mwifiex: fixup init_channel_scan_gap error case Brian Norris
2017-05-25  0:11 ` [PATCH 08/14] mwifiex: ensure "disable auto DS" struct is initialized Brian Norris
2017-05-25  0:11 ` [PATCH 09/14] mwifiex: pcie: remove redundant synchronize_irq() Brian Norris
2017-05-25  0:11 ` [PATCH 10/14] mwifiex: pcie: stop masking interrupts in FW downloader Brian Norris
2017-05-25  0:11 ` [PATCH 11/14] mwifiex: utilize netif_tx_{wake,stop}_all_queues() Brian Norris
2017-05-25  0:11 ` [PATCH 12/14] mwifiex: don't open-code ARRAY_SIZE() Brian Norris
2017-05-25  0:11 ` [PATCH 13/14] mwifiex: drop 'add_tail' param from mwifiex_insert_cmd_to_pending_q() Brian Norris
2017-05-25  0:11 ` [PATCH 14/14] mwifiex: pcie: fix whitespace Brian Norris
2017-05-31 17:10 ` [PATCH 01/14] mwifiex: pcie: properly synchronize, disable interrupts in driver callbacks Brian Norris
2017-06-05 11:49   ` Xinming Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87inka77md.fsf@codeaurora.org \
    --to=kvalo@codeaurora.org \
    --cc=amitkarwar@gmail.com \
    --cc=briannorris@chromium.org \
    --cc=dmitry.torokhov@gmail.com \
    --cc=gbhat@marvell.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=nishants@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).