* [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
@ 2017-03-16 10:28 Amitkumar Karwar
2017-03-16 18:33 ` Dmitry Torokhov
0 siblings, 1 reply; 5+ messages in thread
From: Amitkumar Karwar @ 2017-03-16 10:28 UTC (permalink / raw)
To: linux-wireless
Cc: Cathy Luo, Nishant Sarmukadam, rajatja, dmitry.torokhov,
briannorris, Amitkumar Karwar
We observed a SHUTDOWN command timeout during reboot stress test
due to a corner case firmware bug. It leads to use-after-free on
adapter structure pointer and crash.
Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing
any work scheduled after cancel_work_sync() call in teardown path
to resolve the issue.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
---
v2: New work_flag has been added to resolve the issue cleanly as per
Brian's suggestion.
---
drivers/net/wireless/marvell/mwifiex/main.h | 1 +
drivers/net/wireless/marvell/mwifiex/pcie.c | 4 ++++
drivers/net/wireless/marvell/mwifiex/sdio.c | 4 ++++
3 files changed, 9 insertions(+)
diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
index 5c82972..d5b1fd6 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.h
+++ b/drivers/net/wireless/marvell/mwifiex/main.h
@@ -510,6 +510,7 @@ struct mwifiex_roc_cfg {
enum mwifiex_iface_work_flags {
MWIFIEX_IFACE_WORK_DEVICE_DUMP,
MWIFIEX_IFACE_WORK_CARD_RESET,
+ MWIFIEX_IFACE_WORK_DONT_RUN,
};
struct mwifiex_private {
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index a0d9180..bb3d798 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
if (!adapter || !adapter->priv_num)
return;
+ set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags);
cancel_work_sync(&card->work);
reg = card->pcie.reg;
@@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct *work)
struct pcie_service_card *card =
container_of(work, struct pcie_service_card, work);
+ if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags))
+ return;
+
if (test_and_clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP,
&card->work_flags))
mwifiex_pcie_device_dump_work(card->adapter);
diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c
index a4b356d..8140bb4 100644
--- a/drivers/net/wireless/marvell/mwifiex/sdio.c
+++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
@@ -387,6 +387,7 @@ static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter)
if (!adapter || !adapter->priv_num)
return;
+ set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags);
cancel_work_sync(&card->work);
mwifiex_dbg(adapter, INFO, "info: SDIO func num=%d\n", func->num);
@@ -2514,6 +2515,9 @@ static void mwifiex_sdio_work(struct work_struct *work)
struct sdio_mmc_card *card =
container_of(work, struct sdio_mmc_card, work);
+ if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags))
+ return;
+
if (test_and_clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP,
&card->work_flags))
mwifiex_sdio_device_dump_work(card->adapter);
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
2017-03-16 10:28 [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout Amitkumar Karwar
@ 2017-03-16 18:33 ` Dmitry Torokhov
2017-03-16 18:41 ` Brian Norris
0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Torokhov @ 2017-03-16 18:33 UTC (permalink / raw)
To: Amitkumar Karwar
Cc: linux-wireless, Cathy Luo, Nishant Sarmukadam, rajatja, briannorris
On Thu, Mar 16, 2017 at 03:58:52PM +0530, Amitkumar Karwar wrote:
> We observed a SHUTDOWN command timeout during reboot stress test
> due to a corner case firmware bug. It leads to use-after-free on
> adapter structure pointer and crash.
>
> Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing
> any work scheduled after cancel_work_sync() call in teardown path
> to resolve the issue.
>
> Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> ---
> v2: New work_flag has been added to resolve the issue cleanly as per
> Brian's suggestion.
> ---
> drivers/net/wireless/marvell/mwifiex/main.h | 1 +
> drivers/net/wireless/marvell/mwifiex/pcie.c | 4 ++++
> drivers/net/wireless/marvell/mwifiex/sdio.c | 4 ++++
> 3 files changed, 9 insertions(+)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
> index 5c82972..d5b1fd6 100644
> --- a/drivers/net/wireless/marvell/mwifiex/main.h
> +++ b/drivers/net/wireless/marvell/mwifiex/main.h
> @@ -510,6 +510,7 @@ struct mwifiex_roc_cfg {
> enum mwifiex_iface_work_flags {
> MWIFIEX_IFACE_WORK_DEVICE_DUMP,
> MWIFIEX_IFACE_WORK_CARD_RESET,
> + MWIFIEX_IFACE_WORK_DONT_RUN,
> };
>
> struct mwifiex_private {
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index a0d9180..bb3d798 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
> if (!adapter || !adapter->priv_num)
> return;
>
> + set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags);
> cancel_work_sync(&card->work);
>
> reg = card->pcie.reg;
> @@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct *work)
> struct pcie_service_card *card =
> container_of(work, struct pcie_service_card, work);
>
> + if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags))
> + return;
I do not see how this could possible prevent use-after-free, assuming
that the "card" memory is gone by the time mwifiex_pcie_work() gets to
run. You need to check this flag before queueing firmware dump work, and
make sure it is not racy with setting this flag in mwifiex_pcie_remove()
(and sdio).
Thanks.
--
Dmitry
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
2017-03-16 18:33 ` Dmitry Torokhov
@ 2017-03-16 18:41 ` Brian Norris
2017-03-16 19:38 ` Brian Norris
0 siblings, 1 reply; 5+ messages in thread
From: Brian Norris @ 2017-03-16 18:41 UTC (permalink / raw)
To: Dmitry Torokhov
Cc: Amitkumar Karwar, linux-wireless, Cathy Luo, Nishant Sarmukadam, rajatja
Hi Dmitry,
On Thu, Mar 16, 2017 at 11:33:17AM -0700, Dmitry Torokhov wrote:
> On Thu, Mar 16, 2017 at 03:58:52PM +0530, Amitkumar Karwar wrote:
> > We observed a SHUTDOWN command timeout during reboot stress test
> > due to a corner case firmware bug. It leads to use-after-free on
> > adapter structure pointer and crash.
> >
> > Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing
> > any work scheduled after cancel_work_sync() call in teardown path
> > to resolve the issue.
> >
> > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> > ---
> > v2: New work_flag has been added to resolve the issue cleanly as per
> > Brian's suggestion.
> > ---
> > drivers/net/wireless/marvell/mwifiex/main.h | 1 +
> > drivers/net/wireless/marvell/mwifiex/pcie.c | 4 ++++
> > drivers/net/wireless/marvell/mwifiex/sdio.c | 4 ++++
> > 3 files changed, 9 insertions(+)
> >
> > diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
> > index 5c82972..d5b1fd6 100644
> > --- a/drivers/net/wireless/marvell/mwifiex/main.h
> > +++ b/drivers/net/wireless/marvell/mwifiex/main.h
> > @@ -510,6 +510,7 @@ struct mwifiex_roc_cfg {
> > enum mwifiex_iface_work_flags {
> > MWIFIEX_IFACE_WORK_DEVICE_DUMP,
> > MWIFIEX_IFACE_WORK_CARD_RESET,
> > + MWIFIEX_IFACE_WORK_DONT_RUN,
> > };
> >
> > struct mwifiex_private {
> > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > index a0d9180..bb3d798 100644
> > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > @@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
> > if (!adapter || !adapter->priv_num)
> > return;
> >
> > + set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags);
> > cancel_work_sync(&card->work);
> >
> > reg = card->pcie.reg;
> > @@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct *work)
> > struct pcie_service_card *card =
> > container_of(work, struct pcie_service_card, work);
> >
> > + if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags))
> > + return;
>
> I do not see how this could possible prevent use-after-free, assuming
> that the "card" memory is gone by the time mwifiex_pcie_work() gets to
> run.
The 'card' memory isn't getting freed; it's the 'adapter' memory we're
worried about. This is either already freed (because the FW init
procedure failed), or else it's freed later in this function via
mwifiex_remove_card().
(We're also worried about having the FW dump race with the FW shutdown
sequence, which can begin later in this function. This patch blocks both
races AFAICT.)
> You need to check this flag before queueing firmware dump work, and
> make sure it is not racy with setting this flag in mwifiex_pcie_remove()
> (and sdio).
That's another approach that could work, but it's a little more
invasive.
I'm still reviewing and testing this, but I believe this is nearly
equivalent to my own draft version, which tested out fine.
Brian
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
2017-03-16 18:41 ` Brian Norris
@ 2017-03-16 19:38 ` Brian Norris
2017-03-16 20:52 ` Brian Norris
0 siblings, 1 reply; 5+ messages in thread
From: Brian Norris @ 2017-03-16 19:38 UTC (permalink / raw)
To: Dmitry Torokhov
Cc: Amitkumar Karwar, linux-wireless, Cathy Luo, Nishant Sarmukadam, rajatja
Hi Dmitry and Amit,
On Thu, Mar 16, 2017 at 11:41:15AM -0700, Brian Norris wrote:
> On Thu, Mar 16, 2017 at 11:33:17AM -0700, Dmitry Torokhov wrote:
> > On Thu, Mar 16, 2017 at 03:58:52PM +0530, Amitkumar Karwar wrote:
> > > We observed a SHUTDOWN command timeout during reboot stress test
> > > due to a corner case firmware bug. It leads to use-after-free on
> > > adapter structure pointer and crash.
> > >
> > > Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing
BTW, the 'DONT_RUN' suggestion was more of a pseudo-code suggestion than
a real name, but I guess it's not terrible :)
> > > any work scheduled after cancel_work_sync() call in teardown path
> > > to resolve the issue.
> > >
> > > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> > > ---
> > > v2: New work_flag has been added to resolve the issue cleanly as per
> > > Brian's suggestion.
> > > ---
> > > drivers/net/wireless/marvell/mwifiex/main.h | 1 +
> > > drivers/net/wireless/marvell/mwifiex/pcie.c | 4 ++++
> > > drivers/net/wireless/marvell/mwifiex/sdio.c | 4 ++++
> > > 3 files changed, 9 insertions(+)
> > >
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
> > > index 5c82972..d5b1fd6 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/main.h
> > > +++ b/drivers/net/wireless/marvell/mwifiex/main.h
> > > @@ -510,6 +510,7 @@ struct mwifiex_roc_cfg {
> > > enum mwifiex_iface_work_flags {
> > > MWIFIEX_IFACE_WORK_DEVICE_DUMP,
> > > MWIFIEX_IFACE_WORK_CARD_RESET,
> > > + MWIFIEX_IFACE_WORK_DONT_RUN,
> > > };
> > >
> > > struct mwifiex_private {
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > index a0d9180..bb3d798 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > @@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
> > > if (!adapter || !adapter->priv_num)
> > > return;
> > >
> > > + set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags);
> > > cancel_work_sync(&card->work);
> > >
> > > reg = card->pcie.reg;
> > > @@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct *work)
> > > struct pcie_service_card *card =
> > > container_of(work, struct pcie_service_card, work);
> > >
> > > + if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags))
> > > + return;
> >
> > I do not see how this could possible prevent use-after-free, assuming
> > that the "card" memory is gone by the time mwifiex_pcie_work() gets to
> > run.
>
> The 'card' memory isn't getting freed; it's the 'adapter' memory we're
> worried about. This is either already freed (because the FW init
> procedure failed), or else it's freed later in this function via
> mwifiex_remove_card().
I guess there was a slight miscommunication here: Dmitry pointed out to
me that he *was* actually talking about 'card' getting freed -- when it
gets freed after remove() finishes.
So the sequence would have to go like:
1. enter remove()
2. set DONT_RUN flag; cancel_work_sync()
3. begin to shutdown firmware
4. hit, e.g., a command timeout that schedules the work again
5. ** scheduler decides not to schedule the work for a while **
6. we finish mwifiex_remove_card(), and exit from remove() successfully
7. devm_* frees the pcie_service_card (and enclosed work_struct)
8. scheduler tries to run our work item
9. use-after-free!
However unlikely that the delay from 4 to 8 might be, this is indeed a
race condition.
> (We're also worried about having the FW dump race with the FW shutdown
> sequence, which can begin later in this function. This patch blocks both
> races AFAICT.)
>
> > You need to check this flag before queueing firmware dump work, and
> > make sure it is not racy with setting this flag in mwifiex_pcie_remove()
> > (and sdio).
>
> That's another approach that could work, but it's a little more
> invasive.
Never mind, that isn't too invasive. There's only one schedule_work() in
pcie.c and two in sdio.c. We could even factor out a helper, that knows
how to check the appropriate MWIFIEX_IFACE_* flags, if we really wanted
to...
Brian
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
2017-03-16 19:38 ` Brian Norris
@ 2017-03-16 20:52 ` Brian Norris
0 siblings, 0 replies; 5+ messages in thread
From: Brian Norris @ 2017-03-16 20:52 UTC (permalink / raw)
To: Dmitry Torokhov
Cc: Amitkumar Karwar, linux-wireless, Cathy Luo, Nishant Sarmukadam, rajatja
On Thu, Mar 16, 2017 at 12:38:57PM -0700, Brian Norris wrote:
> On Thu, Mar 16, 2017 at 11:41:15AM -0700, Brian Norris wrote:
> > On Thu, Mar 16, 2017 at 11:33:17AM -0700, Dmitry Torokhov wrote:
> > > You need to check this flag before queueing firmware dump work, and
> > > make sure it is not racy with setting this flag in mwifiex_pcie_remove()
> > > (and sdio).
> >
> > That's another approach that could work, but it's a little more
> > invasive.
>
> Never mind, that isn't too invasive. There's only one schedule_work() in
> pcie.c and two in sdio.c. We could even factor out a helper, that knows
> how to check the appropriate MWIFIEX_IFACE_* flags, if we really wanted
> to...
OK, so I took a crack at implementing this, and after thinking about it,
the "make sure it is not racy with setting this flag" part is tougher
than it seems. In the end, I think the key is that to eliminate the
race between setting and checking the flag, we just want to halt all
sources of more work -- e.g., commands (which could time out), or
debugfs entries (which could trigger a FW dump manually) -- without
fiddling with extra flags. We do this already in the first half of
mwifiex_remove_card(), when we terminate the main workqueue(s) and
unregister the net and wiphy devices.
IOW, we can move the cancel_work_sync() into the .cleanup_if() callback,
which occurs after the above described teardown, but before the PCIe
driver has actually called things like pci_disable_device() [1]. Then we
don't need any DONT_RUN flag either.
I'll test the above a bit more here, then send a v3 myself, with the
above reasoning captured. I *think* that should eliminate all the races
we've discussed here.
Brian
[1] BTW, I think I previously blamed mwifiex_init_shutdown_fw() for
racing with the FW dumper; I think that is not actually the smoking
gun (it was an educated guess). Based on testing, I see aborts if
we're still accessing the PCIe device (e.g., in the FW dumper) after
mwifiex_cleanup_pcie() -> pci_disable_device().
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-03-16 21:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16 10:28 [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout Amitkumar Karwar
2017-03-16 18:33 ` Dmitry Torokhov
2017-03-16 18:41 ` Brian Norris
2017-03-16 19:38 ` Brian Norris
2017-03-16 20:52 ` Brian Norris
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.