All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
@ 2017-03-16 10:28 Amitkumar Karwar
  2017-03-16 18:33 ` Dmitry Torokhov
  0 siblings, 1 reply; 5+ messages in thread
From: Amitkumar Karwar @ 2017-03-16 10:28 UTC (permalink / raw)
  To: linux-wireless
  Cc: Cathy Luo, Nishant Sarmukadam, rajatja, dmitry.torokhov,
	briannorris, Amitkumar Karwar

We observed a SHUTDOWN command timeout during reboot stress test
due to a corner case firmware bug. It leads to use-after-free on
adapter structure pointer and crash.

Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing
any work scheduled after cancel_work_sync() call in teardown path
to resolve the issue.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
---
v2: New work_flag has been added to resolve the issue cleanly as per
Brian's suggestion.
---
 drivers/net/wireless/marvell/mwifiex/main.h | 1 +
 drivers/net/wireless/marvell/mwifiex/pcie.c | 4 ++++
 drivers/net/wireless/marvell/mwifiex/sdio.c | 4 ++++
 3 files changed, 9 insertions(+)

diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
index 5c82972..d5b1fd6 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.h
+++ b/drivers/net/wireless/marvell/mwifiex/main.h
@@ -510,6 +510,7 @@ struct mwifiex_roc_cfg {
 enum mwifiex_iface_work_flags {
 	MWIFIEX_IFACE_WORK_DEVICE_DUMP,
 	MWIFIEX_IFACE_WORK_CARD_RESET,
+	MWIFIEX_IFACE_WORK_DONT_RUN,
 };
 
 struct mwifiex_private {
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index a0d9180..bb3d798 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
 	if (!adapter || !adapter->priv_num)
 		return;
 
+	set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags);
 	cancel_work_sync(&card->work);
 
 	reg = card->pcie.reg;
@@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct *work)
 	struct pcie_service_card *card =
 		container_of(work, struct pcie_service_card, work);
 
+	if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags))
+		return;
+
 	if (test_and_clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP,
 			       &card->work_flags))
 		mwifiex_pcie_device_dump_work(card->adapter);
diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c
index a4b356d..8140bb4 100644
--- a/drivers/net/wireless/marvell/mwifiex/sdio.c
+++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
@@ -387,6 +387,7 @@ static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter)
 	if (!adapter || !adapter->priv_num)
 		return;
 
+	set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags);
 	cancel_work_sync(&card->work);
 
 	mwifiex_dbg(adapter, INFO, "info: SDIO func num=%d\n", func->num);
@@ -2514,6 +2515,9 @@ static void mwifiex_sdio_work(struct work_struct *work)
 	struct sdio_mmc_card *card =
 		container_of(work, struct sdio_mmc_card, work);
 
+	if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags))
+		return;
+
 	if (test_and_clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP,
 			       &card->work_flags))
 		mwifiex_sdio_device_dump_work(card->adapter);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
  2017-03-16 10:28 [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout Amitkumar Karwar
@ 2017-03-16 18:33 ` Dmitry Torokhov
  2017-03-16 18:41   ` Brian Norris
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Torokhov @ 2017-03-16 18:33 UTC (permalink / raw)
  To: Amitkumar Karwar
  Cc: linux-wireless, Cathy Luo, Nishant Sarmukadam, rajatja, briannorris

On Thu, Mar 16, 2017 at 03:58:52PM +0530, Amitkumar Karwar wrote:
> We observed a SHUTDOWN command timeout during reboot stress test
> due to a corner case firmware bug. It leads to use-after-free on
> adapter structure pointer and crash.
> 
> Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing
> any work scheduled after cancel_work_sync() call in teardown path
> to resolve the issue.
> 
> Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> ---
> v2: New work_flag has been added to resolve the issue cleanly as per
> Brian's suggestion.
> ---
>  drivers/net/wireless/marvell/mwifiex/main.h | 1 +
>  drivers/net/wireless/marvell/mwifiex/pcie.c | 4 ++++
>  drivers/net/wireless/marvell/mwifiex/sdio.c | 4 ++++
>  3 files changed, 9 insertions(+)
> 
> diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
> index 5c82972..d5b1fd6 100644
> --- a/drivers/net/wireless/marvell/mwifiex/main.h
> +++ b/drivers/net/wireless/marvell/mwifiex/main.h
> @@ -510,6 +510,7 @@ struct mwifiex_roc_cfg {
>  enum mwifiex_iface_work_flags {
>  	MWIFIEX_IFACE_WORK_DEVICE_DUMP,
>  	MWIFIEX_IFACE_WORK_CARD_RESET,
> +	MWIFIEX_IFACE_WORK_DONT_RUN,
>  };
>  
>  struct mwifiex_private {
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index a0d9180..bb3d798 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
>  	if (!adapter || !adapter->priv_num)
>  		return;
>  
> +	set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags);
>  	cancel_work_sync(&card->work);
>  
>  	reg = card->pcie.reg;
> @@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct *work)
>  	struct pcie_service_card *card =
>  		container_of(work, struct pcie_service_card, work);
>  
> +	if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags))
> +		return;

I do not see how this could possible prevent use-after-free, assuming
that the "card" memory is gone by the time mwifiex_pcie_work() gets to
run. You need to check this flag before queueing firmware dump work, and
make sure it is not racy with setting this flag in mwifiex_pcie_remove()
(and sdio).

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
  2017-03-16 18:33 ` Dmitry Torokhov
@ 2017-03-16 18:41   ` Brian Norris
  2017-03-16 19:38     ` Brian Norris
  0 siblings, 1 reply; 5+ messages in thread
From: Brian Norris @ 2017-03-16 18:41 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Amitkumar Karwar, linux-wireless, Cathy Luo, Nishant Sarmukadam, rajatja

Hi Dmitry,

On Thu, Mar 16, 2017 at 11:33:17AM -0700, Dmitry Torokhov wrote:
> On Thu, Mar 16, 2017 at 03:58:52PM +0530, Amitkumar Karwar wrote:
> > We observed a SHUTDOWN command timeout during reboot stress test
> > due to a corner case firmware bug. It leads to use-after-free on
> > adapter structure pointer and crash.
> > 
> > Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing
> > any work scheduled after cancel_work_sync() call in teardown path
> > to resolve the issue.
> > 
> > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> > ---
> > v2: New work_flag has been added to resolve the issue cleanly as per
> > Brian's suggestion.
> > ---
> >  drivers/net/wireless/marvell/mwifiex/main.h | 1 +
> >  drivers/net/wireless/marvell/mwifiex/pcie.c | 4 ++++
> >  drivers/net/wireless/marvell/mwifiex/sdio.c | 4 ++++
> >  3 files changed, 9 insertions(+)
> > 
> > diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
> > index 5c82972..d5b1fd6 100644
> > --- a/drivers/net/wireless/marvell/mwifiex/main.h
> > +++ b/drivers/net/wireless/marvell/mwifiex/main.h
> > @@ -510,6 +510,7 @@ struct mwifiex_roc_cfg {
> >  enum mwifiex_iface_work_flags {
> >  	MWIFIEX_IFACE_WORK_DEVICE_DUMP,
> >  	MWIFIEX_IFACE_WORK_CARD_RESET,
> > +	MWIFIEX_IFACE_WORK_DONT_RUN,
> >  };
> >  
> >  struct mwifiex_private {
> > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > index a0d9180..bb3d798 100644
> > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > @@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
> >  	if (!adapter || !adapter->priv_num)
> >  		return;
> >  
> > +	set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags);
> >  	cancel_work_sync(&card->work);
> >  
> >  	reg = card->pcie.reg;
> > @@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct *work)
> >  	struct pcie_service_card *card =
> >  		container_of(work, struct pcie_service_card, work);
> >  
> > +	if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags))
> > +		return;
> 
> I do not see how this could possible prevent use-after-free, assuming
> that the "card" memory is gone by the time mwifiex_pcie_work() gets to
> run.

The 'card' memory isn't getting freed; it's the 'adapter' memory we're
worried about. This is either already freed (because the FW init
procedure failed), or else it's freed later in this function via
mwifiex_remove_card().

(We're also worried about having the FW dump race with the FW shutdown
sequence, which can begin later in this function. This patch blocks both
races AFAICT.)

> You need to check this flag before queueing firmware dump work, and
> make sure it is not racy with setting this flag in mwifiex_pcie_remove()
> (and sdio).

That's another approach that could work, but it's a little more
invasive.

I'm still reviewing and testing this, but I believe this is nearly
equivalent to my own draft version, which tested out fine.

Brian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
  2017-03-16 18:41   ` Brian Norris
@ 2017-03-16 19:38     ` Brian Norris
  2017-03-16 20:52       ` Brian Norris
  0 siblings, 1 reply; 5+ messages in thread
From: Brian Norris @ 2017-03-16 19:38 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Amitkumar Karwar, linux-wireless, Cathy Luo, Nishant Sarmukadam, rajatja

Hi Dmitry and Amit,

On Thu, Mar 16, 2017 at 11:41:15AM -0700, Brian Norris wrote:
> On Thu, Mar 16, 2017 at 11:33:17AM -0700, Dmitry Torokhov wrote:
> > On Thu, Mar 16, 2017 at 03:58:52PM +0530, Amitkumar Karwar wrote:
> > > We observed a SHUTDOWN command timeout during reboot stress test
> > > due to a corner case firmware bug. It leads to use-after-free on
> > > adapter structure pointer and crash.
> > > 
> > > Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing

BTW, the 'DONT_RUN' suggestion was more of a pseudo-code suggestion than
a real name, but I guess it's not terrible :)

> > > any work scheduled after cancel_work_sync() call in teardown path
> > > to resolve the issue.
> > > 
> > > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> > > ---
> > > v2: New work_flag has been added to resolve the issue cleanly as per
> > > Brian's suggestion.
> > > ---
> > >  drivers/net/wireless/marvell/mwifiex/main.h | 1 +
> > >  drivers/net/wireless/marvell/mwifiex/pcie.c | 4 ++++
> > >  drivers/net/wireless/marvell/mwifiex/sdio.c | 4 ++++
> > >  3 files changed, 9 insertions(+)
> > > 
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
> > > index 5c82972..d5b1fd6 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/main.h
> > > +++ b/drivers/net/wireless/marvell/mwifiex/main.h
> > > @@ -510,6 +510,7 @@ struct mwifiex_roc_cfg {
> > >  enum mwifiex_iface_work_flags {
> > >  	MWIFIEX_IFACE_WORK_DEVICE_DUMP,
> > >  	MWIFIEX_IFACE_WORK_CARD_RESET,
> > > +	MWIFIEX_IFACE_WORK_DONT_RUN,
> > >  };
> > >  
> > >  struct mwifiex_private {
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > index a0d9180..bb3d798 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > @@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
> > >  	if (!adapter || !adapter->priv_num)
> > >  		return;
> > >  
> > > +	set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags);
> > >  	cancel_work_sync(&card->work);
> > >  
> > >  	reg = card->pcie.reg;
> > > @@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct *work)
> > >  	struct pcie_service_card *card =
> > >  		container_of(work, struct pcie_service_card, work);
> > >  
> > > +	if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags))
> > > +		return;
> > 
> > I do not see how this could possible prevent use-after-free, assuming
> > that the "card" memory is gone by the time mwifiex_pcie_work() gets to
> > run.
> 
> The 'card' memory isn't getting freed; it's the 'adapter' memory we're
> worried about. This is either already freed (because the FW init
> procedure failed), or else it's freed later in this function via
> mwifiex_remove_card().

I guess there was a slight miscommunication here: Dmitry pointed out to
me that he *was* actually talking about 'card' getting freed -- when it
gets freed after remove() finishes.

So the sequence would have to go like:

1. enter remove()
2. set DONT_RUN flag; cancel_work_sync()
3. begin to shutdown firmware
4. hit, e.g., a command timeout that schedules the work again
5. ** scheduler decides not to schedule the work for a while **
6. we finish mwifiex_remove_card(), and exit from remove() successfully
7. devm_* frees the pcie_service_card (and enclosed work_struct)
8. scheduler tries to run our work item
9. use-after-free!

However unlikely that the delay from 4 to 8 might be, this is indeed a
race condition.

> (We're also worried about having the FW dump race with the FW shutdown
> sequence, which can begin later in this function. This patch blocks both
> races AFAICT.)
> 
> > You need to check this flag before queueing firmware dump work, and
> > make sure it is not racy with setting this flag in mwifiex_pcie_remove()
> > (and sdio).
> 
> That's another approach that could work, but it's a little more
> invasive.

Never mind, that isn't too invasive. There's only one schedule_work() in
pcie.c and two in sdio.c. We could even factor out a helper, that knows
how to check the appropriate MWIFIEX_IFACE_* flags, if we really wanted
to...

Brian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
  2017-03-16 19:38     ` Brian Norris
@ 2017-03-16 20:52       ` Brian Norris
  0 siblings, 0 replies; 5+ messages in thread
From: Brian Norris @ 2017-03-16 20:52 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Amitkumar Karwar, linux-wireless, Cathy Luo, Nishant Sarmukadam, rajatja

On Thu, Mar 16, 2017 at 12:38:57PM -0700, Brian Norris wrote:
> On Thu, Mar 16, 2017 at 11:41:15AM -0700, Brian Norris wrote:
> > On Thu, Mar 16, 2017 at 11:33:17AM -0700, Dmitry Torokhov wrote:
> > > You need to check this flag before queueing firmware dump work, and
> > > make sure it is not racy with setting this flag in mwifiex_pcie_remove()
> > > (and sdio).
> > 
> > That's another approach that could work, but it's a little more
> > invasive.
> 
> Never mind, that isn't too invasive. There's only one schedule_work() in
> pcie.c and two in sdio.c. We could even factor out a helper, that knows
> how to check the appropriate MWIFIEX_IFACE_* flags, if we really wanted
> to...

OK, so I took a crack at implementing this, and after thinking about it,
the "make sure it is not racy with setting this flag" part is tougher
than it seems. In the end, I think the key is that to eliminate the
race between setting and checking the flag, we just want to halt all
sources of more work -- e.g., commands (which could time out), or
debugfs entries (which could trigger a FW dump manually) -- without
fiddling with extra flags. We do this already in the first half of
mwifiex_remove_card(), when we terminate the main workqueue(s) and
unregister the net and wiphy devices.

IOW, we can move the cancel_work_sync() into the .cleanup_if() callback,
which occurs after the above described teardown, but before the PCIe
driver has actually called things like pci_disable_device() [1]. Then we
don't need any DONT_RUN flag either.

I'll test the above a bit more here, then send a v3 myself, with the
above reasoning captured. I *think* that should eliminate all the races
we've discussed here.

Brian


[1] BTW, I think I previously blamed mwifiex_init_shutdown_fw() for
    racing with the FW dumper; I think that is not actually the smoking
    gun (it was an educated guess). Based on testing, I see aborts if
    we're still accessing the PCIe device (e.g., in the FW dumper) after
    mwifiex_cleanup_pcie() -> pci_disable_device().

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-03-16 21:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16 10:28 [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout Amitkumar Karwar
2017-03-16 18:33 ` Dmitry Torokhov
2017-03-16 18:41   ` Brian Norris
2017-03-16 19:38     ` Brian Norris
2017-03-16 20:52       ` Brian Norris

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.