linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mwifiex: fix nested rtnl locking on BG_SCAN_STOPPED
@ 2018-04-25  5:19 Ganapathi Bhat
  2018-04-25  7:16 ` Kalle Valo
  0 siblings, 1 reply; 7+ messages in thread
From: Ganapathi Bhat @ 2018-04-25  5:19 UTC (permalink / raw)
  To: linux-wireless
  Cc: Brian Norris, Cathy Luo, Xinming Hu, Zhiyuan Yang, James Cao,
	Mangesh Malusare, Ganapathi Bhat

Whenever sched_scan(BG_SCAN) is in progress and driver downloads
any command, firmware will send an event BG_SCAN_STOPPED. On
recieving this, driver calls cfg80211_sched_scan_stopped. This
function in turn will try to acquire rtnl_lock. But if the
rtnl_lock was already held(while sending the command above), this
will result in nested rtnl locking. To fix this driver must call
rtnl version of the API if rtnl_is_locked().

Signed-off-by: Cathy Luo <cluo@marvell.com>
Signed-off-by: Ganapathi Bhat <gbhat@marvell.com>
---
 drivers/net/wireless/marvell/mwifiex/sta_event.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/sta_event.c b/drivers/net/wireless/marvell/mwifiex/sta_event.c
index 93dfb76..03ef625c 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_event.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c
@@ -848,7 +848,10 @@ int mwifiex_process_sta_event(struct mwifiex_private *priv)
 
 	case EVENT_BG_SCAN_STOPPED:
 		dev_dbg(adapter->dev, "event: BGS_STOPPED\n");
-		cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
+		if (rtnl_is_locked())
+			cfg80211_sched_scan_stopped_rtnl(priv->wdev.wiphy, 0);
+		else
+			cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
 		if (priv->sched_scanning)
 			priv->sched_scanning = false;
 		break;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mwifiex: fix nested rtnl locking on BG_SCAN_STOPPED
  2018-04-25  5:19 [PATCH] mwifiex: fix nested rtnl locking on BG_SCAN_STOPPED Ganapathi Bhat
@ 2018-04-25  7:16 ` Kalle Valo
  2018-04-25  8:06   ` [EXT] " Ganapathi Bhat
  2018-06-27 15:00   ` Ganapathi Bhat
  0 siblings, 2 replies; 7+ messages in thread
From: Kalle Valo @ 2018-04-25  7:16 UTC (permalink / raw)
  To: Ganapathi Bhat
  Cc: linux-wireless, Brian Norris, Cathy Luo, Xinming Hu,
	Zhiyuan Yang, James Cao, Mangesh Malusare

Ganapathi Bhat <gbhat@marvell.com> writes:

> Whenever sched_scan(BG_SCAN) is in progress and driver downloads
> any command, firmware will send an event BG_SCAN_STOPPED. On
> recieving this, driver calls cfg80211_sched_scan_stopped. This
> function in turn will try to acquire rtnl_lock. But if the
> rtnl_lock was already held(while sending the command above), this
> will result in nested rtnl locking. To fix this driver must call
> rtnl version of the API if rtnl_is_locked().
>
> Signed-off-by: Cathy Luo <cluo@marvell.com>
> Signed-off-by: Ganapathi Bhat <gbhat@marvell.com>

Which one is the author? If Cathy is the author, you should add a From
header to indicate that.

> --- a/drivers/net/wireless/marvell/mwifiex/sta_event.c
> +++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c
> @@ -848,7 +848,10 @@ int mwifiex_process_sta_event(struct mwifiex_private *priv)
>  
>  	case EVENT_BG_SCAN_STOPPED:
>  		dev_dbg(adapter->dev, "event: BGS_STOPPED\n");
> -		cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
> +		if (rtnl_is_locked())
> +			cfg80211_sched_scan_stopped_rtnl(priv->wdev.wiphy, 0);
> +		else
> +			cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);

IMHO checking if a lock is taking is rather ugly and an indication
there's a problem with the locking. Instead making an ugly workaround
like this I would rather investigate who is holding the rtnl and solve
that.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [EXT] Re: [PATCH] mwifiex: fix nested rtnl locking on BG_SCAN_STOPPED
  2018-04-25  7:16 ` Kalle Valo
@ 2018-04-25  8:06   ` Ganapathi Bhat
  2018-12-08  0:58     ` Brian Norris
  2018-06-27 15:00   ` Ganapathi Bhat
  1 sibling, 1 reply; 7+ messages in thread
From: Ganapathi Bhat @ 2018-04-25  8:06 UTC (permalink / raw)
  To: Kalle Valo
  Cc: linux-wireless, Brian Norris, Cathy Luo, Xinming Hu,
	Zhiyuan Yang, James Cao, Mangesh Malusare

Hi Kalle,

> ----------------------------------------------------------------------
> Ganapathi Bhat <gbhat@marvell.com> writes:
>
> > Whenever sched_scan(BG_SCAN) is in progress and driver downloads any
> > command, firmware will send an event BG_SCAN_STOPPED. On recieving
> > this, driver calls cfg80211_sched_scan_stopped. This function in turn
> > will try to acquire rtnl_lock. But if the rtnl_lock was already
> > held(while sending the command above), this will result in nested rtnl
> > locking. To fix this driver must call rtnl version of the API if
> > rtnl_is_locked().
> >
> > Signed-off-by: Cathy Luo <cluo@marvell.com>
> > Signed-off-by: Ganapathi Bhat <gbhat@marvell.com>
>
> Which one is the author? If Cathy is the author, you should add a From
> header to indicate that.
Sorry for the miss. Myself was the author. Will send v2 with corrections once we address the below comment.
>
> > --- a/drivers/net/wireless/marvell/mwifiex/sta_event.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c
> > @@ -848,7 +848,10 @@ int mwifiex_process_sta_event(struct
> > mwifiex_private *priv)
> >
> >     case EVENT_BG_SCAN_STOPPED:
> >             dev_dbg(adapter->dev, "event: BGS_STOPPED\n");
> > -           cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
> > +           if (rtnl_is_locked())
> > +                   cfg80211_sched_scan_stopped_rtnl(priv-
> >wdev.wiphy, 0);
> > +           else
> > +                   cfg80211_sched_scan_stopped(priv->wdev.wiphy,
> 0);
>
> IMHO checking if a lock is taking is rather ugly and an indication there's a
> problem with the locking. Instead making an ugly workaround like this I
> would rather investigate who is holding the rtnl and solve that.
There can be applications trying to access driver(via cfg80211), holding rtnl_lock.
For example(in our case):
1. "iw dev"  was called, when BG_SCAN was active.
2. NL80211_CMD_GET_INTERFACE requires rtnl_lock to be hold(specified in internal_flags)
3. cfg80211 will  hold rtnl_lock before calling "get_tx_power"(in pre_doit).
4. mwifiex will download RF_TX_PWR command to firmware
5. firmware will send BG_SCAN_STOPPED event(since BG_SCAN was active).
6. mwifiex will call "cfg80211_sched_scan_stopped" causing nested rtnl locking.

Please share your comments further.
>
> --
> Kalle Valo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [EXT] Re: [PATCH] mwifiex: fix nested rtnl locking on BG_SCAN_STOPPED
  2018-04-25  7:16 ` Kalle Valo
  2018-04-25  8:06   ` [EXT] " Ganapathi Bhat
@ 2018-06-27 15:00   ` Ganapathi Bhat
  2018-06-27 15:22     ` Kalle Valo
  1 sibling, 1 reply; 7+ messages in thread
From: Ganapathi Bhat @ 2018-06-27 15:00 UTC (permalink / raw)
  To: Kalle Valo
  Cc: linux-wireless, Brian Norris, Cathy Luo, 'Xinming Hu',
	Zhiyuan Yang, James Cao, Mangesh Malusare

Hi Kalle,

> > >
> > >   case EVENT_BG_SCAN_STOPPED:
> > >           dev_dbg(adapter->dev, "event: BGS_STOPPED\n");
> > > -         cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
> > > +         if (rtnl_is_locked())
> > > +                 cfg80211_sched_scan_stopped_rtnl(priv-
> > >wdev.wiphy, 0);
> > > +         else
> > > +                 cfg80211_sched_scan_stopped(priv->wdev.wiphy,
> > 0);
> >
> > IMHO checking if a lock is taking is rather ugly and an indication there's a
> > problem with the locking. Instead making an ugly workaround like this I
> > would rather investigate who is holding the rtnl and solve that.
> There can be applications trying to access driver(via cfg80211), holding
> rtnl_lock.
> For example(in our case):
> 1. "iw dev"  was called, when BG_SCAN was active.
> 2. NL80211_CMD_GET_INTERFACE requires rtnl_lock to be hold(specified in
> internal_flags)
> 3. cfg80211 will  hold rtnl_lock before calling "get_tx_power"(in pre_doit).
> 4. mwifiex will download RF_TX_PWR command to firmware
> 5. firmware will send BG_SCAN_STOPPED event(since BG_SCAN was active).
> 6. mwifiex will call "cfg80211_sched_scan_stopped" causing nested rtnl
> locking.
>
> Please share your comments further.
Let us know your thoughts on above sequence.

Regards,
Ganapathi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [EXT] Re: [PATCH] mwifiex: fix nested rtnl locking on BG_SCAN_STOPPED
  2018-06-27 15:00   ` Ganapathi Bhat
@ 2018-06-27 15:22     ` Kalle Valo
  0 siblings, 0 replies; 7+ messages in thread
From: Kalle Valo @ 2018-06-27 15:22 UTC (permalink / raw)
  To: Ganapathi Bhat
  Cc: linux-wireless, Brian Norris, Cathy Luo, 'Xinming Hu',
	Zhiyuan Yang, James Cao, Mangesh Malusare

Ganapathi Bhat <gbhat@marvell.com> writes:

> Hi Kalle,
>
>> > >
>> > >   case EVENT_BG_SCAN_STOPPED:
>> > >           dev_dbg(adapter->dev, "event: BGS_STOPPED\n");
>> > > -         cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
>> > > +         if (rtnl_is_locked())
>> > > +                 cfg80211_sched_scan_stopped_rtnl(priv-
>> > >wdev.wiphy, 0);
>> > > +         else
>> > > +                 cfg80211_sched_scan_stopped(priv->wdev.wiphy,
>> > 0);
>> >
>> > IMHO checking if a lock is taking is rather ugly and an indication there's a
>> > problem with the locking. Instead making an ugly workaround like this I
>> > would rather investigate who is holding the rtnl and solve that.
>> There can be applications trying to access driver(via cfg80211), holding
>> rtnl_lock.
>> For example(in our case):
>> 1. "iw dev"  was called, when BG_SCAN was active.
>> 2. NL80211_CMD_GET_INTERFACE requires rtnl_lock to be hold(specified in
>> internal_flags)
>> 3. cfg80211 will  hold rtnl_lock before calling "get_tx_power"(in pre_doit).
>> 4. mwifiex will download RF_TX_PWR command to firmware
>> 5. firmware will send BG_SCAN_STOPPED event(since BG_SCAN was active).
>> 6. mwifiex will call "cfg80211_sched_scan_stopped" causing nested rtnl
>> locking.
>>
>> Please share your comments further.
>
> Let us know your thoughts on above sequence.

Sorry, I don't have time to help with this. Hopefully someone else can.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [EXT] Re: [PATCH] mwifiex: fix nested rtnl locking on BG_SCAN_STOPPED
  2018-04-25  8:06   ` [EXT] " Ganapathi Bhat
@ 2018-12-08  0:58     ` Brian Norris
  2018-12-10  3:27       ` Ganapathi Bhat
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Norris @ 2018-12-08  0:58 UTC (permalink / raw)
  To: Ganapathi Bhat
  Cc: Kalle Valo, linux-wireless, Cathy Luo, Xinming Hu, Zhiyuan Yang,
	James Cao, Mangesh Malusare

Hi Ganapathi,

For some reason, I'm daft enough to reply to this ancient thread. It
appears as if you probably have not resolved this issue yet though, so
I figured I could lend some advice.

On Wed, Apr 25, 2018 at 1:06 AM Ganapathi Bhat <gbhat@marvell.com> wrote:
> > Ganapathi Bhat <gbhat@marvell.com> writes:
> > > --- a/drivers/net/wireless/marvell/mwifiex/sta_event.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c
> > > @@ -848,7 +848,10 @@ int mwifiex_process_sta_event(struct
> > > mwifiex_private *priv)
> > >
> > >     case EVENT_BG_SCAN_STOPPED:
> > >             dev_dbg(adapter->dev, "event: BGS_STOPPED\n");
> > > -           cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0);
> > > +           if (rtnl_is_locked())
> > > +                   cfg80211_sched_scan_stopped_rtnl(priv-
> > >wdev.wiphy, 0);
> > > +           else
> > > +                   cfg80211_sched_scan_stopped(priv->wdev.wiphy,
> > 0);
> >
> > IMHO checking if a lock is taking is rather ugly and an indication there's a
> > problem with the locking. Instead making an ugly workaround like this I
> > would rather investigate who is holding the rtnl and solve that.

Agreed, this is not good. You are now bound to hit ASSERT_RTNL()
warnings occasionally, since you might see rtnl locked here, but a
split second later, when you're running in
__cfg80211_stop_sched_scan(), it could be released by some other
thread.

> There can be applications trying to access driver(via cfg80211), holding rtnl_lock.
> For example(in our case):
> 1. "iw dev"  was called, when BG_SCAN was active.
> 2. NL80211_CMD_GET_INTERFACE requires rtnl_lock to be hold(specified in internal_flags)
> 3. cfg80211 will  hold rtnl_lock before calling "get_tx_power"(in pre_doit).
> 4. mwifiex will download RF_TX_PWR command to firmware
> 5. firmware will send BG_SCAN_STOPPED event(since BG_SCAN was active).
> 6. mwifiex will call "cfg80211_sched_scan_stopped" causing nested rtnl locking.

IIUC, it's not exactly a nested lock, but a lock inversion issue.

#1
NL80211_CMD_GET_INTERFACE thread is holding rtnl lock and later
waiting on one of its HostCmd_CMD_* to complete

In the meantime:
#2
a EVENT_BG_SCAN_STOPPED is queued, and it grabs the rtnl lock

Because events are served on the same thread as commands, you get #1
waiting on #2, which is waiting on the lock held in #1. i.e., ABBA.

The way to resolve this is to either move event processing to a
different thread than command processing (that looks it would be very
difficult to do correctly, given the ossified structure of your
driver; but that might be a wise move in the long term)...
...or else maybe you could defer this specific piece of work to its
own thread. That might require yet another workqueue?

Anyway, the key point is that you really don't want to hold rtnl_lock
in your main workqueue, or in any other way that might block the rest
of the operation of your driver.

Brian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [EXT] Re: [PATCH] mwifiex: fix nested rtnl locking on BG_SCAN_STOPPED
  2018-12-08  0:58     ` Brian Norris
@ 2018-12-10  3:27       ` Ganapathi Bhat
  0 siblings, 0 replies; 7+ messages in thread
From: Ganapathi Bhat @ 2018-12-10  3:27 UTC (permalink / raw)
  To: Brian Norris
  Cc: Kalle Valo, linux-wireless, Cathy Luo, Xinming Hu, Zhiyuan Yang,
	James Cao, Mangesh Malusare

Hi Brian,
> 
> IIUC, it's not exactly a nested lock, but a lock inversion issue.
> 
> #1
> NL80211_CMD_GET_INTERFACE thread is holding rtnl lock and later waiting
> on one of its HostCmd_CMD_* to complete
> 
> In the meantime:
> #2
> a EVENT_BG_SCAN_STOPPED is queued, and it grabs the rtnl lock
> 
> Because events are served on the same thread as commands, you get #1
> waiting on #2, which is waiting on the lock held in #1. i.e., ABBA.
> 
> The way to resolve this is to either move event processing to a different
> thread than command processing (that looks it would be very difficult to do
> correctly, given the ossified structure of your driver; but that might be a wise
> move in the long term)...
> ...or else maybe you could defer this specific piece of work to its own thread.
> That might require yet another workqueue?
Yes, with current design we should be doing this. We will try to get this change.
> 
> Anyway, the key point is that you really don't want to hold rtnl_lock in your
> main workqueue, or in any other way that might block the rest of the
> operation of your driver.
Ok. Thanks for clarifying the scenario. I missed out the fact that same thread is blocked by the lock.
> 
> Brian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-12-10  3:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25  5:19 [PATCH] mwifiex: fix nested rtnl locking on BG_SCAN_STOPPED Ganapathi Bhat
2018-04-25  7:16 ` Kalle Valo
2018-04-25  8:06   ` [EXT] " Ganapathi Bhat
2018-12-08  0:58     ` Brian Norris
2018-12-10  3:27       ` Ganapathi Bhat
2018-06-27 15:00   ` Ganapathi Bhat
2018-06-27 15:22     ` Kalle Valo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).