All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cfg80211: Warn once for delayed scan completion
@ 2014-10-15 14:57 Sujith Manoharan
  2014-10-16 20:38 ` Emmanuel Grumbach
  0 siblings, 1 reply; 17+ messages in thread
From: Sujith Manoharan @ 2014-10-15 14:57 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless

From: Sujith Manoharan <c_manoha@qca.qualcomm.com>

Scan completion for drivers that use HW scanning
causes a warning to be hit if the interface has been
removed. Since cfg80211 ensures that both use-after-free
and scheduling a new scan doesn't happen in this case,
WARN_ON_ONCE can be used. With ath9k this warning is seen
many times when multi-channel-concurrency is enabled.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
---
 net/wireless/core.c | 2 +-
 net/wireless/scan.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/wireless/core.c b/net/wireless/core.c
index f52a4cd..2be08e8 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -931,7 +931,7 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
 	case NETDEV_DOWN:
 		cfg80211_update_iface_num(rdev, wdev->iftype, -1);
 		if (rdev->scan_req && rdev->scan_req->wdev == wdev) {
-			if (WARN_ON(!rdev->scan_req->notified))
+			if (WARN_ON_ONCE(!rdev->scan_req->notified))
 				rdev->scan_req->aborted = true;
 			___cfg80211_scan_done(rdev, false);
 		}
diff --git a/net/wireless/scan.c b/net/wireless/scan.c
index bda39f1..c0fc0b3 100644
--- a/net/wireless/scan.c
+++ b/net/wireless/scan.c
@@ -239,7 +239,7 @@ void __cfg80211_scan_done(struct work_struct *wk)
 void cfg80211_scan_done(struct cfg80211_scan_request *request, bool aborted)
 {
 	trace_cfg80211_scan_done(request, aborted);
-	WARN_ON(request != wiphy_to_rdev(request->wiphy)->scan_req);
+	WARN_ON_ONCE(request != wiphy_to_rdev(request->wiphy)->scan_req);
 
 	request->aborted = aborted;
 	request->notified = true;
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-15 14:57 [PATCH] cfg80211: Warn once for delayed scan completion Sujith Manoharan
@ 2014-10-16 20:38 ` Emmanuel Grumbach
  2014-10-17  1:32   ` Sujith Manoharan
  0 siblings, 1 reply; 17+ messages in thread
From: Emmanuel Grumbach @ 2014-10-16 20:38 UTC (permalink / raw)
  To: Sujith Manoharan; +Cc: Johannes Berg, linux-wireless

On Wed, Oct 15, 2014 at 5:57 PM, Sujith Manoharan <sujith@msujith.org> wrote:
> From: Sujith Manoharan <c_manoha@qca.qualcomm.com>
>
> Scan completion for drivers that use HW scanning
> causes a warning to be hit if the interface has been
> removed. Since cfg80211 ensures that both use-after-free
> and scheduling a new scan doesn't happen in this case,
> WARN_ON_ONCE can be used. With ath9k this warning is seen
> many times when multi-channel-concurrency is enabled.
>
> Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>

It is not really my business, but how come this happen with ath9k?
The scan should be cancelled before the interface is removed. You mean
that the scan cancellation is
asynchronous and the notification of the cancellation termination can
come after the cfg80211 has removed
the interface?

> ---
>  net/wireless/core.c | 2 +-
>  net/wireless/scan.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/wireless/core.c b/net/wireless/core.c
> index f52a4cd..2be08e8 100644
> --- a/net/wireless/core.c
> +++ b/net/wireless/core.c
> @@ -931,7 +931,7 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
>         case NETDEV_DOWN:
>                 cfg80211_update_iface_num(rdev, wdev->iftype, -1);
>                 if (rdev->scan_req && rdev->scan_req->wdev == wdev) {
> -                       if (WARN_ON(!rdev->scan_req->notified))
> +                       if (WARN_ON_ONCE(!rdev->scan_req->notified))
>                                 rdev->scan_req->aborted = true;
>                         ___cfg80211_scan_done(rdev, false);
>                 }
> diff --git a/net/wireless/scan.c b/net/wireless/scan.c
> index bda39f1..c0fc0b3 100644
> --- a/net/wireless/scan.c
> +++ b/net/wireless/scan.c
> @@ -239,7 +239,7 @@ void __cfg80211_scan_done(struct work_struct *wk)
>  void cfg80211_scan_done(struct cfg80211_scan_request *request, bool aborted)
>  {
>         trace_cfg80211_scan_done(request, aborted);
> -       WARN_ON(request != wiphy_to_rdev(request->wiphy)->scan_req);
> +       WARN_ON_ONCE(request != wiphy_to_rdev(request->wiphy)->scan_req);
>
>         request->aborted = aborted;
>         request->notified = true;
> --
> 2.1.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-16 20:38 ` Emmanuel Grumbach
@ 2014-10-17  1:32   ` Sujith Manoharan
  2014-10-20  9:33     ` Johannes Berg
  0 siblings, 1 reply; 17+ messages in thread
From: Sujith Manoharan @ 2014-10-17  1:32 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Johannes Berg, linux-wireless

Emmanuel Grumbach wrote:
> It is not really my business, but how come this happen with ath9k?
> The scan should be cancelled before the interface is removed. You mean
> that the scan cancellation is
> asynchronous and the notification of the cancellation termination can
> come after the cfg80211 has removed
> the interface?

Yes, the scan cancellation is initiated on interface removal,
but it's async. These commits explain the race:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/wireless/core.c?id=a617302c531eaf497ccd02a61d380efc119ba999
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/wireless/core.c?id=f9d15d162b3acf28f85b3ac05c4883e5ed588d28

We see it often with ath9k on a low-powered board, with
a 'state manager' that plays with the interfaces.

Sujith

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-17  1:32   ` Sujith Manoharan
@ 2014-10-20  9:33     ` Johannes Berg
  2014-10-20  9:42       ` Sujith Manoharan
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2014-10-20  9:33 UTC (permalink / raw)
  To: Sujith Manoharan; +Cc: Emmanuel Grumbach, linux-wireless

On Fri, 2014-10-17 at 07:02 +0530, Sujith Manoharan wrote:

> Yes, the scan cancellation is initiated on interface removal,
> but it's async. These commits explain the race:
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/wireless/core.c?id=a617302c531eaf497ccd02a61d380efc119ba999
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/wireless/core.c?id=f9d15d162b3acf28f85b3ac05c4883e5ed588d28
> 
> We see it often with ath9k on a low-powered board, with
> a 'state manager' that plays with the interfaces.

Those commits say they *fix* the race though, so in case I was wrong you
should report the sequence of events so we can look at fixing it
properly?

I'm not going to apply this patch, it's not even a band-aid, it's only
like an aspirin pill that makes the pain go away...

johannes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-20  9:33     ` Johannes Berg
@ 2014-10-20  9:42       ` Sujith Manoharan
  2014-10-20  9:51         ` Johannes Berg
  0 siblings, 1 reply; 17+ messages in thread
From: Sujith Manoharan @ 2014-10-20  9:42 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Emmanuel Grumbach, linux-wireless

Johannes Berg wrote:
> Those commits say they *fix* the race though, so in case I was wrong you
> should report the sequence of events so we can look at fixing it
> properly?

The commits fix both the issues (use-after-free, new scan) correctly,
so I don't see a race ? If the driver is slow in notifying scan completion
and the interface is removed in the meantime, this warning is hit.

Sujith

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-20  9:42       ` Sujith Manoharan
@ 2014-10-20  9:51         ` Johannes Berg
  2014-10-20 10:27           ` Sujith Manoharan
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2014-10-20  9:51 UTC (permalink / raw)
  To: Sujith Manoharan; +Cc: Emmanuel Grumbach, linux-wireless

On Mon, 2014-10-20 at 15:12 +0530, Sujith Manoharan wrote:
> Johannes Berg wrote:
> > Those commits say they *fix* the race though, so in case I was wrong you
> > should report the sequence of events so we can look at fixing it
> > properly?
> 
> The commits fix both the issues (use-after-free, new scan) correctly,
> so I don't see a race ? If the driver is slow in notifying scan completion
> and the interface is removed in the meantime, this warning is hit.

But the driver can/should just do it when the interface is removed from
it? Otherwise it has no valid way to run/continue the scan anyway?

johannes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-20  9:51         ` Johannes Berg
@ 2014-10-20 10:27           ` Sujith Manoharan
  2014-10-20 10:37             ` Johannes Berg
  0 siblings, 1 reply; 17+ messages in thread
From: Sujith Manoharan @ 2014-10-20 10:27 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Emmanuel Grumbach, linux-wireless

Johannes Berg wrote:
> But the driver can/should just do it when the interface is removed from
> it? Otherwise it has no valid way to run/continue the scan anyway?

cancel_hw_scan() is called before remove_interface() and if the
interface is removed in cfg80211 before __cfg80211_scan_done() gets
a chance to run, then this warning is hit.

a617302c531eaf497ccd02a61d380efc119ba999 mentions that the work item
cannot be flushed and prevents a use-after-free crash.

Sujith

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-20 10:27           ` Sujith Manoharan
@ 2014-10-20 10:37             ` Johannes Berg
  2014-10-20 10:56               ` Sujith Manoharan
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2014-10-20 10:37 UTC (permalink / raw)
  To: Sujith Manoharan; +Cc: Emmanuel Grumbach, linux-wireless

On Mon, 2014-10-20 at 15:57 +0530, Sujith Manoharan wrote:
> Johannes Berg wrote:
> > But the driver can/should just do it when the interface is removed from
> > it? Otherwise it has no valid way to run/continue the scan anyway?
> 
> cancel_hw_scan() is called before remove_interface() and if the
> interface is removed in cfg80211 before __cfg80211_scan_done() gets
> a chance to run, then this warning is hit.

I'm not sure that's true - the WARN_ON you're modifying keys off
something that happens in cfg80211_scan_done(), not
__cfg80211_scan_done(), so as long as the driver called
cfg80211_scan_done() properly things should be OK?

How are you implementing cancel_hw_scan()? You need to call
ieee80211_scan_completed() by the time the function returns.

johannes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-20 10:37             ` Johannes Berg
@ 2014-10-20 10:56               ` Sujith Manoharan
  2014-10-20 12:02                 ` Johannes Berg
  0 siblings, 1 reply; 17+ messages in thread
From: Sujith Manoharan @ 2014-10-20 10:56 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Emmanuel Grumbach, linux-wireless

Johannes Berg wrote:
> I'm not sure that's true - the WARN_ON you're modifying keys off
> something that happens in cfg80211_scan_done(), not
> __cfg80211_scan_done(), so as long as the driver called
> cfg80211_scan_done() properly things should be OK?

Here is a trace log showing the issue - cfg80211_scan_done()
is called after NETDEV_DOWN is received in cfg80211_netdev_notifier_call().

http://pastebin.com/raw.php?i=AJqFLtZR

I've removed entries to show the issue clearly, if
you need the full trace, please let me know.

> How are you implementing cancel_hw_scan()? You need to call
> ieee80211_scan_completed() by the time the function returns.

Yes, that's what we do in ath9k.

Sujith

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-20 10:56               ` Sujith Manoharan
@ 2014-10-20 12:02                 ` Johannes Berg
  2014-10-20 12:45                   ` Johannes Berg
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2014-10-20 12:02 UTC (permalink / raw)
  To: Sujith Manoharan; +Cc: Emmanuel Grumbach, linux-wireless

On Mon, 2014-10-20 at 16:26 +0530, Sujith Manoharan wrote:

> Here is a trace log showing the issue - cfg80211_scan_done()
> is called after NETDEV_DOWN is received in cfg80211_netdev_notifier_call().
> 
> http://pastebin.com/raw.php?i=AJqFLtZR
> 
> I've removed entries to show the issue clearly, if
> you need the full trace, please let me know.

Ah, so mac80211 is also delaying and not doing the right thing ... need
to look at that.

johannes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-20 12:02                 ` Johannes Berg
@ 2014-10-20 12:45                   ` Johannes Berg
  2014-10-21  3:56                     ` Sujith Manoharan
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2014-10-20 12:45 UTC (permalink / raw)
  To: Sujith Manoharan; +Cc: Emmanuel Grumbach, linux-wireless

On Mon, 2014-10-20 at 14:02 +0200, Johannes Berg wrote:
> On Mon, 2014-10-20 at 16:26 +0530, Sujith Manoharan wrote:
> 
> > Here is a trace log showing the issue - cfg80211_scan_done()
> > is called after NETDEV_DOWN is received in cfg80211_netdev_notifier_call().
> > 
> > http://pastebin.com/raw.php?i=AJqFLtZR
> > 
> > I've removed entries to show the issue clearly, if
> > you need the full trace, please let me know.
> 
> Ah, so mac80211 is also delaying and not doing the right thing ... need
> to look at that.

Maybe we need something like this:

diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
index af0d094b2f2f..45b74ab1c59d 100644
--- a/net/mac80211/scan.c
+++ b/net/mac80211/scan.c
@@ -985,7 +985,6 @@ void ieee80211_scan_cancel(struct ieee80211_local *local)
 			drv_cancel_hw_scan(local,
 				rcu_dereference_protected(local->scan_sdata,
 						lockdep_is_held(&local->mtx)));
-		goto out;
 	}
 
 	/*

johannes


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-20 12:45                   ` Johannes Berg
@ 2014-10-21  3:56                     ` Sujith Manoharan
  2014-10-21  6:42                       ` Johannes Berg
  0 siblings, 1 reply; 17+ messages in thread
From: Sujith Manoharan @ 2014-10-21  3:56 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Emmanuel Grumbach, linux-wireless

Johannes Berg wrote:
> Maybe we need something like this:
> 
> diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
> index af0d094b2f2f..45b74ab1c59d 100644
> --- a/net/mac80211/scan.c
> +++ b/net/mac80211/scan.c
> @@ -985,7 +985,6 @@ void ieee80211_scan_cancel(struct ieee80211_local *local)
>  			drv_cancel_hw_scan(local,
>  				rcu_dereference_protected(local->scan_sdata,
>  						lockdep_is_held(&local->mtx)));
> -		goto out;
>  	}

Is the big comment at the beginning of ieee80211_scan_cancel()
outdated ? It does seem valid to me, at least the point about
not cancelling scan_work for HW scan.

Sujith

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-21  3:56                     ` Sujith Manoharan
@ 2014-10-21  6:42                       ` Johannes Berg
  2014-10-21 12:27                         ` Eliad Peller
  2014-10-21 13:18                         ` Sujith Manoharan
  0 siblings, 2 replies; 17+ messages in thread
From: Johannes Berg @ 2014-10-21  6:42 UTC (permalink / raw)
  To: Sujith Manoharan; +Cc: Emmanuel Grumbach, linux-wireless

On Tue, 2014-10-21 at 09:26 +0530, Sujith Manoharan wrote:
> Johannes Berg wrote:
> > Maybe we need something like this:
> > 
> > diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
> > index af0d094b2f2f..45b74ab1c59d 100644
> > --- a/net/mac80211/scan.c
> > +++ b/net/mac80211/scan.c
> > @@ -985,7 +985,6 @@ void ieee80211_scan_cancel(struct ieee80211_local *local)
> >  			drv_cancel_hw_scan(local,
> >  				rcu_dereference_protected(local->scan_sdata,
> >  						lockdep_is_held(&local->mtx)));
> > -		goto out;
> >  	}
> 
> Is the big comment at the beginning of ieee80211_scan_cancel()
> outdated ? It does seem valid to me, at least the point about
> not cancelling scan_work for HW scan.

Yeah you're right.

However, when the interface is deleted, the driver must still also
remove the scan and definitely schedule the work, so most drivers will
really always do so in cancel_hw_scan()? We could change the API and
require that to be synchronous, like in the interface removal case, but
maybe we don't want to.

How about this then?

diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index e469b3390f2a..6f1b90eb568c 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -766,10 +766,12 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
 	int i, flushed;
 	struct ps_data *ps;
 	struct cfg80211_chan_def chandef;
+	bool cancel_scan;
 
 	clear_bit(SDATA_STATE_RUNNING, &sdata->state);
 
-	if (rcu_access_pointer(local->scan_sdata) == sdata)
+	cancel_scan = rcu_access_pointer(local->scan_sdata) == sdata;
+	if (cancel_scan)
 		ieee80211_scan_cancel(local);
 
 	/*
@@ -993,6 +995,9 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
 
 	ieee80211_recalc_ps(local, -1);
 
+	if (cancel_scan)
+		flush_delayed_work(&local->scan_work);
+
 	if (local->open_count == 0) {
 		ieee80211_stop_device(local);
 

johannes


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-21  6:42                       ` Johannes Berg
@ 2014-10-21 12:27                         ` Eliad Peller
  2014-10-21 12:50                           ` Johannes Berg
  2014-10-21 13:18                         ` Sujith Manoharan
  1 sibling, 1 reply; 17+ messages in thread
From: Eliad Peller @ 2014-10-21 12:27 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Sujith Manoharan, Emmanuel Grumbach, linux-wireless

On Tue, Oct 21, 2014 at 9:42 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Tue, 2014-10-21 at 09:26 +0530, Sujith Manoharan wrote:
>> Johannes Berg wrote:
>> > Maybe we need something like this:
>> >
>> > diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
>> > index af0d094b2f2f..45b74ab1c59d 100644
>> > --- a/net/mac80211/scan.c
>> > +++ b/net/mac80211/scan.c
>> > @@ -985,7 +985,6 @@ void ieee80211_scan_cancel(struct ieee80211_local *local)
>> >                     drv_cancel_hw_scan(local,
>> >                             rcu_dereference_protected(local->scan_sdata,
>> >                                             lockdep_is_held(&local->mtx)));
>> > -           goto out;
>> >     }
>>
>> Is the big comment at the beginning of ieee80211_scan_cancel()
>> outdated ? It does seem valid to me, at least the point about
>> not cancelling scan_work for HW scan.
>
> Yeah you're right.
>
> However, when the interface is deleted, the driver must still also
> remove the scan and definitely schedule the work, so most drivers will
> really always do so in cancel_hw_scan()? We could change the API and
> require that to be synchronous, like in the interface removal case, but
> maybe we don't want to.
>
> How about this then?
>
> diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
> index e469b3390f2a..6f1b90eb568c 100644
> --- a/net/mac80211/iface.c
> +++ b/net/mac80211/iface.c
> @@ -766,10 +766,12 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
>         int i, flushed;
>         struct ps_data *ps;
>         struct cfg80211_chan_def chandef;
> +       bool cancel_scan;
>
>         clear_bit(SDATA_STATE_RUNNING, &sdata->state);
>
> -       if (rcu_access_pointer(local->scan_sdata) == sdata)
> +       cancel_scan = rcu_access_pointer(local->scan_sdata) == sdata;
> +       if (cancel_scan)
>                 ieee80211_scan_cancel(local);
>
>         /*
> @@ -993,6 +995,9 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
>
>         ieee80211_recalc_ps(local, -1);
>
> +       if (cancel_scan)
> +               flush_delayed_work(&local->scan_work);
> +
>         if (local->open_count == 0) {
>                 ieee80211_stop_device(local);
>
This patch actually solves a kernel panic that can be reproduced
easily by increasing the delay in ieee80211_scan_completed() and
removing the driver right after initiating a scan - the delayed
scan_work never gets flushed, resulting in invalid memory access, etc.

Eliad.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-21 12:27                         ` Eliad Peller
@ 2014-10-21 12:50                           ` Johannes Berg
  2014-10-21 13:19                             ` Eliad Peller
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2014-10-21 12:50 UTC (permalink / raw)
  To: Eliad Peller; +Cc: Sujith Manoharan, Emmanuel Grumbach, linux-wireless

On Tue, 2014-10-21 at 15:27 +0300, Eliad Peller wrote:

> This patch actually solves a kernel panic that can be reproduced
> easily by increasing the delay in ieee80211_scan_completed() and
> removing the driver right after initiating a scan - the delayed
> scan_work never gets flushed, resulting in invalid memory access, etc.

Yeah I thought about that too - we flush the workqueue but if the timer
hasn't yet fired you can get issues.

OTOH, the delay there is 0, so the workqueue code doesn't even arm the
timer but rather puts the work on the workqueue immediately. Therefore
flush_work() should be enough?

Or did you not test with something like

@@ -357,7 +357,7 @@ void ieee80211_scan_completed(struct ieee80211_hw *hw, bool aborted)
        set_bit(SCAN_COMPLETED, &local->scanning);
        if (aborted)
                set_bit(SCAN_ABORTED, &local->scanning);
-       ieee80211_queue_delayed_work(&local->hw, &local->scan_work, 0);
+       ieee80211_queue_delayed_work(&local->hw, &local->scan_work, 1);


johannes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-21  6:42                       ` Johannes Berg
  2014-10-21 12:27                         ` Eliad Peller
@ 2014-10-21 13:18                         ` Sujith Manoharan
  1 sibling, 0 replies; 17+ messages in thread
From: Sujith Manoharan @ 2014-10-21 13:18 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Emmanuel Grumbach, linux-wireless

Johannes Berg wrote:
> Yeah you're right.
> 
> However, when the interface is deleted, the driver must still also
> remove the scan and definitely schedule the work, so most drivers will
> really always do so in cancel_hw_scan()? We could change the API and
> require that to be synchronous, like in the interface removal case, but
> maybe we don't want to.
> 
> How about this then?
> 
> diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
> index e469b3390f2a..6f1b90eb568c 100644
> --- a/net/mac80211/iface.c
> +++ b/net/mac80211/iface.c
> @@ -766,10 +766,12 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
>  	int i, flushed;
>  	struct ps_data *ps;
>  	struct cfg80211_chan_def chandef;
> +	bool cancel_scan;
>  
>  	clear_bit(SDATA_STATE_RUNNING, &sdata->state);
>  
> -	if (rcu_access_pointer(local->scan_sdata) == sdata)
> +	cancel_scan = rcu_access_pointer(local->scan_sdata) == sdata;
> +	if (cancel_scan)
>  		ieee80211_scan_cancel(local);
>  
>  	/*
> @@ -993,6 +995,9 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata,
>  
>  	ieee80211_recalc_ps(local, -1);
>  
> +	if (cancel_scan)
> +		flush_delayed_work(&local->scan_work);
> +
>  	if (local->open_count == 0) {
>  		ieee80211_stop_device(local);

Yep, this fixes the issue. Thanks !

Sujith

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cfg80211: Warn once for delayed scan completion
  2014-10-21 12:50                           ` Johannes Berg
@ 2014-10-21 13:19                             ` Eliad Peller
  0 siblings, 0 replies; 17+ messages in thread
From: Eliad Peller @ 2014-10-21 13:19 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Sujith Manoharan, Emmanuel Grumbach, linux-wireless

On Tue, Oct 21, 2014 at 3:50 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Tue, 2014-10-21 at 15:27 +0300, Eliad Peller wrote:
>
>> This patch actually solves a kernel panic that can be reproduced
>> easily by increasing the delay in ieee80211_scan_completed() and
>> removing the driver right after initiating a scan - the delayed
>> scan_work never gets flushed, resulting in invalid memory access, etc.
>
> Yeah I thought about that too - we flush the workqueue but if the timer
> hasn't yet fired you can get issues.
>
> OTOH, the delay there is 0, so the workqueue code doesn't even arm the
> timer but rather puts the work on the workqueue immediately. Therefore
> flush_work() should be enough?
>
> Or did you not test with something like
>
> @@ -357,7 +357,7 @@ void ieee80211_scan_completed(struct ieee80211_hw *hw, bool aborted)
>         set_bit(SCAN_COMPLETED, &local->scanning);
>         if (aborted)
>                 set_bit(SCAN_ABORTED, &local->scanning);
> -       ieee80211_queue_delayed_work(&local->hw, &local->scan_work, 0);
> +       ieee80211_queue_delayed_work(&local->hw, &local->scan_work, 1);
>
yeah, that's how i tested it (with a larger delay).

you're right about the delay=0.
(this might not be enough if scan_work is already delayed_queued, but
i currently don't see how this can happen)

anyway, the patch looks good.

Eliad.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-10-21 13:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-15 14:57 [PATCH] cfg80211: Warn once for delayed scan completion Sujith Manoharan
2014-10-16 20:38 ` Emmanuel Grumbach
2014-10-17  1:32   ` Sujith Manoharan
2014-10-20  9:33     ` Johannes Berg
2014-10-20  9:42       ` Sujith Manoharan
2014-10-20  9:51         ` Johannes Berg
2014-10-20 10:27           ` Sujith Manoharan
2014-10-20 10:37             ` Johannes Berg
2014-10-20 10:56               ` Sujith Manoharan
2014-10-20 12:02                 ` Johannes Berg
2014-10-20 12:45                   ` Johannes Berg
2014-10-21  3:56                     ` Sujith Manoharan
2014-10-21  6:42                       ` Johannes Berg
2014-10-21 12:27                         ` Eliad Peller
2014-10-21 12:50                           ` Johannes Berg
2014-10-21 13:19                             ` Eliad Peller
2014-10-21 13:18                         ` Sujith Manoharan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.