linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mac80211: fix sw scan bracketing
@ 2010-06-18 10:32 Johannes Berg
  2010-07-27  8:04 ` Luis R. Rodriguez
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Berg @ 2010-06-18 10:32 UTC (permalink / raw)
  To: John Linville; +Cc: linux-wireless

From: Johannes Berg <johannes.berg@intel.com>

Currently, detection in hwsim and ath9k can
detect that two sw scans are in flight at the
same time, which isn't really true. It is
caused by a race condition, because the scan
complete callback is called too late, after
the lock has been dropped, so that a new scan
can be started before it is called.

It is also called too early semantically, as
it is currently called _after_ the return to
the operating channel -- it should be before
so that drivers know this is the operating
channel again.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/mac80211/scan.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- wireless-testing.orig/net/mac80211/scan.c	2010-06-18 12:22:56.000000000 +0200
+++ wireless-testing/net/mac80211/scan.c	2010-06-18 12:23:25.000000000 +0200
@@ -287,6 +287,8 @@ void ieee80211_scan_completed(struct iee
 	local->scanning = 0;
 	local->scan_channel = NULL;
 
+	drv_sw_scan_complete(local);
+
 	/* we only have to protect scan_req and hw/sw scan */
 	mutex_unlock(&local->scan_mtx);
 
@@ -296,8 +298,6 @@ void ieee80211_scan_completed(struct iee
 
 	ieee80211_configure_filter(local);
 
-	drv_sw_scan_complete(local);
-
 	ieee80211_offchannel_return(local, true);
 
  done:



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mac80211: fix sw scan bracketing
  2010-06-18 10:32 [PATCH] mac80211: fix sw scan bracketing Johannes Berg
@ 2010-07-27  8:04 ` Luis R. Rodriguez
  2010-07-27  8:27   ` Luis R. Rodriguez
  0 siblings, 1 reply; 11+ messages in thread
From: Luis R. Rodriguez @ 2010-07-27  8:04 UTC (permalink / raw)
  To: Johannes Berg; +Cc: John Linville, linux-wireless

On Fri, Jun 18, 2010 at 3:32 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> From: Johannes Berg <johannes.berg@intel.com>
>
> Currently, detection in hwsim and ath9k can
> detect that two sw scans are in flight at the
> same time, which isn't really true. It is
> caused by a race condition, because the scan
> complete callback is called too late, after
> the lock has been dropped, so that a new scan
> can be started before it is called.
>
> It is also called too early semantically, as
> it is currently called _after_ the return to
> the operating channel -- it should be before
> so that drivers know this is the operating
> channel again.
>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---
>  net/mac80211/scan.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- wireless-testing.orig/net/mac80211/scan.c   2010-06-18 12:22:56.000000000 +0200
> +++ wireless-testing/net/mac80211/scan.c        2010-06-18 12:23:25.000000000 +0200
> @@ -287,6 +287,8 @@ void ieee80211_scan_completed(struct iee
>        local->scanning = 0;
>        local->scan_channel = NULL;
>
> +       drv_sw_scan_complete(local);
> +
>        /* we only have to protect scan_req and hw/sw scan */
>        mutex_unlock(&local->scan_mtx);
>
> @@ -296,8 +298,6 @@ void ieee80211_scan_completed(struct iee
>
>        ieee80211_configure_filter(local);
>
> -       drv_sw_scan_complete(local);
> -
>        ieee80211_offchannel_return(local, true);
>
>  done:

Leaving the entire patch in the e-mail reply as the patch is small to
help with context.

Turns out this patch broke ath9k in the sense that right after a scan
we get disconnected from the AP. I just bisected wireless-testing by
master-tag dates and found 2010-06-18 to be where we hit poo in. I
reverted this patch and its cured. I'm going to test a recent
wireless-testing now and then just try reverting this to see if its OK
even on a recent wireless-testing, and then see if we can fix this
properly on ath9k or whatever.

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mac80211: fix sw scan bracketing
  2010-07-27  8:04 ` Luis R. Rodriguez
@ 2010-07-27  8:27   ` Luis R. Rodriguez
  2010-07-27  9:22     ` Luis R. Rodriguez
  0 siblings, 1 reply; 11+ messages in thread
From: Luis R. Rodriguez @ 2010-07-27  8:27 UTC (permalink / raw)
  To: Johannes Berg; +Cc: John Linville, linux-wireless

On Tue, Jul 27, 2010 at 1:04 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
> On Fri, Jun 18, 2010 at 3:32 AM, Johannes Berg
> <johannes@sipsolutions.net> wrote:
>> From: Johannes Berg <johannes.berg@intel.com>
>>
>> Currently, detection in hwsim and ath9k can
>> detect that two sw scans are in flight at the
>> same time, which isn't really true. It is
>> caused by a race condition, because the scan
>> complete callback is called too late, after
>> the lock has been dropped, so that a new scan
>> can be started before it is called.
>>
>> It is also called too early semantically, as
>> it is currently called _after_ the return to
>> the operating channel -- it should be before
>> so that drivers know this is the operating
>> channel again.
>>
>> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>> ---
>>  net/mac80211/scan.c |    4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> --- wireless-testing.orig/net/mac80211/scan.c   2010-06-18 12:22:56.000000000 +0200
>> +++ wireless-testing/net/mac80211/scan.c        2010-06-18 12:23:25.000000000 +0200
>> @@ -287,6 +287,8 @@ void ieee80211_scan_completed(struct iee
>>        local->scanning = 0;
>>        local->scan_channel = NULL;
>>
>> +       drv_sw_scan_complete(local);
>> +
>>        /* we only have to protect scan_req and hw/sw scan */
>>        mutex_unlock(&local->scan_mtx);
>>
>> @@ -296,8 +298,6 @@ void ieee80211_scan_completed(struct iee
>>
>>        ieee80211_configure_filter(local);
>>
>> -       drv_sw_scan_complete(local);
>> -
>>        ieee80211_offchannel_return(local, true);
>>
>>  done:
>
> Leaving the entire patch in the e-mail reply as the patch is small to
> help with context.
>
> Turns out this patch broke ath9k in the sense that right after a scan
> we get disconnected from the AP. I just bisected wireless-testing by
> master-tag dates and found 2010-06-18 to be where we hit poo in. I
> reverted this patch and its cured. I'm going to test a recent
> wireless-testing now and then just try reverting this to see if its OK
> even on a recent wireless-testing, and then see if we can fix this
> properly on ath9k or whatever.

Yeap, reverting this patch alone today on wireless-testing makes me
ath9k happy once again.

 Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mac80211: fix sw scan bracketing
  2010-07-27  8:27   ` Luis R. Rodriguez
@ 2010-07-27  9:22     ` Luis R. Rodriguez
  2010-07-27  9:30       ` Luis R. Rodriguez
                         ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Luis R. Rodriguez @ 2010-07-27  9:22 UTC (permalink / raw)
  To: Johannes Berg; +Cc: John Linville, linux-wireless

On Tue, Jul 27, 2010 at 1:27 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
> On Tue, Jul 27, 2010 at 1:04 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
>> On Fri, Jun 18, 2010 at 3:32 AM, Johannes Berg
>> <johannes@sipsolutions.net> wrote:
>>> From: Johannes Berg <johannes.berg@intel.com>
>>>
>>> Currently, detection in hwsim and ath9k can
>>> detect that two sw scans are in flight at the
>>> same time, which isn't really true. It is
>>> caused by a race condition, because the scan
>>> complete callback is called too late, after
>>> the lock has been dropped, so that a new scan
>>> can be started before it is called.
>>>
>>> It is also called too early semantically, as
>>> it is currently called _after_ the return to
>>> the operating channel -- it should be before
>>> so that drivers know this is the operating
>>> channel again.
>>>
>>> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>>> ---
>>>  net/mac80211/scan.c |    4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> --- wireless-testing.orig/net/mac80211/scan.c   2010-06-18 12:22:56.000000000 +0200
>>> +++ wireless-testing/net/mac80211/scan.c        2010-06-18 12:23:25.000000000 +0200
>>> @@ -287,6 +287,8 @@ void ieee80211_scan_completed(struct iee
>>>        local->scanning = 0;
>>>        local->scan_channel = NULL;
>>>
>>> +       drv_sw_scan_complete(local);
>>> +
>>>        /* we only have to protect scan_req and hw/sw scan */
>>>        mutex_unlock(&local->scan_mtx);
>>>
>>> @@ -296,8 +298,6 @@ void ieee80211_scan_completed(struct iee
>>>
>>>        ieee80211_configure_filter(local);
>>>
>>> -       drv_sw_scan_complete(local);
>>> -
>>>        ieee80211_offchannel_return(local, true);
>>>
>>>  done:
>>
>> Leaving the entire patch in the e-mail reply as the patch is small to
>> help with context.
>>
>> Turns out this patch broke ath9k in the sense that right after a scan
>> we get disconnected from the AP. I just bisected wireless-testing by
>> master-tag dates and found 2010-06-18 to be where we hit poo in. I
>> reverted this patch and its cured. I'm going to test a recent
>> wireless-testing now and then just try reverting this to see if its OK
>> even on a recent wireless-testing, and then see if we can fix this
>> properly on ath9k or whatever.
>
> Yeap, reverting this patch alone today on wireless-testing makes me
> ath9k happy once again.

So after some though and review in order to fix this we need a little
more time and thought. The above patch changes the order in which we
configure hardware upon a scan complete. It used to be we would
configure the hardware, configure the filter and then scan complete.
By then we'd be on the home channel and the filters would be set
correctly. ath9k's scan complete routine does a few things which
depend on the channel and will make hardware poo otherwise or do
stupid things:

  * start TX poll routine, due to the new latency introduced to
configure hardware and configure the RX filter the TX polll routine
now has some extra work done by hardware before it actually starts
TXing. I don't suspect this could introduce a huge regression but it
is worth noting.
  * ANI gets started, and the wrong channel is used if we're not yet
on the home channel. A simple patch to test if this may be causing the
disconnect issue I am seeing I just tried was to increase the delay
before starting ANI, this did indeed help.
  * ath_beacon_config() which I suspect means that if we now scan in
AP mode/IBSS mode we could potentially be leaking beacons on the last
channel of a scan, prior to sending them on the actual desired home
channel.

To address and review all this I need more time and instead I think
its easier to ask for revert of this patch. I think we should more
carefully address these issues acanll together in one patch to avoid
these regressions. Right now ath9k is just completely useless with
this. It disconnects upon every scan, so for network manager users
this is every 60 seconds.

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mac80211: fix sw scan bracketing
  2010-07-27  9:22     ` Luis R. Rodriguez
@ 2010-07-27  9:30       ` Luis R. Rodriguez
  2010-07-27  9:31         ` Luis R. Rodriguez
  2010-07-27 10:22       ` Johannes Berg
  2010-09-13 22:22       ` Luis R. Rodriguez
  2 siblings, 1 reply; 11+ messages in thread
From: Luis R. Rodriguez @ 2010-07-27  9:30 UTC (permalink / raw)
  To: Johannes Berg; +Cc: John Linville, linux-wireless

On Tue, Jul 27, 2010 at 2:22 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
> On Tue, Jul 27, 2010 at 1:27 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
>> On Tue, Jul 27, 2010 at 1:04 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
>>> On Fri, Jun 18, 2010 at 3:32 AM, Johannes Berg
>>> <johannes@sipsolutions.net> wrote:
>>>> From: Johannes Berg <johannes.berg@intel.com>
>>>>
>>>> Currently, detection in hwsim and ath9k can
>>>> detect that two sw scans are in flight at the
>>>> same time, which isn't really true. It is
>>>> caused by a race condition, because the scan
>>>> complete callback is called too late, after
>>>> the lock has been dropped, so that a new scan
>>>> can be started before it is called.
>>>>
>>>> It is also called too early semantically, as
>>>> it is currently called _after_ the return to
>>>> the operating channel -- it should be before
>>>> so that drivers know this is the operating
>>>> channel again.
>>>>
>>>> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>>>> ---
>>>>  net/mac80211/scan.c |    4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> --- wireless-testing.orig/net/mac80211/scan.c   2010-06-18 12:22:56.000000000 +0200
>>>> +++ wireless-testing/net/mac80211/scan.c        2010-06-18 12:23:25.000000000 +0200
>>>> @@ -287,6 +287,8 @@ void ieee80211_scan_completed(struct iee
>>>>        local->scanning = 0;
>>>>        local->scan_channel = NULL;
>>>>
>>>> +       drv_sw_scan_complete(local);
>>>> +
>>>>        /* we only have to protect scan_req and hw/sw scan */
>>>>        mutex_unlock(&local->scan_mtx);
>>>>
>>>> @@ -296,8 +298,6 @@ void ieee80211_scan_completed(struct iee
>>>>
>>>>        ieee80211_configure_filter(local);
>>>>
>>>> -       drv_sw_scan_complete(local);
>>>> -
>>>>        ieee80211_offchannel_return(local, true);
>>>>
>>>>  done:
>>>
>>> Leaving the entire patch in the e-mail reply as the patch is small to
>>> help with context.
>>>
>>> Turns out this patch broke ath9k in the sense that right after a scan
>>> we get disconnected from the AP. I just bisected wireless-testing by
>>> master-tag dates and found 2010-06-18 to be where we hit poo in. I
>>> reverted this patch and its cured. I'm going to test a recent
>>> wireless-testing now and then just try reverting this to see if its OK
>>> even on a recent wireless-testing, and then see if we can fix this
>>> properly on ath9k or whatever.
>>
>> Yeap, reverting this patch alone today on wireless-testing makes me
>> ath9k happy once again.
>
> So after some though and review in order to fix this we need a little
> more time and thought. The above patch changes the order in which we
> configure hardware upon a scan complete. It used to be we would
> configure the hardware, configure the filter and then scan complete.
> By then we'd be on the home channel and the filters would be set
> correctly. ath9k's scan complete routine does a few things which
> depend on the channel and will make hardware poo otherwise or do
> stupid things:
>
>  * start TX poll routine, due to the new latency introduced to
> configure hardware and configure the RX filter the TX polll routine
> now has some extra work done by hardware before it actually starts
> TXing. I don't suspect this could introduce a huge regression but it
> is worth noting.
>  * ANI gets started, and the wrong channel is used if we're not yet
> on the home channel. A simple patch to test if this may be causing the
> disconnect issue I am seeing I just tried was to increase the delay
> before starting ANI, this did indeed help.

Oh and having the right filters set up for ANI processing is important too.

At first I thought trying to detect upon hw config the home channel
would be a good place to move the ani start, but then we'd also need
to move the beacon config and test that.. at this point there are just
too many unknowns on possible regressions introduced by this.

>  * ath_beacon_config() which I suspect means that if we now scan in
> AP mode/IBSS mode we could potentially be leaking beacons on the last
> channel of a scan, prior to sending them on the actual desired home
> channel.
>
> To address and review all this I need more time and instead I think
> its easier to ask for revert of this patch. I think we should more
> carefully address these issues acanll together in one patch to avoid
> these regressions. Right now ath9k is just completely useless with
> this. It disconnects upon every scan, so for network manager users
> this is every 60 seconds.

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mac80211: fix sw scan bracketing
  2010-07-27  9:30       ` Luis R. Rodriguez
@ 2010-07-27  9:31         ` Luis R. Rodriguez
  0 siblings, 0 replies; 11+ messages in thread
From: Luis R. Rodriguez @ 2010-07-27  9:31 UTC (permalink / raw)
  To: Johannes Berg; +Cc: John Linville, linux-wireless

On Tue, Jul 27, 2010 at 2:30 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
> On Tue, Jul 27, 2010 at 2:22 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
>> On Tue, Jul 27, 2010 at 1:27 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
>>> On Tue, Jul 27, 2010 at 1:04 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
>>>> On Fri, Jun 18, 2010 at 3:32 AM, Johannes Berg
>>>> <johannes@sipsolutions.net> wrote:
>>>>> From: Johannes Berg <johannes.berg@intel.com>
>>>>>
>>>>> Currently, detection in hwsim and ath9k can
>>>>> detect that two sw scans are in flight at the
>>>>> same time, which isn't really true. It is
>>>>> caused by a race condition, because the scan
>>>>> complete callback is called too late, after
>>>>> the lock has been dropped, so that a new scan
>>>>> can be started before it is called.
>>>>>
>>>>> It is also called too early semantically, as
>>>>> it is currently called _after_ the return to
>>>>> the operating channel -- it should be before
>>>>> so that drivers know this is the operating
>>>>> channel again.
>>>>>
>>>>> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>>>>> ---
>>>>>  net/mac80211/scan.c |    4 ++--
>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> --- wireless-testing.orig/net/mac80211/scan.c   2010-06-18 12:22:56.000000000 +0200
>>>>> +++ wireless-testing/net/mac80211/scan.c        2010-06-18 12:23:25.000000000 +0200
>>>>> @@ -287,6 +287,8 @@ void ieee80211_scan_completed(struct iee
>>>>>        local->scanning = 0;
>>>>>        local->scan_channel = NULL;
>>>>>
>>>>> +       drv_sw_scan_complete(local);
>>>>> +
>>>>>        /* we only have to protect scan_req and hw/sw scan */
>>>>>        mutex_unlock(&local->scan_mtx);
>>>>>
>>>>> @@ -296,8 +298,6 @@ void ieee80211_scan_completed(struct iee
>>>>>
>>>>>        ieee80211_configure_filter(local);
>>>>>
>>>>> -       drv_sw_scan_complete(local);
>>>>> -
>>>>>        ieee80211_offchannel_return(local, true);
>>>>>
>>>>>  done:
>>>>
>>>> Leaving the entire patch in the e-mail reply as the patch is small to
>>>> help with context.
>>>>
>>>> Turns out this patch broke ath9k in the sense that right after a scan
>>>> we get disconnected from the AP. I just bisected wireless-testing by
>>>> master-tag dates and found 2010-06-18 to be where we hit poo in. I
>>>> reverted this patch and its cured. I'm going to test a recent
>>>> wireless-testing now and then just try reverting this to see if its OK
>>>> even on a recent wireless-testing, and then see if we can fix this
>>>> properly on ath9k or whatever.
>>>
>>> Yeap, reverting this patch alone today on wireless-testing makes me
>>> ath9k happy once again.
>>
>> So after some though and review in order to fix this we need a little
>> more time and thought. The above patch changes the order in which we
>> configure hardware upon a scan complete. It used to be we would
>> configure the hardware, configure the filter and then scan complete.
>> By then we'd be on the home channel and the filters would be set
>> correctly. ath9k's scan complete routine does a few things which
>> depend on the channel and will make hardware poo otherwise or do
>> stupid things:
>>
>>  * start TX poll routine, due to the new latency introduced to
>> configure hardware and configure the RX filter the TX polll routine
>> now has some extra work done by hardware before it actually starts
>> TXing. I don't suspect this could introduce a huge regression but it
>> is worth noting.
>>  * ANI gets started, and the wrong channel is used if we're not yet
>> on the home channel. A simple patch to test if this may be causing the
>> disconnect issue I am seeing I just tried was to increase the delay
>> before starting ANI, this did indeed help.
>
> Oh and having the right filters set up for ANI processing is important too.
>
> At first I thought trying to detect upon hw config the home channel
> would be a good place to move the ani start, but then we'd also need
> to move the beacon config and test that.. at this point there are just
> too many unknowns on possible regressions introduced by this.

I'm also a bit surprised there are no other drivers with issues due to this too.

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mac80211: fix sw scan bracketing
  2010-07-27  9:22     ` Luis R. Rodriguez
  2010-07-27  9:30       ` Luis R. Rodriguez
@ 2010-07-27 10:22       ` Johannes Berg
  2010-07-27 15:54         ` Luis R. Rodriguez
  2010-09-13 22:22       ` Luis R. Rodriguez
  2 siblings, 1 reply; 11+ messages in thread
From: Johannes Berg @ 2010-07-27 10:22 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: John Linville, linux-wireless

On Tue, 2010-07-27 at 02:22 -0700, Luis R. Rodriguez wrote:

> > Yeap, reverting this patch alone today on wireless-testing makes me
> > ath9k happy once again.
> 
> So after some though and review in order to fix this we need a little
> more time and thought. The above patch changes the order in which we
> configure hardware upon a scan complete. It used to be we would
> configure the hardware, configure the filter and then scan complete.
> By then we'd be on the home channel and the filters would be set
> correctly. ath9k's scan complete routine does a few things which
> depend on the channel and will make hardware poo otherwise or do
> stupid things:
> 
>   * start TX poll routine, due to the new latency introduced to
> configure hardware and configure the RX filter the TX polll routine
> now has some extra work done by hardware before it actually starts
> TXing. I don't suspect this could introduce a huge regression but it
> is worth noting.

I don't even understand why this is stopped during scan?

>   * ANI gets started, and the wrong channel is used if we're not yet
> on the home channel. A simple patch to test if this may be causing the
> disconnect issue I am seeing I just tried was to increase the delay
> before starting ANI, this did indeed help.

This really should be done when the channel is configured after scan
anyway.

>   * ath_beacon_config() which I suspect means that if we now scan in
> AP mode/IBSS mode we could potentially be leaking beacons on the last
> channel of a scan, prior to sending them on the actual desired home
> channel.

This should be done by the driver based on mac80211's indication of
whether it should be beaconing or not, not based on the scan callbacks.

I'm starting to think that it was a mistake to add these callbacks to
start with ...

> To address and review all this I need more time and instead I think
> its easier to ask for revert of this patch.

I'm OK with a revert, but *ONLY* if at the same time you remove the
double-scan-detected message from ath9k and stop claiming it's a
mac80211 bug :-)

johannes


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mac80211: fix sw scan bracketing
  2010-07-27 10:22       ` Johannes Berg
@ 2010-07-27 15:54         ` Luis R. Rodriguez
  2010-07-27 17:29           ` Luis R. Rodriguez
  0 siblings, 1 reply; 11+ messages in thread
From: Luis R. Rodriguez @ 2010-07-27 15:54 UTC (permalink / raw)
  To: Johannes Berg; +Cc: John Linville, linux-wireless

On Tue, Jul 27, 2010 at 3:22 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Tue, 2010-07-27 at 02:22 -0700, Luis R. Rodriguez wrote:
>
>> > Yeap, reverting this patch alone today on wireless-testing makes me
>> > ath9k happy once again.
>>
>> So after some though and review in order to fix this we need a little
>> more time and thought. The above patch changes the order in which we
>> configure hardware upon a scan complete. It used to be we would
>> configure the hardware, configure the filter and then scan complete.
>> By then we'd be on the home channel and the filters would be set
>> correctly. ath9k's scan complete routine does a few things which
>> depend on the channel and will make hardware poo otherwise or do
>> stupid things:
>>
>>   * start TX poll routine, due to the new latency introduced to
>> configure hardware and configure the RX filter the TX polll routine
>> now has some extra work done by hardware before it actually starts
>> TXing. I don't suspect this could introduce a huge regression but it
>> is worth noting.
>
> I don't even understand why this is stopped during scan?

Good question, perhaps some scans take too long and you don't TX at
all on some set of passive channels? Its the only thing I can think
of.

>>   * ANI gets started, and the wrong channel is used if we're not yet
>> on the home channel. A simple patch to test if this may be causing the
>> disconnect issue I am seeing I just tried was to increase the delay
>> before starting ANI, this did indeed help.
>
> This really should be done when the channel is configured after scan
> anyway.

Indeed, I really prefer to do so on a new development kernel series,
think its too late in the series now.

>>   * ath_beacon_config() which I suspect means that if we now scan in
>> AP mode/IBSS mode we could potentially be leaking beacons on the last
>> channel of a scan, prior to sending them on the actual desired home
>> channel.
>
> This should be done by the driver based on mac80211's indication of
> whether it should be beaconing or not, not based on the scan callbacks.

Yeah, this is odd, I'll check with our team to see how we came up with
this here.

> I'm starting to think that it was a mistake to add these callbacks to
> start with ...

Could be.

>> To address and review all this I need more time and instead I think
>> its easier to ask for revert of this patch.
>
> I'm OK with a revert, but *ONLY* if at the same time you remove the
> double-scan-detected message from ath9k and stop claiming it's a
> mac80211 bug :-)

Deal :)

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mac80211: fix sw scan bracketing
  2010-07-27 15:54         ` Luis R. Rodriguez
@ 2010-07-27 17:29           ` Luis R. Rodriguez
  2010-07-27 20:09             ` Johannes Berg
  0 siblings, 1 reply; 11+ messages in thread
From: Luis R. Rodriguez @ 2010-07-27 17:29 UTC (permalink / raw)
  To: Johannes Berg; +Cc: John Linville, linux-wireless

On Tue, Jul 27, 2010 at 8:54 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
> On Tue, Jul 27, 2010 at 3:22 AM, Johannes Berg
> <johannes@sipsolutions.net> wrote:
>> On Tue, 2010-07-27 at 02:22 -0700, Luis R. Rodriguez wrote:
>>> To address and review all this I need more time and instead I think
>>> its easier to ask for revert of this patch.
>>
>> I'm OK with a revert, but *ONLY* if at the same time you remove the
>> double-scan-detected message from ath9k and stop claiming it's a
>> mac80211 bug :-)
>
> Deal :)

I just reviewed this and noticed both of these prints are using
KERN_DEBUG, and since the warning will become valid *after* this gets
properly fixed, are you OK with just a revert of your patch?

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mac80211: fix sw scan bracketing
  2010-07-27 17:29           ` Luis R. Rodriguez
@ 2010-07-27 20:09             ` Johannes Berg
  0 siblings, 0 replies; 11+ messages in thread
From: Johannes Berg @ 2010-07-27 20:09 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: John Linville, linux-wireless

On Tue, 2010-07-27 at 10:29 -0700, Luis R. Rodriguez wrote:

> I just reviewed this and noticed both of these prints are using
> KERN_DEBUG, and since the warning will become valid *after* this gets
> properly fixed, are you OK with just a revert of your patch?

I've seen this show up in bug reports, and we _know_ that there's a
problem since I fixed it, so no, I really do want the message to go
away.

johannes


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mac80211: fix sw scan bracketing
  2010-07-27  9:22     ` Luis R. Rodriguez
  2010-07-27  9:30       ` Luis R. Rodriguez
  2010-07-27 10:22       ` Johannes Berg
@ 2010-09-13 22:22       ` Luis R. Rodriguez
  2 siblings, 0 replies; 11+ messages in thread
From: Luis R. Rodriguez @ 2010-09-13 22:22 UTC (permalink / raw)
  To: Johannes Berg, Paul Stewart; +Cc: John Linville, linux-wireless, Amod Bodas

On Tue, Jul 27, 2010 at 2:22 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
> On Tue, Jul 27, 2010 at 1:27 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
>> On Tue, Jul 27, 2010 at 1:04 AM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
>>> On Fri, Jun 18, 2010 at 3:32 AM, Johannes Berg
>>> <johannes@sipsolutions.net> wrote:
>>>> From: Johannes Berg <johannes.berg@intel.com>
>>>>
>>>> Currently, detection in hwsim and ath9k can
>>>> detect that two sw scans are in flight at the
>>>> same time, which isn't really true. It is
>>>> caused by a race condition, because the scan
>>>> complete callback is called too late, after
>>>> the lock has been dropped, so that a new scan
>>>> can be started before it is called.
>>>>
>>>> It is also called too early semantically, as
>>>> it is currently called _after_ the return to
>>>> the operating channel -- it should be before
>>>> so that drivers know this is the operating
>>>> channel again.
>>>>
>>>> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>>>> ---
>>>>  net/mac80211/scan.c |    4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> --- wireless-testing.orig/net/mac80211/scan.c   2010-06-18 12:22:56.000000000 +0200
>>>> +++ wireless-testing/net/mac80211/scan.c        2010-06-18 12:23:25.000000000 +0200
>>>> @@ -287,6 +287,8 @@ void ieee80211_scan_completed(struct iee
>>>>        local->scanning = 0;
>>>>        local->scan_channel = NULL;
>>>>
>>>> +       drv_sw_scan_complete(local);
>>>> +
>>>>        /* we only have to protect scan_req and hw/sw scan */
>>>>        mutex_unlock(&local->scan_mtx);
>>>>
>>>> @@ -296,8 +298,6 @@ void ieee80211_scan_completed(struct iee
>>>>
>>>>        ieee80211_configure_filter(local);
>>>>
>>>> -       drv_sw_scan_complete(local);
>>>> -
>>>>        ieee80211_offchannel_return(local, true);
>>>>
>>>>  done:
>>>
>>> Leaving the entire patch in the e-mail reply as the patch is small to
>>> help with context.
>>>
>>> Turns out this patch broke ath9k in the sense that right after a scan
>>> we get disconnected from the AP. I just bisected wireless-testing by
>>> master-tag dates and found 2010-06-18 to be where we hit poo in. I
>>> reverted this patch and its cured. I'm going to test a recent
>>> wireless-testing now and then just try reverting this to see if its OK
>>> even on a recent wireless-testing, and then see if we can fix this
>>> properly on ath9k or whatever.
>>
>> Yeap, reverting this patch alone today on wireless-testing makes me
>> ath9k happy once again.
>
> So after some though and review in order to fix this we need a little
> more time and thought. The above patch changes the order in which we
> configure hardware upon a scan complete. It used to be we would
> configure the hardware, configure the filter and then scan complete.
> By then we'd be on the home channel and the filters would be set
> correctly. ath9k's scan complete routine does a few things which
> depend on the channel and will make hardware poo otherwise or do
> stupid things:
>
>  * start TX poll routine, due to the new latency introduced to
> configure hardware and configure the RX filter the TX polll routine
> now has some extra work done by hardware before it actually starts
> TXing. I don't suspect this could introduce a huge regression but it
> is worth noting.
>  * ANI gets started, and the wrong channel is used if we're not yet
> on the home channel. A simple patch to test if this may be causing the
> disconnect issue I am seeing I just tried was to increase the delay
> before starting ANI, this did indeed help.
>  * ath_beacon_config() which I suspect means that if we now scan in
> AP mode/IBSS mode we could potentially be leaking beacons on the last
> channel of a scan, prior to sending them on the actual desired home
> channel.

Fun enough reverting this change also created one side regression
caused by some new patch merged in 2.6.36-rc series (ath9k: prevent
calibration during off-channel activity). Since the scan was not yet
completed after reverting this patch the flag for scanning was still
true in ath9k when changing the to the last home channel, this mean
ath_beacon_config() for STA mode was not being called, and its
required since otherwise we do not set the beacon timers and get
disassociated after a bg scan.

I'll be addressing a fix for this next, but I just wanted to highlight
this for future scan completion work.

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-09-13 22:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-18 10:32 [PATCH] mac80211: fix sw scan bracketing Johannes Berg
2010-07-27  8:04 ` Luis R. Rodriguez
2010-07-27  8:27   ` Luis R. Rodriguez
2010-07-27  9:22     ` Luis R. Rodriguez
2010-07-27  9:30       ` Luis R. Rodriguez
2010-07-27  9:31         ` Luis R. Rodriguez
2010-07-27 10:22       ` Johannes Berg
2010-07-27 15:54         ` Luis R. Rodriguez
2010-07-27 17:29           ` Luis R. Rodriguez
2010-07-27 20:09             ` Johannes Berg
2010-09-13 22:22       ` Luis R. Rodriguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).