All of lore.kernel.org
 help / color / mirror / Atom feed
* How many null-data probes on connection loss?
@ 2018-09-25 23:12 Ben Greear
  2018-09-26  8:38 ` Johannes Berg
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2018-09-25 23:12 UTC (permalink / raw)
  To: linux-wireless

While testing out some other issue, I noticed that my ath10k system creates
several hundred null-data probes when I abruptly down the AP the station
is connected to.

I guess this is because I use the mac80211 stack to handle the probes, and
the firmware then retries each mac80211 probe many times.

So, in the case where mac80211 is sending a null-data probe, is the assumption
that the driver will try each frame exactly once?

Or is several hundred frames expected?  I'm guessing the former, but before I go
hacking firmware, I thought I would ask...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How many null-data probes on connection loss?
  2018-09-25 23:12 How many null-data probes on connection loss? Ben Greear
@ 2018-09-26  8:38 ` Johannes Berg
  2018-09-26 18:04   ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Berg @ 2018-09-26  8:38 UTC (permalink / raw)
  To: Ben Greear, linux-wireless

On Tue, 2018-09-25 at 16:12 -0700, Ben Greear wrote:
> While testing out some other issue, I noticed that my ath10k system creates
> several hundred null-data probes when I abruptly down the AP the station
> is connected to.
> 
> I guess this is because I use the mac80211 stack to handle the probes, and
> the firmware then retries each mac80211 probe many times.
> 
> So, in the case where mac80211 is sending a null-data probe, is the assumption
> that the driver will try each frame exactly once?

Not really, it should be treated like any other management frame.

> Or is several hundred frames expected?  I'm guessing the former, but before I go
> hacking firmware, I thought I would ask...

Certainly not several hundred, but maybe a dozen? I think iwlwifi uses
16, and minstrel would set up max_rate_tries, which drivers set to
somewhere between 1 and 18? One seems s a bit low, mt76?

johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How many null-data probes on connection loss?
  2018-09-26  8:38 ` Johannes Berg
@ 2018-09-26 18:04   ` Ben Greear
  2018-09-26 18:26     ` Johannes Berg
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2018-09-26 18:04 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless

On 09/26/2018 01:38 AM, Johannes Berg wrote:
> On Tue, 2018-09-25 at 16:12 -0700, Ben Greear wrote:
>> While testing out some other issue, I noticed that my ath10k system creates
>> several hundred null-data probes when I abruptly down the AP the station
>> is connected to.
>>
>> I guess this is because I use the mac80211 stack to handle the probes, and
>> the firmware then retries each mac80211 probe many times.
>>
>> So, in the case where mac80211 is sending a null-data probe, is the assumption
>> that the driver will try each frame exactly once?
>
> Not really, it should be treated like any other management frame.
>
>> Or is several hundred frames expected?  I'm guessing the former, but before I go
>> hacking firmware, I thought I would ask...
>
> Certainly not several hundred, but maybe a dozen? I think iwlwifi uses
> 16, and minstrel would set up max_rate_tries, which drivers set to
> somewhere between 1 and 18? One seems s a bit low, mt76?
>
> johannes
>

I have been running with mac80211/mlme.c's max_nullfunc_tries set to 5 for many years.
Long ago it helped with connectivity issues with lots of vdevs and and/orloaded APs
if I recall correctly.

In fact, I see 62 frames captured on air all with the same sequence number
in the test I just did, and subsequent frames with the next seq-no are sent
immediately after the first one.  The frames are all right after each other, so
I guess this is probably firmware doing lots of HW retransmits and then *also*
doing software retransmits in the firmware (my reading of mlme.c indicates it should
only probe every 500ms).

I think I'll start by making sure the firmware does not do software retransmits
for frames from the driver (self-gen frames are OK to be retransmitted I guess).

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How many null-data probes on connection loss?
  2018-09-26 18:04   ` Ben Greear
@ 2018-09-26 18:26     ` Johannes Berg
  2018-09-26 18:47       ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Berg @ 2018-09-26 18:26 UTC (permalink / raw)
  To: Ben Greear, linux-wireless

On Wed, 2018-09-26 at 11:04 -0700, Ben Greear wrote:

> I have been running with mac80211/mlme.c's max_nullfunc_tries set to 5 for many years.
> Long ago it helped with connectivity issues with lots of vdevs and and/orloaded APs
> if I recall correctly.

That's different, that's the number of distinct frames mac80211 will
send.

I thought you were asking about *retries*.

> In fact, I see 62 frames captured on air all with the same sequence number
> in the test I just did, and subsequent frames with the next seq-no are sent
> immediately after the first one.  The frames are all right after each other, so
> I guess this is probably firmware doing lots of HW retransmits and then *also*
> doing software retransmits in the firmware (my reading of mlme.c indicates it should
> only probe every 500ms).

Yes.

> I think I'll start by making sure the firmware does not do software retransmits
> for frames from the driver (self-gen frames are OK to be retransmitted I guess).

You do want it to be doing retries for frames from the driver, since you
want it to recover from temporary collisions with a microwave and
whatnot ... just not *that many*, I guess.

johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How many null-data probes on connection loss?
  2018-09-26 18:26     ` Johannes Berg
@ 2018-09-26 18:47       ` Ben Greear
  2018-09-26 18:48         ` Johannes Berg
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2018-09-26 18:47 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless

On 09/26/2018 11:26 AM, Johannes Berg wrote:
> On Wed, 2018-09-26 at 11:04 -0700, Ben Greear wrote:
>
>> I have been running with mac80211/mlme.c's max_nullfunc_tries set to 5 for many years.
>> Long ago it helped with connectivity issues with lots of vdevs and and/orloaded APs
>> if I recall correctly.
>
> That's different, that's the number of distinct frames mac80211 will
> send.
>
> I thought you were asking about *retries*.

Well, it retries the probe action 5 times in my case.

I am also asking about total amount of retried frames on the air.

>> In fact, I see 62 frames captured on air all with the same sequence number
>> in the test I just did, and subsequent frames with the next seq-no are sent
>> immediately after the first one.  The frames are all right after each other, so
>> I guess this is probably firmware doing lots of HW retransmits and then *also*
>> doing software retransmits in the firmware (my reading of mlme.c indicates it should
>> only probe every 500ms).
>
> Yes.
>
>> I think I'll start by making sure the firmware does not do software retransmits
>> for frames from the driver (self-gen frames are OK to be retransmitted I guess).
>
> You do want it to be doing retries for frames from the driver, since you
> want it to recover from temporary collisions with a microwave and
> whatnot ... just not *that many*, I guess.

 From what I can tell so far, my firmware has this sort of logic:

frame from stack to the driver
   -> send to firmware
   -> in firmware, hardware will do up to X retries (maybe 16 or so, need to check)
   -> On failure, the firmware may re-queue the packet (firmware-software retry)
   -> back to hardware retries (~32 frames on air at this point)
   ...
   Eventually tx-fail notification is sent back to the driver one way or another.

I am thinking it would be best to have the software retry in the firmware
disabled.

Then, when mac80211 sends a null-data frame, you would see at most about
16 of them on air, every 500ms or so until it recovers or considers the
connection lost.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How many null-data probes on connection loss?
  2018-09-26 18:47       ` Ben Greear
@ 2018-09-26 18:48         ` Johannes Berg
  2018-09-26 18:53           ` Ben Greear
  2018-09-26 22:21           ` Ben Greear
  0 siblings, 2 replies; 12+ messages in thread
From: Johannes Berg @ 2018-09-26 18:48 UTC (permalink / raw)
  To: Ben Greear, linux-wireless

On Wed, 2018-09-26 at 11:47 -0700, Ben Greear wrote:

> > > I think I'll start by making sure the firmware does not do software retransmits
> > > for frames from the driver (self-gen frames are OK to be retransmitted I guess).
> > 
> > You do want it to be doing retries for frames from the driver, since you
> > want it to recover from temporary collisions with a microwave and
> > whatnot ... just not *that many*, I guess.
> 
>  From what I can tell so far, my firmware has this sort of logic:
> 
> frame from stack to the driver
>    -> send to firmware
>    -> in firmware, hardware will do up to X retries (maybe 16 or so, need to check)
>    -> On failure, the firmware may re-queue the packet (firmware-software retry)
>    -> back to hardware retries (~32 frames on air at this point)
>    ...
>    Eventually tx-fail notification is sent back to the driver one way or another.
> 
> I am thinking it would be best to have the software retry in the firmware
> disabled.
> 
> Then, when mac80211 sends a null-data frame, you would see at most about
> 16 of them on air, every 500ms or so until it recovers or considers the
> connection lost.

Yes, that seems reasonable. In fact, I'd argue that such software-retry
should just be disabled completely - it's better to lose the occasional
frame than to keep using airtime for it forever ...

Toke is probably getting nightmares reading this - sweet dreams ;-)

johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How many null-data probes on connection loss?
  2018-09-26 18:48         ` Johannes Berg
@ 2018-09-26 18:53           ` Ben Greear
  2018-09-26 22:21           ` Ben Greear
  1 sibling, 0 replies; 12+ messages in thread
From: Ben Greear @ 2018-09-26 18:53 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless

On 09/26/2018 11:48 AM, Johannes Berg wrote:
> On Wed, 2018-09-26 at 11:47 -0700, Ben Greear wrote:
>
>>>> I think I'll start by making sure the firmware does not do software retransmits
>>>> for frames from the driver (self-gen frames are OK to be retransmitted I guess).
>>>
>>> You do want it to be doing retries for frames from the driver, since you
>>> want it to recover from temporary collisions with a microwave and
>>> whatnot ... just not *that many*, I guess.
>>
>>  From what I can tell so far, my firmware has this sort of logic:
>>
>> frame from stack to the driver
>>    -> send to firmware
>>    -> in firmware, hardware will do up to X retries (maybe 16 or so, need to check)
>>    -> On failure, the firmware may re-queue the packet (firmware-software retry)
>>    -> back to hardware retries (~32 frames on air at this point)
>>    ...
>>    Eventually tx-fail notification is sent back to the driver one way or another.
>>
>> I am thinking it would be best to have the software retry in the firmware
>> disabled.
>>
>> Then, when mac80211 sends a null-data frame, you would see at most about
>> 16 of them on air, every 500ms or so until it recovers or considers the
>> connection lost.
>
> Yes, that seems reasonable. In fact, I'd argue that such software-retry
> should just be disabled completely - it's better to lose the occasional
> frame than to keep using airtime for it forever ...
>
> Toke is probably getting nightmares reading this - sweet dreams ;-)

I *think* this software-retry does not apply to frames on a block-ack enabled
TID at least...that appeared to be the case with wave-2 firmware I just got through
modifying similar logic.

Just maybe that will help him sleep a bit better :)

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How many null-data probes on connection loss?
  2018-09-26 18:48         ` Johannes Berg
  2018-09-26 18:53           ` Ben Greear
@ 2018-09-26 22:21           ` Ben Greear
  2018-09-27  7:12             ` Johannes Berg
  1 sibling, 1 reply; 12+ messages in thread
From: Ben Greear @ 2018-09-26 22:21 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless

On 09/26/2018 11:48 AM, Johannes Berg wrote:
> On Wed, 2018-09-26 at 11:47 -0700, Ben Greear wrote:
>
>>>> I think I'll start by making sure the firmware does not do software retransmits
>>>> for frames from the driver (self-gen frames are OK to be retransmitted I guess).
>>>
>>> You do want it to be doing retries for frames from the driver, since you
>>> want it to recover from temporary collisions with a microwave and
>>> whatnot ... just not *that many*, I guess.
>>
>>  From what I can tell so far, my firmware has this sort of logic:
>>
>> frame from stack to the driver
>>    -> send to firmware
>>    -> in firmware, hardware will do up to X retries (maybe 16 or so, need to check)
>>    -> On failure, the firmware may re-queue the packet (firmware-software retry)
>>    -> back to hardware retries (~32 frames on air at this point)
>>    ...
>>    Eventually tx-fail notification is sent back to the driver one way or another.
>>
>> I am thinking it would be best to have the software retry in the firmware
>> disabled.
>>
>> Then, when mac80211 sends a null-data frame, you would see at most about
>> 16 of them on air, every 500ms or so until it recovers or considers the
>> connection lost.
>
> Yes, that seems reasonable. In fact, I'd argue that such software-retry
> should just be disabled completely - it's better to lose the occasional
> frame than to keep using airtime for it forever ...
>
> Toke is probably getting nightmares reading this - sweet dreams ;-)

I fixed the firmware....

Now only 4 retries per frame, but it seems mac80211 is all 5 of its
null-data probes within a few miliseconds.  Is that expected, or should
there be a bit more pause between each of the probe requests to better
weather periodic network glitches?

Thanks,
Ben

>
> johannes
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How many null-data probes on connection loss?
  2018-09-26 22:21           ` Ben Greear
@ 2018-09-27  7:12             ` Johannes Berg
  2018-09-27 15:32               ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Berg @ 2018-09-27  7:12 UTC (permalink / raw)
  To: Ben Greear, linux-wireless

On Wed, 2018-09-26 at 15:21 -0700, Ben Greear wrote:
> 
> Now only 4 retries per frame, but it seems mac80211 is all 5 of its
> null-data probes within a few miliseconds.  Is that expected, or should
> there be a bit more pause between each of the probe requests to better
> weather periodic network glitches?

Hmm. That's a good point, but it seems we've never considered this
before. This must be a consequence of retrying immediately on lost ACK,
but I guess I could see that delaying it for a little bit would make
sense.

We do delay it if there's no reliable ACK reporting, IIRC, but if we
know it failed for sure ...

It seems though that if there's some noise or so on the channel you
wouldn't be transmitting, so what kind of "network glitches" might
affect this? AP going away unexpectedly for some time?

johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How many null-data probes on connection loss?
  2018-09-27  7:12             ` Johannes Berg
@ 2018-09-27 15:32               ` Ben Greear
  2018-09-28  7:19                 ` Johannes Berg
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2018-09-27 15:32 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless



On 09/27/2018 12:12 AM, Johannes Berg wrote:
> On Wed, 2018-09-26 at 15:21 -0700, Ben Greear wrote:
>>
>> Now only 4 retries per frame, but it seems mac80211 is all 5 of its
>> null-data probes within a few miliseconds.  Is that expected, or should
>> there be a bit more pause between each of the probe requests to better
>> weather periodic network glitches?
>
> Hmm. That's a good point, but it seems we've never considered this
> before. This must be a consequence of retrying immediately on lost ACK,
> but I guess I could see that delaying it for a little bit would make
> sense.
>
> We do delay it if there's no reliable ACK reporting, IIRC, but if we
> know it failed for sure ...
>
> It seems though that if there's some noise or so on the channel you
> wouldn't be transmitting, so what kind of "network glitches" might
> affect this? AP going away unexpectedly for some time?

I am thinking that if the 'timeout' is 500ms, and the number of probes is 2
(the default values), then it should probe at 0ms, and at 250ms, and then finally
fail at 500ms if nothing was received.  In otherwords, X probes, x/timeout apart.

I should have a patch to make it work more like I think it should work
later today for discussion.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How many null-data probes on connection loss?
  2018-09-27 15:32               ` Ben Greear
@ 2018-09-28  7:19                 ` Johannes Berg
  2018-09-28 15:14                   ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Berg @ 2018-09-28  7:19 UTC (permalink / raw)
  To: Ben Greear, linux-wireless

On Thu, 2018-09-27 at 08:32 -0700, Ben Greear wrote:

> > It seems though that if there's some noise or so on the channel you
> > wouldn't be transmitting, so what kind of "network glitches" might
> > affect this? AP going away unexpectedly for some time?
> 
> I am thinking that if the 'timeout' is 500ms, and the number of probes is 2
> (the default values), then it should probe at 0ms, and at 250ms, and then finally
> fail at 500ms if nothing was received.  In otherwords, X probes, x/timeout apart.

That seems reasonable I guess. Although I'm not sure - perhaps once we
know it failed we *do* want to try a bit quicker again? Otherwise we
have a totally dead period there in the meantime, no?

johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How many null-data probes on connection loss?
  2018-09-28  7:19                 ` Johannes Berg
@ 2018-09-28 15:14                   ` Ben Greear
  0 siblings, 0 replies; 12+ messages in thread
From: Ben Greear @ 2018-09-28 15:14 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless



On 09/28/2018 12:19 AM, Johannes Berg wrote:
> On Thu, 2018-09-27 at 08:32 -0700, Ben Greear wrote:
>
>>> It seems though that if there's some noise or so on the channel you
>>> wouldn't be transmitting, so what kind of "network glitches" might
>>> affect this? AP going away unexpectedly for some time?
>>
>> I am thinking that if the 'timeout' is 500ms, and the number of probes is 2
>> (the default values), then it should probe at 0ms, and at 250ms, and then finally
>> fail at 500ms if nothing was received.  In otherwords, X probes, x/timeout apart.
>
> That seems reasonable I guess. Although I'm not sure - perhaps once we
> know it failed we *do* want to try a bit quicker again? Otherwise we
> have a totally dead period there in the meantime, no?

I don't think the tx path stops just because the mlme decides probes
are timing out, so if AP is really functional, you won't have a dead period.

If it is dead, then the sooner you probe and discover timeout and disconnect,
the sooner you can re-connect to some better AP (assuming one exists).  A
false disconnect due to missing a few probes would be disruptive though, and
probing very often in idle situations would use more airtime, so
of course there is a trade-off.

While watching the sniffer, it seems the initial mlme probe happens about
3 seconds after I admin down the AP, and then disconnect is about 1s after
that (I have my timeout set to 1000ms instead of default 500ms, and I have retry
count set to 5 instead of 2.

Did you see that patch I posted?  It looks good in the sniffer and logs,
as far as I can tell....

Thanks,
Ben

>
> johannes
>

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-09-28 15:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-25 23:12 How many null-data probes on connection loss? Ben Greear
2018-09-26  8:38 ` Johannes Berg
2018-09-26 18:04   ` Ben Greear
2018-09-26 18:26     ` Johannes Berg
2018-09-26 18:47       ` Ben Greear
2018-09-26 18:48         ` Johannes Berg
2018-09-26 18:53           ` Ben Greear
2018-09-26 22:21           ` Ben Greear
2018-09-27  7:12             ` Johannes Berg
2018-09-27 15:32               ` Ben Greear
2018-09-28  7:19                 ` Johannes Berg
2018-09-28 15:14                   ` Ben Greear

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.