All of lore.kernel.org
 help / color / mirror / Atom feed
* Probe Response packets sometimes delayed by 200ms
@ 2014-10-03  5:48 Avery Pennarun
  2014-10-03  6:10 ` Michal Kazior
  0 siblings, 1 reply; 5+ messages in thread
From: Avery Pennarun @ 2014-10-03  5:48 UTC (permalink / raw)
  To: ath10k

[-- Attachment #1: Type: text/plain, Size: 3335 bytes --]

Hi all,

We're chasing a problem where, with an ath10k device running as access
point and a Macbook Air as a client, responses to probe requests are
sometimes missed by the Macbook Air because they are delayed for a
long time, and the Macbook has changed channels by the time they come
through.

ath10k driver version: from backports as of kvalo/ath-next on
2014-03-08 (v3.15-rc1-237-gd9bc4b9)

Firmware version: 10.1.467.3

Steps:
- start capturing packets, either on the ath10k AP itself or a
secondary monitor system
- on the Macbook Air (which has joined your SSID at least once
before), open and close the wifi dropdown menu a few times.  (May also
happen with clients other than Macbook Air; not heavily tested.)

Expected:
- Probe requests are sent by the station and answered by the AP within
a millisecond or two.

Actual:
- Probe requests are often answered within a millisecond or two, but
very frequently they are delayed by 200ms.  They are therefore not
seen by the Macbook, and thus not acknowledged, so you can see them
retransmitted several times by the AP.

I suspect this has something to do with the AP thinking the Macbook is
in power saving mode.  In the attached capture, the last packet (a
wifi ack) received from the Macbook before the probe request does
indeed say it's going to sleep.  But then it sends a new probe request
a few ms later.  The AP doesn't respond until its original wakeup
time.  I think it should consider the station to have woken up
immediately, since it just sent a packet.  This is especially true
since the powersaving flag in the Probe Request packet is set to zero
(station will stay up).

See attached pcap.  I trimmed out some beacons for APs that are not
related to the test, but otherwise it's intact.

$ zcat /tmp/reduced-delayed-proberesp.pcap.gz | tcpdump -r -
...
00:42:12.670744 1947809503us tsft 6.0 Mb/s 5745 MHz 11a -38dB signal
antenna 1 Probe Request (AVERY_Bu1_TestWifi) [6.0 9.0 12.0 18.0 24.0
36.0 48.0 54.0 Mbit]
00:42:12.672960 1947812849us tsft 6.0 Mb/s 5745 MHz 11a -37dB signal
antenna 1 Probe Request () [6.0 9.0 12.0 18.0 24.0 36.0 48.0 54.0
Mbit]
...
00:42:12.712490 1947853161us tsft 6.0 Mb/s 5745 MHz 11a -39dB signal
antenna 1 Beacon (AVERY_Bu1_TestWifi) [6.0* 9.0 12.0* 18.0 24.0* 36.0
48.0 54.0 Mbit] ESS CH: 149, PRIVACY
...
00:42:12.814832 1947955593us tsft 6.0 Mb/s 5745 MHz 11a -40dB signal
antenna 1 Beacon (AVERY_Bu1_TestWifi) [6.0* 9.0 12.0* 18.0 24.0* 36.0
48.0 54.0 Mbit] ESS CH: 149, PRIVACY
...
00:42:12.878434 1948017652us tsft 6.0 Mb/s 5745 MHz 11a -40dB signal
antenna 1 Probe Response (AVERY_Bu1_TestWifi) [6.0* 9.0 12.0* 18.0
24.0* 36.0 48.0 54.0 Mbit] CH: 149, PRIVACY
00:42:12.878450 1948017995us tsft 6.0 Mb/s 5745 MHz 11a -39dB signal
antenna 1 Probe Response (AVERY_Bu1_TestWifi) [6.0* 9.0 12.0* 18.0
24.0* 36.0 48.0 54.0 Mbit] CH: 149, PRIVACY
00:42:12.878455 1948018367us tsft 6.0 Mb/s 5745 MHz 11a -39dB signal
antenna 1 Probe Response (AVERY_Bu1_TestWifi) [6.0* 9.0 12.0* 18.0
24.0* 36.0 48.0 54.0 Mbit] CH: 149, PRIVACY
00:42:12.878463 1948018730us tsft 6.0 Mb/s 5745 MHz 11a -39dB signal
antenna 1 Probe Response (AVERY_Bu1_TestWifi) [6.0* 9.0 12.0* 18.0
24.0* 36.0 48.0 54.0 Mbit] CH: 149, PRIVACY

Any suggestions?  I looked at ath10k driver patches since the date we
last grabbed a copy, and I don't see anything relevant.

[-- Attachment #2: reduced-delayed-proberesp.pcap.gz --]
[-- Type: application/x-gzip, Size: 2743 bytes --]

[-- Attachment #3: Type: text/plain, Size: 146 bytes --]

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Probe Response packets sometimes delayed by 200ms
  2014-10-03  5:48 Probe Response packets sometimes delayed by 200ms Avery Pennarun
@ 2014-10-03  6:10 ` Michal Kazior
  2014-10-03  6:39   ` Avery Pennarun
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Kazior @ 2014-10-03  6:10 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: ath10k

On 3 October 2014 07:48, Avery Pennarun <apenwarr@gmail.com> wrote:
> Hi all,
>
> We're chasing a problem where, with an ath10k device running as access
> point and a Macbook Air as a client, responses to probe requests are
> sometimes missed by the Macbook Air because they are delayed for a
> long time, and the Macbook has changed channels by the time they come
> through.
>
> ath10k driver version: from backports as of kvalo/ath-next on
> 2014-03-08 (v3.15-rc1-237-gd9bc4b9)
>
> Firmware version: 10.1.467.3
>
> Steps:
> - start capturing packets, either on the ath10k AP itself or a
> secondary monitor system
> - on the Macbook Air (which has joined your SSID at least once
> before), open and close the wifi dropdown menu a few times.  (May also
> happen with clients other than Macbook Air; not heavily tested.)

Was the Macbook Air disconnected cleanly on the AP?

There's a tx credit starvation bug which blocks wmi commands after
disassoc+deauth frames are queued (via wmi as well) and aren't acked
by station in which case wmi peer delete command times out and
sta_state splats a calltrace in kernel logs. This effectively leaves
firmware thinking the peer is still connected and it is never
disconnected (you can expect spurious sta kickout events after an hour
once that happens). This could explain why ath10k AP tries to play
powersave with the Macbook Air.

Or perhaps this is related to uAPSD? Do you have it enabled in
hostapd? Is Macbook Air associating with uAPSD enabled?

Or maybe firmwar expects host driver should reset some ap-sta
powersave related settings before deleting peer entries.

I don't have any other ideas now.


Michał

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Probe Response packets sometimes delayed by 200ms
  2014-10-03  6:10 ` Michal Kazior
@ 2014-10-03  6:39   ` Avery Pennarun
  2014-10-03  7:37     ` Michal Kazior
  0 siblings, 1 reply; 5+ messages in thread
From: Avery Pennarun @ 2014-10-03  6:39 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k

On Fri, Oct 3, 2014 at 2:10 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
> On 3 October 2014 07:48, Avery Pennarun <apenwarr@gmail.com> wrote:
>> Steps:
>> - start capturing packets, either on the ath10k AP itself or a
>> secondary monitor system
>> - on the Macbook Air (which has joined your SSID at least once
>> before), open and close the wifi dropdown menu a few times.  (May also
>> happen with clients other than Macbook Air; not heavily tested.)
>
> Was the Macbook Air disconnected cleanly on the AP?

In my particular test case, I was actually already associated with the
AP while I was doing these steps.  I don't think that affects the
results, which means in this case that there is no question of being
uncleanly disconnected since I was not disconnected at all.

> There's a tx credit starvation bug which blocks wmi commands after
> disassoc+deauth frames are queued (via wmi as well) and aren't acked
> by station in which case wmi peer delete command times out and
> sta_state splats a calltrace in kernel logs. This effectively leaves
> firmware thinking the peer is still connected and it is never
> disconnected (you can expect spurious sta kickout events after an hour
> once that happens). This could explain why ath10k AP tries to play
> powersave with the Macbook Air.

I think we previously ran into the tx credit starvation bug and
cherry-picked one of your patches to fix it.  So I don't think that's
the problem here.

> Or perhaps this is related to uAPSD? Do you have it enabled in
> hostapd? Is Macbook Air associating with uAPSD enabled?

We tried enabling uAPSD but it caused lots of problems so we turned it
off again.

> Or maybe firmware expects host driver should reset some ap-sta
> powersave related settings before deleting peer entries.

I'm not sure it's deleting peer entries, but my guess is that the host
driver needs to reset some power save related setting, yeah.  I looked
through the ath10k driver and wasn't really sure where to look
however.  I found some global per-vdev power save mode settings, but
not any per-STA ones.

Thanks!

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Probe Response packets sometimes delayed by 200ms
  2014-10-03  6:39   ` Avery Pennarun
@ 2014-10-03  7:37     ` Michal Kazior
  2014-10-04  3:02       ` Avery Pennarun
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Kazior @ 2014-10-03  7:37 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: ath10k

On 3 October 2014 08:39, Avery Pennarun <apenwarr@gmail.com> wrote:
> On Fri, Oct 3, 2014 at 2:10 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
>> On 3 October 2014 07:48, Avery Pennarun <apenwarr@gmail.com> wrote:
>>> Steps:
>>> - start capturing packets, either on the ath10k AP itself or a
>>> secondary monitor system
>>> - on the Macbook Air (which has joined your SSID at least once
>>> before), open and close the wifi dropdown menu a few times.  (May also
>>> happen with clients other than Macbook Air; not heavily tested.)
>>
>> Was the Macbook Air disconnected cleanly on the AP?
>
> In my particular test case, I was actually already associated with the
> AP while I was doing these steps.  I don't think that affects the
> results, which means in this case that there is no question of being
> uncleanly disconnected since I was not disconnected at all.

But this kind of confirms that if there's a peer entry then ath10k AP
will try to do powersave game with probe req / resp.


>> There's a tx credit starvation bug which blocks wmi commands after
>> disassoc+deauth frames are queued (via wmi as well) and aren't acked
>> by station in which case wmi peer delete command times out and
>> sta_state splats a calltrace in kernel logs. This effectively leaves
>> firmware thinking the peer is still connected and it is never
>> disconnected (you can expect spurious sta kickout events after an hour
>> once that happens). This could explain why ath10k AP tries to play
>> powersave with the Macbook Air.
>
> I think we previously ran into the tx credit starvation bug and
> cherry-picked one of your patches to fix it.  So I don't think that's
> the problem here.

Tx starvation credit bug cannot be simply fixed in host. It needs
firmware changes as well which aren't there. Perhaps this is actually
what causes the problem? I recall my patches had a timeout on wmi mgmt
tx. Wasn't it 2x beacon interval? That's the 200ms. Pcap suggests your
beacon interval is 100ms.

Can you look at ath10k logs if each wmi mgmt tx is sent immediately
after wmi mgmt rx? Can you share the exact patch you cherry-picked?


>> Or perhaps this is related to uAPSD? Do you have it enabled in
>> hostapd? Is Macbook Air associating with uAPSD enabled?
>
> We tried enabling uAPSD but it caused lots of problems so we turned it
> off again.

I'm asking since it calls per-peer powersave wmi command a few times
(wmi_ap_ps_peer_cmd).


Michał

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Probe Response packets sometimes delayed by 200ms
  2014-10-03  7:37     ` Michal Kazior
@ 2014-10-04  3:02       ` Avery Pennarun
  0 siblings, 0 replies; 5+ messages in thread
From: Avery Pennarun @ 2014-10-04  3:02 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k

On Fri, Oct 3, 2014 at 3:37 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
> On 3 October 2014 08:39, Avery Pennarun <apenwarr@gmail.com> wrote:
>> On Fri, Oct 3, 2014 at 2:10 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
>>> Was the Macbook Air disconnected cleanly on the AP?
>>
>> In my particular test case, I was actually already associated with the
>> AP while I was doing these steps.  I don't think that affects the
>> results, which means in this case that there is no question of being
>> uncleanly disconnected since I was not disconnected at all.
>
> But this kind of confirms that if there's a peer entry then ath10k AP
> will try to do powersave game with probe req / resp.

Yeah, I'm not at all surprised if it's a powersave game.

>>> There's a tx credit starvation bug which blocks wmi commands after
>>> disassoc+deauth frames are queued (via wmi as well) and aren't acked
>>> by station in which case wmi peer delete command times out and
>>> sta_state splats a calltrace in kernel logs. This effectively leaves
>>> firmware thinking the peer is still connected and it is never
>>> disconnected (you can expect spurious sta kickout events after an hour
>>> once that happens). This could explain why ath10k AP tries to play
>>> powersave with the Macbook Air.
>>
>> I think we previously ran into the tx credit starvation bug and
>> cherry-picked one of your patches to fix it.  So I don't think that's
>> the problem here.
>
> Tx starvation credit bug cannot be simply fixed in host. It needs
> firmware changes as well which aren't there. Perhaps this is actually
> what causes the problem? I recall my patches had a timeout on wmi mgmt
> tx. Wasn't it 2x beacon interval? That's the 200ms. Pcap suggests your
> beacon interval is 100ms.
>
> Can you look at ath10k logs if each wmi mgmt tx is sent immediately
> after wmi mgmt rx? Can you share the exact patch you cherry-picked?

Hmm, okay, you're right, I see the 2*beacon_interval delay in your patch.

0531 ath10k: fix wmi-htc tx credit starvation
0532 ath10k: wait for mgmt tx when flushing too
0533 ath10k: improve tx flushing

Exact patches are visible here:
https://gfiber.googlesource.com/buildroot/+/master/package/backports-custom/

Looks like none of these were ever applied to kvalo's tree.  They are
essential for avoiding some really serious problems we had in the
field (ie. beacons stop getting sent).

I see from looking back at those threads that you did have a comment
about it being unfixable without a firmware change.  Reducing the WMI
timeout to 1s could help (sort of), but would not fix the problem in
the current thread, which requires transmissions in well under 100ms.
How is anyone surviving without a fix?  The problem triggers
frequently.

>>> Or perhaps this is related to uAPSD? Do you have it enabled in
>>> hostapd? Is Macbook Air associating with uAPSD enabled?
>>
>> We tried enabling uAPSD but it caused lots of problems so we turned it
>> off again.
>
> I'm asking since it calls per-peer powersave wmi command a few times
> (wmi_ap_ps_peer_cmd).

I could look up what problems uAPSD caused.  ISTR it was random driver
or firmware crashes on our setup, and we didn't have time to debug
further since it's an optional feature.

Thanks!

Avery

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-10-04  3:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-03  5:48 Probe Response packets sometimes delayed by 200ms Avery Pennarun
2014-10-03  6:10 ` Michal Kazior
2014-10-03  6:39   ` Avery Pennarun
2014-10-03  7:37     ` Michal Kazior
2014-10-04  3:02       ` Avery Pennarun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.