All of lore.kernel.org
 help / color / mirror / Atom feed
* Aggregation problem with rt2800 AP and Intel 5100 STA
@ 2011-03-23 22:58 Helmut Schaa
  2011-03-24  6:50 ` Emmanuel Grumbach
  0 siblings, 1 reply; 17+ messages in thread
From: Helmut Schaa @ 2011-03-23 22:58 UTC (permalink / raw)
  To: users; +Cc: Jay Hung, Eddy Tsai, linux-wireless

Hi,

[CC'ing Jay and Eddy as they might be able to answer some questions
 regarding 11n aggregation on rt2800 devices]

I'm using rt2800pci as AP. 11n aggregation works just fine when using a
Intel 5100 client and transferring data in both directions on Linux. Also
my Intel 4965 works just fine.

However, using Windows Vita on the same machine (Intel 5100) the Intel
RX reorder buffer seems to get confused. At least I can see (I can also
provide a pcap if anyone is interested) that the Intel STA still BlockAcks
received AMPDUs but the frames never make it out of the driver. It seems
as if they are stuck in the Intel driver/firmware (Windows).

I first thought this might be due to an old windows driver but even after
updating it to the newest version I can still reproduce this issue.

After debugging some more I noticed that this issue seems to appear as
soon as some parts (or maybe all parts) of an AMPDU (rt2800pci -> Intel 5100
Windows STA) are not BlockAcked and need to be retried. However, as far as I
can see the whole aggregation stuff including retransmissions on failures
should be handled entirely in hardware (only the BA session setup/teardown is
done by the driver/mac80211).

According to 802.11n-2009 the BA originator could send a BlockAckReq if 
an AMPDU is not BlockAcked in time. However, I never see the hw sending a
BlockAckReq.

Jay, Eddy, does the hw ever send a BlockAckReq if it doesn't receive a
BlockAck for an AMPDU?

At least, I hacked rt2x00 to set IEEE80211_TX_STAT_AMPDU_NO_BACK for failed
aggregated frames such that mac80211 sends a BlockAckReq for this failed MPDU
and this indeed seems to improve the situation but doesn't fix it completely.

Anything else I could try? Any ideas?

Thanks,
Helmut

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-23 22:58 Aggregation problem with rt2800 AP and Intel 5100 STA Helmut Schaa
@ 2011-03-24  6:50 ` Emmanuel Grumbach
  2011-03-24  7:36   ` Helmut Schaa
  0 siblings, 1 reply; 17+ messages in thread
From: Emmanuel Grumbach @ 2011-03-24  6:50 UTC (permalink / raw)
  To: Helmut Schaa; +Cc: users, Jay Hung, Eddy Tsai, linux-wireless

>
> According to 802.11n-2009 the BA originator could send a BlockAckReq if
> an AMPDU is not BlockAcked in time. However, I never see the hw sending a
> BlockAckReq.
>

Not sending BAR (BlockAckReq) can be quite problematic. The originator
needs to send BAR to tell the recipient that the frames that are
pending in the reordering buffer can be released to the upper level
even if there are holes in the packet sequence. Not sending BAR could
in theory lead to deadlock. In practice, most implementations of
reordering buffers release frames out of order after timeout.

taken from rx.c of mac80211:
/*
 * Timeout (in jiffies) for skb's that are waiting in the RX reorder buffer. If
 * the skb was added to the buffer longer than this time ago, the earlier
 * frames that have not yet been received are assumed to be lost and the skb
 * can be released for processing. This may also release other skb's from the
 * reorder buffer if there are no additional gaps between the frames.
 *
 * Callers must hold tid_agg_rx->reorder_lock.
 */
#define HT_RX_REORDER_BUF_TIMEOUT (HZ / 10)

I am quite surprised that you see that frames are "stuck in the
driver". This would mean that the windows drivers's implementation of
the reordering buffer doesn't have timer... or I miss something...

> At least, I hacked rt2x00 to set IEEE80211_TX_STAT_AMPDU_NO_BACK for failed
> aggregated frames such that mac80211 sends a BlockAckReq for this failed MPDU
> and this indeed seems to improve the situation but doesn't fix it completely.

Can you please be more specific by "improve situation" ? better TPT ?
lower packet loss ?
What do you mean by not fixed completely ?
Do you track packets out of the driver ?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-24  6:50 ` Emmanuel Grumbach
@ 2011-03-24  7:36   ` Helmut Schaa
  2011-03-24 13:09     ` Helmut Schaa
  0 siblings, 1 reply; 17+ messages in thread
From: Helmut Schaa @ 2011-03-24  7:36 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: users, Jay Hung, Eddy Tsai, linux-wireless

Am Donnerstag, 24. März 2011 schrieb Emmanuel Grumbach:
> > According to 802.11n-2009 the BA originator could send a BlockAckReq if
> > an AMPDU is not BlockAcked in time. However, I never see the hw sending a
> > BlockAckReq.
> >
> 
> Not sending BAR (BlockAckReq) can be quite problematic. The originator
> needs to send BAR to tell the recipient that the frames that are
> pending in the reordering buffer can be released to the upper level
> even if there are holes in the packet sequence. Not sending BAR could
> in theory lead to deadlock. In practice, most implementations of
> reordering buffers release frames out of order after timeout.

Thanks for the clarification. However 802.11n-2009 doesn't seem to require
a BA originator to send an BlockAckReq in that case:

Page 173, 9.10.7.7:

"The originator may send a BlockAckReq for non-Protected Block Ack agreement or a Robust Management
ADDBA frame for Protected Block Ack agreement when a data MPDU that was previously transmitted
within an A-MPDU that had the Ack Policy field set to Normal Ack is discarded due to exhausted MSDU
lifetime. The purpose of this BlockAckReq is to shift the recipient’s WinStartB value past the hole in the
sequence number space that is created by the discarded data MPDU and thereby to allow the earliest
possible passing of buffered frames up to the next MAC process."

But I agree, it makes totally sense to do so if a AMPDU subframe failed and
that's what I've tried by using IEEE80211_TX_STAT_AMPDU_NO_BACK.

> taken from rx.c of mac80211:
> /*
>  * Timeout (in jiffies) for skb's that are waiting in the RX reorder buffer. If
>  * the skb was added to the buffer longer than this time ago, the earlier
>  * frames that have not yet been received are assumed to be lost and the skb
>  * can be released for processing. This may also release other skb's from the
>  * reorder buffer if there are no additional gaps between the frames.
>  *
>  * Callers must hold tid_agg_rx->reorder_lock.
>  */
> #define HT_RX_REORDER_BUF_TIMEOUT (HZ / 10)
> 
> I am quite surprised that you see that frames are "stuck in the
> driver". This would mean that the windows drivers's implementation of
> the reordering buffer doesn't have timer... or I miss something...

I don't know what the Windows driver is doing but it _looks_ as if it doesn't
release the frames with a timeout. I also tried to completely disable TX
aggregation on the rt2800 AP and that made the connection stable. Hence my
assumption that the frames are stuck somewhere in the reorder buffer. More
details below ...

> > At least, I hacked rt2x00 to set IEEE80211_TX_STAT_AMPDU_NO_BACK for failed
> > aggregated frames such that mac80211 sends a BlockAckReq for this failed MPDU
> > and this indeed seems to improve the situation but doesn't fix it completely.
> 
> Can you please be more specific by "improve situation" ? better TPT ?
> lower packet loss ?
> What do you mean by not fixed completely ?
> Do you track packets out of the driver ?

That's what I'm doing on the Intel machine:
- Associate the Intel 5100 Windows STA with the rt2800pci AP
- Start a ping from the Intel STA to a station on the LAN
- Run iperf between the Intel STA and a different station on the LAN attached
  to the AP (the direction doesn't matter that much as long as iperf is 
  running in TCP mode and thus frames are sent in both directions)

And that's what I observe:

It takes just a few seconds and iperf will stop printing stats and the
concurrently running ping always times out. Sniffing the traffic with a
different wifi STA I can see the pings (Intel -> AP -> lan station) on the
air and the reponse (lan station -> AP -> Intel) as well, and the reponse
also gets BlockAcked by the Intel STA but the user space ping process never
gets the ping response and still only prints timeouts. So, I can only observe
that the ping response is correctly send over the air by the rt2800 AP and the
Intel STA seems to correctly receive it but doesn't pass it to the user space.
Stopping all traffic now will lead to the BA session (AP -> STA) being teared
down (after a few seconds) and afterwards I can start to ping again.

After adding the code to issue BARs when a AMPDU subframe failed, the issue
seems to not happen anymore. However, in some rare cases it happened again and
the Intel STA wasn't able to "receive" anything anymore (for example it still
happenend after running an iperf for ~150 seconds instead of just a few seconds
as before).

At least I'll first submit a patch to pass IEEE80211_TX_STAT_AMPDU_NO_BACK to
mac80211 in case of a failed AMPDU subframe. And the remaining issue might be
due to a different bug.

Thanks,
Helmut

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-24  7:36   ` Helmut Schaa
@ 2011-03-24 13:09     ` Helmut Schaa
  2011-03-24 13:19       ` Johannes Berg
  0 siblings, 1 reply; 17+ messages in thread
From: Helmut Schaa @ 2011-03-24 13:09 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Emmanuel Grumbach, linux-wireless

[trimmed CC as this is not rt2x00 specific anymore]

Am Donnerstag, 24. März 2011 schrieb Helmut Schaa:
> After adding the code to issue BARs when a AMPDU subframe failed, the issue
> seems to not happen anymore. However, in some rare cases it happened again and
> the Intel STA wasn't able to "receive" anything anymore (for example it still
> happenend after running an iperf for ~150 seconds instead of just a few seconds
> as before).

Ok, found out some more. The problem now seems to trigger only if the Intel STA
leaves powersave and mac80211 dropped a frame while the STA was sleeping (either
due to the size of the STA PS buffer or due to a race between rx/tx processing).

This can also cause a sequence number hole in the STAs rx reordering buffer but
we won't send a BAR in that case ending in the same situation that the Intel STA
BlockAcks AMPDUs to it but doesn't pass them to the user space.

Johannes, does it make sense to always send a BAR when a STA wakes up from
sleep and we've got an active aggregation session?

Thanks,
Helmut

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-24 13:09     ` Helmut Schaa
@ 2011-03-24 13:19       ` Johannes Berg
  2011-03-24 13:57         ` Helmut Schaa
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2011-03-24 13:19 UTC (permalink / raw)
  To: Helmut Schaa; +Cc: Emmanuel Grumbach, linux-wireless

On Thu, 2011-03-24 at 14:09 +0100, Helmut Schaa wrote:

> Ok, found out some more. The problem now seems to trigger only if the Intel STA
> leaves powersave and mac80211 dropped a frame while the STA was sleeping (either
> due to the size of the STA PS buffer or due to a race between rx/tx processing).

Why do we even have aggregation sessions active while the station goes
to sleep? Is it scanning?

Maybe the PS buffer should be larger then? I don't see how we can lose a
frame due to rx/tx processing races either, how does that happen? I
thought we had the ability to avoid all those races now.

> This can also cause a sequence number hole in the STAs rx reordering buffer but
> we won't send a BAR in that case ending in the same situation that the Intel STA
> BlockAcks AMPDUs to it but doesn't pass them to the user space.
> 
> Johannes, does it make sense to always send a BAR when a STA wakes up from
> sleep and we've got an active aggregation session?

I don't know, you tell me, does it? :)

joahnnes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-24 13:19       ` Johannes Berg
@ 2011-03-24 13:57         ` Helmut Schaa
  2011-03-24 17:32           ` Johannes Berg
  0 siblings, 1 reply; 17+ messages in thread
From: Helmut Schaa @ 2011-03-24 13:57 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Emmanuel Grumbach, linux-wireless

Am Donnerstag, 24. März 2011 schrieb Johannes Berg:
> On Thu, 2011-03-24 at 14:09 +0100, Helmut Schaa wrote:
> 
> > Ok, found out some more. The problem now seems to trigger only if the Intel STA
> > leaves powersave and mac80211 dropped a frame while the STA was sleeping (either
> > due to the size of the STA PS buffer or due to a race between rx/tx processing).
> 
> Why do we even have aggregation sessions active while the station goes
> to sleep? Is it scanning?

I guess so, yes (as I wrote before this is an Intel 5100 client running Windows
Vista and the latest Intel driver). It sends a nullfunc going to sleep and returns
a few hundret ms later. And in the meantime I can see my rt2800 AP sending an
AMPDU to the sleeping STA which of course times out and therefore ends up as
filtered frame.

> Maybe the PS buffer should be larger then?

128 frames per STA is already large, no? And also it's nothing unusual
that some frames get dropped if the STA stays in powersave for a long time.

> I don't see how we can lose a
> frame due to rx/tx processing races either, how does that happen? I
> thought we had the ability to avoid all those races now.

Good point. Maybe this only happens with rt2x00. The frame exchange looks
basically like (if you want to see the pcap just ask ;) ):

STA -> AP  nullfunc PM=1
AP -> STA  AMPDU (seqnr 3106 - 3112)
AP -> STA  AMPDU (seqnr 3106 - 3112), retry
AP -> STA  AMPDU (seqnr 3106 - 3112), retry
AP -> STA  AMPDU (seqnr 3106 - 3112), retry
AP -> STA  AMPDU (seqnr 3106 - 3112), retry
AP -> STA  AMPDU (seqnr 3106 - 3112), retry
...
STA -> AP nullfunc PM=0
...
AP -> STA  AMPDU (seqnr 3108)
STA -> AP  BlockAck
AP -> STA  AMPDU (seqnr 3109 - 3114)
STA -> AP  BlockAck
...

As you can see 3106 and 3107 somehow got lost and thus leave a hole in
the STAs reorder buffer leading to the strange behavior I described before.

> > This can also cause a sequence number hole in the STAs rx reordering buffer but
> > we won't send a BAR in that case ending in the same situation that the Intel STA
> > BlockAcks AMPDUs to it but doesn't pass them to the user space.
> > 
> > Johannes, does it make sense to always send a BAR when a STA wakes up from
> > sleep and we've got an active aggregation session?
> 
> I don't know, you tell me, does it? :)

Ha Ha ;)

Sending a BAR at this point would at least ensure that the Intel STAs RX
reorder buffer gets flushed (in case we dropped any frames while the STA was
sleeping, which happened in this case).

Helmut

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-24 13:57         ` Helmut Schaa
@ 2011-03-24 17:32           ` Johannes Berg
  2011-03-24 20:38             ` Emmanuel Grumbach
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2011-03-24 17:32 UTC (permalink / raw)
  To: Helmut Schaa; +Cc: Emmanuel Grumbach, linux-wireless

On Thu, 2011-03-24 at 14:57 +0100, Helmut Schaa wrote:

> I guess so, yes (as I wrote before this is an Intel 5100 client running Windows
> Vista and the latest Intel driver). It sends a nullfunc going to sleep and returns
> a few hundret ms later. And in the meantime I can see my rt2800 AP sending an
> AMPDU to the sleeping STA which of course times out and therefore ends up as
> filtered frame.

Ok so it gets reported to mac80211.

> > Maybe the PS buffer should be larger then?
> 
> 128 frames per STA is already large, no? And also it's nothing unusual
> that some frames get dropped if the STA stays in powersave for a long time.
> 
> > I don't see how we can lose a
> > frame due to rx/tx processing races either, how does that happen? I
> > thought we had the ability to avoid all those races now.
> 
> Good point. Maybe this only happens with rt2x00. The frame exchange looks
> basically like (if you want to see the pcap just ask ;) ):
> 
> STA -> AP  nullfunc PM=1
> AP -> STA  AMPDU (seqnr 3106 - 3112)
> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> ...
> STA -> AP nullfunc PM=0
> ...
> AP -> STA  AMPDU (seqnr 3108)
> STA -> AP  BlockAck
> AP -> STA  AMPDU (seqnr 3109 - 3114)
> STA -> AP  BlockAck
> ...
> 
> As you can see 3106 and 3107 somehow got lost and thus leave a hole in
> the STAs reorder buffer leading to the strange behavior I described before.

but if they are reported to mac80211 they should be put on the queue
again? Are they maybe not reported back quite in the right way?

> > I don't know, you tell me, does it? :)
> 
> Ha Ha ;)
> 
> Sending a BAR at this point would at least ensure that the Intel STAs RX
> reorder buffer gets flushed (in case we dropped any frames while the STA was
> sleeping, which happened in this case).

Right, but so far this looks like it's more like a bug in the powersave
code rather than something we desperately need to recover from with a
BAR.

johannes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-24 17:32           ` Johannes Berg
@ 2011-03-24 20:38             ` Emmanuel Grumbach
  2011-03-24 20:40               ` Johannes Berg
                                 ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Emmanuel Grumbach @ 2011-03-24 20:38 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Helmut Schaa, linux-wireless

On Thu, Mar 24, 2011 at 19:32, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Thu, 2011-03-24 at 14:57 +0100, Helmut Schaa wrote:
>
>> I guess so, yes (as I wrote before this is an Intel 5100 client running Windows
>> Vista and the latest Intel driver). It sends a nullfunc going to sleep and returns
>> a few hundret ms later. And in the meantime I can see my rt2800 AP sending an
>> AMPDU to the sleeping STA which of course times out and therefore ends up as
>> filtered frame.
>
> Ok so it gets reported to mac80211.
>
>> > Maybe the PS buffer should be larger then?
>>
>> 128 frames per STA is already large, no? And also it's nothing unusual
>> that some frames get dropped if the STA stays in powersave for a long time.
>>
>> > I don't see how we can lose a
>> > frame due to rx/tx processing races either, how does that happen? I
>> > thought we had the ability to avoid all those races now.
>>
>> Good point. Maybe this only happens with rt2x00. The frame exchange looks
>> basically like (if you want to see the pcap just ask ;) ):
>>
>> STA -> AP  nullfunc PM=1
>> AP -> STA  AMPDU (seqnr 3106 - 3112)
>> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
>> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
>> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
>> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
>> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
>> ...
>> STA -> AP nullfunc PM=0
>> ...

Hold on, why do mac80211 send a frame to a sleeping station ?
Am I stupid or is this buggy ?
Johannes, is this the buggy behavior in PS you were pointing out later
in your mail ?


>> AP -> STA  AMPDU (seqnr 3108)
>> STA -> AP  BlockAck
>> AP -> STA  AMPDU (seqnr 3109 - 3114)
>> STA -> AP  BlockAck
>> ...
>>
>> As you can see 3106 and 3107 somehow got lost and thus leave a hole in
>> the STAs reorder buffer leading to the strange behavior I described before.
>
> but if they are reported to mac80211 they should be put on the queue
> again? Are they maybe not reported back quite in the right way?
>
>> > I don't know, you tell me, does it? :)
>>
>> Ha Ha ;)
>>
>> Sending a BAR at this point would at least ensure that the Intel STAs RX
>> reorder buffer gets flushed (in case we dropped any frames while the STA was
>> sleeping, which happened in this case).
>
> Right, but so far this looks like it's more like a bug in the powersave
> code rather than something we desperately need to recover from with a
> BAR.
>

The originator tried to send data and this data didn't get acked. In
this case a BAR should be sent regardless the reason why this frame
didn't get acked.
This is done by reporting IEEE80211_TX_STAT_AMPDU_NO_BACK to mac80211.
Can it be that the rt2800 doesn't report properly the xmit failure ?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-24 20:38             ` Emmanuel Grumbach
@ 2011-03-24 20:40               ` Johannes Berg
  2011-03-24 20:41                 ` Johannes Berg
  2011-03-25  8:25               ` Helmut Schaa
  2011-03-25 11:08               ` Helmut Schaa
  2 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2011-03-24 20:40 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Helmut Schaa, linux-wireless

On Thu, 2011-03-24 at 22:38 +0200, Emmanuel Grumbach wrote:

> >> STA -> AP  nullfunc PM=1
> >> AP -> STA  AMPDU (seqnr 3106 - 3112)
> >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> >> ...
> >> STA -> AP nullfunc PM=0
> >> ...
> 
> Hold on, why do mac80211 send a frame to a sleeping station ?
> Am I stupid or is this buggy ?

Heh, neither. This happens when the frame is already queued on the
hardware queue.

johannes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-24 20:40               ` Johannes Berg
@ 2011-03-24 20:41                 ` Johannes Berg
  2011-03-24 20:48                   ` Emmanuel Grumbach
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2011-03-24 20:41 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Helmut Schaa, linux-wireless

On Thu, 2011-03-24 at 21:40 +0100, Johannes Berg wrote:
> On Thu, 2011-03-24 at 22:38 +0200, Emmanuel Grumbach wrote:
> 
> > >> STA -> AP  nullfunc PM=1
> > >> AP -> STA  AMPDU (seqnr 3106 - 3112)
> > >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> > >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> > >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> > >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> > >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> > >> ...
> > >> STA -> AP nullfunc PM=0
> > >> ...
> > 
> > Hold on, why do mac80211 send a frame to a sleeping station ?
> > Am I stupid or is this buggy ?
> 
> Heh, neither. This happens when the frame is already queued on the
> hardware queue.

Of course, most (sane) hardware/firmware designs have some way to avoid
that race by having the device itself note the WAKE->SLEEP transition,
and the requiring the driver to tell the device about the SLEEP->WAKE
transition, which enables us to never do this (sending a frame to a
station that is now asleep).

johannes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-24 20:41                 ` Johannes Berg
@ 2011-03-24 20:48                   ` Emmanuel Grumbach
  0 siblings, 0 replies; 17+ messages in thread
From: Emmanuel Grumbach @ 2011-03-24 20:48 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Helmut Schaa, linux-wireless

On Thu, Mar 24, 2011 at 22:41, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Thu, 2011-03-24 at 21:40 +0100, Johannes Berg wrote:
>> On Thu, 2011-03-24 at 22:38 +0200, Emmanuel Grumbach wrote:
>>
>> > >> STA -> AP  nullfunc PM=1
>> > >> AP -> STA  AMPDU (seqnr 3106 - 3112)
>> > >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
>> > >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
>> > >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
>> > >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
>> > >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
>> > >> ...
>> > >> STA -> AP nullfunc PM=0
>> > >> ...
>> >
>> > Hold on, why do mac80211 send a frame to a sleeping station ?
>> > Am I stupid or is this buggy ?
>>
>> Heh, neither. This happens when the frame is already queued on the
>> hardware queue.
>
> Of course, most (sane) hardware/firmware designs have some way to avoid
> that race by having the device itself note the WAKE->SLEEP transition,
> and the requiring the driver to tell the device about the SLEEP->WAKE
> transition, which enables us to never do this (sending a frame to a
> station that is now asleep).
>

good point. For the same reasons, DELBA Deauth and friends are handled
in FW too...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-24 20:38             ` Emmanuel Grumbach
  2011-03-24 20:40               ` Johannes Berg
@ 2011-03-25  8:25               ` Helmut Schaa
  2011-03-25 11:08               ` Helmut Schaa
  2 siblings, 0 replies; 17+ messages in thread
From: Helmut Schaa @ 2011-03-25  8:25 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Johannes Berg, linux-wireless

Am Donnerstag, 24. März 2011 schrieb Emmanuel Grumbach:
> The originator tried to send data and this data didn't get acked. In
> this case a BAR should be sent regardless the reason why this frame
> didn't get acked.
> This is done by reporting IEEE80211_TX_STAT_AMPDU_NO_BACK to mac80211.
> Can it be that the rt2800 doesn't report properly the xmit failure ?

Good point. During "normal" operation rt2800 correctly sets
IEEE80211_TX_STAT_AMPDU_NO_BACK on a failure, so it _should_ have done
this in the above example but I'll add some more debug output and verify
this again.

Helmut

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-24 20:38             ` Emmanuel Grumbach
  2011-03-24 20:40               ` Johannes Berg
  2011-03-25  8:25               ` Helmut Schaa
@ 2011-03-25 11:08               ` Helmut Schaa
  2011-03-25 11:23                 ` Johannes Berg
  2 siblings, 1 reply; 17+ messages in thread
From: Helmut Schaa @ 2011-03-25 11:08 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Johannes Berg, linux-wireless

Am Donnerstag, 24. März 2011 schrieb Emmanuel Grumbach:
> On Thu, Mar 24, 2011 at 19:32, Johannes Berg <johannes@sipsolutions.net> wrote:
> > On Thu, 2011-03-24 at 14:57 +0100, Helmut Schaa wrote:
> >
> >> I guess so, yes (as I wrote before this is an Intel 5100 client running Windows
> >> Vista and the latest Intel driver). It sends a nullfunc going to sleep and returns
> >> a few hundret ms later. And in the meantime I can see my rt2800 AP sending an
> >> AMPDU to the sleeping STA which of course times out and therefore ends up as
> >> filtered frame.
> >
> > Ok so it gets reported to mac80211.
> >
> >> > Maybe the PS buffer should be larger then?
> >>
> >> 128 frames per STA is already large, no? And also it's nothing unusual
> >> that some frames get dropped if the STA stays in powersave for a long time.
> >>
> >> > I don't see how we can lose a
> >> > frame due to rx/tx processing races either, how does that happen? I
> >> > thought we had the ability to avoid all those races now.
> >>
> >> Good point. Maybe this only happens with rt2x00. The frame exchange looks
> >> basically like (if you want to see the pcap just ask ;) ):
> >>
> >> STA -> AP  nullfunc PM=1
> >> AP -> STA  AMPDU (seqnr 3106 - 3112)
> >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> >> AP -> STA  AMPDU (seqnr 3106 - 3112), retry
> >> ...
> >> STA -> AP nullfunc PM=0
> >> ...
> 
> Hold on, why do mac80211 send a frame to a sleeping station ?
> Am I stupid or is this buggy ?
> Johannes, is this the buggy behavior in PS you were pointing out later
> in your mail ?
> 
> 
> >> AP -> STA  AMPDU (seqnr 3108)
> >> STA -> AP  BlockAck
> >> AP -> STA  AMPDU (seqnr 3109 - 3114)
> >> STA -> AP  BlockAck
> >> ...
> >>
> >> As you can see 3106 and 3107 somehow got lost and thus leave a hole in
> >> the STAs reorder buffer leading to the strange behavior I described before.
> >
> > but if they are reported to mac80211 they should be put on the queue
> > again? Are they maybe not reported back quite in the right way?
> >
> >> > I don't know, you tell me, does it? :)
> >>
> >> Ha Ha ;)
> >>
> >> Sending a BAR at this point would at least ensure that the Intel STAs RX
> >> reorder buffer gets flushed (in case we dropped any frames while the STA was
> >> sleeping, which happened in this case).
> >
> > Right, but so far this looks like it's more like a bug in the powersave
> > code rather than something we desperately need to recover from with a
> > BAR.
> >
> 
> The originator tried to send data and this data didn't get acked. In
> this case a BAR should be sent regardless the reason why this frame
> didn't get acked.
> This is done by reporting IEEE80211_TX_STAT_AMPDU_NO_BACK to mac80211.

mac80211 isn't sending a BAR in the case above because of this code
in ieee80211_tx_status (status.c):

218                 acked = !!(info->flags & IEEE80211_TX_STAT_ACK);
219                 if (!acked && test_sta_flags(sta, WLAN_STA_PS_STA)) {
220                         /*
221                          * The STA is in power save mode, so assume
222                          * that this TX packet failed because of that.
223                          */
224                         ieee80211_handle_filtered_frame(local, sta, skb);
225                         rcu_read_unlock();
226                         return;
227                 }
...
233                 if ((info->flags & IEEE80211_TX_STAT_AMPDU_NO_BACK) &&
234                     (ieee80211_is_data_qos(fc))) {
235                         u16 tid, ssn;
236                         u8 *qc;
237 
238                         qc = ieee80211_get_qos_ctl(hdr);
239                         tid = qc[0] & 0xf;
240                         ssn = ((le16_to_cpu(hdr->seq_ctrl) + 0x10)
241                                                 & IEEE80211_SCTL_SEQ);
242                         ieee80211_send_bar(sta->sdata, hdr->addr1,
243                                            tid, ssn);
244                 }

So, if a frame failed because the STA entered PS state it will be added
to the filtered frames queue and no BAR will get generated.

However, if this was the case mac80211 should send out all filtered frames
after the STA wakes up again, which does not happen in the situation
above (3106 and 3107 got lost) :(

Generating the BAR also for filtered frames might improve the situation but
looks more like a workaround to me.

Still debugging the rt2800 tx status reporting.

Helmut

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-25 11:08               ` Helmut Schaa
@ 2011-03-25 11:23                 ` Johannes Berg
  2011-03-25 12:55                   ` Helmut Schaa
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2011-03-25 11:23 UTC (permalink / raw)
  To: Helmut Schaa; +Cc: Emmanuel Grumbach, linux-wireless

On Fri, 2011-03-25 at 12:08 +0100, Helmut Schaa wrote:

> mac80211 isn't sending a BAR in the case above because of this code
> in ieee80211_tx_status (status.c):
> 
> 218                 acked = !!(info->flags & IEEE80211_TX_STAT_ACK);
> 219                 if (!acked && test_sta_flags(sta, WLAN_STA_PS_STA)) {
> 220                         /*
> 221                          * The STA is in power save mode, so assume
> 222                          * that this TX packet failed because of that.
> 223                          */
> 224                         ieee80211_handle_filtered_frame(local, sta, skb);
> 225                         rcu_read_unlock();
> 226                         return;
> 227                 }
> ...
> 233                 if ((info->flags & IEEE80211_TX_STAT_AMPDU_NO_BACK) &&
> 234                     (ieee80211_is_data_qos(fc))) {
> 235                         u16 tid, ssn;
> 236                         u8 *qc;
> 237 
> 238                         qc = ieee80211_get_qos_ctl(hdr);
> 239                         tid = qc[0] & 0xf;
> 240                         ssn = ((le16_to_cpu(hdr->seq_ctrl) + 0x10)
> 241                                                 & IEEE80211_SCTL_SEQ);
> 242                         ieee80211_send_bar(sta->sdata, hdr->addr1,
> 243                                            tid, ssn);
> 244                 }
> 
> So, if a frame failed because the STA entered PS state it will be added
> to the filtered frames queue and no BAR will get generated.

Right. Of course, if the device could help and the driver would set
IEEE80211_TX_STAT_TX_FILTERED this would be better since then the frames
wouldn't even be on the air...

FWIW, since Intel hw uses special queues for aggregation the ucode just
stops those queues when the station goes to sleep. Caused me some grey
hair in fact since I had naively expected it to reject those frames too.

> However, if this was the case mac80211 should send out all filtered frames
> after the STA wakes up again, which does not happen in the situation
> above (3106 and 3107 got lost) :(

Yeah, that's in fact strange. Unless they timed out?

> Generating the BAR also for filtered frames might improve the situation but
> looks more like a workaround to me.

Agree.

> Still debugging the rt2800 tx status reporting.

Maybe it's not reporting the right status for all frames?

johannes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-25 11:23                 ` Johannes Berg
@ 2011-03-25 12:55                   ` Helmut Schaa
  2011-03-25 13:06                     ` Johannes Berg
  0 siblings, 1 reply; 17+ messages in thread
From: Helmut Schaa @ 2011-03-25 12:55 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Emmanuel Grumbach, linux-wireless

Am Freitag, 25. März 2011 schrieb Johannes Berg:
> On Fri, 2011-03-25 at 12:08 +0100, Helmut Schaa wrote:
> > So, if a frame failed because the STA entered PS state it will be added
> > to the filtered frames queue and no BAR will get generated.
> 
> Right. Of course, if the device could help and the driver would set
> IEEE80211_TX_STAT_TX_FILTERED this would be better since then the frames
> wouldn't even be on the air...

Yeah, unfortunately ralink devices don't seem capable of handling that
in hw/fw :(
 
> FWIW, since Intel hw uses special queues for aggregation the ucode just
> stops those queues when the station goes to sleep. Caused me some grey
> hair in fact since I had naively expected it to reject those frames too.
> 
> > However, if this was the case mac80211 should send out all filtered frames
> > after the STA wakes up again, which does not happen in the situation
> > above (3106 and 3107 got lost) :(
> 
> Yeah, that's in fact strange. Unless they timed out?

Hmm, right, maybe I should enable mac80211 PS debugging to see if frames
time out or get dropped from the buffer ...

> > Generating the BAR also for filtered frames might improve the situation but
> > looks more like a workaround to me.
> 
> Agree.
> 
> > Still debugging the rt2800 tx status reporting.
> 
> Maybe it's not reporting the right status for all frames?

So far the tx status as returned by the device look correct. I printed the
sequence numbers of failed AMPDU subframes and had a second device capturing
the traffic and compared them.

But I noticed that there still appear holes from time to time that cause the
strange Intel driver stuck problem (which is really really strange and IMHO
the driver has a problem too). And it is not necessarily PS related.

Johannes, what happens if a frame gets dropped on its way though the tx path
after getting a seqnr assigned? That would cause a hole in the sequence number
series without the chance to get a IEEE80211_TX_STAT_AMPDU_NO_BACK tx status.

Helmut

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-25 12:55                   ` Helmut Schaa
@ 2011-03-25 13:06                     ` Johannes Berg
  2011-03-28 11:38                       ` Helmut Schaa
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2011-03-25 13:06 UTC (permalink / raw)
  To: Helmut Schaa; +Cc: Emmanuel Grumbach, linux-wireless

On Fri, 2011-03-25 at 13:55 +0100, Helmut Schaa wrote:

> So far the tx status as returned by the device look correct. I printed the
> sequence numbers of failed AMPDU subframes and had a second device capturing
> the traffic and compared them.
> 
> But I noticed that there still appear holes from time to time that cause the
> strange Intel driver stuck problem (which is really really strange and IMHO
> the driver has a problem too). And it is not necessarily PS related.
> 
> Johannes, what happens if a frame gets dropped on its way though the tx path
> after getting a seqnr assigned? That would cause a hole in the sequence number
> series without the chance to get a IEEE80211_TX_STAT_AMPDU_NO_BACK tx status.

Yeah that's correct, try not to do that :) It's one of the reasons the
Intel driver assigns its own sequence numbers to AMPDU frames (the other
being some odd seqno/queue place match requirement.)

johannes


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Aggregation problem with rt2800 AP and Intel 5100 STA
  2011-03-25 13:06                     ` Johannes Berg
@ 2011-03-28 11:38                       ` Helmut Schaa
  0 siblings, 0 replies; 17+ messages in thread
From: Helmut Schaa @ 2011-03-28 11:38 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Emmanuel Grumbach, linux-wireless

Am Freitag, 25. März 2011 schrieb Johannes Berg:
> On Fri, 2011-03-25 at 13:55 +0100, Helmut Schaa wrote:
> 
> > So far the tx status as returned by the device look correct. I printed the
> > sequence numbers of failed AMPDU subframes and had a second device capturing
> > the traffic and compared them.

JFI, it seems indeed as if the tx status reporting in rt2800 is not 100%
accurate [1]. Not sure if this is really the culprit but it looks
feasible.

Helmut

[1]  http://rt2x00.serialmonkey.com/pipermail/users_rt2x00.serialmonkey.com/2011-March/003456.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-03-28 11:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-23 22:58 Aggregation problem with rt2800 AP and Intel 5100 STA Helmut Schaa
2011-03-24  6:50 ` Emmanuel Grumbach
2011-03-24  7:36   ` Helmut Schaa
2011-03-24 13:09     ` Helmut Schaa
2011-03-24 13:19       ` Johannes Berg
2011-03-24 13:57         ` Helmut Schaa
2011-03-24 17:32           ` Johannes Berg
2011-03-24 20:38             ` Emmanuel Grumbach
2011-03-24 20:40               ` Johannes Berg
2011-03-24 20:41                 ` Johannes Berg
2011-03-24 20:48                   ` Emmanuel Grumbach
2011-03-25  8:25               ` Helmut Schaa
2011-03-25 11:08               ` Helmut Schaa
2011-03-25 11:23                 ` Johannes Berg
2011-03-25 12:55                   ` Helmut Schaa
2011-03-25 13:06                     ` Johannes Berg
2011-03-28 11:38                       ` Helmut Schaa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.