[RFC] ath9k: improve aggregation throughput by using only first rate

linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] ath9k: improve aggregation throughput by using only first rate
@ 2010-07-26 17:10 Björn Smedman
  2010-07-26 17:44 ` [ath9k-devel] " Felix Fietkau
  0 siblings, 1 reply; 7+ messages in thread
From: Björn Smedman @ 2010-07-26 17:10 UTC (permalink / raw)
  To: ath9k-devel, linux-wireless

Hi all,

I've been running a lot of iperf on AR913x /
compat-wireless-2010-07-16 (w/ openwrt/trunk@22388).

I think there are some (in theory) simple improvements that can be
done to the tx aggregation / rate control logic. A proof of concept of
one such improvement is provided below. Basically, it's a hack that
makes ath9k output aggregates with only the first rate in the rate
series. The reasoning is that a failure is not a problem for
aggregates because there is software retry. Retrying in hardware at a
slower rate is counter productive. So, better to fail and do a
software retry at possibly another rate. Also, since the aggregate
size is often limited by the slowest rate in the MRR series (4 ms txop
limit) having a slow rate in the series may affect performance even if
it is never used by the hardware.

In my (not so scientific) tests max AP downstream throughput increases
about 30-40% with the patch below (from 33.9 to 55.7 Mbit/s with HT20
in noisy environment with 20 meters and a few walls between AP and
client).

Of course, if all rates in the series are high then this patch has no effect.

/Björn
---
diff -urpN a/drivers/net/wireless/ath/ath9k/xmit.c
b/drivers/net/wireless/ath/ath9k/xmit.c
--- a/drivers/net/wireless/ath/ath9k/xmit.c	2010-07-26 15:35:17.000000000 +0200
+++ b/drivers/net/wireless/ath/ath9k/xmit.c	2010-07-26 17:11:33.000000000 +0200
@@ -565,7 +565,7 @@ static u32 ath_lookup_rate(struct ath_so
 	 */
 	max_4ms_framelen = ATH_AMPDU_LIMIT_MAX;

-	for (i = 0; i < 4; i++) {
+	for (i = 0; i < 1; i++) {
 		if (rates[i].count) {
 			int modeidx;
 			if (!(rates[i].flags & IEEE80211_TX_RC_MCS)) {
@@ -1553,6 +1553,9 @@ static void ath_buf_set_rate(struct ath_
 	if (sc->sc_flags & SC_OP_PREAMBLE_SHORT)
 		ctsrate |= rate->hw_value_short;

+	if (bf_isaggr(bf))
+		rates[1].count = rates[2].count = rates[3].count = 0;
+
 	for (i = 0; i < 4; i++) {
 		bool is_40, is_sgi, is_sp;
 		int phy;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
  2010-07-26 17:10 [RFC] ath9k: improve aggregation throughput by using only first rate Björn Smedman
@ 2010-07-26 17:44 ` Felix Fietkau
  2010-07-26 19:23   ` Björn Smedman
  0 siblings, 1 reply; 7+ messages in thread
From: Felix Fietkau @ 2010-07-26 17:44 UTC (permalink / raw)
  To: Björn Smedman; +Cc: ath9k-devel, linux-wireless

On 2010-07-26 7:10 PM, Björn Smedman wrote:
> Hi all,
> 
> I've been running a lot of iperf on AR913x /
> compat-wireless-2010-07-16 (w/ openwrt/trunk@22388).
> 
> I think there are some (in theory) simple improvements that can be
> done to the tx aggregation / rate control logic. A proof of concept of
> one such improvement is provided below. Basically, it's a hack that
> makes ath9k output aggregates with only the first rate in the rate
> series. The reasoning is that a failure is not a problem for
> aggregates because there is software retry. Retrying in hardware at a
> slower rate is counter productive. So, better to fail and do a
> software retry at possibly another rate. Also, since the aggregate
> size is often limited by the slowest rate in the MRR series (4 ms txop
> limit) having a slow rate in the series may affect performance even if
> it is never used by the hardware.
> 
> In my (not so scientific) tests max AP downstream throughput increases
> about 30-40% with the patch below (from 33.9 to 55.7 Mbit/s with HT20
> in noisy environment with 20 meters and a few walls between AP and
> client).
> 
> Of course, if all rates in the series are high then this patch has no effect.
I think it makes sense to rely less on on-chip MRR for fallback, but I
think to make this workable, we really should use the MRR table for
something, otherwise the rate control algorithm will take much longer to
adapt.
It's probably better to fix this properly after I'm done with my A-MPDU
rewrite, because then I can more easily push parts of the software
retransmission behaviour into minstrel_ht directly.

- Felix

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
  2010-07-26 17:44 ` [ath9k-devel] " Felix Fietkau
@ 2010-07-26 19:23   ` Björn Smedman
  2010-07-26 19:41     ` Felix Fietkau
  0 siblings, 1 reply; 7+ messages in thread
From: Björn Smedman @ 2010-07-26 19:23 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: ath9k-devel, linux-wireless

2010/7/26 Felix Fietkau <nbd@openwrt.org>:
> On 2010-07-26 7:10 PM, Björn Smedman wrote:
>> I think there are some (in theory) simple improvements that can be
>> done to the tx aggregation / rate control logic. A proof of concept of
>> one such improvement is provided below. Basically, it's a hack that
> I think it makes sense to rely less on on-chip MRR for fallback, but I
> think to make this workable, we really should use the MRR table for
> something, otherwise the rate control algorithm will take much longer to
> adapt.
> It's probably better to fix this properly after I'm done with my A-MPDU
> rewrite, because then I can more easily push parts of the software
> retransmission behaviour into minstrel_ht directly.

Sounds very reasonable. I'm sure you've thought of it but now that
it's fresh in my head it would be great if the new aggregation design
allowed us to experiment with stuff like this:

* The rate control logic treats the average aggregate length as a
measured independent variable, when in fact it depends heavily on the
rates selected (via the 4 ms txop limit).

* When tx is aggregated most rate control probe frames end up inside
aggregates and are never used for probing (effective probe frequency
is divided by average aggregate length).

* When setting up a hardware MRR for an aggregate the focus should be
on throughput (as explained earlier in this thread). But there are
situations when reliability is important: e.g. when a subframe in the
aggregate is about to expire (because of time or block ack window). It
may even be advantageous to tx the subframes that are about to expire
in their own aggregate with lower / more reliable bitrate?

* In many busy radio environments the packet success rate depends very
much on the protection method being used (none, cts-to-self or
rts-cts), often more so than on the bitrate itself. It would be
interesting to experiment with including the protection method in the
rate selection, i.e. to probe for the optimal protection method and
bitrate combination.

* In order to have the best possible rate control in very dynamic rf
environments it's important to keep the hardware queue short and
select rates as late as possible (to not introduce unnecessary delay
when selecting new rates). I have no idea how to do this but it would
be great if the tx queue could be kept long enough to never stall tx,
but no longer.

* If I understand correctly the Atheros hardware does not adjust the
rts / cts-to-self duration field when going through the MRR
(correct?). In that case it may be even more advantageous to use
software retry as much as possible when some form of protection is
enabled.

Looking forward to the new aggregation code!

/Björn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
  2010-07-26 19:23   ` Björn Smedman
@ 2010-07-26 19:41     ` Felix Fietkau
  2010-07-26 20:37       ` Björn Smedman
  2010-07-27  4:48       ` Ranga Rao Ravuri
  0 siblings, 2 replies; 7+ messages in thread
From: Felix Fietkau @ 2010-07-26 19:41 UTC (permalink / raw)
  To: Björn Smedman; +Cc: ath9k-devel, linux-wireless

On 2010-07-26 9:23 PM, Björn Smedman wrote:
> 2010/7/26 Felix Fietkau <nbd@openwrt.org>:
>> On 2010-07-26 7:10 PM, Björn Smedman wrote:
>>> I think there are some (in theory) simple improvements that can be
>>> done to the tx aggregation / rate control logic. A proof of concept of
>>> one such improvement is provided below. Basically, it's a hack that
>> I think it makes sense to rely less on on-chip MRR for fallback, but I
>> think to make this workable, we really should use the MRR table for
>> something, otherwise the rate control algorithm will take much longer to
>> adapt.
>> It's probably better to fix this properly after I'm done with my A-MPDU
>> rewrite, because then I can more easily push parts of the software
>> retransmission behaviour into minstrel_ht directly.
> 
> Sounds very reasonable. I'm sure you've thought of it but now that
> it's fresh in my head it would be great if the new aggregation design
> allowed us to experiment with stuff like this:
> 
> * The rate control logic treats the average aggregate length as a
> measured independent variable, when in fact it depends heavily on the
> rates selected (via the 4 ms txop limit).
Yes, with the new design maybe we could use the initial rate lookup only
for setting the sampling flag, and then doing a separate per-AMPDU
lookup, which properly takes the AMPDU length into account.

> * When tx is aggregated most rate control probe frames end up inside
> aggregates and are never used for probing (effective probe frequency
> is divided by average aggregate length).
Nope, a probing frame never ends up inside an aggregate. It's always
sent out as a single frame, which is why I had to make the decision
about sending a probing frame more complex in minstrel_ht, compared to
minstrel - the previous 10% stuff was limiting aggregation size.

> * When setting up a hardware MRR for an aggregate the focus should be
> on throughput (as explained earlier in this thread). But there are
> situations when reliability is important: e.g. when a subframe in the
> aggregate is about to expire (because of time or block ack window). It
> may even be advantageous to tx the subframes that are about to expire
> in their own aggregate with lower / more reliable bitrate?
Yes, that's what I was thinking as well. We should probably make this
decision based on the number of sw-retransmitted frames, and maybe
consider the offset of seqno vs baw_tail as well.

> * In many busy radio environments the packet success rate depends very
> much on the protection method being used (none, cts-to-self or
> rts-cts), often more so than on the bitrate itself. It would be
> interesting to experiment with including the protection method in the
> rate selection, i.e. to probe for the optimal protection method and
> bitrate combination.
Sounds good.

> * In order to have the best possible rate control in very dynamic rf
> environments it's important to keep the hardware queue short and
> select rates as late as possible (to not introduce unnecessary delay
> when selecting new rates). I have no idea how to do this but it would
> be great if the tx queue could be kept long enough to never stall tx,
> but no longer.
This would work with what I suggested above - per-AMPDU rate lookup.
With software scheduling that's easy to do, since we already restrict
the queue to max. 2 AMPDUs

> * If I understand correctly the Atheros hardware does not adjust the
> rts / cts-to-self duration field when going through the MRR
> (correct?). In that case it may be even more advantageous to use
> software retry as much as possible when some form of protection is
> enabled.
Not sure, but I think it does adjust the duration field according to the
rate, while transmitting.

> Looking forward to the new aggregation code!
That will still take some time, I recently came up with some better
design ideas, which require some larger changes to the code that I
already wrote.

- Felix

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
  2010-07-26 19:41     ` Felix Fietkau
@ 2010-07-26 20:37       ` Björn Smedman
  2010-07-26 20:41         ` Felix Fietkau
  2010-07-27  4:48       ` Ranga Rao Ravuri
  1 sibling, 1 reply; 7+ messages in thread
From: Björn Smedman @ 2010-07-26 20:37 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: ath9k-devel, linux-wireless

2010/7/26 Felix Fietkau <nbd@openwrt.org>:
> On 2010-07-26 9:23 PM, Björn Smedman wrote:
>> 2010/7/26 Felix Fietkau <nbd@openwrt.org>:
>> * When tx is aggregated most rate control probe frames end up inside
>> aggregates and are never used for probing (effective probe frequency
>> is divided by average aggregate length).
> Nope, a probing frame never ends up inside an aggregate. It's always
> sent out as a single frame, which is why I had to make the decision
> about sending a probing frame more complex in minstrel_ht, compared to
> minstrel - the previous 10% stuff was limiting aggregation size.

Ok, I must have jumped to conclusions. I looked quickly at the code
and had the impression that it only cared about the RATE_PROBE flag if
it was on the first subframe of the aggregate, and then I compared
debug output from rc and xmit like this:

root@OpenWrt:/sys/kernel/debug# cat
ieee80211/phy0/stations/00\:1e\:52\:c7\:cf\:63/rc_stats ; ca
t ath9k/phy0/xmit
type      rate     throughput  ewma prob   this prob  this
succ/attempt   success    attempts
HT20/LGI    MCS0        5.8       87.3       50.0          0(  0)
   48          54
HT20/LGI    MCS1       12.6       94.6      100.0          0(  0)
   46          48
HT20/LGI    MCS2       18.9       95.8      100.0          0(  0)
   52          73
HT20/LGI    MCS3       24.8       94.8      100.0          0(  0)
   53          62
HT20/LGI    MCS4       38.4       99.2      100.0          0(  0)
   45          55
HT20/LGI    MCS5       47.4       94.0      100.0          0(  0)
   56          72
HT20/LGI    MCS6       55.4       98.7      100.0          0(  0)
   60          78
HT20/LGI   PMCS7       56.2       88.8       66.6          0(  0)
  112         143
HT20/LGI    MCS8       10.8       81.4       50.0          0(  0)
   50          62
HT20/LGI    MCS9       23.6       90.4      100.0          0(  0)
   66          81
HT20/LGI    MCS10      30.6       79.0       50.0          0(  0)
   51          64
HT20/LGI    MCS11      50.1       99.2      100.0          0(  0)
   56          63
HT20/LGI    MCS12      60.1       80.6      100.0          0(  0)
  217         382
HT20/LGI    MCS13      66.6       70.6       50.0          0(  0)
 2440        3042
HT20/LGI  t MCS14      82.9       77.9       65.9          0(  0)
70446       86949
HT20/LGI T  MCS15      85.5       73.5       77.1        264(342)
31170       43240

Total packet count::    ideal 117093      lookaround 1322
Average A-MPDU length: 10.6
                            BE         BK        VI        VO

MPDUs Queued:              120          0         0       224
MPDUs Completed:           120          0         0       224
Aggregates:               7555          0         0         0
AMPDUs Queued:          118358          0         0        50
AMPDUs Completed:       118247          0         0        20
AMPDUs Retried:          15406          0         0       300
AMPDUs XRetried:            21          0         0        30
FIFO Underrun:               0          0         0         0
TXOP Exceeded:               0          0         0         0
TXTIMER Expiry:              0          0         0         0
DESC CFG Error:              0          0         0         0
DATA Underrun:               0          0         0         0
DELIM Underrun:              0          0         0         0

Rate control says 1322 lookaround (=probe frames?) but ath9k xmit says
only 120 + 224 MPDUs.

/Björn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
  2010-07-26 20:37       ` Björn Smedman
@ 2010-07-26 20:41         ` Felix Fietkau
  0 siblings, 0 replies; 7+ messages in thread
From: Felix Fietkau @ 2010-07-26 20:41 UTC (permalink / raw)
  To: Björn Smedman; +Cc: ath9k-devel, linux-wireless

On 2010-07-26 10:37 PM, Björn Smedman wrote:
> 2010/7/26 Felix Fietkau <nbd@openwrt.org>:
>> On 2010-07-26 9:23 PM, Björn Smedman wrote:
>>> 2010/7/26 Felix Fietkau <nbd@openwrt.org>:
>>> * When tx is aggregated most rate control probe frames end up inside
>>> aggregates and are never used for probing (effective probe frequency
>>> is divided by average aggregate length).
>> Nope, a probing frame never ends up inside an aggregate. It's always
>> sent out as a single frame, which is why I had to make the decision
>> about sending a probing frame more complex in minstrel_ht, compared to
>> minstrel - the previous 10% stuff was limiting aggregation size.
> 
> Ok, I must have jumped to conclusions. I looked quickly at the code
> and had the impression that it only cared about the RATE_PROBE flag if
> it was on the first subframe of the aggregate, and then I compared
> debug output from rc and xmit like this:
Oh, wait. It seems that you may be right after all. I think I was
remembering stuff from the wrong codebase again Well, at least what I
described is what I think the code should be doing ;)

- Felix

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
  2010-07-26 19:41     ` Felix Fietkau
  2010-07-26 20:37       ` Björn Smedman
@ 2010-07-27  4:48       ` Ranga Rao Ravuri
  1 sibling, 0 replies; 7+ messages in thread
From: Ranga Rao Ravuri @ 2010-07-27  4:48 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: Björn Smedman, ath9k-devel, linux-wireless



On 07/27/2010 01:11 AM, Felix Fietkau wrote:
> On 2010-07-26 9:23 PM, Björn Smedman wrote:
>> 2010/7/26 Felix Fietkau<nbd@openwrt.org>:
>>> On 2010-07-26 7:10 PM, Björn Smedman wrote:
>>>> I think there are some (in theory) simple improvements that can be
>>>> done to the tx aggregation / rate control logic. A proof of concept of
>>>> one such improvement is provided below. Basically, it's a hack that
>>> I think it makes sense to rely less on on-chip MRR for fallback, but I
>>> think to make this workable, we really should use the MRR table for
>>> something, otherwise the rate control algorithm will take much longer to
>>> adapt.
>>> It's probably better to fix this properly after I'm done with my A-MPDU
>>> rewrite, because then I can more easily push parts of the software
>>> retransmission behaviour into minstrel_ht directly.
>> Sounds very reasonable. I'm sure you've thought of it but now that
>> it's fresh in my head it would be great if the new aggregation design
>> allowed us to experiment with stuff like this:
>>
>> * The rate control logic treats the average aggregate length as a
>> measured independent variable, when in fact it depends heavily on the
>> rates selected (via the 4 ms txop limit).
> Yes, with the new design maybe we could use the initial rate lookup only
> for setting the sampling flag, and then doing a separate per-AMPDU
> lookup, which properly takes the AMPDU length into account.
>
>> * When tx is aggregated most rate control probe frames end up inside
>> aggregates and are never used for probing (effective probe frequency
>> is divided by average aggregate length).
> Nope, a probing frame never ends up inside an aggregate. It's always
> sent out as a single frame, which is why I had to make the decision
> about sending a probing frame more complex in minstrel_ht, compared to
> minstrel - the previous 10% stuff was limiting aggregation size.
>
>> * When setting up a hardware MRR for an aggregate the focus should be
>> on throughput (as explained earlier in this thread). But there are
>> situations when reliability is important: e.g. when a subframe in the
>> aggregate is about to expire (because of time or block ack window). It
>> may even be advantageous to tx the subframes that are about to expire
>> in their own aggregate with lower / more reliable bitrate?
> Yes, that's what I was thinking as well. We should probably make this
> decision based on the number of sw-retransmitted frames, and maybe
> consider the offset of seqno vs baw_tail as well.
>
>> * In many busy radio environments the packet success rate depends very
>> much on the protection method being used (none, cts-to-self or
>> rts-cts), often more so than on the bitrate itself. It would be
>> interesting to experiment with including the protection method in the
>> rate selection, i.e. to probe for the optimal protection method and
>> bitrate combination.
> Sounds good.
>
>> * In order to have the best possible rate control in very dynamic rf
>> environments it's important to keep the hardware queue short and
>> select rates as late as possible (to not introduce unnecessary delay
>> when selecting new rates). I have no idea how to do this but it would
>> be great if the tx queue could be kept long enough to never stall tx,
>> but no longer.
> This would work with what I suggested above - per-AMPDU rate lookup.
> With software scheduling that's easy to do, since we already restrict
> the queue to max. 2 AMPDUs
>
>> * If I understand correctly the Atheros hardware does not adjust the
>> rts / cts-to-self duration field when going through the MRR
>> (correct?). In that case it may be even more advantageous to use
>> software retry as much as possible when some form of protection is
>> enabled.
> Not sure, but I think it does adjust the duration field according to the
> rate, while transmitting.
[ranga] Yes it does. If you enable RTS on all rates, you would see 
different RTSs coming with different duration.
>> Looking forward to the new aggregation code!
> That will still take some time, I recently came up with some better
> design ideas, which require some larger changes to the code that I
> already wrote.
>
> - Felix
> _______________________________________________
> ath9k-devel mailing list
> ath9k-devel@lists.ath9k.org
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-07-27  4:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-26 17:10 [RFC] ath9k: improve aggregation throughput by using only first rate Björn Smedman
2010-07-26 17:44 ` [ath9k-devel] " Felix Fietkau
2010-07-26 19:23   ` Björn Smedman
2010-07-26 19:41     ` Felix Fietkau
2010-07-26 20:37       ` Björn Smedman
2010-07-26 20:41         ` Felix Fietkau
2010-07-27  4:48       ` Ranga Rao Ravuri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).