All of lore.kernel.org
 help / color / mirror / Atom feed
* Need to get msdu-chaining working.
@ 2014-02-26 23:38 Ben Greear
  2014-02-27  6:51 ` Michal Kazior
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Greear @ 2014-02-26 23:38 UTC (permalink / raw)
  To: ath10k


I'm using raw-rx mode in my firmware, and it's throughput is
extremely poor at higher speeds.  I see large numbers of warnings
about msdu_chaining, so I suspect this may be the reason.

Does anyone have any suggestions for what needs to be done
to handle these sorts of frames properly?

			/* FIXME: we do not support chaining yet.
			 * this needs investigation */
			if (msdu_chaining) {
				ath10k_warn("htt rx msdu_chaining is true\n");
				ath10k_htt_rx_free_msdu_chain(msdu_head);
				continue;
			}

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need to get msdu-chaining working.
  2014-02-26 23:38 Need to get msdu-chaining working Ben Greear
@ 2014-02-27  6:51 ` Michal Kazior
  2014-02-27 16:08   ` Ben Greear
  2014-02-28  1:18   ` Ben Greear
  0 siblings, 2 replies; 13+ messages in thread
From: Michal Kazior @ 2014-02-27  6:51 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

On 27 February 2014 00:38, Ben Greear <greearb@candelatech.com> wrote:
>
> I'm using raw-rx mode in my firmware, and it's throughput is
> extremely poor at higher speeds.  I see large numbers of warnings
> about msdu_chaining, so I suspect this may be the reason.

I assume this requires your custom FW?


> Does anyone have any suggestions for what needs to be done
> to handle these sorts of frames properly?
>
>                         /* FIXME: we do not support chaining yet.
>                          * this needs investigation */
>                         if (msdu_chaining) {
>                                 ath10k_warn("htt rx msdu_chaining is true\n");
>                                 ath10k_htt_rx_free_msdu_chain(msdu_head);
>                                 continue;
>                         }

From what I understand chained msdu is a msdu that hasn't fit into the
rx buffer and is split across the popped amsdu list. I suspect only
the first msdu in chain has the htt_rx_desc and all other have not
(this is what the current code does, but you'll need to verify that).

I would try to concatenate all msdus into one (lots of memcpy :( ) or
increase the HTT_RX_BUF_SIZE so that A-MSDU frames can fit into a
single buffer (hopefully FW/HW is capable of doing that).


Michał

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need to get msdu-chaining working.
  2014-02-27  6:51 ` Michal Kazior
@ 2014-02-27 16:08   ` Ben Greear
  2014-02-28  7:36     ` Michal Kazior
  2014-02-28  1:18   ` Ben Greear
  1 sibling, 1 reply; 13+ messages in thread
From: Ben Greear @ 2014-02-27 16:08 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k

On 02/26/2014 10:51 PM, Michal Kazior wrote:
> On 27 February 2014 00:38, Ben Greear <greearb@candelatech.com> wrote:
>>
>> I'm using raw-rx mode in my firmware, and it's throughput is
>> extremely poor at higher speeds.  I see large numbers of warnings
>> about msdu_chaining, so I suspect this may be the reason.
>
> I assume this requires your custom FW?

Yes.  I am not sure you can use raw tx with standard firmware (yet)?

Or at least, you cannot have tx cooked and rx raw.

>> Does anyone have any suggestions for what needs to be done
>> to handle these sorts of frames properly?
>>
>>                          /* FIXME: we do not support chaining yet.
>>                           * this needs investigation */
>>                          if (msdu_chaining) {
>>                                  ath10k_warn("htt rx msdu_chaining is true\n");
>>                                  ath10k_htt_rx_free_msdu_chain(msdu_head);
>>                                  continue;
>>                          }
>
>>From what I understand chained msdu is a msdu that hasn't fit into the
> rx buffer and is split across the popped amsdu list. I suspect only
> the first msdu in chain has the htt_rx_desc and all other have not
> (this is what the current code does, but you'll need to verify that).
>
> I would try to concatenate all msdus into one (lots of memcpy :( ) or
> increase the HTT_RX_BUF_SIZE so that A-MSDU frames can fit into a
> single buffer (hopefully FW/HW is capable of doing that).

I think in raw mode the hardware/firmware may not try to re-assemble the skb,
or also when no peer object can be found.  This might be an issue in monitor
mode as well, but I'm not certain about that.

I think there is a way to chain skb data chunks w/out doing the copying,
but I am not certain how that would be done properly.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need to get msdu-chaining working.
  2014-02-27  6:51 ` Michal Kazior
  2014-02-27 16:08   ` Ben Greear
@ 2014-02-28  1:18   ` Ben Greear
  2014-02-28  7:41     ` Michal Kazior
  1 sibling, 1 reply; 13+ messages in thread
From: Ben Greear @ 2014-02-28  1:18 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k

On 02/26/2014 10:51 PM, Michal Kazior wrote:

>>From what I understand chained msdu is a msdu that hasn't fit into the
> rx buffer and is split across the popped amsdu list. I suspect only
> the first msdu in chain has the htt_rx_desc and all other have not
> (this is what the current code does, but you'll need to verify that).
> 
> I would try to concatenate all msdus into one (lots of memcpy :( ) or
> increase the HTT_RX_BUF_SIZE so that A-MSDU frames can fit into a
> single buffer (hopefully FW/HW is capable of doing that).

I got this working, basically using memcpy approach.

Throughput is comparable what I was seeing on stock firmware,
but I do notice an issue:

The reported rx speed is almost always 6Mbps when I drive at high speeds.

(At low speeds I see rx rate reported at 1.3Gbps most of the time.)

I am pretty sure this is related to the msdu-chaining somehow.

There are lots of FIXME's in the ath10k_htt_rx_amsdu_pop method.

Any idea where the rx rate problem might lie?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need to get msdu-chaining working.
  2014-02-27 16:08   ` Ben Greear
@ 2014-02-28  7:36     ` Michal Kazior
  2014-03-03 22:13       ` Ben Greear
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Kazior @ 2014-02-28  7:36 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

On 27 February 2014 17:08, Ben Greear <greearb@candelatech.com> wrote:
> On 02/26/2014 10:51 PM, Michal Kazior wrote:
>>
>> On 27 February 2014 00:38, Ben Greear <greearb@candelatech.com> wrote:
>>>
>>>
>>> I'm using raw-rx mode in my firmware, and it's throughput is
>>> extremely poor at higher speeds.  I see large numbers of warnings
>>> about msdu_chaining, so I suspect this may be the reason.
>>
>>
>> I assume this requires your custom FW?
>
>
> Yes.  I am not sure you can use raw tx with standard firmware (yet)?
>
> Or at least, you cannot have tx cooked and rx raw.

I recall there were issues with different tx/rx modes. I think the
10.x branch suffers from this, but the FW 636 does not as it has
tx_encap_type vdev parameter.


>>> Does anyone have any suggestions for what needs to be done
>>> to handle these sorts of frames properly?
>>>
>>>                          /* FIXME: we do not support chaining yet.
>>>                           * this needs investigation */
>>>                          if (msdu_chaining) {
>>>                                  ath10k_warn("htt rx msdu_chaining is
>>> true\n");
>>>
>>> ath10k_htt_rx_free_msdu_chain(msdu_head);
>>>                                  continue;
>>>                          }
>>
>>
>>> From what I understand chained msdu is a msdu that hasn't fit into the
>>
>> rx buffer and is split across the popped amsdu list. I suspect only
>> the first msdu in chain has the htt_rx_desc and all other have not
>> (this is what the current code does, but you'll need to verify that).
>>
>> I would try to concatenate all msdus into one (lots of memcpy :( ) or
>> increase the HTT_RX_BUF_SIZE so that A-MSDU frames can fit into a
>> single buffer (hopefully FW/HW is capable of doing that).
>
>
> I think in raw mode the hardware/firmware may not try to re-assemble the
> skb,
> or also when no peer object can be found.  This might be an issue in monitor
> mode as well, but I'm not certain about that.
>
> I think there is a way to chain skb data chunks w/out doing the copying,
> but I am not certain how that would be done properly.

Did you try increasing the HTT_RX_BUF_SIZE and see if you get complete
frames instead of chained ones?


Michał

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need to get msdu-chaining working.
  2014-02-28  1:18   ` Ben Greear
@ 2014-02-28  7:41     ` Michal Kazior
  2014-02-28 19:12       ` Ben Greear
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Kazior @ 2014-02-28  7:41 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

On 28 February 2014 02:18, Ben Greear <greearb@candelatech.com> wrote:
> On 02/26/2014 10:51 PM, Michal Kazior wrote:
>
>>>From what I understand chained msdu is a msdu that hasn't fit into the
>> rx buffer and is split across the popped amsdu list. I suspect only
>> the first msdu in chain has the htt_rx_desc and all other have not
>> (this is what the current code does, but you'll need to verify that).
>>
>> I would try to concatenate all msdus into one (lots of memcpy :( ) or
>> increase the HTT_RX_BUF_SIZE so that A-MSDU frames can fit into a
>> single buffer (hopefully FW/HW is capable of doing that).
>
> I got this working, basically using memcpy approach.
>
> Throughput is comparable what I was seeing on stock firmware,
> but I do notice an issue:

Doing memcpy() is pretty bad, though. It'll hurt you, sooner or later,
or hurt other with less powerful host systems.

The problem is chained msdus suggest A-MSDU rx. This means you
re-assemble the aggregate, just to pass it to mac80211 which splits
the frame into 802.3 frames (again, memcpy). This can easily flush you
d-cache.


> The reported rx speed is almost always 6Mbps when I drive at high speeds.
>
> (At low speeds I see rx rate reported at 1.3Gbps most of the time.)
>
> I am pretty sure this is related to the msdu-chaining somehow.
>
> There are lots of FIXME's in the ath10k_htt_rx_amsdu_pop method.
>
> Any idea where the rx rate problem might lie?

Interesting. Rx rates are computed from the htt_rx_indication
structure, not popped frames from rx ring. I'm guessing FW/HW doesn't
fill in some stuff for raw rx at all. You probably can't do much about
it if it's HW. You can probably hack your way around if it's just FW
not copying some register values.


Michał

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need to get msdu-chaining working.
  2014-02-28  7:41     ` Michal Kazior
@ 2014-02-28 19:12       ` Ben Greear
  0 siblings, 0 replies; 13+ messages in thread
From: Ben Greear @ 2014-02-28 19:12 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k

On 02/27/2014 11:41 PM, Michal Kazior wrote:
> On 28 February 2014 02:18, Ben Greear <greearb@candelatech.com> wrote:
>> On 02/26/2014 10:51 PM, Michal Kazior wrote:
>>
>>> >From what I understand chained msdu is a msdu that hasn't fit into the
>>> rx buffer and is split across the popped amsdu list. I suspect only
>>> the first msdu in chain has the htt_rx_desc and all other have not
>>> (this is what the current code does, but you'll need to verify that).
>>>
>>> I would try to concatenate all msdus into one (lots of memcpy :( ) or
>>> increase the HTT_RX_BUF_SIZE so that A-MSDU frames can fit into a
>>> single buffer (hopefully FW/HW is capable of doing that).
>>
>> I got this working, basically using memcpy approach.
>>
>> Throughput is comparable what I was seeing on stock firmware,
>> but I do notice an issue:
> 
> Doing memcpy() is pretty bad, though. It'll hurt you, sooner or later,
> or hurt other with less powerful host systems.
> 
> The problem is chained msdus suggest A-MSDU rx. This means you
> re-assemble the aggregate, just to pass it to mac80211 which splits
> the frame into 802.3 frames (again, memcpy). This can easily flush you
> d-cache.

That is interesting...maybe a better API would allow us to pass
a-msdu chains directly to mac80211?


>> The reported rx speed is almost always 6Mbps when I drive at high speeds.
>>
>> (At low speeds I see rx rate reported at 1.3Gbps most of the time.)
>>
>> I am pretty sure this is related to the msdu-chaining somehow.
>>
>> There are lots of FIXME's in the ath10k_htt_rx_amsdu_pop method.
>>
>> Any idea where the rx rate problem might lie?
> 
> Interesting. Rx rates are computed from the htt_rx_indication
> structure, not popped frames from rx ring. I'm guessing FW/HW doesn't
> fill in some stuff for raw rx at all. You probably can't do much about
> it if it's HW. You can probably hack your way around if it's just FW
> not copying some register values.

Ok, I'll go look at this in more detail.

I do see expected rates at lower speeds (where a-msdu does not happen),
so it's not *just* a problem with raw rx mode.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need to get msdu-chaining working.
  2014-02-28  7:36     ` Michal Kazior
@ 2014-03-03 22:13       ` Ben Greear
  2014-03-04  7:36         ` Michal Kazior
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Greear @ 2014-03-03 22:13 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k

On 02/27/2014 11:36 PM, Michal Kazior wrote:

>>> rx buffer and is split across the popped amsdu list. I suspect only
>>> the first msdu in chain has the htt_rx_desc and all other have not
>>> (this is what the current code does, but you'll need to verify that).
>>>
>>> I would try to concatenate all msdus into one (lots of memcpy :( ) or
>>> increase the HTT_RX_BUF_SIZE so that A-MSDU frames can fit into a
>>> single buffer (hopefully FW/HW is capable of doing that).

Just FYI:  At least on my firmware in raw rx mode, increasing the
HTT_RX_BUF_SIZE (to 4 * 1920) and at least some chaining remains.
Performance did not change noticeably.  I'm using fairly powerful
core i7 processor systems, so maybe the memcpy doesn't
make enough difference to notice in my tests.

I did not put any effort into figuring out why.

I'm currently getting about 540Mbps upload TCP goodput,
and only 420Mbps download TCP goodput.  Not sure why
the discrepancy, but perhaps the rx raw performance
is worse for a variety of reasons.  My firmware changes
to support multiple stations to same AP may also be slowing
things down, though these numbers are from  a single station
test...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need to get msdu-chaining working.
  2014-03-03 22:13       ` Ben Greear
@ 2014-03-04  7:36         ` Michal Kazior
  2014-03-04 16:59           ` Ben Greear
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Kazior @ 2014-03-04  7:36 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

On 3 March 2014 23:13, Ben Greear <greearb@candelatech.com> wrote:
> On 02/27/2014 11:36 PM, Michal Kazior wrote:
>
>>>> rx buffer and is split across the popped amsdu list. I suspect only
>>>> the first msdu in chain has the htt_rx_desc and all other have not
>>>> (this is what the current code does, but you'll need to verify that).
>>>>
>>>> I would try to concatenate all msdus into one (lots of memcpy :( ) or
>>>> increase the HTT_RX_BUF_SIZE so that A-MSDU frames can fit into a
>>>> single buffer (hopefully FW/HW is capable of doing that).
>
> Just FYI:  At least on my firmware in raw rx mode, increasing the
> HTT_RX_BUF_SIZE (to 4 * 1920) and at least some chaining remains.
> Performance did not change noticeably.  I'm using fairly powerful
> core i7 processor systems, so maybe the memcpy doesn't
> make enough difference to notice in my tests.
>
> I did not put any effort into figuring out why.

Getting rid of memcpy() was a huge performance win for AP135 and its
MIPS processor.


> I'm currently getting about 540Mbps upload TCP goodput,
> and only 420Mbps download TCP goodput.  Not sure why
> the discrepancy, but perhaps the rx raw performance
> is worse for a variety of reasons.  My firmware changes
> to support multiple stations to same AP may also be slowing
> things down, though these numbers are from  a single station
> test...

Hmm, I assume you test this without any bridging. It's probably going
to be a little slower due to tx timings being directly visible to the
TCP subsystem because both TCP and ath10k are locally on the same
machine. You could try moving the actual TCP endpoints behind bridges.

Or you're actually seeing the memcpy() at work...

Did you try to test performance on vanilla driver/firmware?


Michał

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need to get msdu-chaining working.
  2014-03-04  7:36         ` Michal Kazior
@ 2014-03-04 16:59           ` Ben Greear
  2014-03-05  8:09             ` Michal Kazior
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Greear @ 2014-03-04 16:59 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k

On 03/03/2014 11:36 PM, Michal Kazior wrote:
> On 3 March 2014 23:13, Ben Greear <greearb@candelatech.com> wrote:
>> On 02/27/2014 11:36 PM, Michal Kazior wrote:
>>
>>>>> rx buffer and is split across the popped amsdu list. I suspect only
>>>>> the first msdu in chain has the htt_rx_desc and all other have not
>>>>> (this is what the current code does, but you'll need to verify that).
>>>>>
>>>>> I would try to concatenate all msdus into one (lots of memcpy :( ) or
>>>>> increase the HTT_RX_BUF_SIZE so that A-MSDU frames can fit into a
>>>>> single buffer (hopefully FW/HW is capable of doing that).
>>
>> Just FYI:  At least on my firmware in raw rx mode, increasing the
>> HTT_RX_BUF_SIZE (to 4 * 1920) and at least some chaining remains.
>> Performance did not change noticeably.  I'm using fairly powerful
>> core i7 processor systems, so maybe the memcpy doesn't
>> make enough difference to notice in my tests.
>>
>> I did not put any effort into figuring out why.
> 
> Getting rid of memcpy() was a huge performance win for AP135 and its
> MIPS processor.

No doubt, but at this point, my problems appear to lie elsewhere.

>> I'm currently getting about 540Mbps upload TCP goodput,
>> and only 420Mbps download TCP goodput.  Not sure why
>> the discrepancy, but perhaps the rx raw performance
>> is worse for a variety of reasons.  My firmware changes
>> to support multiple stations to same AP may also be slowing
>> things down, though these numbers are from  a single station
>> test...
> 
> Hmm, I assume you test this without any bridging. It's probably going
> to be a little slower due to tx timings being directly visible to the
> TCP subsystem because both TCP and ath10k are locally on the same
> machine. You could try moving the actual TCP endpoints behind bridges.
> 
> Or you're actually seeing the memcpy() at work...
> 
> Did you try to test performance on vanilla driver/firmware?

I have used vanilla firmware on AP for all tests, because my firmware
will not do AP mode on WLE900VX for some reason.  Using my slightly modified
driver has no noticeable difference (and it now works virtually identical
to upstream code when not using my modified firmware).

For station machine, vanilla firmware performs no better than my firmware,
and I see the same issue where upload is 150Mbps or so faster than
download.

I tried putting TCP/UDP endpoints on AP, and using AP as bridge, and both
cases have similar throughput.  Interestingly to me, UDP and TCP have similar
thoughput, so it is unlikely that we are actually hitting limits on
the spectrum (otherwise, UDP would do better because it has no ACK packets
and generally runs much faster total throughput on wifi in my experience
with /a/b/g/n NICs).

With vanilla firmware, there should be little to no amsdu packets
(I assume), so it is unlikely to be related to memcpy.  perf top
shows no obvious hot spots in download test (running about 380Mbps
in this case):

  2.24%  [kernel]                      [k] swiotlb_tbl_unmap_single
  1.88%  [kernel]                      [k] do_raw_spin_lock
  1.88%  [kernel]                      [k] ioread32
  1.84%  [kernel]                      [k] tcp_packet
  1.53%  [mac80211]                    [k] ieee80211_rx_handlers
  1.31%  [kernel]                      [k] copy_user_generic_string
  1.19%  [ath10k_core]                 [k] ath10k_htt_rx_amsdu.isra.29
  1.14%  btserver                      [.] do_big_while()
  1.09%  [kernel]                      [k] _raw_spin_lock_irqsave

What throughputs are you seeing, and what NICs are you using for AP
and stations?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need to get msdu-chaining working.
  2014-03-04 16:59           ` Ben Greear
@ 2014-03-05  8:09             ` Michal Kazior
  2014-03-05 20:50               ` Ben Greear
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Kazior @ 2014-03-05  8:09 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

On 4 March 2014 17:59, Ben Greear <greearb@candelatech.com> wrote:
> On 03/03/2014 11:36 PM, Michal Kazior wrote:
>> On 3 March 2014 23:13, Ben Greear <greearb@candelatech.com> wrote:
>>> On 02/27/2014 11:36 PM, Michal Kazior wrote:
>>>
>>>>>> rx buffer and is split across the popped amsdu list. I suspect only
>>>>>> the first msdu in chain has the htt_rx_desc and all other have not
>>>>>> (this is what the current code does, but you'll need to verify that).
>>>>>>
>>>>>> I would try to concatenate all msdus into one (lots of memcpy :( ) or
>>>>>> increase the HTT_RX_BUF_SIZE so that A-MSDU frames can fit into a
>>>>>> single buffer (hopefully FW/HW is capable of doing that).
>>>
>>> Just FYI:  At least on my firmware in raw rx mode, increasing the
>>> HTT_RX_BUF_SIZE (to 4 * 1920) and at least some chaining remains.
>>> Performance did not change noticeably.  I'm using fairly powerful
>>> core i7 processor systems, so maybe the memcpy doesn't
>>> make enough difference to notice in my tests.
>>>
>>> I did not put any effort into figuring out why.
>>
>> Getting rid of memcpy() was a huge performance win for AP135 and its
>> MIPS processor.
>
> No doubt, but at this point, my problems appear to lie elsewhere.
>
>>> I'm currently getting about 540Mbps upload TCP goodput,
>>> and only 420Mbps download TCP goodput.  Not sure why
>>> the discrepancy, but perhaps the rx raw performance
>>> is worse for a variety of reasons.  My firmware changes
>>> to support multiple stations to same AP may also be slowing
>>> things down, though these numbers are from  a single station
>>> test...
>>
>> Hmm, I assume you test this without any bridging. It's probably going
>> to be a little slower due to tx timings being directly visible to the
>> TCP subsystem because both TCP and ath10k are locally on the same
>> machine. You could try moving the actual TCP endpoints behind bridges.
>>
>> Or you're actually seeing the memcpy() at work...
>>
>> Did you try to test performance on vanilla driver/firmware?
>
> I have used vanilla firmware on AP for all tests, because my firmware
> will not do AP mode on WLE900VX for some reason.  Using my slightly modified
> driver has no noticeable difference (and it now works virtually identical
> to upstream code when not using my modified firmware).
>
> For station machine, vanilla firmware performs no better than my firmware,
> and I see the same issue where upload is 150Mbps or so faster than
> download.
>
> I tried putting TCP/UDP endpoints on AP, and using AP as bridge, and both
> cases have similar throughput.  Interestingly to me, UDP and TCP have similar
> thoughput, so it is unlikely that we are actually hitting limits on
> the spectrum (otherwise, UDP would do better because it has no ACK packets
> and generally runs much faster total throughput on wifi in my experience
> with /a/b/g/n NICs).
>
> With vanilla firmware, there should be little to no amsdu packets
> (I assume), so it is unlikely to be related to memcpy.  perf top

Incorrect. There's actually quite a lot of amsdu with vanilla firmware
(keep in mind I refer to nwifi rx). In the early days ath10k was
stiching msdus back just to be teared down again in mac80211 and this
was hitting performance.


> shows no obvious hot spots in download test (running about 380Mbps
> in this case):
>
>   2.24%  [kernel]                      [k] swiotlb_tbl_unmap_single
>   1.88%  [kernel]                      [k] do_raw_spin_lock
>   1.88%  [kernel]                      [k] ioread32
>   1.84%  [kernel]                      [k] tcp_packet
>   1.53%  [mac80211]                    [k] ieee80211_rx_handlers
>   1.31%  [kernel]                      [k] copy_user_generic_string
>   1.19%  [ath10k_core]                 [k] ath10k_htt_rx_amsdu.isra.29
>   1.14%  btserver                      [.] do_big_while()
>   1.09%  [kernel]                      [k] _raw_spin_lock_irqsave
>
> What throughputs are you seeing, and what NICs are you using for AP
> and stations?

With current master branch CUS223-CUS223 should get over 800mbps in
udp tx/rx and 700mbps in tcp tx/rx with cabled setup on a poor AP135.
Since AP135's CPU doesn't have any time to idle around I expect x86 to
perform better and I don't think OTA should be terribly slower in a
reasonably clean environment.


Michał

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need to get msdu-chaining working.
  2014-03-05  8:09             ` Michal Kazior
@ 2014-03-05 20:50               ` Ben Greear
  2014-03-06  7:36                 ` Michal Kazior
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Greear @ 2014-03-05 20:50 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k

On 03/05/2014 12:09 AM, Michal Kazior wrote:

> Incorrect. There's actually quite a lot of amsdu with vanilla firmware
> (keep in mind I refer to nwifi rx). In the early days ath10k was
> stiching msdus back just to be teared down again in mac80211 and this
> was hitting performance.

I guess with native wifi you do not see any chained amsdu
frames at ath10k now?  Out of curiosity, how do amsdu frames appear
in ath10k?  Does the firmware take care of everything
before it ever gets to the host for native wifi?

>> What throughputs are you seeing, and what NICs are you using for AP
>> and stations?
> 
> With current master branch CUS223-CUS223 should get over 800mbps in
> udp tx/rx and 700mbps in tcp tx/rx with cabled setup on a poor AP135.
> Since AP135's CPU doesn't have any time to idle around I expect x86 to
> perform better and I don't think OTA should be terribly slower in a
> reasonably clean environment.

Over-the-air, CUS223 wasn't any better than WLE900VX, so probably not
an issue with the NICs.  Arranging the antennas a bit gave me a bit of
a boost, but still only around 600Mbps TCP upload, and download was
still in the 400Mbps range.  There are no APs in scan range in the
lower end of 5Ghz where I am running, so it should be pretty clean.

I cabled WLE900VX NICs up with 30dB attenuation, and I do see about 800Mbps
UDP upload, 700Mbps TCP upload, and 460Mbps TCP download.  Download still
sucks compared to upload for me.

I switched functions:  AP logic moved to STA machine, vice-versa.
The mis-match between upload follows the stations/APs, not the hardware.

(In this test, both machines are using stock 467.1-1 firmware.)

Maybe something with how I am configuring APs interfaces is the problem.

Can you verify that you see similar upload and download speeds for your
test scenario?

Also, at 800Mbps UDP, perf top main offender is a 17% raw spin lock,
but in general my system was not working too hard as far as I can tell.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Need to get msdu-chaining working.
  2014-03-05 20:50               ` Ben Greear
@ 2014-03-06  7:36                 ` Michal Kazior
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Kazior @ 2014-03-06  7:36 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

On 5 March 2014 21:50, Ben Greear <greearb@candelatech.com> wrote:
> On 03/05/2014 12:09 AM, Michal Kazior wrote:
>
>> Incorrect. There's actually quite a lot of amsdu with vanilla firmware
>> (keep in mind I refer to nwifi rx). In the early days ath10k was
>> stiching msdus back just to be teared down again in mac80211 and this
>> was hitting performance.
>
> I guess with native wifi you do not see any chained amsdu
> frames at ath10k now?  Out of curiosity, how do amsdu frames appear
> in ath10k?  Does the firmware take care of everything
> before it ever gets to the host for native wifi?

For nwifi amsdu you get linked skbuffs. Each skbuff is a 802.11
non-qos data frame. ath10k fixes the header (using decapped header in
rx_hdr_status) and submits each subframe separately as an independent
802.11 frame. This required a small change to be done in mac80211
(RX_FLAG_AMSDU_MORE) because all subframes share the same sequence
number and it conflicted with retransmission/deduplication
countermeasures.


>>> What throughputs are you seeing, and what NICs are you using for AP
>>> and stations?
>>
>> With current master branch CUS223-CUS223 should get over 800mbps in
>> udp tx/rx and 700mbps in tcp tx/rx with cabled setup on a poor AP135.
>> Since AP135's CPU doesn't have any time to idle around I expect x86 to
>> perform better and I don't think OTA should be terribly slower in a
>> reasonably clean environment.
>
> Over-the-air, CUS223 wasn't any better than WLE900VX, so probably not
> an issue with the NICs.  Arranging the antennas a bit gave me a bit of
> a boost, but still only around 600Mbps TCP upload, and download was
> still in the 400Mbps range.  There are no APs in scan range in the
> lower end of 5Ghz where I am running, so it should be pretty clean.
>
> I cabled WLE900VX NICs up with 30dB attenuation, and I do see about 800Mbps
> UDP upload, 700Mbps TCP upload, and 460Mbps TCP download.  Download still
> sucks compared to upload for me.
>
> I switched functions:  AP logic moved to STA machine, vice-versa.
> The mis-match between upload follows the stations/APs, not the hardware.
>
> (In this test, both machines are using stock 467.1-1 firmware.)
>
> Maybe something with how I am configuring APs interfaces is the problem.
>
> Can you verify that you see similar upload and download speeds for your
> test scenario?

I'm busy so I can't investigate this much, sorry.

Like I mentioned in my other mail - on AP135 UDP is 800mbps, TCP is
700mbps, both ways, and the bottleneck is AP135's CPU (MIPS 74Kc @
720MHz).

You might try to change your qdiscs to pfifo and check if it makes any
difference for you.


> Also, at 800Mbps UDP, perf top main offender is a 17% raw spin lock,
> but in general my system was not working too hard as far as I can tell.

To be honest I haven't tested x86-x86 performance for some time now.
I've been mainly focused on AP135. It bottlenecks elsewhere (it is
non-preemptive so spinlocks work completely different).


Michał

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-03-06  7:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26 23:38 Need to get msdu-chaining working Ben Greear
2014-02-27  6:51 ` Michal Kazior
2014-02-27 16:08   ` Ben Greear
2014-02-28  7:36     ` Michal Kazior
2014-03-03 22:13       ` Ben Greear
2014-03-04  7:36         ` Michal Kazior
2014-03-04 16:59           ` Ben Greear
2014-03-05  8:09             ` Michal Kazior
2014-03-05 20:50               ` Ben Greear
2014-03-06  7:36                 ` Michal Kazior
2014-02-28  1:18   ` Ben Greear
2014-02-28  7:41     ` Michal Kazior
2014-02-28 19:12       ` Ben Greear

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.