All of lore.kernel.org
 help / color / mirror / Atom feed
* fq_codel_drop vs a udp flood
@ 2016-05-01  3:41 Dave Taht
  2016-05-01  4:46 ` [Make-wifi-fast] " Jonathan Morton
                   ` (3 more replies)
  0 siblings, 4 replies; 108+ messages in thread
From: Dave Taht @ 2016-05-01  3:41 UTC (permalink / raw)
  To: ath10k, codel, make-wifi-fast

There were a few things on this thread that went by, and I wasn't on
the ath10k list

(https://www.mail-archive.com/ath10k@lists.infradead.org/msg04461.html)

first up, udp flood...

>>> From: ath10k <ath10k-boun...@lists.infradead.org> on behalf of Roman
>>> Yeryomin <leroi.li...@gmail.com>
>>> Sent: Friday, April 8, 2016 8:14 PM
>>> To: ath10k@lists.infradead.org
>>> Subject: ath10k performance, master branch from 20160407
>>>
>>> Hello!
>>>
>>> I've seen performance patches were commited so I've decided to give it
>>> a try (using 4.1 kernel and backports).
>>> The results are quite disappointing: TCP download (client pov) dropped
>>> from 750Mbps to ~550 and UDP shows completely weird behavour - if
>>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives
>>> 250Mbps, before (latest official backports release from January) I was
>>> able to get 900Mbps.
>>> Hardware is basically ap152 + qca988x 3x3.
>>> When running perf top I see that fq_codel_drop eats a lot of cpu.
>>> Here is the output when running iperf3 UDP test:
>>>
>>>     45.78%  [kernel]       [k] fq_codel_drop
>>>      3.05%  [kernel]       [k] ag71xx_poll
>>>      2.18%  [kernel]       [k] skb_release_data
>>>      2.01%  [kernel]       [k] r4k_dma_cache_inv

The udp flood behavior is not "weird".  The test is wrong. It is so filling
the local queue as to dramatically exceed the bandwidth on the link.

The size of the local queue has exceeded anything rational, gentle
tcp-friendly methods have failed, we're out of configured queue space,
 and as a last ditch move, fq_codel_drop is attempting to reduce the
backlog via brute force.

Approaches:

0) Fix the test

The udp flood test should seek an operating point roughly equal to
the bandwidth of the link, to where there is near zero queuing delay,
and nearly 100% utilization.

There are several well known methods for an endpoint to seek
equilibrium, - filling the pipe and not the queue - notably the ones
outlined in this:

http://ee.lbl.gov/papers/congavoid.pdf

are a good starting point for further research. :)

Now, a unicast flood test is useful for figuring out how many packets
can fit in a link (both large and small), and tweaking the cpu (or
running a box out of memory).

However -

I have seen a lot of udp flood tests that are constructed badly.

Measuring time to *send* X packets without counting the queue length
in the test is one. This was iperf3 what options, exactly? Running
locally or via a test client connected via ethernet? (so at local cpu
speeds, rather than the network ingress speed?)

Simple test of your test: if your udp flood test tool reports a better
result with a 10000 packet local queue than a 1000 packet one, it's
broken.

A "Good" udp flood test merely counts the number of *received* packets
and bytes over some (set of) intervals, gradually ramping up until it
sees no further improvements. A better one might shock the system and
try to measure the rate controller or aggregator as well, AND count
and graph packet loss over time, etc.

and then there's side effects like running out of cpu on an artificial
test. Still, in the real world, udp floods exist, and we can rip some
of the cpu cost out of fq_codel drop.

fq_codel_drop looks through 1024 queues in the mainline version and
4096 in this. [4] That's *expensive*.

1) fq_codel_drop should probably bump up the codel count on every drop
to give the main portion of the algorithm a higher drop frequency,
faster.

Won't hurt, but won't help much in the face of a large disparity of
input vs output rates for a fairly long time. A smaller disparity
(like with gigE feeding 800mbit wifi) will naturally have the main
part of the algo kick in sooner.

2) fq_codel_drop can simply taildrop. That would cut the cpu cost by
quite a lot and make the udp flood test easier to "pass".

It does little in the real world to actually shoot at the offending
flow and a serious flood will end up hurting flows behaving sanely.

I favor this option as it is cheap and more or less what happened in
the pre-fq_codeled world. Coupling it with 1 above doesn't quite work
as well as you might want, either, but might help.

3) Steering - you could store the size and ptr to the biggest flow
of all flows and drop from head of that.

Or to give more friendly behavior store the top 3 and circulate between
them.

This incurs an ongoing cpu cost on every queue/dequeue of a packet.

4) Do it more per-station airtime fairness (find the station with the
biggest backlog) and have a smaller number of fq_codel queues per
station. For most purposes, honestly, 64 queues per station sounds
like plenty at the moment.

...

I am painfully aware we have a long way to go to get this right, but
http://blog.cerowrt.org/post/rtt_fair_on_wifi/ is the endgame for
normal traffic....

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] fq_codel_drop vs a udp flood
  2016-05-01  3:41 fq_codel_drop vs a udp flood Dave Taht
@ 2016-05-01  4:46 ` Jonathan Morton
  2016-05-01  5:08 ` Ben Greear
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 108+ messages in thread
From: Jonathan Morton @ 2016-05-01  4:46 UTC (permalink / raw)
  To: Dave Taht; +Cc: make-wifi-fast, codel, ath10k


> On 1 May, 2016, at 06:41, Dave Taht <dave.taht@gmail.com> wrote:
> 
> fq_codel_drop looks through 1024 queues in the mainline version and
> 4096 in this. [4] That's *expensive*.

Cake originally inherited this behaviour.

Some time ago, I changed it to search only the queues in the active lists.  In a simple case like this, that means it only has to consider 1-2 queues each time, which is a huge improvement and sufficient for any normal traffic situation.

The theoretical worst-case performance (eg. under a DDoS) remains poor.  To fix that, I think we would need to maintain a max-heap of queue lengths (which is a generalisation of the “keep 3 longest queues” idea).  On the upside, this heap would only need to be maintained when drops actually occur, and would remain fast in simple cases where only one or two queues are disproportionately long.

It’s just as easy to head-drop as tail-drop, once you decide to be queue-fair (which *is* desirable to make DoS attacks harder).  It is the queue-fairness which is CPU intensive (and can be optimised, see above).  Head-drop is theoretically superior, since it gets the congestion signal to the receiver faster.  An unresponsive flow won’t care either way, but a responsive one will.

 - Jonathan Morton


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: fq_codel_drop vs a udp flood
  2016-05-01  3:41 fq_codel_drop vs a udp flood Dave Taht
  2016-05-01  4:46 ` [Make-wifi-fast] " Jonathan Morton
@ 2016-05-01  5:08 ` Ben Greear
  2016-05-01  5:23   ` Dave Taht
  2016-05-01 17:59 ` [Codel] " Eric Dumazet
  2016-05-02 13:47 ` [Make-wifi-fast] " Roman Yeryomin
  3 siblings, 1 reply; 108+ messages in thread
From: Ben Greear @ 2016-05-01  5:08 UTC (permalink / raw)
  To: Dave Taht, ath10k, codel, make-wifi-fast



On 04/30/2016 08:41 PM, Dave Taht wrote:
> There were a few things on this thread that went by, and I wasn't on
> the ath10k list
>
> (https://www.mail-archive.com/ath10k@lists.infradead.org/msg04461.html)
>
> first up, udp flood...
>
>>>> From: ath10k <ath10k-boun...@lists.infradead.org> on behalf of Roman
>>>> Yeryomin <leroi.li...@gmail.com>
>>>> Sent: Friday, April 8, 2016 8:14 PM
>>>> To: ath10k@lists.infradead.org
>>>> Subject: ath10k performance, master branch from 20160407
>>>>
>>>> Hello!
>>>>
>>>> I've seen performance patches were commited so I've decided to give it
>>>> a try (using 4.1 kernel and backports).
>>>> The results are quite disappointing: TCP download (client pov) dropped
>>>> from 750Mbps to ~550 and UDP shows completely weird behavour - if
>>>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives
>>>> 250Mbps, before (latest official backports release from January) I was
>>>> able to get 900Mbps.
>>>> Hardware is basically ap152 + qca988x 3x3.
>>>> When running perf top I see that fq_codel_drop eats a lot of cpu.
>>>> Here is the output when running iperf3 UDP test:
>>>>
>>>>      45.78%  [kernel]       [k] fq_codel_drop
>>>>       3.05%  [kernel]       [k] ag71xx_poll
>>>>       2.18%  [kernel]       [k] skb_release_data
>>>>       2.01%  [kernel]       [k] r4k_dma_cache_inv
>
> The udp flood behavior is not "weird".  The test is wrong. It is so filling
> the local queue as to dramatically exceed the bandwidth on the link.

It would be nice if you could provide backpressure so that you could
simply select on the udp socket and use that to know when you can send
more frames??

Any idea how that works with codel?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: fq_codel_drop vs a udp flood
  2016-05-01  5:08 ` Ben Greear
@ 2016-05-01  5:23   ` Dave Taht
  2016-05-01 14:47     ` [Make-wifi-fast] " dpreed
  0 siblings, 1 reply; 108+ messages in thread
From: Dave Taht @ 2016-05-01  5:23 UTC (permalink / raw)
  To: Ben Greear; +Cc: make-wifi-fast, codel, ath10k

On Sat, Apr 30, 2016 at 10:08 PM, Ben Greear <greearb@candelatech.com> wrote:
>
>
> On 04/30/2016 08:41 PM, Dave Taht wrote:
>>
>> There were a few things on this thread that went by, and I wasn't on
>> the ath10k list
>>
>> (https://www.mail-archive.com/ath10k@lists.infradead.org/msg04461.html)
>>
>> first up, udp flood...
>>
>>>>> From: ath10k <ath10k-boun...@lists.infradead.org> on behalf of Roman
>>>>> Yeryomin <leroi.li...@gmail.com>
>>>>> Sent: Friday, April 8, 2016 8:14 PM
>>>>> To: ath10k@lists.infradead.org
>>>>> Subject: ath10k performance, master branch from 20160407
>>>>>
>>>>> Hello!
>>>>>
>>>>> I've seen performance patches were commited so I've decided to give it
>>>>> a try (using 4.1 kernel and backports).
>>>>> The results are quite disappointing: TCP download (client pov) dropped
>>>>> from 750Mbps to ~550 and UDP shows completely weird behavour - if
>>>>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives
>>>>> 250Mbps, before (latest official backports release from January) I was
>>>>> able to get 900Mbps.
>>>>> Hardware is basically ap152 + qca988x 3x3.
>>>>> When running perf top I see that fq_codel_drop eats a lot of cpu.
>>>>> Here is the output when running iperf3 UDP test:
>>>>>
>>>>>      45.78%  [kernel]       [k] fq_codel_drop
>>>>>       3.05%  [kernel]       [k] ag71xx_poll
>>>>>       2.18%  [kernel]       [k] skb_release_data
>>>>>       2.01%  [kernel]       [k] r4k_dma_cache_inv
>>
>>
>> The udp flood behavior is not "weird".  The test is wrong. It is so
>> filling
>> the local queue as to dramatically exceed the bandwidth on the link.
>
>
> It would be nice if you could provide backpressure so that you could
> simply select on the udp socket and use that to know when you can send
> more frames??

The qdisc version returns  NET_XMIT_CN to the upper layers of the
stack in the case
where the dropped packet's flow = the ingress packet's flow, but that
is after the
exhaustive search...

I don't know what effect (if any) that had on udp sockets. Hmm... will
look. Eric would "just know".

That might provide more backpressure in the local scenario. SO_SND_BUF
should interact with this stuff in some sane way...

... but over the wire from a test driver box elsewhere, tho, aside
from ethernet flow control itself, where enabled, no.

... but in that case you have a much lower inbound/outbound
performance disparity in the general case to start with... which can
still be quite high...

>
> Any idea how that works with codel?

Beautifully.

For responsive TCP flows. It immediately reduces the window without a RTT.

> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] fq_codel_drop vs a udp flood
  2016-05-01  5:23   ` Dave Taht
@ 2016-05-01 14:47     ` dpreed
  2016-05-02 14:03       ` Roman Yeryomin
  0 siblings, 1 reply; 108+ messages in thread
From: dpreed @ 2016-05-01 14:47 UTC (permalink / raw)
  To: Dave Taht; +Cc: make-wifi-fast, Ben Greear, codel, ath10k

Maybe I missed something, but why is it important to optimize for a UDP flood?

A general observation of control theory is that there is almost always an adversarial strategy that will destroy any control regime. Sometimes one has to invoke an "oracle" that knows the state of the control system at all times to get there.

So a handwave is that *there is always a DDoS that will work* no matter how clever you are.

And the corollary is illustrated by the TSA. If you can't anticipate all possible attacks, it is not clearly better to just congest the whole system at all times with controls that can't possibly solve all possible attacks - i.e. Security Theater. We don't want "anti-DDoS theater" I don't think.

There is an alternative mechanism that has been effective at dealing with DDoS in general - track the disruption back to the source and kill it.  (this is what the end-to-end argument would be: don't try to solve a fundamentally end-to-end problem, DDoS, solely in the network [switches], since you have to solve it at the edges anyway. Just include in the network things that will help you solve it at the edges - traceback tools that work fast and targeted shutdown of sources).

I don't happen to know of a "normal" application that benefits from UDP flooding - not even "gossip protocols" do that!

In context, then, let's not focus on UDP flood performance (or any other "extreme case" that just seems fun to work on in a research paper because it is easy to state compared to the real world) too much.

I know that the reaction to this post will be to read it and pretty much go on as usual focusing on UDP floods. But I have to try. There are so many more important issues (like understanding how to use congestion signalling in gossip protocols, gaming, or live AV conferencing better, as some related examples, which are end-to-end problems for which queue management and congestion signalling are truly crucial).



On Sunday, May 1, 2016 1:23am, "Dave Taht" <dave.taht@gmail.com> said:

> On Sat, Apr 30, 2016 at 10:08 PM, Ben Greear <greearb@candelatech.com> wrote:
>>
>>
>> On 04/30/2016 08:41 PM, Dave Taht wrote:
>>>
>>> There were a few things on this thread that went by, and I wasn't on
>>> the ath10k list
>>>
>>> (https://www.mail-archive.com/ath10k@lists.infradead.org/msg04461.html)
>>>
>>> first up, udp flood...
>>>
>>>>>> From: ath10k <ath10k-boun...@lists.infradead.org> on behalf of Roman
>>>>>> Yeryomin <leroi.li...@gmail.com>
>>>>>> Sent: Friday, April 8, 2016 8:14 PM
>>>>>> To: ath10k@lists.infradead.org
>>>>>> Subject: ath10k performance, master branch from 20160407
>>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> I've seen performance patches were commited so I've decided to give it
>>>>>> a try (using 4.1 kernel and backports).
>>>>>> The results are quite disappointing: TCP download (client pov) dropped
>>>>>> from 750Mbps to ~550 and UDP shows completely weird behavour - if
>>>>>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives
>>>>>> 250Mbps, before (latest official backports release from January) I was
>>>>>> able to get 900Mbps.
>>>>>> Hardware is basically ap152 + qca988x 3x3.
>>>>>> When running perf top I see that fq_codel_drop eats a lot of cpu.
>>>>>> Here is the output when running iperf3 UDP test:
>>>>>>
>>>>>>      45.78%  [kernel]       [k] fq_codel_drop
>>>>>>       3.05%  [kernel]       [k] ag71xx_poll
>>>>>>       2.18%  [kernel]       [k] skb_release_data
>>>>>>       2.01%  [kernel]       [k] r4k_dma_cache_inv
>>>
>>>
>>> The udp flood behavior is not "weird".  The test is wrong. It is so
>>> filling
>>> the local queue as to dramatically exceed the bandwidth on the link.
>>
>>
>> It would be nice if you could provide backpressure so that you could
>> simply select on the udp socket and use that to know when you can send
>> more frames??
> 
> The qdisc version returns  NET_XMIT_CN to the upper layers of the
> stack in the case
> where the dropped packet's flow = the ingress packet's flow, but that
> is after the
> exhaustive search...
> 
> I don't know what effect (if any) that had on udp sockets. Hmm... will
> look. Eric would "just know".
> 
> That might provide more backpressure in the local scenario. SO_SND_BUF
> should interact with this stuff in some sane way...
> 
> ... but over the wire from a test driver box elsewhere, tho, aside
> from ethernet flow control itself, where enabled, no.
> 
> ... but in that case you have a much lower inbound/outbound
> performance disparity in the general case to start with... which can
> still be quite high...
> 
>>
>> Any idea how that works with codel?
> 
> Beautifully.
> 
> For responsive TCP flows. It immediately reduces the window without a RTT.
> 
>> Thanks,
>> Ben
>>
>> --
>> Ben Greear <greearb@candelatech.com>
>> Candela Technologies Inc  http://www.candelatech.com
> 
> 
> 
> --
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> http://blog.cerowrt.org
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast
> 



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-01  3:41 fq_codel_drop vs a udp flood Dave Taht
  2016-05-01  4:46 ` [Make-wifi-fast] " Jonathan Morton
  2016-05-01  5:08 ` Ben Greear
@ 2016-05-01 17:59 ` Eric Dumazet
  2016-05-01 18:20   ` Jonathan Morton
                     ` (2 more replies)
  2016-05-02 13:47 ` [Make-wifi-fast] " Roman Yeryomin
  3 siblings, 3 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-01 17:59 UTC (permalink / raw)
  To: Dave Taht; +Cc: make-wifi-fast, codel, ath10k

On Sat, 2016-04-30 at 20:41 -0700, Dave Taht wrote:
> >>>
> >>>     45.78%  [kernel]       [k] fq_codel_drop
> >>>      3.05%  [kernel]       [k] ag71xx_poll
> >>>      2.18%  [kernel]       [k] skb_release_data
> >>>      2.01%  [kernel]       [k] r4k_dma_cache_inv
> 
> The udp flood behavior is not "weird".  The test is wrong. It is so filling
> the local queue as to dramatically exceed the bandwidth on the link.

Well, just _kill_ the offender, instead of trying to be gentle.

fq_codel_drop() could drop _all_ packets of the fat flow, instead of a
single one.

It is too cpu intensive to be kind to the elephant, since under pressure
fq_codel_drop() needs to be called for every enqueue.

Really, we should not try to let inelastic flows hurt us.

I can provide a patch.




_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-01 17:59 ` [Codel] " Eric Dumazet
@ 2016-05-01 18:20   ` Jonathan Morton
  2016-05-01 18:46     ` Eric Dumazet
  2016-05-03  2:26     ` [Codel] fq_codel_drop vs a udp flood Dave Taht
  2016-05-01 18:26   ` Dave Taht
  2016-05-02 14:09   ` Roman Yeryomin
  2 siblings, 2 replies; 108+ messages in thread
From: Jonathan Morton @ 2016-05-01 18:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: make-wifi-fast, codel, Dave Taht, ath10k


> On 1 May, 2016, at 20:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> fq_codel_drop() could drop _all_ packets of the fat flow, instead of a
> single one.

Unfortunately, that could have bad consequences if the “fat flow” happens to be a TCP in slow-start on a long-RTT path.  Such a flow is responsive, but on an order-magnitude longer timescale than may have been configured as optimum.

The real problem is that fq_codel_drop() performs the same (excessive) amount of work to cope with a single unresponsive flow as it would for a true DDoS.  Optimising the search function is sufficient.

 - Jonathan Morton


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-01 17:59 ` [Codel] " Eric Dumazet
  2016-05-01 18:20   ` Jonathan Morton
@ 2016-05-01 18:26   ` Dave Taht
  2016-05-01 22:30     ` Eric Dumazet
  2016-05-02 14:09   ` Roman Yeryomin
  2 siblings, 1 reply; 108+ messages in thread
From: Dave Taht @ 2016-05-01 18:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: make-wifi-fast, codel, ath10k

On Sun, May 1, 2016 at 10:59 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2016-04-30 at 20:41 -0700, Dave Taht wrote:
>> >>>
>> >>>     45.78%  [kernel]       [k] fq_codel_drop
>> >>>      3.05%  [kernel]       [k] ag71xx_poll
>> >>>      2.18%  [kernel]       [k] skb_release_data
>> >>>      2.01%  [kernel]       [k] r4k_dma_cache_inv
>>
>> The udp flood behavior is not "weird".  The test is wrong. It is so filling
>> the local queue as to dramatically exceed the bandwidth on the link.
>
> Well, just _kill_ the offender, instead of trying to be gentle.

I like it. :) Killing off a malfunctioning program flooding the local
network interface (intentionally or unintentionally) seems like a
useful idea.

it will break some test tools that deserve to be broken, too.

> fq_codel_drop() could drop _all_ packets of the fat flow, instead of a
> single one.
>
> It is too cpu intensive to be kind to the elephant, since under pressure
> fq_codel_drop() needs to be called for every enqueue.

A somewhat gentler approach might be drop 3 packets or more per
fq_codel_drop round - or "nearly" the entire flow (all but the last
packet).

But sure, dropping *all* of an unresponsive elephant (as more will be
arriving), in the event of the extreme overload that hitting
fq_codel_drop represents, sounds pretty good.

> Really, we should not try to let inelastic flows hurt us.

+10.

>
> I can provide a patch.

Killing the bad program, and dropping all of the fattest flow strike
me as two patches.[1]

This approach is akin to some of the thoughts in here:

https://tools.ietf.org/html/draft-ietf-tsvwg-circuit-breaker-14 [2]

... all that said, is there any way to exert flow control on a udp
socket from down in these layers?

[1] what sort of error code should a program killed for flooding the network
 return?

[2] I don't actually agree with some of the thinking in the circuit
breakers doc, but.... this is something of an outgrowth of the
obsolete and wrong source quench idea, which might be useful history
for someone: https://tools.ietf.org/html/rfc6633.






-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-01 18:20   ` Jonathan Morton
@ 2016-05-01 18:46     ` Eric Dumazet
  2016-05-01 19:55       ` Eric Dumazet
  2016-05-01 20:35       ` Jonathan Morton
  2016-05-03  2:26     ` [Codel] fq_codel_drop vs a udp flood Dave Taht
  1 sibling, 2 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-01 18:46 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: make-wifi-fast, codel, Dave Taht, ath10k

On Sun, 2016-05-01 at 21:20 +0300, Jonathan Morton wrote:
> > On 1 May, 2016, at 20:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > 
> > fq_codel_drop() could drop _all_ packets of the fat flow, instead of a
> > single one.
> 
> Unfortunately, that could have bad consequences if the “fat flow”
> happens to be a TCP in slow-start on a long-RTT path.  Such a flow is
> responsive, but on an order-magnitude longer timescale than may have
> been configured as optimum.

Are you trying to reinvent Hystart ? ;)
> 
> The real problem is that fq_codel_drop() performs the same (excessive)
> amount of work to cope with a single unresponsive flow as it would for
> a true DDoS.  Optimising the search function is sufficient.

Optimizing the search function is not possible, unless you slow down the
fast path. This was my design choice.

Just drop half backlog packets instead of 1, (maybe add a cap of 64
packets to avoid too big burts of kfree_skb() which might add cpu
spikes) and problem is gone.

TCP in slow start wont be hurt at all. A fat TCP flow is still fat.

Only bad CC could possibly be hurt.




_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-01 18:46     ` Eric Dumazet
@ 2016-05-01 19:55       ` Eric Dumazet
  2016-05-02  7:47         ` Jesper Dangaard Brouer
  2016-05-01 20:35       ` Jonathan Morton
  1 sibling, 1 reply; 108+ messages in thread
From: Eric Dumazet @ 2016-05-01 19:55 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: make-wifi-fast, codel, Dave Taht, ath10k

On Sun, 2016-05-01 at 11:46 -0700, Eric Dumazet wrote:

> Just drop half backlog packets instead of 1, (maybe add a cap of 64
> packets to avoid too big burts of kfree_skb() which might add cpu
> spikes) and problem is gone.
> 

I used following patch and it indeed solved the issue in my tests.

(Not the DDOS case, but when few fat flows are really bad citizens)

diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index a5e420b3d4ab..0cb8699624bc 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -135,11 +135,11 @@ static inline void flow_queue_add(struct fq_codel_flow *flow,
 	skb->next = NULL;
 }
 
-static unsigned int fq_codel_drop(struct Qdisc *sch)
+static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max)
 {
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
-	unsigned int maxbacklog = 0, idx = 0, i, len;
+	unsigned int maxbacklog = 0, idx = 0, i, len = 0;
 	struct fq_codel_flow *flow;
 
 	/* Queue is full! Find the fat flow and drop packet from it.
@@ -153,15 +153,26 @@ static unsigned int fq_codel_drop(struct Qdisc *sch)
 			idx = i;
 		}
 	}
+	/* As the search was painful, drop half bytes of this fat flow.
+	 * Limit to max packets to not inflict too big latencies,
+	 * as kfree_skb() might be quite expensive.
+	 */
+	maxbacklog >>= 1;
+
 	flow = &q->flows[idx];
-	skb = dequeue_head(flow);
-	len = qdisc_pkt_len(skb);
+	for (i = 0; i < max;) {
+		skb = dequeue_head(flow);
+		len += qdisc_pkt_len(skb);
+		kfree_skb(skb);
+		i++;
+		if (len >= maxbacklog)
+			break;
+	}
+	sch->qstats.drops += i;
+	sch->qstats.backlog -= len;
 	q->backlogs[idx] -= len;
-	sch->q.qlen--;
-	qdisc_qstats_drop(sch);
-	qdisc_qstats_backlog_dec(sch, skb);
-	kfree_skb(skb);
-	flow->dropped++;
+	sch->q.qlen -= i;
+	flow->dropped += i;
 	return idx;
 }
 
@@ -170,14 +181,14 @@ static unsigned int fq_codel_qdisc_drop(struct Qdisc *sch)
 	unsigned int prev_backlog;
 
 	prev_backlog = sch->qstats.backlog;
-	fq_codel_drop(sch);
+	fq_codel_drop(sch, 1U);
 	return prev_backlog - sch->qstats.backlog;
 }
 
 static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
-	unsigned int idx, prev_backlog;
+	unsigned int idx, prev_backlog, prev_qlen;
 	struct fq_codel_flow *flow;
 	int uninitialized_var(ret);
 
@@ -206,16 +217,15 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		return NET_XMIT_SUCCESS;
 
 	prev_backlog = sch->qstats.backlog;
-	q->drop_overlimit++;
-	/* Return Congestion Notification only if we dropped a packet
-	 * from this flow.
-	 */
-	if (fq_codel_drop(sch) == idx)
-		return NET_XMIT_CN;
+	prev_qlen = sch->q.qlen;
+	ret = fq_codel_drop(sch, 64U);
+	q->drop_overlimit += prev_qlen - sch->q.qlen;
+
+	/* As we dropped packet(s), better let upper stack know this */
+	qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
+				  prev_backlog - sch->qstats.backlog);
 
-	/* As we dropped a packet, better let upper stack know this */
-	qdisc_tree_reduce_backlog(sch, 1, prev_backlog - sch->qstats.backlog);
-	return NET_XMIT_SUCCESS;
+	return ret == idx ? NET_XMIT_CN : NET_XMIT_SUCCESS;
 }
 
 /* This is the specific function called from codel_dequeue()



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-01 18:46     ` Eric Dumazet
  2016-05-01 19:55       ` Eric Dumazet
@ 2016-05-01 20:35       ` Jonathan Morton
  2016-05-01 20:55         ` Eric Dumazet
  1 sibling, 1 reply; 108+ messages in thread
From: Jonathan Morton @ 2016-05-01 20:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: make-wifi-fast, codel, Dave Taht, ath10k


> On 1 May, 2016, at 21:46, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> Optimizing the search function is not possible, unless you slow down the
> fast path. This was my design choice.

I beg to differ.  Cake iterates over the queues in the bulk and sparse lists, rather than all queues full stop.  That’s a straightforward optimisation which covers the case in question here, and has no effect on the fast path.

 - Jonathan Morton


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-01 20:35       ` Jonathan Morton
@ 2016-05-01 20:55         ` Eric Dumazet
  2016-05-02 14:18           ` Roman Yeryomin
  0 siblings, 1 reply; 108+ messages in thread
From: Eric Dumazet @ 2016-05-01 20:55 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: make-wifi-fast, codel, Dave Taht, ath10k

On Sun, 2016-05-01 at 23:35 +0300, Jonathan Morton wrote:
> > On 1 May, 2016, at 21:46, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > 
> > Optimizing the search function is not possible, unless you slow down the
> > fast path. This was my design choice.
> 
> I beg to differ.  Cake iterates over the queues in the bulk and sparse
> lists, rather than all queues full stop.  That’s a straightforward
> optimisation which covers the case in question here, and has no effect
> on the fast path.


Thats not going to help, sorry, if you have hundreds of flows in these
queues. (I had 200 TCP_STREAM running in my test, plus one hostile
UDP_STREAM)

It is going to be _much_ slower, since you'll bring in cpu caches way
more memory, while disc spinlock is held.

Better have a known cost of 1 cache line miss per drop, instead of 200
cache line misses per drop.

(4096 bytes to store q->backlogs[] array -> 64 cache lines.
If we drop 64 skb per pass, this averages to 1 cache line miss per drop)

Listen, I never thought people were going to use fq_codel in some
hostile env.

I simply designed it to be used in home routers, so I would not imagine
someone would be trying to kill its own Internet connection.

So I believe I will sent this patch as is to David Miller for inclusion.



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-01 18:26   ` Dave Taht
@ 2016-05-01 22:30     ` Eric Dumazet
  0 siblings, 0 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-01 22:30 UTC (permalink / raw)
  To: Dave Taht; +Cc: make-wifi-fast, codel, ath10k

On Sun, 2016-05-01 at 11:26 -0700, Dave Taht wrote:
> On Sun, May 1, 2016 at 10:59 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > Well, just _kill_ the offender, instead of trying to be gentle.
> 
> I like it. :) Killing off a malfunctioning program flooding the local
> network interface (intentionally or unintentionally) seems like a
> useful idea.
> 
..

> Killing the bad program, and dropping all of the fattest flow strike
> me as two patches.[1]

What I meant by 'killing' was to drop more than one packet from this fat
flow, not actually killing a task.

I will submit an official patch, dropping 50% of the fat flow backlog,
and a configurable cap of 64 packets to somewhat control max ->enqueue()
latency.







_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-01 19:55       ` Eric Dumazet
@ 2016-05-02  7:47         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 108+ messages in thread
From: Jesper Dangaard Brouer @ 2016-05-02  7:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jonathan Morton, codel, ath10k, make-wifi-fast

On Sun, 01 May 2016 12:55:53 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Sun, 2016-05-01 at 11:46 -0700, Eric Dumazet wrote:
> 
> > Just drop half backlog packets instead of 1, (maybe add a cap of 64
> > packets to avoid too big burts of kfree_skb() which might add cpu
> > spikes) and problem is gone.
> >   
> 
> I used following patch and it indeed solved the issue in my tests.
> 
> (Not the DDOS case, but when few fat flows are really bad citizens)
> 
> diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> index a5e420b3d4ab..0cb8699624bc 100644
> --- a/net/sched/sch_fq_codel.c
> +++ b/net/sched/sch_fq_codel.c
> @@ -135,11 +135,11 @@ static inline void flow_queue_add(struct fq_codel_flow *flow,
>  	skb->next = NULL;
>  }
>  
> -static unsigned int fq_codel_drop(struct Qdisc *sch)
> +static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max)
>  {
>  	struct fq_codel_sched_data *q = qdisc_priv(sch);
>  	struct sk_buff *skb;
> -	unsigned int maxbacklog = 0, idx = 0, i, len;
> +	unsigned int maxbacklog = 0, idx = 0, i, len = 0;
>  	struct fq_codel_flow *flow;
>  
>  	/* Queue is full! Find the fat flow and drop packet from it.
> @@ -153,15 +153,26 @@ static unsigned int fq_codel_drop(struct Qdisc *sch)
>  			idx = i;
>  		}
>  	}
> +	/* As the search was painful, drop half bytes of this fat flow.
> +	 * Limit to max packets to not inflict too big latencies,
> +	 * as kfree_skb() might be quite expensive.
> +	 */
> +	maxbacklog >>= 1;
> +
>  	flow = &q->flows[idx];
> -	skb = dequeue_head(flow);
> -	len = qdisc_pkt_len(skb);
> +	for (i = 0; i < max;) {
> +		skb = dequeue_head(flow);
> +		len += qdisc_pkt_len(skb);
> +		kfree_skb(skb);
> +		i++;
> +		if (len >= maxbacklog)
> +			break;
> +	}

What about using bulk free of SKBs here?

There is a very high probability that we are hitting SLUB slowpath,
which involves a locked cmpxchg_double per packet.  Instead we can
amortize this cost via kmem_cache_free_bulk().

Maybe extend kfree_skb_list() to hide the slab/kmem_cache call?


> +	sch->qstats.drops += i;
> +	sch->qstats.backlog -= len;
>  	q->backlogs[idx] -= len;
> -	sch->q.qlen--;
> -	qdisc_qstats_drop(sch);
> -	qdisc_qstats_backlog_dec(sch, skb);
> -	kfree_skb(skb);
> -	flow->dropped++;
> +	sch->q.qlen -= i;
> +	flow->dropped += i;
>  	return idx;
>  }


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] fq_codel_drop vs a udp flood
  2016-05-01  3:41 fq_codel_drop vs a udp flood Dave Taht
                   ` (2 preceding siblings ...)
  2016-05-01 17:59 ` [Codel] " Eric Dumazet
@ 2016-05-02 13:47 ` Roman Yeryomin
  2016-05-02 15:01   ` Eric Dumazet
  3 siblings, 1 reply; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-02 13:47 UTC (permalink / raw)
  To: Dave Taht; +Cc: make-wifi-fast, codel, ath10k

On 1 May 2016 at 06:41, Dave Taht <dave.taht@gmail.com> wrote:
> There were a few things on this thread that went by, and I wasn't on
> the ath10k list
>
> (https://www.mail-archive.com/ath10k@lists.infradead.org/msg04461.html)
>
> first up, udp flood...
>
>>>> From: ath10k <ath10k-boun...@lists.infradead.org> on behalf of Roman
>>>> Yeryomin <leroi.li...@gmail.com>
>>>> Sent: Friday, April 8, 2016 8:14 PM
>>>> To: ath10k@lists.infradead.org
>>>> Subject: ath10k performance, master branch from 20160407
>>>>
>>>> Hello!
>>>>
>>>> I've seen performance patches were commited so I've decided to give it
>>>> a try (using 4.1 kernel and backports).
>>>> The results are quite disappointing: TCP download (client pov) dropped
>>>> from 750Mbps to ~550 and UDP shows completely weird behavour - if
>>>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives
>>>> 250Mbps, before (latest official backports release from January) I was
>>>> able to get 900Mbps.
>>>> Hardware is basically ap152 + qca988x 3x3.
>>>> When running perf top I see that fq_codel_drop eats a lot of cpu.
>>>> Here is the output when running iperf3 UDP test:
>>>>
>>>>     45.78%  [kernel]       [k] fq_codel_drop
>>>>      3.05%  [kernel]       [k] ag71xx_poll
>>>>      2.18%  [kernel]       [k] skb_release_data
>>>>      2.01%  [kernel]       [k] r4k_dma_cache_inv
>
> The udp flood behavior is not "weird".  The test is wrong. It is so filling
> the local queue as to dramatically exceed the bandwidth on the link.

Are you trying to say that generating 250Mbps and having 250Mbps an
generating, e.g. 700Mbps and having 30Mbps is normal and I should
blame iperf3? Even if before I could get 900Mbps with the same
tools/parameters/hw? Really?

> The size of the local queue has exceeded anything rational, gentle
> tcp-friendly methods have failed, we're out of configured queue space,
>  and as a last ditch move, fq_codel_drop is attempting to reduce the
> backlog via brute force.

So it looks to me that fq_codel is just broken if it needs half of my resources.

> Approaches:
>
> 0) Fix the test
>
> The udp flood test should seek an operating point roughly equal to
> the bandwidth of the link, to where there is near zero queuing delay,
> and nearly 100% utilization.
>
> There are several well known methods for an endpoint to seek
> equilibrium, - filling the pipe and not the queue - notably the ones
> outlined in this:
>
> http://ee.lbl.gov/papers/congavoid.pdf
>
> are a good starting point for further research. :)
>
> Now, a unicast flood test is useful for figuring out how many packets
> can fit in a link (both large and small), and tweaking the cpu (or
> running a box out of memory).
>
> However -
>
> I have seen a lot of udp flood tests that are constructed badly.
>
> Measuring time to *send* X packets without counting the queue length
> in the test is one. This was iperf3 what options, exactly? Running
> locally or via a test client connected via ethernet? (so at local cpu
> speeds, rather than the network ingress speed?)

iperf3 -c <server_ip> -u -b900M -l1472 -R -t600
server_ip is on ethernet side, no NAT, minimal system, client is 3x3 MacBook Pro


Regards,
Roman

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] fq_codel_drop vs a udp flood
  2016-05-01 14:47     ` [Make-wifi-fast] " dpreed
@ 2016-05-02 14:03       ` Roman Yeryomin
  2016-05-02 18:40         ` Dave Taht
  2016-05-02 19:47         ` David Lang
  0 siblings, 2 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-02 14:03 UTC (permalink / raw)
  To: dpreed; +Cc: make-wifi-fast, Ben Greear, codel, Dave Taht, ath10k

On 1 May 2016 at 17:47,  <dpreed@reed.com> wrote:
> Maybe I missed something, but why is it important to optimize for a UDP flood?

We don't need to optimize it to UDP but UDP is used e.g. by torrents
to achieve higher throughput and used a lot in general.
And, again, in this case TCP is broken too (750Mbps down to 550), so
it's not like Dave is saying that UDP test is broken, fq_codel is just
too hungry for CPU

> A general observation of control theory is that there is almost always an adversarial strategy that will destroy any control regime. Sometimes one has to invoke an "oracle" that knows the state of the control system at all times to get there.
>
> So a handwave is that *there is always a DDoS that will work* no matter how clever you are.
>
> And the corollary is illustrated by the TSA. If you can't anticipate all possible attacks, it is not clearly better to just congest the whole system at all times with controls that can't possibly solve all possible attacks - i.e. Security Theater. We don't want "anti-DDoS theater" I don't think.
>
> There is an alternative mechanism that has been effective at dealing with DDoS in general - track the disruption back to the source and kill it.  (this is what the end-to-end argument would be: don't try to solve a fundamentally end-to-end problem, DDoS, solely in the network [switches], since you have to solve it at the edges anyway. Just include in the network things that will help you solve it at the edges - traceback tools that work fast and targeted shutdown of sources).
>
> I don't happen to know of a "normal" application that benefits from UDP flooding - not even "gossip protocols" do that!
>
> In context, then, let's not focus on UDP flood performance (or any other "extreme case" that just seems fun to work on in a research paper because it is easy to state compared to the real world) too much.
>
> I know that the reaction to this post will be to read it and pretty much go on as usual focusing on UDP floods. But I have to try. There are so many more important issues (like understanding how to use congestion signalling in gossip protocols, gaming, or live AV conferencing better, as some related examples, which are end-to-end problems for which queue management and congestion signalling are truly crucial).
>
>
>
> On Sunday, May 1, 2016 1:23am, "Dave Taht" <dave.taht@gmail.com> said:
>
>> On Sat, Apr 30, 2016 at 10:08 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>
>>>
>>> On 04/30/2016 08:41 PM, Dave Taht wrote:
>>>>
>>>> There were a few things on this thread that went by, and I wasn't on
>>>> the ath10k list
>>>>
>>>> (https://www.mail-archive.com/ath10k@lists.infradead.org/msg04461.html)
>>>>
>>>> first up, udp flood...
>>>>
>>>>>>> From: ath10k <ath10k-boun...@lists.infradead.org> on behalf of Roman
>>>>>>> Yeryomin <leroi.li...@gmail.com>
>>>>>>> Sent: Friday, April 8, 2016 8:14 PM
>>>>>>> To: ath10k@lists.infradead.org
>>>>>>> Subject: ath10k performance, master branch from 20160407
>>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> I've seen performance patches were commited so I've decided to give it
>>>>>>> a try (using 4.1 kernel and backports).
>>>>>>> The results are quite disappointing: TCP download (client pov) dropped
>>>>>>> from 750Mbps to ~550 and UDP shows completely weird behavour - if
>>>>>>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives
>>>>>>> 250Mbps, before (latest official backports release from January) I was
>>>>>>> able to get 900Mbps.
>>>>>>> Hardware is basically ap152 + qca988x 3x3.
>>>>>>> When running perf top I see that fq_codel_drop eats a lot of cpu.
>>>>>>> Here is the output when running iperf3 UDP test:
>>>>>>>
>>>>>>>      45.78%  [kernel]       [k] fq_codel_drop
>>>>>>>       3.05%  [kernel]       [k] ag71xx_poll
>>>>>>>       2.18%  [kernel]       [k] skb_release_data
>>>>>>>       2.01%  [kernel]       [k] r4k_dma_cache_inv
>>>>
>>>>
>>>> The udp flood behavior is not "weird".  The test is wrong. It is so
>>>> filling
>>>> the local queue as to dramatically exceed the bandwidth on the link.
>>>
>>>
>>> It would be nice if you could provide backpressure so that you could
>>> simply select on the udp socket and use that to know when you can send
>>> more frames??
>>
>> The qdisc version returns  NET_XMIT_CN to the upper layers of the
>> stack in the case
>> where the dropped packet's flow = the ingress packet's flow, but that
>> is after the
>> exhaustive search...
>>
>> I don't know what effect (if any) that had on udp sockets. Hmm... will
>> look. Eric would "just know".
>>
>> That might provide more backpressure in the local scenario. SO_SND_BUF
>> should interact with this stuff in some sane way...
>>
>> ... but over the wire from a test driver box elsewhere, tho, aside
>> from ethernet flow control itself, where enabled, no.
>>
>> ... but in that case you have a much lower inbound/outbound
>> performance disparity in the general case to start with... which can
>> still be quite high...
>>
>>>
>>> Any idea how that works with codel?
>>
>> Beautifully.
>>
>> For responsive TCP flows. It immediately reduces the window without a RTT.
>>
>>> Thanks,
>>> Ben
>>>
>>> --
>>> Ben Greear <greearb@candelatech.com>
>>> Candela Technologies Inc  http://www.candelatech.com
>>
>>
>>
>> --
>> Dave Täht
>> Let's go make home routers and wifi faster! With better software!
>> http://blog.cerowrt.org
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wifi-fast@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>
>
>
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-01 17:59 ` [Codel] " Eric Dumazet
  2016-05-01 18:20   ` Jonathan Morton
  2016-05-01 18:26   ` Dave Taht
@ 2016-05-02 14:09   ` Roman Yeryomin
  2016-05-02 15:04     ` Eric Dumazet
  2 siblings, 1 reply; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-02 14:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: make-wifi-fast, codel, Dave Taht, ath10k

On 1 May 2016 at 20:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2016-04-30 at 20:41 -0700, Dave Taht wrote:
>> >>>
>> >>>     45.78%  [kernel]       [k] fq_codel_drop
>> >>>      3.05%  [kernel]       [k] ag71xx_poll
>> >>>      2.18%  [kernel]       [k] skb_release_data
>> >>>      2.01%  [kernel]       [k] r4k_dma_cache_inv
>>
>> The udp flood behavior is not "weird".  The test is wrong. It is so filling
>> the local queue as to dramatically exceed the bandwidth on the link.
>
> Well, just _kill_ the offender, instead of trying to be gentle.
>
> fq_codel_drop() could drop _all_ packets of the fat flow, instead of a
> single one.
>
> It is too cpu intensive to be kind to the elephant, since under pressure
> fq_codel_drop() needs to be called for every enqueue.
>
> Really, we should not try to let inelastic flows hurt us.
>
> I can provide a patch.
>

So if I run some UDP download you will just kill me? Sounds broken.

Regards,
Roman

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-01 20:55         ` Eric Dumazet
@ 2016-05-02 14:18           ` Roman Yeryomin
  2016-05-02 15:07             ` Eric Dumazet
  0 siblings, 1 reply; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-02 14:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jonathan Morton, codel, Dave Taht, ath10k, make-wifi-fast

On 1 May 2016 at 23:55, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2016-05-01 at 23:35 +0300, Jonathan Morton wrote:
>> > On 1 May, 2016, at 21:46, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >
>> > Optimizing the search function is not possible, unless you slow down the
>> > fast path. This was my design choice.
>>
>> I beg to differ.  Cake iterates over the queues in the bulk and sparse
>> lists, rather than all queues full stop.  That’s a straightforward
>> optimisation which covers the case in question here, and has no effect
>> on the fast path.
>
>
> Thats not going to help, sorry, if you have hundreds of flows in these
> queues. (I had 200 TCP_STREAM running in my test, plus one hostile
> UDP_STREAM)
>
> It is going to be _much_ slower, since you'll bring in cpu caches way
> more memory, while disc spinlock is held.
>
> Better have a known cost of 1 cache line miss per drop, instead of 200
> cache line misses per drop.
>
> (4096 bytes to store q->backlogs[] array -> 64 cache lines.
> If we drop 64 skb per pass, this averages to 1 cache line miss per drop)
>
> Listen, I never thought people were going to use fq_codel in some
> hostile env.
>
> I simply designed it to be used in home routers, so I would not imagine
> someone would be trying to kill its own Internet connection.

Imagine you are a video operator, have MacBook Pro, gigabit LAN and
NAS on ethernet side. You would want to get maximum speed. And
fq_codel just dropped it down to 550Mbps for TCP (instead of 750Mbps)
and to 30Mbps for UDP (instead of 900Mbps).
So, again, it looks broken to me.

Regards,
Roman

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] fq_codel_drop vs a udp flood
  2016-05-02 13:47 ` [Make-wifi-fast] " Roman Yeryomin
@ 2016-05-02 15:01   ` Eric Dumazet
  0 siblings, 0 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-02 15:01 UTC (permalink / raw)
  To: Roman Yeryomin; +Cc: make-wifi-fast, codel, Dave Taht, ath10k

On Mon, 2016-05-02 at 16:47 +0300, Roman Yeryomin wrote:

> So it looks to me that fq_codel is just broken if it needs half of my resources.

Agreed.

When I wrote fq_codel, I was not expecting that one UDP socket could
fill fq_codel with packets, since we have standard backpressure.

SO_SNDBUF default on UDP does not allow to send more than 256 packets in
a qdisc. So when sk_sndbuf limit is hit, sendmsg() either blocks or
returns an error (if in non blocking mode)

So it looks like someone really tries hard ;)

To reproduce the issue I had to change (as root user)

echo 1000000 >/proc/sys/net/core/wmem_default

_and_ setup fq_codel with a much lower limit than the default 10240
limit

tc qdisc replace dev eth0 root fq_codel limit 300

So please try this fix :

https://patchwork.ozlabs.org/patch/617307/






_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-02 14:09   ` Roman Yeryomin
@ 2016-05-02 15:04     ` Eric Dumazet
  2016-05-02 15:42       ` Roman Yeryomin
  0 siblings, 1 reply; 108+ messages in thread
From: Eric Dumazet @ 2016-05-02 15:04 UTC (permalink / raw)
  To: Roman Yeryomin; +Cc: make-wifi-fast, codel, Dave Taht, ath10k

On Mon, 2016-05-02 at 17:09 +0300, Roman Yeryomin wrote:

> So if I run some UDP download you will just kill me? Sounds broken.
> 

Seriously guys, I never suggesting kill a _task_ but the _flow_

Meaning dropping packets. See ?

If you do not want to drop packets, do not use fq_codel and simply use
bufferbloat pfifo_fast.





_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-02 14:18           ` Roman Yeryomin
@ 2016-05-02 15:07             ` Eric Dumazet
  2016-05-02 15:43               ` Roman Yeryomin
  0 siblings, 1 reply; 108+ messages in thread
From: Eric Dumazet @ 2016-05-02 15:07 UTC (permalink / raw)
  To: Roman Yeryomin; +Cc: Jonathan Morton, codel, Dave Taht, ath10k, make-wifi-fast

On Mon, 2016-05-02 at 17:18 +0300, Roman Yeryomin wrote:

> Imagine you are a video operator, have MacBook Pro, gigabit LAN and
> NAS on ethernet side. You would want to get maximum speed. And
> fq_codel just dropped it down to 550Mbps for TCP (instead of 750Mbps)
> and to 30Mbps for UDP (instead of 900Mbps).
> So, again, it looks broken to me.

Can you show us your qdisc config ?

It looks broken to me.



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-02 15:04     ` Eric Dumazet
@ 2016-05-02 15:42       ` Roman Yeryomin
  0 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-02 15:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: make-wifi-fast, codel, Dave Taht, ath10k

On 2 May 2016 at 18:04, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-05-02 at 17:09 +0300, Roman Yeryomin wrote:
>
>> So if I run some UDP download you will just kill me? Sounds broken.
>>
>
> Seriously guys, I never suggesting kill a _task_ but the _flow_
>
> Meaning dropping packets. See ?
>
> If you do not want to drop packets, do not use fq_codel and simply use
> bufferbloat pfifo_fast.
>

I understand what you mean but in case of a flow running through the
AP it probably could be considered as "task"

I've tried pfifo before, it was better but didn't help much.
How much faster pfifo_fast is?

Regards,
Roman

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-02 15:07             ` Eric Dumazet
@ 2016-05-02 15:43               ` Roman Yeryomin
  2016-05-02 16:14                 ` Eric Dumazet
  0 siblings, 1 reply; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-02 15:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jonathan Morton, codel, Dave Taht, ath10k, make-wifi-fast

On 2 May 2016 at 18:07, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-05-02 at 17:18 +0300, Roman Yeryomin wrote:
>
>> Imagine you are a video operator, have MacBook Pro, gigabit LAN and
>> NAS on ethernet side. You would want to get maximum speed. And
>> fq_codel just dropped it down to 550Mbps for TCP (instead of 750Mbps)
>> and to 30Mbps for UDP (instead of 900Mbps).
>> So, again, it looks broken to me.
>
> Can you show us your qdisc config ?

Which build do you want? Before it broke or after?

> It looks broken to me.
>
>

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-02 15:43               ` Roman Yeryomin
@ 2016-05-02 16:14                 ` Eric Dumazet
  2016-05-02 17:08                   ` Dave Taht
  2016-05-05 14:53                   ` Roman Yeryomin
  0 siblings, 2 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-02 16:14 UTC (permalink / raw)
  To: Roman Yeryomin; +Cc: Jonathan Morton, codel, Dave Taht, ath10k, make-wifi-fast

On Mon, 2016-05-02 at 18:43 +0300, Roman Yeryomin wrote:
> On 2 May 2016 at 18:07, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Mon, 2016-05-02 at 17:18 +0300, Roman Yeryomin wrote:
> >
> >> Imagine you are a video operator, have MacBook Pro, gigabit LAN and
> >> NAS on ethernet side. You would want to get maximum speed. And
> >> fq_codel just dropped it down to 550Mbps for TCP (instead of 750Mbps)
> >> and to 30Mbps for UDP (instead of 900Mbps).
> >> So, again, it looks broken to me.
> >
> > Can you show us your qdisc config ?
> 
> Which build do you want? Before it broke or after?
> 


I want to check your qdisc configuration, the one that you used and
where you had fq_codel performance issues

tc -s -d qdisc



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-02 16:14                 ` Eric Dumazet
@ 2016-05-02 17:08                   ` Dave Taht
  2016-05-02 17:44                     ` Eric Dumazet
  2016-05-05 14:32                     ` Roman Yeryomin
  2016-05-05 14:53                   ` Roman Yeryomin
  1 sibling, 2 replies; 108+ messages in thread
From: Dave Taht @ 2016-05-02 17:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: make-wifi-fast, ath10k, codel, Michal Kazior, Jonathan Morton,
	Roman Yeryomin

On Mon, May 2, 2016 at 9:14 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-05-02 at 18:43 +0300, Roman Yeryomin wrote:
>> On 2 May 2016 at 18:07, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Mon, 2016-05-02 at 17:18 +0300, Roman Yeryomin wrote:
>> >
>> >> Imagine you are a video operator, have MacBook Pro, gigabit LAN and
>> >> NAS on ethernet side. You would want to get maximum speed. And
>> >> fq_codel just dropped it down to 550Mbps for TCP (instead of 750Mbps)
>> >> and to 30Mbps for UDP (instead of 900Mbps).
>> >> So, again, it looks broken to me.

The big regression trying to be addressed here is the decades long
increase in wifi overbuffering for slow and normal clients.

The number that was making me happy was seeing low speed clients
finally have sane behavior:
http://blog.cerowrt.org/post/fq_codel_on_ath10k/

I will add your iperf flood test to the testbench. Certainly we don't
want to hurt peak speeds overmuch... but we'd also like to see people
trying traffic at lower speeds.

Incidentally if you are doing openwrt builds that would be of great help.

>> > Can you show us your qdisc config ?
>>
>> Which build do you want? Before it broke or after?

Commit hashes for each would help.

>
>
> I want to check your qdisc configuration, the one that you used and
> where you had fq_codel performance issues
>
> tc -s -d qdisc

Not sure it's the qdisc version under test here. ? If it is, I'd be
perversely happy as for the first time ever the wifi layer started
exerting some backpressure on the upper layers of the stack.

I'm not sure which parts of which patchset are under test here,
either. I saw a few too many patches go by all around, and I am only
just this week able to add ath10k to my test matrix. Commit?

https://github.com/kazikcz/linux/commits/fqmac-v3.5 has a version of
fq_codel in it (and the underlying driver changes) *at the mac80211
layer*, not the qdisc layer. It disables the overlying qdisc. It will
need the equivalent new fq_codel_drop logic added to it, that eric
just added, also, to do better on the udp flood test.

There was a prior branch that did pretty darn well at high speeds,
results I put on that blog post I linked to above - 820Mbps for tcp,
an actual improvement on the baseline test. The current branch is
simpler and did not do as well due in part to not being integrated
with rate control (I think).

There are pieces dropping in all over, there was a amdsu patch,
another patch on rx/tx,


-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-02 17:08                   ` Dave Taht
@ 2016-05-02 17:44                     ` Eric Dumazet
  2016-05-05 14:32                     ` Roman Yeryomin
  1 sibling, 0 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-02 17:44 UTC (permalink / raw)
  To: Dave Taht
  Cc: make-wifi-fast, ath10k, codel, Michal Kazior, Jonathan Morton,
	Roman Yeryomin

On Mon, 2016-05-02 at 10:08 -0700, Dave Taht wrote:
> On Mon, May 2, 2016 at 9:14 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> >
> > I want to check your qdisc configuration, the one that you used and
> > where you had fq_codel performance issues
> >
> > tc -s -d qdisc
> 
> Not sure it's the qdisc version under test here. ? If it is, I'd be
> perversely happy as for the first time ever the wifi layer started
> exerting some backpressure on the upper layers of the stack.

I wrote fq_codel with a configurable limit of packets.

Default is 10240

I want to see what limit was set on this particular qdisc.

All qdisc are going to drop hell of packets, if say limit is set to 5.

Claiming they are buggy is quite misleading.

fq_codel was designed with the hope of dropping in excess packets at
dequeue() time, not enqueue() time.

Apparently someone did not understand this part.



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] fq_codel_drop vs a udp flood
  2016-05-02 14:03       ` Roman Yeryomin
@ 2016-05-02 18:40         ` Dave Taht
  2016-05-05 13:55           ` Roman Yeryomin
  2016-05-02 19:47         ` David Lang
  1 sibling, 1 reply; 108+ messages in thread
From: Dave Taht @ 2016-05-02 18:40 UTC (permalink / raw)
  To: Roman Yeryomin; +Cc: make-wifi-fast, David Reed, codel, Ben Greear, ath10k

On Mon, May 2, 2016 at 7:03 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 1 May 2016 at 17:47,  <dpreed@reed.com> wrote:
>> Maybe I missed something, but why is it important to optimize for a UDP flood?
>
> We don't need to optimize it to UDP but UDP is used e.g. by torrents
> to achieve higher throughput and used a lot in general.

Torrents use uTP congestion control and won't hit this function at
all. And eric just made fq_codel_drop more efficient for tests that
do.

There are potentially zillions of other issues with ampdu's, txop
usage, aggregate "packing", etc that can also affect and other
protocools.

> And, again, in this case TCP is broken too (750Mbps down to 550), so
> it's not like Dave is saying that UDP test is broken, fq_codel is just
> too hungry for CPU

"fq_codel_drop" was too hungry for cpu. fixed. thx eric. :)

I've never seen ath10k tcp throughput in the real world (e.g not wired
up, over the air) even close to 750 under test on the ath10k (I've
seen 300, and I'm getting some better gear up this week)... and
everybody tests wifi differently.

(for the record, what was your iperf tcp test line?). More people
testing differently = good.

Did fq_codel_drop show up in the perf trace for the tcp test?

(More likely you would have seen timestamping rise significantly for
the tcp test, as well as enqueue time)

That said, more people testing the same ways, good too.

I'd love it if you could re-run your test via flent, rather than
iperf, and look at the tcp sawtooth or lack thereof, and the overall
curve of the throughput, before and after this set of commits.

Flent can be made to run on osx via macports or brew. (much easier to
get running on linux) And try to tag along on observing/fixing low
wifi rate behavior?

This was the more recent dql vs wifi test:

http://blog.cerowrt.org/post/dql_on_wifi_2/

and series.

>> A general observation of control theory is that there is almost always an adversarial strategy that will destroy any control regime. Sometimes one has to invoke an "oracle" that knows the state of the control system at all times to get there.
>>
>> So a handwave is that *there is always a DDoS that will work* no matter how clever you are.
>>
>> And the corollary is illustrated by the TSA. If you can't anticipate all possible attacks, it is not clearly better to just congest the whole system at all times with controls that can't possibly solve all possible attacks - i.e. Security Theater. We don't want "anti-DDoS theater" I don't think.
>>
>> There is an alternative mechanism that has been effective at dealing with DDoS in general - track the disruption back to the source and kill it.  (this is what the end-to-end argument would be: don't try to solve a fundamentally end-to-end problem, DDoS, solely in the network [switches], since you have to solve it at the edges anyway. Just include in the network things that will help you solve it at the edges - traceback tools that work fast and targeted shutdown of sources).
>>
>> I don't happen to know of a "normal" application that benefits from UDP flooding - not even "gossip protocols" do that!
>>
>> In context, then, let's not focus on UDP flood performance (or any other "extreme case" that just seems fun to work on in a research paper because it is easy to state compared to the real world) too much.
>>
>> I know that the reaction to this post will be to read it and pretty much go on as usual focusing on UDP floods. But I have to try. There are so many more important issues (like understanding how to use congestion signalling in gossip protocols, gaming, or live AV conferencing better, as some related examples, which are end-to-end problems for which queue management and congestion signalling are truly crucial).
>>
>>
>>
>> On Sunday, May 1, 2016 1:23am, "Dave Taht" <dave.taht@gmail.com> said:
>>
>>> On Sat, Apr 30, 2016 at 10:08 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>>
>>>>
>>>> On 04/30/2016 08:41 PM, Dave Taht wrote:
>>>>>
>>>>> There were a few things on this thread that went by, and I wasn't on
>>>>> the ath10k list
>>>>>
>>>>> (https://www.mail-archive.com/ath10k@lists.infradead.org/msg04461.html)
>>>>>
>>>>> first up, udp flood...
>>>>>
>>>>>>>> From: ath10k <ath10k-boun...@lists.infradead.org> on behalf of Roman
>>>>>>>> Yeryomin <leroi.li...@gmail.com>
>>>>>>>> Sent: Friday, April 8, 2016 8:14 PM
>>>>>>>> To: ath10k@lists.infradead.org
>>>>>>>> Subject: ath10k performance, master branch from 20160407
>>>>>>>>
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> I've seen performance patches were commited so I've decided to give it
>>>>>>>> a try (using 4.1 kernel and backports).
>>>>>>>> The results are quite disappointing: TCP download (client pov) dropped
>>>>>>>> from 750Mbps to ~550 and UDP shows completely weird behavour - if
>>>>>>>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives
>>>>>>>> 250Mbps, before (latest official backports release from January) I was
>>>>>>>> able to get 900Mbps.
>>>>>>>> Hardware is basically ap152 + qca988x 3x3.
>>>>>>>> When running perf top I see that fq_codel_drop eats a lot of cpu.
>>>>>>>> Here is the output when running iperf3 UDP test:
>>>>>>>>
>>>>>>>>      45.78%  [kernel]       [k] fq_codel_drop
>>>>>>>>       3.05%  [kernel]       [k] ag71xx_poll
>>>>>>>>       2.18%  [kernel]       [k] skb_release_data
>>>>>>>>       2.01%  [kernel]       [k] r4k_dma_cache_inv
>>>>>
>>>>>
>>>>> The udp flood behavior is not "weird".  The test is wrong. It is so
>>>>> filling
>>>>> the local queue as to dramatically exceed the bandwidth on the link.
>>>>
>>>>
>>>> It would be nice if you could provide backpressure so that you could
>>>> simply select on the udp socket and use that to know when you can send
>>>> more frames??
>>>
>>> The qdisc version returns  NET_XMIT_CN to the upper layers of the
>>> stack in the case
>>> where the dropped packet's flow = the ingress packet's flow, but that
>>> is after the
>>> exhaustive search...
>>>
>>> I don't know what effect (if any) that had on udp sockets. Hmm... will
>>> look. Eric would "just know".
>>>
>>> That might provide more backpressure in the local scenario. SO_SND_BUF
>>> should interact with this stuff in some sane way...
>>>
>>> ... but over the wire from a test driver box elsewhere, tho, aside
>>> from ethernet flow control itself, where enabled, no.
>>>
>>> ... but in that case you have a much lower inbound/outbound
>>> performance disparity in the general case to start with... which can
>>> still be quite high...
>>>
>>>>
>>>> Any idea how that works with codel?
>>>
>>> Beautifully.
>>>
>>> For responsive TCP flows. It immediately reduces the window without a RTT.
>>>
>>>> Thanks,
>>>> Ben
>>>>
>>>> --
>>>> Ben Greear <greearb@candelatech.com>
>>>> Candela Technologies Inc  http://www.candelatech.com
>>>
>>>
>>>
>>> --
>>> Dave Täht
>>> Let's go make home routers and wifi faster! With better software!
>>> http://blog.cerowrt.org
>>> _______________________________________________
>>> Make-wifi-fast mailing list
>>> Make-wifi-fast@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>>
>>
>>
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wifi-fast@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/make-wifi-fast



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] fq_codel_drop vs a udp flood
  2016-05-02 14:03       ` Roman Yeryomin
  2016-05-02 18:40         ` Dave Taht
@ 2016-05-02 19:47         ` David Lang
  1 sibling, 0 replies; 108+ messages in thread
From: David Lang @ 2016-05-02 19:47 UTC (permalink / raw)
  To: Roman Yeryomin; +Cc: make-wifi-fast, dpreed, codel, Ben Greear, ath10k

On Mon, 2 May 2016, Roman Yeryomin wrote:

> On 1 May 2016 at 17:47,  <dpreed@reed.com> wrote:
>> Maybe I missed something, but why is it important to optimize for a UDP flood?
>
> We don't need to optimize it to UDP but UDP is used e.g. by torrents
> to achieve higher throughput and used a lot in general.
> And, again, in this case TCP is broken too (750Mbps down to 550), so
> it's not like Dave is saying that UDP test is broken, fq_codel is just
> too hungry for CPU

while I wouldn't do it via wifi, syslog to/from relay systems can result in a 
lot of UDP traffic that could look like a flood.

David Lang

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-01 18:20   ` Jonathan Morton
  2016-05-01 18:46     ` Eric Dumazet
@ 2016-05-03  2:26     ` Dave Taht
  2016-05-03  5:21       ` Dave Taht
  2016-05-03 13:20       ` Kevin Darbyshire-Bryant
  1 sibling, 2 replies; 108+ messages in thread
From: Dave Taht @ 2016-05-03  2:26 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: make-wifi-fast, codel, ath10k, Eric Dumazet

On Sun, May 1, 2016 at 11:20 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 1 May, 2016, at 20:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> fq_codel_drop() could drop _all_ packets of the fat flow, instead of a
>> single one.
>
> Unfortunately, that could have bad consequences if the “fat flow” happens to be a TCP in slow-start on a long-RTT path.  Such a flow is responsive, but on an order-magnitude longer timescale than may have been configured as optimum.
>
> The real problem is that fq_codel_drop() performs the same (excessive) amount of work to cope with a single unresponsive flow as it would for a true DDoS.  Optimising the search function is sufficient.

Don't think so.

I did some tests today,  (not the fq_codel batch drop patch yet)

When hit with a 900mbit flood, cake shaping down to 250mbit, results
in nearly 100% cpu use in the ksoftirq1 thread on the apu2, and
150mbits of actual throughput (as measured by iperf3, which is now a
measurement I don't trust)

cake *does* hold the packet count down a lot better than fq_codel does.

fq_codel (pre eric's patch) basically goes to the configured limit and
stays there.

In both cases I will eventually get an error like this (in my babel
routed environment) that suggests that we're also not delivering
packets from other flows (arp?) with either fq_codel or cake in these
extreme conditions.

iperf3 -c 172.26.64.200 -u -b900Mbit -t 600

[  4]  47.00-48.00  sec   107 MBytes   895 Mbits/sec  13659
iperf3: error - unable to write to stream socket: No route to host

...

The results I get from iperf are a bit puzzling over the interval it
samples at - this is from a 100Mbit test (downshifting from 900mbit)

[ 15]  25.00-26.00  sec   152 KBytes  1.25 Mbits/sec  0.998 ms
29673/29692 (1e+02%)
[ 15]  26.00-27.00  sec   232 KBytes  1.90 Mbits/sec  1.207 ms
10235/10264 (1e+02%)
[ 15]  27.00-28.00  sec  72.0 KBytes   590 Kbits/sec  1.098 ms
19035/19044 (1e+02%)
[ 15]  28.00-29.00  sec  0.00 Bytes  0.00 bits/sec  1.098 ms  0/0 (-nan%)
[ 15]  29.00-30.00  sec  72.0 KBytes   590 Kbits/sec  1.044 ms
22468/22477 (1e+02%)
[ 15]  30.00-31.00  sec  64.0 KBytes   524 Kbits/sec  1.060 ms
13078/13086 (1e+02%)
[ 15]  31.00-32.00  sec  0.00 Bytes  0.00 bits/sec  1.060 ms  0/0 (-nan%)
^C[ 15]  32.00-32.66  sec  64.0 KBytes   797 Kbits/sec  1.050 ms
25420/25428 (1e+02%)

Not that I care all that much about how iperf is intepreting it's drop
rate (I guess pulling apart the actual caps is in order).

As for cake struggling to cope:

root@apu2:/home/d/git/tc-adv/tc# ./tc -s qdisc show dev enp2s0

qdisc cake 8018: root refcnt 9 bandwidth 100Mbit diffserv4 flows rtt 100.0ms raw
 Sent 219736818 bytes 157121 pkt (dropped 989289, overlimits 1152272 requeues 0)
 backlog 449646b 319p requeues 0
 memory used: 2658432b of 5000000b
 capacity estimate: 100Mbit
             Bulk    Best Effort     Video       Voice
  thresh       100Mbit   93750Kbit      75Mbit      25Mbit
  target         5.0ms       5.0ms       5.0ms       5.0ms
  interval     100.0ms     100.0ms     100.0ms     100.0ms
  pk_delay         0us       5.2ms        92us        48us
  av_delay         0us       5.1ms         4us         2us
  sp_delay         0us       5.0ms         4us         2us
  pkts               0     1146649          31          49
  bytes              0  1607004053        2258        8779
  way_inds           0           0           0           0
  way_miss           0          15           2           1
  way_cols           0           0           0           0
  drops              0      989289           0           0
  marks              0           0           0           0
  sp_flows           0           0           0           0
  bk_flows           0           1           0           0
  last_len           0        1514          66         138
  max_len            0        1514         110         487

...

But I am very puzzled as to why flow isolation would fail in the face
of this overload.

>  - Jonathan Morton
>



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-03  2:26     ` [Codel] fq_codel_drop vs a udp flood Dave Taht
@ 2016-05-03  5:21       ` Dave Taht
  2016-05-03 12:39         ` Agarwal, Anil
  2016-05-03 13:20       ` Kevin Darbyshire-Bryant
  1 sibling, 1 reply; 108+ messages in thread
From: Dave Taht @ 2016-05-03  5:21 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: make-wifi-fast, codel, ath10k, Eric Dumazet

On Mon, May 2, 2016 at 7:26 PM, Dave Taht <dave.taht@gmail.com> wrote:
> On Sun, May 1, 2016 at 11:20 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>
>>> On 1 May, 2016, at 20:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>> fq_codel_drop() could drop _all_ packets of the fat flow, instead of a
>>> single one.
>>
>> Unfortunately, that could have bad consequences if the “fat flow” happens to be a TCP in slow-start on a long-RTT path.  Such a flow is responsive, but on an order-magnitude longer timescale than may have been configured as optimum.
>>
>> The real problem is that fq_codel_drop() performs the same (excessive) amount of work to cope with a single unresponsive flow as it would for a true DDoS.  Optimising the search function is sufficient.
>
> Don't think so.
>
> I did some tests today,  (not the fq_codel batch drop patch yet)
>
> When hit with a 900mbit flood, cake shaping down to 250mbit, results
> in nearly 100% cpu use in the ksoftirq1 thread on the apu2, and
> 150mbits of actual throughput (as measured by iperf3, which is now a
> measurement I don't trust)
>
> cake *does* hold the packet count down a lot better than fq_codel does.
>
> fq_codel (pre eric's patch) basically goes to the configured limit and
> stays there.
>
> In both cases I will eventually get an error like this (in my babel
> routed environment) that suggests that we're also not delivering
> packets from other flows (arp?) with either fq_codel or cake in these
> extreme conditions.
>
> iperf3 -c 172.26.64.200 -u -b900Mbit -t 600
>
> [  4]  47.00-48.00  sec   107 MBytes   895 Mbits/sec  13659
> iperf3: error - unable to write to stream socket: No route to host
>
> ...
>
> The results I get from iperf are a bit puzzling over the interval it
> samples at - this is from a 100Mbit test (downshifting from 900mbit)
>
> [ 15]  25.00-26.00  sec   152 KBytes  1.25 Mbits/sec  0.998 ms
> 29673/29692 (1e+02%)
> [ 15]  26.00-27.00  sec   232 KBytes  1.90 Mbits/sec  1.207 ms
> 10235/10264 (1e+02%)
> [ 15]  27.00-28.00  sec  72.0 KBytes   590 Kbits/sec  1.098 ms
> 19035/19044 (1e+02%)
> [ 15]  28.00-29.00  sec  0.00 Bytes  0.00 bits/sec  1.098 ms  0/0 (-nan%)
> [ 15]  29.00-30.00  sec  72.0 KBytes   590 Kbits/sec  1.044 ms
> 22468/22477 (1e+02%)
> [ 15]  30.00-31.00  sec  64.0 KBytes   524 Kbits/sec  1.060 ms
> 13078/13086 (1e+02%)
> [ 15]  31.00-32.00  sec  0.00 Bytes  0.00 bits/sec  1.060 ms  0/0 (-nan%)
> ^C[ 15]  32.00-32.66  sec  64.0 KBytes   797 Kbits/sec  1.050 ms
> 25420/25428 (1e+02%)

OK, the above weirdness in calculating a "rate" is due to me sending
8k fragmented packets.

-l1470 fixed that.

> Not that I care all that much about how iperf is intepreting it's drop


> rate (I guess pulling apart the actual caps is in order).
>
> As for cake struggling to cope:
>
> root@apu2:/home/d/git/tc-adv/tc# ./tc -s qdisc show dev enp2s0
>
> qdisc cake 8018: root refcnt 9 bandwidth 100Mbit diffserv4 flows rtt 100.0ms raw
>  Sent 219736818 bytes 157121 pkt (dropped 989289, overlimits 1152272 requeues 0)
>  backlog 449646b 319p requeues 0
>  memory used: 2658432b of 5000000b
>  capacity estimate: 100Mbit
>              Bulk    Best Effort     Video       Voice
>   thresh       100Mbit   93750Kbit      75Mbit      25Mbit
>   target         5.0ms       5.0ms       5.0ms       5.0ms
>   interval     100.0ms     100.0ms     100.0ms     100.0ms
>   pk_delay         0us       5.2ms        92us        48us
>   av_delay         0us       5.1ms         4us         2us
>   sp_delay         0us       5.0ms         4us         2us
>   pkts               0     1146649          31          49
>   bytes              0  1607004053        2258        8779
>   way_inds           0           0           0           0
>   way_miss           0          15           2           1
>   way_cols           0           0           0           0
>   drops              0      989289           0           0
>   marks              0           0           0           0
>   sp_flows           0           0           0           0
>   bk_flows           0           1           0           0
>   last_len           0        1514          66         138
>   max_len            0        1514         110         487
>
> ...
>
> But I am very puzzled as to why flow isolation would fail in the face
> of this overload.

And to simplify matters I got rid of the advanced qdiscs entirely,
switched back to htb+pfifo and get the same ultimate result of the
test aborting...

Joy.

OK,

ethtool -s enp2s0 advertise 0x008 # 100mbit

Feeding packets in at 900mbit into a 1000 packet fifo queue at 100Mbit
is predictably horriffic... other flows get starved entirely, you
can't even type on the thing, and still eventually

[ 28]  28.00-29.00  sec  11.4 MBytes  95.7 Mbits/sec  0.120 ms
72598/80726 (90%)
[ 28]  29.00-30.00  sec  11.4 MBytes  95.7 Mbits/sec  0.119 ms
46187/54314 (85%)
[ 28] 189.00-190.00 sec  8.73 MBytes  73.2 Mbits/sec  0.162 ms
55276/61493 (90%)
[ 28] 190.00-191.00 sec  0.00 Bytes  0.00 bits/sec  0.162 ms  0/0 (-nan%)

vs:

[  4] 188.00-189.00 sec   105 MBytes   879 Mbits/sec  74614
iperf3: error - unable to write to stream socket: No route to host

Yea!  More people should do that to themselves. System is bloody
useless with a 1000 packet full queue  and way more useful with
fq_codel in this scenario...

but still this ping should be surviving with fq_codel going and one
full rate udp flood, if it wasn't for all the cpu being used up
throwing away packets. I think.

64 bytes from 172.26.64.200: icmp_seq=50 ttl=63 time=6.92 ms
64 bytes from 172.26.64.200: icmp_seq=52 ttl=63 time=7.15 ms
64 bytes from 172.26.64.200: icmp_seq=53 ttl=63 time=7.11 ms
64 bytes from 172.26.64.200: icmp_seq=55 ttl=63 time=6.68 ms
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host

...

OK, tomorrow, eric's new patch! A new, brighter day now that I've
burned this one melting 3 boxes into the ground. and perf.




-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* RE: [Codel] fq_codel_drop vs a udp flood
  2016-05-03  5:21       ` Dave Taht
@ 2016-05-03 12:39         ` Agarwal, Anil
  2016-05-03 12:50           ` Agarwal, Anil
  0 siblings, 1 reply; 108+ messages in thread
From: Agarwal, Anil @ 2016-05-03 12:39 UTC (permalink / raw)
  To: Dave Taht, Jonathan Morton; +Cc: make-wifi-fast, codel, ath10k

Dave et al,

Here is another possible approach to improving the code performance when dropping packets.

Keep track of the queue with the largest number of packets, as you go, using an efficient algorithm.
Consequently, a search is not required when the occasion arises. 
There is a small amount of overhead for every packet enqueue and dequeue operation.
Here is some pseudo-code -

// Called after enqueuing a packet with updated queue length
static inline void
maxq_update_enq(q, idx, qlen)
{
    if (qlen > q->maxqlen) {
        q->maxqlen = qlen;
        q->maxqidx = idx;
    }
}

// Called after dequeuing a packet with updated queue length
static inline void
maxq_update_deq(q, idx, qlen)
{
    if (idx == q->maxqidx) {
        q->maxqlen = qlen;
    }
}

// Returns idx of the largest queue
static inline int
maxq_get_idx(q)
{
    return (q->maxqidx);
}

Given that we dequeue packets in round robin manner, the maxqidx value may sometimes be slightly inaccurate, perhaps pointing to the second largest queue on occasions.
The code will scale gracefully to handle larger number of queues and multiple unresponsive flows.

Please see if this makes sense. I have not gone through the fq_codel code in detail.
I had sent a similar suggestion to Rong Pan of the PIE group few months ago; not sure if they ever got to it.

Regards,
Anil

-----Original Message-----
From: Codel [mailto:codel-bounces@lists.bufferbloat.net] On Behalf Of Dave Taht
Sent: Tuesday, May 03, 2016 1:22 AM
To: Jonathan Morton
Cc: make-wifi-fast@lists.bufferbloat.net; codel@lists.bufferbloat.net; ath10k
Subject: Re: [Codel] fq_codel_drop vs a udp flood

On Mon, May 2, 2016 at 7:26 PM, Dave Taht <dave.taht@gmail.com> wrote:
> On Sun, May 1, 2016 at 11:20 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>
>>> On 1 May, 2016, at 20:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>> fq_codel_drop() could drop _all_ packets of the fat flow, instead of 
>>> a single one.
>>
>> Unfortunately, that could have bad consequences if the “fat flow” happens to be a TCP in slow-start on a long-RTT path.  Such a flow is responsive, but on an order-magnitude longer timescale than may have been configured as optimum.
>>
>> The real problem is that fq_codel_drop() performs the same (excessive) amount of work to cope with a single unresponsive flow as it would for a true DDoS.  Optimising the search function is sufficient.
>
> Don't think so.
>
> I did some tests today,  (not the fq_codel batch drop patch yet)
>
> When hit with a 900mbit flood, cake shaping down to 250mbit, results 
> in nearly 100% cpu use in the ksoftirq1 thread on the apu2, and 
> 150mbits of actual throughput (as measured by iperf3, which is now a 
> measurement I don't trust)
>
> cake *does* hold the packet count down a lot better than fq_codel does.
>
> fq_codel (pre eric's patch) basically goes to the configured limit and 
> stays there.
>
> In both cases I will eventually get an error like this (in my babel 
> routed environment) that suggests that we're also not delivering 
> packets from other flows (arp?) with either fq_codel or cake in these 
> extreme conditions.
>
> iperf3 -c 172.26.64.200 -u -b900Mbit -t 600
>
> [  4]  47.00-48.00  sec   107 MBytes   895 Mbits/sec  13659
> iperf3: error - unable to write to stream socket: No route to host
>
> ...
>
> The results I get from iperf are a bit puzzling over the interval it 
> samples at - this is from a 100Mbit test (downshifting from 900mbit)
>
> [ 15]  25.00-26.00  sec   152 KBytes  1.25 Mbits/sec  0.998 ms
> 29673/29692 (1e+02%)
> [ 15]  26.00-27.00  sec   232 KBytes  1.90 Mbits/sec  1.207 ms
> 10235/10264 (1e+02%)
> [ 15]  27.00-28.00  sec  72.0 KBytes   590 Kbits/sec  1.098 ms
> 19035/19044 (1e+02%)
> [ 15]  28.00-29.00  sec  0.00 Bytes  0.00 bits/sec  1.098 ms  0/0 (-nan%)
> [ 15]  29.00-30.00  sec  72.0 KBytes   590 Kbits/sec  1.044 ms
> 22468/22477 (1e+02%)
> [ 15]  30.00-31.00  sec  64.0 KBytes   524 Kbits/sec  1.060 ms
> 13078/13086 (1e+02%)
> [ 15]  31.00-32.00  sec  0.00 Bytes  0.00 bits/sec  1.060 ms  0/0 (-nan%)
> ^C[ 15]  32.00-32.66  sec  64.0 KBytes   797 Kbits/sec  1.050 ms
> 25420/25428 (1e+02%)

OK, the above weirdness in calculating a "rate" is due to me sending 8k fragmented packets.

-l1470 fixed that.

> Not that I care all that much about how iperf is intepreting it's drop


> rate (I guess pulling apart the actual caps is in order).
>
> As for cake struggling to cope:
>
> root@apu2:/home/d/git/tc-adv/tc# ./tc -s qdisc show dev enp2s0
>
> qdisc cake 8018: root refcnt 9 bandwidth 100Mbit diffserv4 flows rtt 
> 100.0ms raw  Sent 219736818 bytes 157121 pkt (dropped 989289, 
> overlimits 1152272 requeues 0)  backlog 449646b 319p requeues 0  
> memory used: 2658432b of 5000000b  capacity estimate: 100Mbit
>              Bulk    Best Effort     Video       Voice
>   thresh       100Mbit   93750Kbit      75Mbit      25Mbit
>   target         5.0ms       5.0ms       5.0ms       5.0ms
>   interval     100.0ms     100.0ms     100.0ms     100.0ms
>   pk_delay         0us       5.2ms        92us        48us
>   av_delay         0us       5.1ms         4us         2us
>   sp_delay         0us       5.0ms         4us         2us
>   pkts               0     1146649          31          49
>   bytes              0  1607004053        2258        8779
>   way_inds           0           0           0           0
>   way_miss           0          15           2           1
>   way_cols           0           0           0           0
>   drops              0      989289           0           0
>   marks              0           0           0           0
>   sp_flows           0           0           0           0
>   bk_flows           0           1           0           0
>   last_len           0        1514          66         138
>   max_len            0        1514         110         487
>
> ...
>
> But I am very puzzled as to why flow isolation would fail in the face 
> of this overload.

And to simplify matters I got rid of the advanced qdiscs entirely, switched back to htb+pfifo and get the same ultimate result of the test aborting...

Joy.

OK,

ethtool -s enp2s0 advertise 0x008 # 100mbit

Feeding packets in at 900mbit into a 1000 packet fifo queue at 100Mbit is predictably horriffic... other flows get starved entirely, you can't even type on the thing, and still eventually

[ 28]  28.00-29.00  sec  11.4 MBytes  95.7 Mbits/sec  0.120 ms
72598/80726 (90%)
[ 28]  29.00-30.00  sec  11.4 MBytes  95.7 Mbits/sec  0.119 ms
46187/54314 (85%)
[ 28] 189.00-190.00 sec  8.73 MBytes  73.2 Mbits/sec  0.162 ms
55276/61493 (90%)
[ 28] 190.00-191.00 sec  0.00 Bytes  0.00 bits/sec  0.162 ms  0/0 (-nan%)

vs:

[  4] 188.00-189.00 sec   105 MBytes   879 Mbits/sec  74614
iperf3: error - unable to write to stream socket: No route to host

Yea!  More people should do that to themselves. System is bloody useless with a 1000 packet full queue  and way more useful with fq_codel in this scenario...

but still this ping should be surviving with fq_codel going and one full rate udp flood, if it wasn't for all the cpu being used up throwing away packets. I think.

64 bytes from 172.26.64.200: icmp_seq=50 ttl=63 time=6.92 ms
64 bytes from 172.26.64.200: icmp_seq=52 ttl=63 time=7.15 ms
64 bytes from 172.26.64.200: icmp_seq=53 ttl=63 time=7.11 ms
64 bytes from 172.26.64.200: icmp_seq=55 ttl=63 time=6.68 ms
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host

...

OK, tomorrow, eric's new patch! A new, brighter day now that I've burned this one melting 3 boxes into the ground. and perf.




--
Dave Täht
Let's go make home routers and wifi faster! With better software!
https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.cerowrt.org&d=CwIGaQ&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=WA7M8kzfWtPc1BoysOKxcUO1fsm9bQlu_S3Voky3Hi0&s=xsNjZNPfz4WmfJZ4sP7jMTVJe140RgNczcwj6g5rU1g&e=
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.bufferbloat.net_listinfo_codel&d=CwIGaQ&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=WA7M8kzfWtPc1BoysOKxcUO1fsm9bQlu_S3Voky3Hi0&s=NTTN7_n6PYwoH6-tlPNWQ2qpYPCsFYiW8VWm3Ih1u5g&e= 
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* RE: [Codel] fq_codel_drop vs a udp flood
  2016-05-03 12:39         ` Agarwal, Anil
@ 2016-05-03 12:50           ` Agarwal, Anil
  2016-05-03 13:35             ` Eric Dumazet
  0 siblings, 1 reply; 108+ messages in thread
From: Agarwal, Anil @ 2016-05-03 12:50 UTC (permalink / raw)
  To: Agarwal, Anil, Dave Taht, Jonathan Morton; +Cc: make-wifi-fast, codel, ath10k


I should be more precise about the statement about the inaccuracy of the algorithm.
Given that we dequeue packets in round robin manner, the maxqidx value may, on occasions, point to a queue 
which is smaller than the largest queue by up to one MTU.

Anil

-----Original Message-----
From: Codel [mailto:codel-bounces@lists.bufferbloat.net] On Behalf Of Agarwal, Anil
Sent: Tuesday, May 03, 2016 8:40 AM
To: Dave Taht; Jonathan Morton
Cc: make-wifi-fast@lists.bufferbloat.net; codel@lists.bufferbloat.net; ath10k
Subject: Re: [Codel] fq_codel_drop vs a udp flood

Dave et al,



Here is another possible approach to improving the code performance when dropping packets.



Keep track of the queue with the largest number of packets, as you go, using an efficient algorithm.

Consequently, a search is not required when the occasion arises. 

There is a small amount of overhead for every packet enqueue and dequeue operation.

Here is some pseudo-code -



// Called after enqueuing a packet with updated queue length

static inline void

maxq_update_enq(q, idx, qlen)

{

    if (qlen > q->maxqlen) {

        q->maxqlen = qlen;

        q->maxqidx = idx;

    }

}



// Called after dequeuing a packet with updated queue length

static inline void

maxq_update_deq(q, idx, qlen)

{

    if (idx == q->maxqidx) {

        q->maxqlen = qlen;

    }

}



// Returns idx of the largest queue

static inline int

maxq_get_idx(q)

{

    return (q->maxqidx);

}



Given that we dequeue packets in round robin manner, the maxqidx value may sometimes be slightly inaccurate, perhaps pointing to the second largest queue on occasions.

The code will scale gracefully to handle larger number of queues and multiple unresponsive flows.



Please see if this makes sense. I have not gone through the fq_codel code in detail.

I had sent a similar suggestion to Rong Pan of the PIE group few months ago; not sure if they ever got to it.



Regards,

Anil



-----Original Message-----

From: Codel [mailto:codel-bounces@lists.bufferbloat.net] On Behalf Of Dave Taht

Sent: Tuesday, May 03, 2016 1:22 AM

To: Jonathan Morton

Cc: make-wifi-fast@lists.bufferbloat.net; codel@lists.bufferbloat.net; ath10k

Subject: Re: [Codel] fq_codel_drop vs a udp flood



On Mon, May 2, 2016 at 7:26 PM, Dave Taht <dave.taht@gmail.com> wrote:

> On Sun, May 1, 2016 at 11:20 AM, Jonathan Morton <chromatix99@gmail.com> wrote:

>>

>>> On 1 May, 2016, at 20:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:

>>>

>>> fq_codel_drop() could drop _all_ packets of the fat flow, instead of 

>>> a single one.

>>

>> Unfortunately, that could have bad consequences if the “fat flow” happens to be a TCP in slow-start on a long-RTT path.  Such a flow is responsive, but on an order-magnitude longer timescale than may have been configured as optimum.

>>

>> The real problem is that fq_codel_drop() performs the same (excessive) amount of work to cope with a single unresponsive flow as it would for a true DDoS.  Optimising the search function is sufficient.

>

> Don't think so.

>

> I did some tests today,  (not the fq_codel batch drop patch yet)

>

> When hit with a 900mbit flood, cake shaping down to 250mbit, results 

> in nearly 100% cpu use in the ksoftirq1 thread on the apu2, and 

> 150mbits of actual throughput (as measured by iperf3, which is now a 

> measurement I don't trust)

>

> cake *does* hold the packet count down a lot better than fq_codel does.

>

> fq_codel (pre eric's patch) basically goes to the configured limit and 

> stays there.

>

> In both cases I will eventually get an error like this (in my babel 

> routed environment) that suggests that we're also not delivering 

> packets from other flows (arp?) with either fq_codel or cake in these 

> extreme conditions.

>

> iperf3 -c 172.26.64.200 -u -b900Mbit -t 600

>

> [  4]  47.00-48.00  sec   107 MBytes   895 Mbits/sec  13659

> iperf3: error - unable to write to stream socket: No route to host

>

> ...

>

> The results I get from iperf are a bit puzzling over the interval it 

> samples at - this is from a 100Mbit test (downshifting from 900mbit)

>

> [ 15]  25.00-26.00  sec   152 KBytes  1.25 Mbits/sec  0.998 ms

> 29673/29692 (1e+02%)

> [ 15]  26.00-27.00  sec   232 KBytes  1.90 Mbits/sec  1.207 ms

> 10235/10264 (1e+02%)

> [ 15]  27.00-28.00  sec  72.0 KBytes   590 Kbits/sec  1.098 ms

> 19035/19044 (1e+02%)

> [ 15]  28.00-29.00  sec  0.00 Bytes  0.00 bits/sec  1.098 ms  0/0 (-nan%)

> [ 15]  29.00-30.00  sec  72.0 KBytes   590 Kbits/sec  1.044 ms

> 22468/22477 (1e+02%)

> [ 15]  30.00-31.00  sec  64.0 KBytes   524 Kbits/sec  1.060 ms

> 13078/13086 (1e+02%)

> [ 15]  31.00-32.00  sec  0.00 Bytes  0.00 bits/sec  1.060 ms  0/0 (-nan%)

> ^C[ 15]  32.00-32.66  sec  64.0 KBytes   797 Kbits/sec  1.050 ms

> 25420/25428 (1e+02%)



OK, the above weirdness in calculating a "rate" is due to me sending 8k fragmented packets.



-l1470 fixed that.



> Not that I care all that much about how iperf is intepreting it's drop





> rate (I guess pulling apart the actual caps is in order).

>

> As for cake struggling to cope:

>

> root@apu2:/home/d/git/tc-adv/tc# ./tc -s qdisc show dev enp2s0

>

> qdisc cake 8018: root refcnt 9 bandwidth 100Mbit diffserv4 flows rtt 

> 100.0ms raw  Sent 219736818 bytes 157121 pkt (dropped 989289, 

> overlimits 1152272 requeues 0)  backlog 449646b 319p requeues 0  

> memory used: 2658432b of 5000000b  capacity estimate: 100Mbit

>              Bulk    Best Effort     Video       Voice

>   thresh       100Mbit   93750Kbit      75Mbit      25Mbit

>   target         5.0ms       5.0ms       5.0ms       5.0ms

>   interval     100.0ms     100.0ms     100.0ms     100.0ms

>   pk_delay         0us       5.2ms        92us        48us

>   av_delay         0us       5.1ms         4us         2us

>   sp_delay         0us       5.0ms         4us         2us

>   pkts               0     1146649          31          49

>   bytes              0  1607004053        2258        8779

>   way_inds           0           0           0           0

>   way_miss           0          15           2           1

>   way_cols           0           0           0           0

>   drops              0      989289           0           0

>   marks              0           0           0           0

>   sp_flows           0           0           0           0

>   bk_flows           0           1           0           0

>   last_len           0        1514          66         138

>   max_len            0        1514         110         487

>

> ...

>

> But I am very puzzled as to why flow isolation would fail in the face 

> of this overload.



And to simplify matters I got rid of the advanced qdiscs entirely, switched back to htb+pfifo and get the same ultimate result of the test aborting...



Joy.



OK,



ethtool -s enp2s0 advertise 0x008 # 100mbit



Feeding packets in at 900mbit into a 1000 packet fifo queue at 100Mbit is predictably horriffic... other flows get starved entirely, you can't even type on the thing, and still eventually



[ 28]  28.00-29.00  sec  11.4 MBytes  95.7 Mbits/sec  0.120 ms

72598/80726 (90%)

[ 28]  29.00-30.00  sec  11.4 MBytes  95.7 Mbits/sec  0.119 ms

46187/54314 (85%)

[ 28] 189.00-190.00 sec  8.73 MBytes  73.2 Mbits/sec  0.162 ms

55276/61493 (90%)

[ 28] 190.00-191.00 sec  0.00 Bytes  0.00 bits/sec  0.162 ms  0/0 (-nan%)



vs:



[  4] 188.00-189.00 sec   105 MBytes   879 Mbits/sec  74614

iperf3: error - unable to write to stream socket: No route to host



Yea!  More people should do that to themselves. System is bloody useless with a 1000 packet full queue  and way more useful with fq_codel in this scenario...



but still this ping should be surviving with fq_codel going and one full rate udp flood, if it wasn't for all the cpu being used up throwing away packets. I think.



64 bytes from 172.26.64.200: icmp_seq=50 ttl=63 time=6.92 ms

64 bytes from 172.26.64.200: icmp_seq=52 ttl=63 time=7.15 ms

64 bytes from 172.26.64.200: icmp_seq=53 ttl=63 time=7.11 ms

64 bytes from 172.26.64.200: icmp_seq=55 ttl=63 time=6.68 ms

ping: sendmsg: No route to host

ping: sendmsg: No route to host

ping: sendmsg: No route to host



...



OK, tomorrow, eric's new patch! A new, brighter day now that I've burned this one melting 3 boxes into the ground. and perf.









--

Dave Täht

Let's go make home routers and wifi faster! With better software!

https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.cerowrt.org&d=CwIGaQ&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=WA7M8kzfWtPc1BoysOKxcUO1fsm9bQlu_S3Voky3Hi0&s=xsNjZNPfz4WmfJZ4sP7jMTVJe140RgNczcwj6g5rU1g&e=

_______________________________________________

Codel mailing list

Codel@lists.bufferbloat.net

https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.bufferbloat.net_listinfo_codel&d=CwIGaQ&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=WA7M8kzfWtPc1BoysOKxcUO1fsm9bQlu_S3Voky3Hi0&s=NTTN7_n6PYwoH6-tlPNWQ2qpYPCsFYiW8VWm3Ih1u5g&e= 

_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.bufferbloat.net_listinfo_codel&d=CwIGaQ&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=sh94VpjR-_N2jAHHqQbnik89iCiFw8Cv0ByrfywQYTo&s=oW_kvgDw9x-ftgF0ozE-JqiRuAm8blm7-22TuVMax2Y&e= 
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-03  2:26     ` [Codel] fq_codel_drop vs a udp flood Dave Taht
  2016-05-03  5:21       ` Dave Taht
@ 2016-05-03 13:20       ` Kevin Darbyshire-Bryant
  1 sibling, 0 replies; 108+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-05-03 13:20 UTC (permalink / raw)
  To: ath10k


[-- Attachment #1.1: Type: text/plain, Size: 3196 bytes --]



On 03/05/16 03:26, Dave Taht wrote:
> On Sun, May 1, 2016 at 11:20 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>> On 1 May, 2016, at 20:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>> fq_codel_drop() could drop _all_ packets of the fat flow, instead of a
>>> single one.
>> Unfortunately, that could have bad consequences if the “fat flow” happens to be a TCP in slow-start on a long-RTT path.  Such a flow is responsive, but on an order-magnitude longer timescale than may have been configured as optimum.
>>
>> The real problem is that fq_codel_drop() performs the same (excessive) amount of work to cope with a single unresponsive flow as it would for a true DDoS.  Optimising the search function is sufficient.
> Don't think so.
>
> I did some tests today,  (not the fq_codel batch drop patch yet)
>
>
> As for cake struggling to cope:
>
> root@apu2:/home/d/git/tc-adv/tc# ./tc -s qdisc show dev enp2s0
>
> qdisc cake 8018: root refcnt 9 bandwidth 100Mbit diffserv4 flows rtt 100.0ms raw
>  Sent 219736818 bytes 157121 pkt (dropped 989289, overlimits 1152272 requeues 0)
>  backlog 449646b 319p requeues 0
>  memory used: 2658432b of 5000000b
>  capacity estimate: 100Mbit
>              Bulk    Best Effort     Video       Voice
>   thresh       100Mbit   93750Kbit      75Mbit      25Mbit
>   target         5.0ms       5.0ms       5.0ms       5.0ms
>   interval     100.0ms     100.0ms     100.0ms     100.0ms
>   pk_delay         0us       5.2ms        92us        48us
>   av_delay         0us       5.1ms         4us         2us
>   sp_delay         0us       5.0ms         4us         2us
>   pkts               0     1146649          31          49
>   bytes              0  1607004053        2258        8779
>   way_inds           0           0           0           0
>   way_miss           0          15           2           1
>   way_cols           0           0           0           0
>   drops              0      989289           0           0
>   marks              0           0           0           0
>   sp_flows           0           0           0           0
>   bk_flows           0           1           0           0
>   last_len           0        1514          66         138
>   max_len            0        1514         110         487
I think it's interesting that the memory used is only a little of 55% of
the autoconfigured max buffer size, so cake_enqueue won't have gone into
its 'buffer full' cake_drop routine - I do wonder if should it ever get
there that it should 'use a shovel' of 64 packets rather than teaspoon
of iterating dropping single packets.

The above also suggests that cake is spending time dropping packets over
the target threshold individually.  Does/could cake codel drop do
something like 'drop sufficient bytes (in packet lumps) to get that flow
back under target threshold- after ECN signalling' in one go, up to some
packet number limit (64)?  Again the idea is to really jump on the
naughty flow!



>
> ...
>
> But I am very puzzled as to why flow isolation would fail in the face
> of this overload.
>
>>  - Jonathan Morton
>>
>
>



[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4816 bytes --]

[-- Attachment #2: Type: text/plain, Size: 146 bytes --]

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-03 12:50           ` Agarwal, Anil
@ 2016-05-03 13:35             ` Eric Dumazet
  2016-05-03 15:37               ` Agarwal, Anil
  2016-05-03 17:37               ` Dave Taht
  0 siblings, 2 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-03 13:35 UTC (permalink / raw)
  To: Agarwal, Anil; +Cc: Jonathan Morton, codel, Dave Taht, ath10k, make-wifi-fast

On Tue, 2016-05-03 at 12:50 +0000, Agarwal, Anil wrote:
> I should be more precise about the statement about the inaccuracy of the algorithm.
> Given that we dequeue packets in round robin manner, the maxqidx value may, on occasions, point to a queue 
> which is smaller than the largest queue by up to one MTU.

That is not true.

Linux qdiscs (fq_codel being one of them) can carry big packets, up to
64KB in size.

You can not assume GRO/GSO are disabled. We absolutely want them for
high performance.

There is no way fq_codel will track in real time the biggest flow 'just
in case we have to drop packets at enqueue()'

This is a conscious choice I made years ago.

This patch will fix the performance issue and keep the normal operations
fast.

https://patchwork.ozlabs.org/patch/617307/




_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* RE: [Codel] fq_codel_drop vs a udp flood
  2016-05-03 13:35             ` Eric Dumazet
@ 2016-05-03 15:37               ` Agarwal, Anil
  2016-05-03 17:37               ` Dave Taht
  1 sibling, 0 replies; 108+ messages in thread
From: Agarwal, Anil @ 2016-05-03 15:37 UTC (permalink / raw)
  To: Dave Taht, Eric Dumazet; +Cc: Jonathan Morton, codel, ath10k, make-wifi-fast


This problem is also caused by some weaknesses in the Codel algorithm itself.
With an unresponsive UDP flow, one would expect Codel to naturally drop packets from the rogue queue and maintain a queuing delay close to the target delay value.
But it does not.
If you were to do a simple simulation of a single queue Codel, with say a 10 Mbps link and a UDP flow at 20 Mbps, you will see queue delay oscillate between 5 milliseconds and 10+ seconds, over periods of ~30 seconds.
There are ways to fix this, if there is interest in doing so.

Anil


-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
Sent: Tuesday, May 03, 2016 9:36 AM
To: Agarwal, Anil
Cc: Dave Taht; Jonathan Morton; make-wifi-fast@lists.bufferbloat.net; codel@lists.bufferbloat.net; ath10k
Subject: Re: [Codel] fq_codel_drop vs a udp flood

On Tue, 2016-05-03 at 12:50 +0000, Agarwal, Anil wrote:
> I should be more precise about the statement about the inaccuracy of the algorithm.
> Given that we dequeue packets in round robin manner, the maxqidx value 
> may, on occasions, point to a queue which is smaller than the largest queue by up to one MTU.

That is not true.

Linux qdiscs (fq_codel being one of them) can carry big packets, up to 64KB in size.

You can not assume GRO/GSO are disabled. We absolutely want them for high performance.

There is no way fq_codel will track in real time the biggest flow 'just in case we have to drop packets at enqueue()'

This is a conscious choice I made years ago.

This patch will fix the performance issue and keep the normal operations fast.

https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_617307_&d=CwICaQ&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=N_VdDT2cMUULbApQu-u_8yTr5wEiA4JHvbCH9jtLHkY&s=5jpMb-Je0Et1TFi0a4L7TZKt7wAtsP5AJwn1x6wnq0w&e= 



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-03 13:35             ` Eric Dumazet
  2016-05-03 15:37               ` Agarwal, Anil
@ 2016-05-03 17:37               ` Dave Taht
  2016-05-03 17:54                 ` Eric Dumazet
  1 sibling, 1 reply; 108+ messages in thread
From: Dave Taht @ 2016-05-03 17:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jonathan Morton, Agarwal, Anil, codel, ath10k, make-wifi-fast

Thus far this batch drop patch is testing out beautifully. Under a
900Mbit flood going into 100Mbit on the pcengines apu2,  cpu usage for
ksoftirqd now doesn't crack 10%, where before (under
pie,pfifo,fq_codel,cake & the prior fq_codel) it went to 88% and
ultimately bad things happened, like losing routability.

I've had it running for hours and I hardly notice it's there.

Performance for the normal cc controlled and/or sparse flows is
unaffected, aside from the uncontrolled flows eating their percentage
of the link.

Nice work. Thx. This should go into -stable.

https://patchwork.ozlabs.org/patch/617307/

Sigh. The RFC is past last call...

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-03 17:37               ` Dave Taht
@ 2016-05-03 17:54                 ` Eric Dumazet
  2016-05-03 18:11                   ` Dave Taht
  0 siblings, 1 reply; 108+ messages in thread
From: Eric Dumazet @ 2016-05-03 17:54 UTC (permalink / raw)
  To: Dave Taht; +Cc: Jonathan Morton, Agarwal, Anil, codel, ath10k, make-wifi-fast

On Tue, 2016-05-03 at 10:37 -0700, Dave Taht wrote:
> Thus far this batch drop patch is testing out beautifully. Under a
> 900Mbit flood going into 100Mbit on the pcengines apu2,  cpu usage for
> ksoftirqd now doesn't crack 10%, where before (under
> pie,pfifo,fq_codel,cake & the prior fq_codel) it went to 88% and
> ultimately bad things happened, like losing routability.
> 
> I've had it running for hours and I hardly notice it's there.
> 

Excellent, thanks for testing it.

> Performance for the normal cc controlled and/or sparse flows is
> unaffected, aside from the uncontrolled flows eating their percentage
> of the link.
> 
> Nice work. Thx. This should go into -stable.
> 
> https://patchwork.ozlabs.org/patch/617307/
> 
> Sigh. The RFC is past last call...

It is merged :

https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=9d18562a227874289fda8ca5d117d8f503f1dcca



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-03 17:54                 ` Eric Dumazet
@ 2016-05-03 18:11                   ` Dave Taht
  0 siblings, 0 replies; 108+ messages in thread
From: Dave Taht @ 2016-05-03 18:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jonathan Morton, Agarwal, Anil, codel, ath10k, make-wifi-fast

On Tue, May 3, 2016 at 10:54 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2016-05-03 at 10:37 -0700, Dave Taht wrote:
>> Thus far this batch drop patch is testing out beautifully. Under a
>> 900Mbit flood going into 100Mbit on the pcengines apu2,  cpu usage for
>> ksoftirqd now doesn't crack 10%, where before (under
>> pie,pfifo,fq_codel,cake & the prior fq_codel) it went to 88% and
>> ultimately bad things happened, like losing routability.
>>
>> I've had it running for hours and I hardly notice it's there.
>>
>
> Excellent, thanks for testing it.

Getting it up to 4 floods with 8k udp fragments each could take it up
to about 20-30% of cpu.

iperf3 -c 172.26.64.200 -u -P 4 -b200Mbit -t 600 &

still, beyond awesome.

>
>> Performance for the normal cc controlled and/or sparse flows is
>> unaffected, aside from the uncontrolled flows eating their percentage
>> of the link.
>>
>> Nice work. Thx. This should go into -stable.
>>
>> https://patchwork.ozlabs.org/patch/617307/
>>
>> Sigh. The RFC is past last call...
>
> It is merged :
>
> https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=9d18562a227874289fda8ca5d117d8f503f1dcca

the ietf approval process is about 17512 hours longer than the netdev
approval process.

>
>



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] fq_codel_drop vs a udp flood
  2016-05-02 18:40         ` Dave Taht
@ 2016-05-05 13:55           ` Roman Yeryomin
  2016-05-05 14:55             ` Roman Yeryomin
  0 siblings, 1 reply; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-05 13:55 UTC (permalink / raw)
  To: Dave Taht; +Cc: make-wifi-fast, David Reed, codel, Ben Greear, ath10k

On 2 May 2016 at 21:40, Dave Taht <dave.taht@gmail.com> wrote:
> On Mon, May 2, 2016 at 7:03 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 1 May 2016 at 17:47,  <dpreed@reed.com> wrote:
>>> Maybe I missed something, but why is it important to optimize for a UDP flood?
>>
>> We don't need to optimize it to UDP but UDP is used e.g. by torrents
>> to achieve higher throughput and used a lot in general.
>
> Torrents use uTP congestion control and won't hit this function at
> all. And eric just made fq_codel_drop more efficient for tests that
> do.
>
> There are potentially zillions of other issues with ampdu's, txop
> usage, aggregate "packing", etc that can also affect and other
> protocools.
>
>> And, again, in this case TCP is broken too (750Mbps down to 550), so
>> it's not like Dave is saying that UDP test is broken, fq_codel is just
>> too hungry for CPU
>
> "fq_codel_drop" was too hungry for cpu. fixed. thx eric. :)
>
> I've never seen ath10k tcp throughput in the real world (e.g not wired
> up, over the air) even close to 750 under test on the ath10k (I've
> seen 300, and I'm getting some better gear up this week)... and
> everybody tests wifi differently.

perhaps you didn't have 3x3 client and AP?

> (for the record, what was your iperf tcp test line?). More people
> testing differently = good.

iperf3 -c <server_ip> -t600

> Did fq_codel_drop show up in the perf trace for the tcp test?

yes, but it was less hungry, something about 15-20% if I remember correctly

> (More likely you would have seen timestamping rise significantly for
> the tcp test, as well as enqueue time)
>
> That said, more people testing the same ways, good too.
>
> I'd love it if you could re-run your test via flent, rather than
> iperf, and look at the tcp sawtooth or lack thereof, and the overall
> curve of the throughput, before and after this set of commits.

I guess I should try flent but the performance drop was too evident
even with iperf

> Flent can be made to run on osx via macports or brew. (much easier to
> get running on linux) And try to tag along on observing/fixing low
> wifi rate behavior?
>
> This was the more recent dql vs wifi test:
>
> http://blog.cerowrt.org/post/dql_on_wifi_2/
>
> and series.
>
>>> A general observation of control theory is that there is almost always an adversarial strategy that will destroy any control regime. Sometimes one has to invoke an "oracle" that knows the state of the control system at all times to get there.
>>>
>>> So a handwave is that *there is always a DDoS that will work* no matter how clever you are.
>>>
>>> And the corollary is illustrated by the TSA. If you can't anticipate all possible attacks, it is not clearly better to just congest the whole system at all times with controls that can't possibly solve all possible attacks - i.e. Security Theater. We don't want "anti-DDoS theater" I don't think.
>>>
>>> There is an alternative mechanism that has been effective at dealing with DDoS in general - track the disruption back to the source and kill it.  (this is what the end-to-end argument would be: don't try to solve a fundamentally end-to-end problem, DDoS, solely in the network [switches], since you have to solve it at the edges anyway. Just include in the network things that will help you solve it at the edges - traceback tools that work fast and targeted shutdown of sources).
>>>
>>> I don't happen to know of a "normal" application that benefits from UDP flooding - not even "gossip protocols" do that!
>>>
>>> In context, then, let's not focus on UDP flood performance (or any other "extreme case" that just seems fun to work on in a research paper because it is easy to state compared to the real world) too much.
>>>
>>> I know that the reaction to this post will be to read it and pretty much go on as usual focusing on UDP floods. But I have to try. There are so many more important issues (like understanding how to use congestion signalling in gossip protocols, gaming, or live AV conferencing better, as some related examples, which are end-to-end problems for which queue management and congestion signalling are truly crucial).
>>>
>>>
>>>
>>> On Sunday, May 1, 2016 1:23am, "Dave Taht" <dave.taht@gmail.com> said:
>>>
>>>> On Sat, Apr 30, 2016 at 10:08 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>>>
>>>>>
>>>>> On 04/30/2016 08:41 PM, Dave Taht wrote:
>>>>>>
>>>>>> There were a few things on this thread that went by, and I wasn't on
>>>>>> the ath10k list
>>>>>>
>>>>>> (https://www.mail-archive.com/ath10k@lists.infradead.org/msg04461.html)
>>>>>>
>>>>>> first up, udp flood...
>>>>>>
>>>>>>>>> From: ath10k <ath10k-boun...@lists.infradead.org> on behalf of Roman
>>>>>>>>> Yeryomin <leroi.li...@gmail.com>
>>>>>>>>> Sent: Friday, April 8, 2016 8:14 PM
>>>>>>>>> To: ath10k@lists.infradead.org
>>>>>>>>> Subject: ath10k performance, master branch from 20160407
>>>>>>>>>
>>>>>>>>> Hello!
>>>>>>>>>
>>>>>>>>> I've seen performance patches were commited so I've decided to give it
>>>>>>>>> a try (using 4.1 kernel and backports).
>>>>>>>>> The results are quite disappointing: TCP download (client pov) dropped
>>>>>>>>> from 750Mbps to ~550 and UDP shows completely weird behavour - if
>>>>>>>>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives
>>>>>>>>> 250Mbps, before (latest official backports release from January) I was
>>>>>>>>> able to get 900Mbps.
>>>>>>>>> Hardware is basically ap152 + qca988x 3x3.
>>>>>>>>> When running perf top I see that fq_codel_drop eats a lot of cpu.
>>>>>>>>> Here is the output when running iperf3 UDP test:
>>>>>>>>>
>>>>>>>>>      45.78%  [kernel]       [k] fq_codel_drop
>>>>>>>>>       3.05%  [kernel]       [k] ag71xx_poll
>>>>>>>>>       2.18%  [kernel]       [k] skb_release_data
>>>>>>>>>       2.01%  [kernel]       [k] r4k_dma_cache_inv
>>>>>>
>>>>>>
>>>>>> The udp flood behavior is not "weird".  The test is wrong. It is so
>>>>>> filling
>>>>>> the local queue as to dramatically exceed the bandwidth on the link.
>>>>>
>>>>>
>>>>> It would be nice if you could provide backpressure so that you could
>>>>> simply select on the udp socket and use that to know when you can send
>>>>> more frames??
>>>>
>>>> The qdisc version returns  NET_XMIT_CN to the upper layers of the
>>>> stack in the case
>>>> where the dropped packet's flow = the ingress packet's flow, but that
>>>> is after the
>>>> exhaustive search...
>>>>
>>>> I don't know what effect (if any) that had on udp sockets. Hmm... will
>>>> look. Eric would "just know".
>>>>
>>>> That might provide more backpressure in the local scenario. SO_SND_BUF
>>>> should interact with this stuff in some sane way...
>>>>
>>>> ... but over the wire from a test driver box elsewhere, tho, aside
>>>> from ethernet flow control itself, where enabled, no.
>>>>
>>>> ... but in that case you have a much lower inbound/outbound
>>>> performance disparity in the general case to start with... which can
>>>> still be quite high...
>>>>
>>>>>
>>>>> Any idea how that works with codel?
>>>>
>>>> Beautifully.
>>>>
>>>> For responsive TCP flows. It immediately reduces the window without a RTT.
>>>>
>>>>> Thanks,
>>>>> Ben
>>>>>
>>>>> --
>>>>> Ben Greear <greearb@candelatech.com>
>>>>> Candela Technologies Inc  http://www.candelatech.com
>>>>
>>>>
>>>>
>>>> --
>>>> Dave Täht
>>>> Let's go make home routers and wifi faster! With better software!
>>>> http://blog.cerowrt.org
>>>> _______________________________________________
>>>> Make-wifi-fast mailing list
>>>> Make-wifi-fast@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>>>
>>>
>>>
>>> _______________________________________________
>>> Make-wifi-fast mailing list
>>> Make-wifi-fast@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>
>
>
> --
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-02 17:08                   ` Dave Taht
  2016-05-02 17:44                     ` Eric Dumazet
@ 2016-05-05 14:32                     ` Roman Yeryomin
  1 sibling, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-05 14:32 UTC (permalink / raw)
  To: Dave Taht
  Cc: Eric Dumazet, make-wifi-fast, ath10k, codel, Michal Kazior,
	Jonathan Morton

On 2 May 2016 at 20:08, Dave Taht <dave.taht@gmail.com> wrote:
> On Mon, May 2, 2016 at 9:14 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Mon, 2016-05-02 at 18:43 +0300, Roman Yeryomin wrote:
>>> On 2 May 2016 at 18:07, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> > On Mon, 2016-05-02 at 17:18 +0300, Roman Yeryomin wrote:
>>> >
>>> >> Imagine you are a video operator, have MacBook Pro, gigabit LAN and
>>> >> NAS on ethernet side. You would want to get maximum speed. And
>>> >> fq_codel just dropped it down to 550Mbps for TCP (instead of 750Mbps)
>>> >> and to 30Mbps for UDP (instead of 900Mbps).
>>> >> So, again, it looks broken to me.
>
> The big regression trying to be addressed here is the decades long
> increase in wifi overbuffering for slow and normal clients.
>
> The number that was making me happy was seeing low speed clients
> finally have sane behavior:
> http://blog.cerowrt.org/post/fq_codel_on_ath10k/
>
> I will add your iperf flood test to the testbench. Certainly we don't
> want to hurt peak speeds overmuch... but we'd also like to see people
> trying traffic at lower speeds.
>
> Incidentally if you are doing openwrt builds that would be of great help.
>
>>> > Can you show us your qdisc config ?
>>>
>>> Which build do you want? Before it broke or after?
>
> Commit hashes for each would help.

commit hashes wouldn't help much, I was reverting patches (see
previous emails in this thread).

>>
>>
>> I want to check your qdisc configuration, the one that you used and
>> where you had fq_codel performance issues
>>
>> tc -s -d qdisc
>
> Not sure it's the qdisc version under test here. ? If it is, I'd be
> perversely happy as for the first time ever the wifi layer started
> exerting some backpressure on the upper layers of the stack.
>
> I'm not sure which parts of which patchset are under test here,
> either. I saw a few too many patches go by all around, and I am only
> just this week able to add ath10k to my test matrix. Commit?
>
> https://github.com/kazikcz/linux/commits/fqmac-v3.5 has a version of
> fq_codel in it (and the underlying driver changes) *at the mac80211
> layer*, not the qdisc layer. It disables the overlying qdisc. It will
> need the equivalent new fq_codel_drop logic added to it, that eric
> just added, also, to do better on the udp flood test.
>
> There was a prior branch that did pretty darn well at high speeds,
> results I put on that blog post I linked to above - 820Mbps for tcp,
> an actual improvement on the baseline test. The current branch is
> simpler and did not do as well due in part to not being integrated
> with rate control (I think).
>
> There are pieces dropping in all over, there was a amdsu patch,
> another patch on rx/tx,
>
>
> --
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-02 16:14                 ` Eric Dumazet
  2016-05-02 17:08                   ` Dave Taht
@ 2016-05-05 14:53                   ` Roman Yeryomin
  2016-05-05 15:32                     ` Dave Taht
  2016-05-05 16:12                     ` Eric Dumazet
  1 sibling, 2 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-05 14:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jonathan Morton, codel, Dave Taht, ath10k, make-wifi-fast

On 2 May 2016 at 19:14, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-05-02 at 18:43 +0300, Roman Yeryomin wrote:
>> On 2 May 2016 at 18:07, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Mon, 2016-05-02 at 17:18 +0300, Roman Yeryomin wrote:
>> >
>> >> Imagine you are a video operator, have MacBook Pro, gigabit LAN and
>> >> NAS on ethernet side. You would want to get maximum speed. And
>> >> fq_codel just dropped it down to 550Mbps for TCP (instead of 750Mbps)
>> >> and to 30Mbps for UDP (instead of 900Mbps).
>> >> So, again, it looks broken to me.
>> >
>> > Can you show us your qdisc config ?
>>
>> Which build do you want? Before it broke or after?
>>
>
>
> I want to check your qdisc configuration, the one that you used and
> where you had fq_codel performance issues
>
> tc -s -d qdisc
>

qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
quantum 1514 target 5.0ms interval 100.0ms ecn
 Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc mq 0: dev wlan0 root
 Sent 29775 bytes 254 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 29775 bytes 254 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0


Will try your patch now.

Regards,
Roman

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] fq_codel_drop vs a udp flood
  2016-05-05 13:55           ` Roman Yeryomin
@ 2016-05-05 14:55             ` Roman Yeryomin
  0 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-05 14:55 UTC (permalink / raw)
  To: Dave Taht; +Cc: make-wifi-fast, David Reed, codel, Ben Greear, ath10k

On 5 May 2016 at 16:55, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 2 May 2016 at 21:40, Dave Taht <dave.taht@gmail.com> wrote:
>> On Mon, May 2, 2016 at 7:03 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 1 May 2016 at 17:47,  <dpreed@reed.com> wrote:
>>>> Maybe I missed something, but why is it important to optimize for a UDP flood?
>>>
>>> We don't need to optimize it to UDP but UDP is used e.g. by torrents
>>> to achieve higher throughput and used a lot in general.
>>
>> Torrents use uTP congestion control and won't hit this function at
>> all. And eric just made fq_codel_drop more efficient for tests that
>> do.
>>
>> There are potentially zillions of other issues with ampdu's, txop
>> usage, aggregate "packing", etc that can also affect and other
>> protocools.
>>
>>> And, again, in this case TCP is broken too (750Mbps down to 550), so
>>> it's not like Dave is saying that UDP test is broken, fq_codel is just
>>> too hungry for CPU
>>
>> "fq_codel_drop" was too hungry for cpu. fixed. thx eric. :)
>>
>> I've never seen ath10k tcp throughput in the real world (e.g not wired
>> up, over the air) even close to 750 under test on the ath10k (I've
>> seen 300, and I'm getting some better gear up this week)... and
>> everybody tests wifi differently.
>
> perhaps you didn't have 3x3 client and AP?
>
>> (for the record, what was your iperf tcp test line?). More people
>> testing differently = good.
>
> iperf3 -c <server_ip> -t600

actually `iperf3 -c <server_ip> -t600 -R` for download, client POV

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 14:53                   ` Roman Yeryomin
@ 2016-05-05 15:32                     ` Dave Taht
  2016-05-05 16:07                       ` Roman Yeryomin
  2016-05-05 16:12                     ` Eric Dumazet
  1 sibling, 1 reply; 108+ messages in thread
From: Dave Taht @ 2016-05-05 15:32 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: Jonathan Morton, codel, ath10k, Eric Dumazet, make-wifi-fast

On Thu, May 5, 2016 at 7:53 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 2 May 2016 at 19:14, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Mon, 2016-05-02 at 18:43 +0300, Roman Yeryomin wrote:
>>> On 2 May 2016 at 18:07, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> > On Mon, 2016-05-02 at 17:18 +0300, Roman Yeryomin wrote:
>>> >
>>> >> Imagine you are a video operator, have MacBook Pro, gigabit LAN and
>>> >> NAS on ethernet side. You would want to get maximum speed. And
>>> >> fq_codel just dropped it down to 550Mbps for TCP (instead of 750Mbps)
>>> >> and to 30Mbps for UDP (instead of 900Mbps).
>>> >> So, again, it looks broken to me.
>>> >
>>> > Can you show us your qdisc config ?
>>>
>>> Which build do you want? Before it broke or after?
>>>
>>
>>
>> I want to check your qdisc configuration, the one that you used and
>> where you had fq_codel performance issues
>>
>> tc -s -d qdisc
>>
Looks fine.

If you could sample that a few times during your various tests,
that would be good.

> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms ecn
>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc mq 0: dev wlan0 root
>  Sent 29775 bytes 254 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum
> 1514 target 5.0ms interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum
> 1514 target 5.0ms interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum
> 1514 target 5.0ms interval 100.0ms ecn
>  Sent 29775 bytes 254 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum
> 1514 target 5.0ms interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
>
>
> Will try your patch now.
>
> Regards,
> Roman



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 15:32                     ` Dave Taht
@ 2016-05-05 16:07                       ` Roman Yeryomin
  2016-05-05 16:59                         ` Jonathan Morton
  0 siblings, 1 reply; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-05 16:07 UTC (permalink / raw)
  To: Dave Taht; +Cc: Jonathan Morton, codel, ath10k, Eric Dumazet, make-wifi-fast

On 5 May 2016 at 18:32, Dave Taht <dave.taht@gmail.com> wrote:
> On Thu, May 5, 2016 at 7:53 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 2 May 2016 at 19:14, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> On Mon, 2016-05-02 at 18:43 +0300, Roman Yeryomin wrote:
>>>> On 2 May 2016 at 18:07, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> > On Mon, 2016-05-02 at 17:18 +0300, Roman Yeryomin wrote:
>>>> >
>>>> >> Imagine you are a video operator, have MacBook Pro, gigabit LAN and
>>>> >> NAS on ethernet side. You would want to get maximum speed. And
>>>> >> fq_codel just dropped it down to 550Mbps for TCP (instead of 750Mbps)
>>>> >> and to 30Mbps for UDP (instead of 900Mbps).
>>>> >> So, again, it looks broken to me.
>>>> >
>>>> > Can you show us your qdisc config ?
>>>>
>>>> Which build do you want? Before it broke or after?
>>>>
>>>
>>>
>>> I want to check your qdisc configuration, the one that you used and
>>> where you had fq_codel performance issues
>>>
>>> tc -s -d qdisc
>>>
> Looks fine.
>
> If you could sample that a few times during your various tests,
> that would be good.
>

UDP:

just started:
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
quantum 1514 target 5.0ms interval 100.0ms ecn
 Sent 84919 bytes 460 pkt (dropped 0, overlimits 0 requeues 2)
 backlog 0b 0p requeues 2
  maxpacket 1374 drop_overlimit 0 new_flow_count 1 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc mq 0: dev wlan0 root
 Sent 87417 bytes 400 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 1304 bytes 14 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 64155 bytes 309 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 21958 bytes 77 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

after 10s of test:
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
quantum 1514 target 5.0ms interval 100.0ms ecn
 Sent 87093 bytes 489 pkt (dropped 0, overlimits 0 requeues 2)
 backlog 0b 0p requeues 2
  maxpacket 1374 drop_overlimit 0 new_flow_count 1 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc mq 0: dev wlan0 root
 Sent 35600270 bytes 23892 pkt (dropped 679412, overlimits 0 requeues 5)
 backlog 1514Kb 1024p requeues 5
qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 1304 bytes 14 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 35891436 bytes 24003 pkt (dropped 685860, overlimits 0 requeues 5)
 backlog 1514Kb 1024p requeues 5
  maxpacket 1514 drop_overlimit 682059 new_flow_count 11 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 22442 bytes 83 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

after 20s of test:
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
quantum 1514 target 5.0ms interval 100.0ms ecn
 Sent 87591 bytes 498 pkt (dropped 0, overlimits 0 requeues 2)
 backlog 0b 0p requeues 2
  maxpacket 1374 drop_overlimit 0 new_flow_count 1 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc mq 0: dev wlan0 root
 Sent 60034332 bytes 40042 pkt (dropped 1176284, overlimits 0 requeues 5)
 backlog 1514Kb 1024p requeues 5
qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 1304 bytes 14 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 60398170 bytes 40201 pkt (dropped 1184220, overlimits 0 requeues 5)
 backlog 1514Kb 1024p requeues 5
  maxpacket 1514 drop_overlimit 1172942 new_flow_count 22 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 22442 bytes 83 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0


TCP:

just started:
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
quantum 1514 target 5.0ms interval 100.0ms ecn
 Sent 46784 bytes 299 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc mq 0: dev wlan0 root
 Sent 52313 bytes 265 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 1304 bytes 14 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 33248 bytes 194 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 17761 bytes 57 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

after 10s of test:
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
quantum 1514 target 5.0ms interval 100.0ms ecn
 Sent 4274547 bytes 64354 pkt (dropped 0, overlimits 0 requeues 1)
 backlog 0b 0p requeues 1
  maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc mq 0: dev wlan0 root
 Sent 766641277 bytes 507064 pkt (dropped 0, overlimits 0 requeues 482)
 backlog 483032b 320p requeues 482
qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 1304 bytes 14 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 768001532 bytes 507905 pkt (dropped 0, overlimits 0 requeues 482)
 backlog 529900b 350p requeues 482
  maxpacket 1514 drop_overlimit 0 new_flow_count 37 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 17761 bytes 57 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

after 20s of test:
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
quantum 1514 target 5.0ms interval 100.0ms ecn
 Sent 9064993 bytes 136936 pkt (dropped 0, overlimits 0 requeues 1)
 backlog 0b 0p requeues 1
  maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc mq 0: dev wlan0 root
 Sent 1638525871 bytes 1083456 pkt (dropped 0, overlimits 0 requeues 945)
 backlog 0b 0p requeues 945
qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 1304 bytes 14 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 1638506477 bytes 1083381 pkt (dropped 0, overlimits 0 requeues 945)
 backlog 0b 0p requeues 945
  maxpacket 1514 drop_overlimit 0 new_flow_count 70 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum
1514 target 5.0ms interval 100.0ms ecn
 Sent 18090 bytes 61 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0


That's with https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=9d18562a227874289fda8ca5d117d8f503f1dcca
Having same (low) speeds.
So it didn't help at all :(


Regards,
Roman

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 14:53                   ` Roman Yeryomin
  2016-05-05 15:32                     ` Dave Taht
@ 2016-05-05 16:12                     ` Eric Dumazet
  2016-05-05 16:25                       ` Roman Yeryomin
  1 sibling, 1 reply; 108+ messages in thread
From: Eric Dumazet @ 2016-05-05 16:12 UTC (permalink / raw)
  To: Roman Yeryomin; +Cc: Jonathan Morton, codel, Dave Taht, ath10k, make-wifi-fast

On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:

> 
> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms ecn
>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0


Limit of 1024 packets and 1024 flows is not wise I think.

(If all buckets are in use, each bucket has a virtual queue of 1 packet,
which is almost the same than having no queue at all)

I suggest to have at least 8 packets per bucket, to let Codel have a
chance to trigger.

So you could either reduce number of buckets to 128 (if memory is
tight), or increase limit to 8192.



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 16:12                     ` Eric Dumazet
@ 2016-05-05 16:25                       ` Roman Yeryomin
  2016-05-05 16:42                         ` Roman Yeryomin
  2016-05-05 19:23                         ` Eric Dumazet
  0 siblings, 2 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-05 16:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jonathan Morton, codel, Dave Taht, ath10k, make-wifi-fast

On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>
>>
>> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>> quantum 1514 target 5.0ms interval 100.0ms ecn
>>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>
>
> Limit of 1024 packets and 1024 flows is not wise I think.
>
> (If all buckets are in use, each bucket has a virtual queue of 1 packet,
> which is almost the same than having no queue at all)
>
> I suggest to have at least 8 packets per bucket, to let Codel have a
> chance to trigger.
>
> So you could either reduce number of buckets to 128 (if memory is
> tight), or increase limit to 8192.

Will try, but what I've posted is default, I didn't change/configure that.

Regards,
Roman

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 16:25                       ` Roman Yeryomin
@ 2016-05-05 16:42                         ` Roman Yeryomin
  2016-05-06 10:55                           ` Roman Yeryomin
  2016-05-05 19:23                         ` Eric Dumazet
  1 sibling, 1 reply; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-05 16:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jonathan Morton, codel, Dave Taht, ath10k, make-wifi-fast

On 5 May 2016 at 19:25, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>>
>>>
>>> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>>> quantum 1514 target 5.0ms interval 100.0ms ecn
>>>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>
>>
>> Limit of 1024 packets and 1024 flows is not wise I think.
>>
>> (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>> which is almost the same than having no queue at all)
>>
>> I suggest to have at least 8 packets per bucket, to let Codel have a
>> chance to trigger.
>>
>> So you could either reduce number of buckets to 128 (if memory is
>> tight), or increase limit to 8192.
>
> Will try, but what I've posted is default, I didn't change/configure that.

So it didn't change anything :(
Do you want new stats?

Regards,
Roman

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 16:07                       ` Roman Yeryomin
@ 2016-05-05 16:59                         ` Jonathan Morton
  2016-05-05 17:39                           ` Roman Yeryomin
  2016-05-05 18:33                           ` Dave Taht
  0 siblings, 2 replies; 108+ messages in thread
From: Jonathan Morton @ 2016-05-05 16:59 UTC (permalink / raw)
  To: Roman Yeryomin; +Cc: make-wifi-fast, ath10k, codel, Dave Taht, Eric Dumazet

> Having same (low) speeds.
> So it didn't help at all :(

Although the new “emergency drop” code is now dropping batches of consecutive packets, Codel is also still dropping individual packets in between these batches, probably at a high rate.  Since all fragments of an original packet are required to reassemble it, but Codel doesn’t link related fragments when deciding to drop, each fragment lost in this way reduces throughput efficiency.  Only a fraction of the original packets can be reassembled correctly, but the surviving (yet useless) fragments still occupy link capacity.

This phenomenon is not Codel specific; I would also expect to see it on most other AQMs, and definitely on RED variants, including PIE.  Fortunately for real traffic, it normally arises only on artificial traffic such as iperf runs with large UDP packets.  Unfortunately for AQM advocates, iperf uses large UDP packets by default, and it is very easy to misinterpret the results unfavourably for AQM (as opposed to unfavourably for iperf).

If you re-run the test with iperf set to a packet size compatible with the path MTU, you should see much better throughput numbers due to the elimination of fragmented packets.  A UDP payload size of 1280 bytes is a safe, conservative figure for a normal MTU in the vicinity of 1500.

> Limit of 1024 packets and 1024 flows is not wise I think.
> 
> (If all buckets are in use, each bucket has a virtual queue of 1 packet,
> which is almost the same than having no queue at all)

This, while theoretically important in extreme cases with very large numbers of flows, is not relevant to the specific test in question.

 - Jonathan Morton


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 16:59                         ` Jonathan Morton
@ 2016-05-05 17:39                           ` Roman Yeryomin
  2016-05-05 18:16                             ` Dave Taht
  2016-05-05 18:33                           ` Dave Taht
  1 sibling, 1 reply; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-05 17:39 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: make-wifi-fast, ath10k, codel, Dave Taht, Eric Dumazet

On 5 May 2016 at 19:59, Jonathan Morton <chromatix99@gmail.com> wrote:
>> Having same (low) speeds.
>> So it didn't help at all :(
>
> Although the new “emergency drop” code is now dropping batches of consecutive packets, Codel is also still dropping individual packets in between these batches, probably at a high rate.  Since all fragments of an original packet are required to reassemble it, but Codel doesn’t link related fragments when deciding to drop, each fragment lost in this way reduces throughput efficiency.  Only a fraction of the original packets can be reassembled correctly, but the surviving (yet useless) fragments still occupy link capacity.
>
> This phenomenon is not Codel specific; I would also expect to see it on most other AQMs, and definitely on RED variants, including PIE.  Fortunately for real traffic, it normally arises only on artificial traffic such as iperf runs with large UDP packets.  Unfortunately for AQM advocates, iperf uses large UDP packets by default, and it is very easy to misinterpret the results unfavourably for AQM (as opposed to unfavourably for iperf).
>
> If you re-run the test with iperf set to a packet size compatible with the path MTU, you should see much better throughput numbers due to the elimination of fragmented packets.  A UDP payload size of 1280 bytes is a safe, conservative figure for a normal MTU in the vicinity of 1500.

Setting packet size to 1280 (-l1280) instead of 1472, I got even lower
speed (18-20Mbps).
Other ideas?

>> Limit of 1024 packets and 1024 flows is not wise I think.
>>
>> (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>> which is almost the same than having no queue at all)
>
> This, while theoretically important in extreme cases with very large numbers of flows, is not relevant to the specific test in question.
>
>  - Jonathan Morton
>

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 17:39                           ` Roman Yeryomin
@ 2016-05-05 18:16                             ` Dave Taht
  0 siblings, 0 replies; 108+ messages in thread
From: Dave Taht @ 2016-05-05 18:16 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: Jonathan Morton, codel, ath10k, Eric Dumazet, make-wifi-fast

On Thu, May 5, 2016 at 10:39 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 5 May 2016 at 19:59, Jonathan Morton <chromatix99@gmail.com> wrote:
>>> Having same (low) speeds.
>>> So it didn't help at all :(
>>
>> Although the new “emergency drop” code is now dropping batches of consecutive packets, Codel is also still dropping individual packets in between these batches, probably at a high rate.  Since all fragments of an original packet are required to reassemble it, but Codel doesn’t link related fragments when deciding to drop, each fragment lost in this way reduces throughput efficiency.  Only a fraction of the original packets can be reassembled correctly, but the surviving (yet useless) fragments still occupy link capacity.
>>
>> This phenomenon is not Codel specific; I would also expect to see it on most other AQMs, and definitely on RED variants, including PIE.  Fortunately for real traffic, it normally arises only on artificial traffic such as iperf runs with large UDP packets.  Unfortunately for AQM advocates, iperf uses large UDP packets by default, and it is very easy to misinterpret the results unfavourably for AQM (as opposed to unfavourably for iperf).
>>
>> If you re-run the test with iperf set to a packet size compatible with the path MTU, you should see much better throughput numbers due to the elimination of fragmented packets.  A UDP payload size of 1280 bytes is a safe, conservative figure for a normal MTU in the vicinity of 1500.
>
> Setting packet size to 1280 (-l1280) instead of 1472, I got even lower
> speed (18-20Mbps).
> Other ideas?

How about:

completely dropping your hand-picked patch set and joining us on michal's tree?

https://github.com/kazikcz/linux/commits/fqmac-v4%2Bdqlrfc%2Bcpuregrfix

He just put a commit in there on top of that patchset that might point
at problem you're seeing, in particular, and the code moves all of
fq_codel into the mac80211 layer where it can be scheduled better.

I'm still working off the prior patch set, finding bugs in 802.11e
(that for all I know pre-exist):

http://blog.cerowrt.org/post/cs5_lockout/

(I would love it if people had more insight into the VI queue)

and I wrote up the first tests of the prior (fqmac 3.5) patch here:

http://blog.cerowrt.org/post/ath10_ath9k_1/

with pretty pictures, and a circle and arrow on the back of each one
to be used as evidence against us.

I did just get another 3x3 card to play with, but I'd like to finish
up comprehensively evaluating what I got against mainline first, at
the bandwidths (ath9k to ath10k) I can currently achieve and that will
take til monday, at least. Since your hardware is weaker than mine
(single core?) it would be good to amble along in parallel.

>>> Limit of 1024 packets and 1024 flows is not wise I think.
>>>
>>> (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>>> which is almost the same than having no queue at all)
>>
>> This, while theoretically important in extreme cases with very large numbers of flows, is not relevant to the specific test in question.
>>
>>  - Jonathan Morton
>>



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 16:59                         ` Jonathan Morton
  2016-05-05 17:39                           ` Roman Yeryomin
@ 2016-05-05 18:33                           ` Dave Taht
  1 sibling, 0 replies; 108+ messages in thread
From: Dave Taht @ 2016-05-05 18:33 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: make-wifi-fast, Roman Yeryomin, codel, ath10k, Eric Dumazet

On Thu, May 5, 2016 at 9:59 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>> Having same (low) speeds.
>> So it didn't help at all :(
>
> Although the new “emergency drop” code is now dropping batches of consecutive packets, Codel is also still dropping individual packets in between these batches, probably at a high rate.  Since all fragments of an original packet are required to reassemble it, but Codel doesn’t link related fragments when deciding to drop, each fragment lost in this way reduces throughput efficiency.  Only a fraction of the original packets can be reassembled correctly, but the surviving (yet useless) fragments still occupy link capacity.

I could see an AQM dropper testing to see if it is dropping a frag,
and then dropping any further fragments, also. We're looking at the IP
headers anyway in that section of the code, and the decision to drop
is (usually) rare, and fragments a PITA.

> This phenomenon is not Codel specific; I would also expect to see it on most other AQMs, and definitely on RED variants, including PIE.  Fortunately for real traffic, it normally arises only on artificial traffic such as iperf runs with large UDP packets.  Unfortunately for AQM advocates, iperf uses large UDP packets by default, and it is very easy to misinterpret the results unfavourably for AQM (as opposed to unfavourably for iperf).
>
> If you re-run the test with iperf set to a packet size compatible with the path MTU, you should see much better throughput numbers due to the elimination of fragmented packets.  A UDP payload size of 1280 bytes is a safe, conservative figure for a normal MTU in the vicinity of 1500.
>
>> Limit of 1024 packets and 1024 flows is not wise I think.
>>
>> (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>> which is almost the same than having no queue at all)
>
> This, while theoretically important in extreme cases with very large numbers of flows, is not relevant to the specific test in question.
>
>  - Jonathan Morton
>



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 16:25                       ` Roman Yeryomin
  2016-05-05 16:42                         ` Roman Yeryomin
@ 2016-05-05 19:23                         ` Eric Dumazet
  2016-05-05 19:41                           ` Dave Taht
  2016-05-06  9:42                             ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Jesper Dangaard Brouer
  1 sibling, 2 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-05 19:23 UTC (permalink / raw)
  To: Roman Yeryomin; +Cc: Jonathan Morton, codel, Dave Taht, ath10k, make-wifi-fast

On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
> On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
> >
> >>
> >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
> >> quantum 1514 target 5.0ms interval 100.0ms ecn
> >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
> >>  backlog 0b 0p requeues 0
> >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> >>   new_flows_len 0 old_flows_len 0
> >
> >
> > Limit of 1024 packets and 1024 flows is not wise I think.
> >
> > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
> > which is almost the same than having no queue at all)
> >
> > I suggest to have at least 8 packets per bucket, to let Codel have a
> > chance to trigger.
> >
> > So you could either reduce number of buckets to 128 (if memory is
> > tight), or increase limit to 8192.
> 
> Will try, but what I've posted is default, I didn't change/configure that.

fq_codel has a default of 10240 packets and 1024 buckets.

http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413

If someone changed that in the linux variant you use, he probably should
explain the rationale.





_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 19:23                         ` Eric Dumazet
@ 2016-05-05 19:41                           ` Dave Taht
  2016-05-06  8:41                             ` moeller0
  2016-05-06  9:42                             ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Jesper Dangaard Brouer
  1 sibling, 1 reply; 108+ messages in thread
From: Dave Taht @ 2016-05-05 19:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jonathan Morton, Roman Yeryomin, codel, ath10k, make-wifi-fast

On Thu, May 5, 2016 at 12:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>> On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>> >
>> >>
>> >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>> >> quantum 1514 target 5.0ms interval 100.0ms ecn
>> >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>> >>  backlog 0b 0p requeues 0
>> >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>> >>   new_flows_len 0 old_flows_len 0
>> >
>> >
>> > Limit of 1024 packets and 1024 flows is not wise I think.
>> >
>> > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>> > which is almost the same than having no queue at all)
>> >
>> > I suggest to have at least 8 packets per bucket, to let Codel have a
>> > chance to trigger.
>> >
>> > So you could either reduce number of buckets to 128 (if memory is
>> > tight), or increase limit to 8192.
>>
>> Will try, but what I've posted is default, I didn't change/configure that.
>
> fq_codel has a default of 10240 packets and 1024 buckets.
>
> http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>
> If someone changed that in the linux variant you use, he probably should
> explain the rationale.

I guess that would be me.

Openwrt has long shipped with the fq_codel default outer queue limit
being lower than the default (e.g. 1024). Think: itty bitty 32MB
routers. 10240 packets can = boom, particuarly while there were 4
fq_codel instances per wifi interface (and people in the habit of
creating 2 or more wifi interfaces).

back then: I viewed the probability of flooding all 1024 queues as low
and thus the queue depth would be sufficient for any given set of
flows to do well. (and long ago we gave codel a probability of working
on all queues). And did not do enough udp flood testing. :(

Totally not the right answer, I know. And the problem is even worse
now, with 128MB arm boxes like the armada 385 (linksys 1200ac, turris
omnia) using software GRO to be bulking up 64k packets at gigE and
trying to ship them to an isp at 5mbit, or over wifi at some rate
lower than that.

cake switched to byte, rather than packet, accounting, for these
reasons, and we're still trying various methods to peel apart
superpackets at some load level efficiently.

 And routers are tending to ship with a lot more memory these days,
overall. We are discussing changing the sqm system to dynamically size
the packet limit by overall memory limits here, for example:
https://github.com/tohojo/sqm-scripts/issues/42

AND: As sorta now implemented in the mac80211 fq_codel code, it's per
radio, rather than per interface (or was, when I last thought about
it), which is *vastly saner* than four fq_codel instances for each
SSID.

>
>
>



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 19:41                           ` Dave Taht
@ 2016-05-06  8:41                             ` moeller0
  2016-05-06 11:33                               ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 108+ messages in thread
From: moeller0 @ 2016-05-06  8:41 UTC (permalink / raw)
  To: Dave Täht
  Cc: Eric Dumazet, make-wifi-fast, ath10k, codel, Jonathan Morton,
	Roman Yeryomin

Hi All,

> On May 5, 2016, at 21:41 , Dave Taht <dave.taht@gmail.com> wrote:
> 
> On Thu, May 5, 2016 at 12:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>>> On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> […]
>> 
>> fq_codel has a default of 10240 packets and 1024 buckets.
>> 
>> http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>> 
>> If someone changed that in the linux variant you use, he probably should
>> explain the rationale.
> 
> I guess that would be me.

	IIRC, I was making a a lot og noise back then as well.

> 
> Openwrt has long shipped with the fq_codel default outer queue limit
> being lower than the default (e.g. 1024). Think: itty bitty 32MB
> routers. 10240 packets can = boom, particuarly while there were 4
> fq_codel instances per wifi interface (and people in the habit of
> creating 2 or more wifi interfaces).

	In my case I could force a OOM reboot of my 64MB router by a “simple” unidirectional UDP flood with randomized port numbers; at 10240 packet limit that was eating approximately 20MB of the 64 MB ram of the device which made it go “boom”. I tried to convince people that bad queueing is not the most important concern under those conditions, staying up is rather more important… 


> 
> back then: I viewed the probability of flooding all 1024 queues as low
> and thus the queue depth would be sufficient for any given set of
> flows to do well. (and long ago we gave codel a probability of working
> on all queues). And did not do enough udp flood testing. :(

	I would argue that the main goal for behaviour under attack should be (IMHO) “staying alive” rather than unscheduled OOMs reboots/crashes. Keeping the lights on so to speak should be the first priority followed by trying to still maintain fairness guarantees.
> 
> Totally not the right answer, I know. And the problem is even worse
> now, with 128MB arm boxes like the armada 385 (linksys 1200ac, turris
> omnia) using software GRO to be bulking up 64k packets at gigE and
> trying to ship them to an isp at 5mbit, or over wifi at some rate
> lower than that.
> 
> cake switched to byte, rather than packet, accounting, for these
> reasons, and we're still trying various methods to peel apart
> superpackets at some load level efficiently.

	Speaking out of total ignorance, I ask why not account GRO/GSO packets by the number of their fragments against the packet limit? Counting a 64kB packets as equivalent to a 64B packet probably is the right thing if one tries to account for the work the OS needs to perform to figure out what to do with the packet, but for limiting the memory consumption it introduces an impressive/manly level of uncertainty (2 orders of magnitude). 


Best Regards
	Sebastian

> 
> And routers are tending to ship with a lot more memory these days,
> overall. We are discussing changing the sqm system to dynamically size
> the packet limit by overall memory limits here, for example:
> https://github.com/tohojo/sqm-scripts/issues/42
> 
> AND: As sorta now implemented in the mac80211 fq_codel code, it's per
> radio, rather than per interface (or was, when I last thought about
> it), which is *vastly saner* than four fq_codel instances for each
> SSID.
> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> http://blog.cerowrt.org
> _______________________________________________
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-05 19:23                         ` Eric Dumazet
@ 2016-05-06  9:42                             ` Jesper Dangaard Brouer
  2016-05-06  9:42                             ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Jesper Dangaard Brouer
  1 sibling, 0 replies; 108+ messages in thread
From: Jesper Dangaard Brouer @ 2016-05-06  9:42 UTC (permalink / raw)
  To: Eric Dumazet, Felix Fietkau, Dave Taht
  Cc: make-wifi-fast, zajec5, ath10k, netdev, codel, Jonathan Morton,
	Roman Yeryomin


Hi Felix,

This is an important fix for OpenWRT, please read!

OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
without also adjusting q->flows_cnt.  Eric explains below that you must
also adjust the buckets (q->flows_cnt) for this not to break. (Just
adjust it to 128)

Problematic OpenWRT commit in question:
 http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")


I also highly recommend you cherry-pick this very recent commit:
 net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
 https://git.kernel.org/davem/net-next/c/9d18562a227

This should fix very high CPU usage in-case fq_codel goes into drop mode.
The problem is that drop mode was considered rare, and implementation
wise it was chosen to be more expensive (to save cycles on normal mode).
Unfortunately is it easy to trigger with an UDP flood. Drop mode is
especially expensive for smaller devices, as it scans a 4K big array,
thus 64 cache misses for small devices!

The fix is to allow drop-mode to bulk-drop more packets when entering
drop-mode (default 64 bulk drop).  That way we don't suddenly
experience a significantly higher processing cost per packet, but
instead can amortize this.

To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
drop, given we also recommend bucket size to be 128 ? (thus the amount
of memory to scan is less, but their CPU is also much smaller).

--Jesper


On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
> > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:  
> > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
> > >  
> > >>
> > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
> > >> quantum 1514 target 5.0ms interval 100.0ms ecn
> > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
> > >>  backlog 0b 0p requeues 0
> > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> > >>   new_flows_len 0 old_flows_len 0  
> > >
> > >
> > > Limit of 1024 packets and 1024 flows is not wise I think.
> > >
> > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
> > > which is almost the same than having no queue at all)
> > >
> > > I suggest to have at least 8 packets per bucket, to let Codel have a
> > > chance to trigger.
> > >
> > > So you could either reduce number of buckets to 128 (if memory is
> > > tight), or increase limit to 8192.  
> > 
> > Will try, but what I've posted is default, I didn't change/configure that.  
> 
> fq_codel has a default of 10240 packets and 1024 buckets.
> 
> http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
> 
> If someone changed that in the linux variant you use, he probably should
> explain the rationale.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-06  9:42                             ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 108+ messages in thread
From: Jesper Dangaard Brouer @ 2016-05-06  9:42 UTC (permalink / raw)
  To: Eric Dumazet, Felix Fietkau, Dave Taht
  Cc: make-wifi-fast, zajec5, ath10k, netdev, codel, brouer,
	Jonathan Morton, Roman Yeryomin


Hi Felix,

This is an important fix for OpenWRT, please read!

OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
without also adjusting q->flows_cnt.  Eric explains below that you must
also adjust the buckets (q->flows_cnt) for this not to break. (Just
adjust it to 128)

Problematic OpenWRT commit in question:
 http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")


I also highly recommend you cherry-pick this very recent commit:
 net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
 https://git.kernel.org/davem/net-next/c/9d18562a227

This should fix very high CPU usage in-case fq_codel goes into drop mode.
The problem is that drop mode was considered rare, and implementation
wise it was chosen to be more expensive (to save cycles on normal mode).
Unfortunately is it easy to trigger with an UDP flood. Drop mode is
especially expensive for smaller devices, as it scans a 4K big array,
thus 64 cache misses for small devices!

The fix is to allow drop-mode to bulk-drop more packets when entering
drop-mode (default 64 bulk drop).  That way we don't suddenly
experience a significantly higher processing cost per packet, but
instead can amortize this.

To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
drop, given we also recommend bucket size to be 128 ? (thus the amount
of memory to scan is less, but their CPU is also much smaller).

--Jesper


On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
> > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:  
> > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
> > >  
> > >>
> > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
> > >> quantum 1514 target 5.0ms interval 100.0ms ecn
> > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
> > >>  backlog 0b 0p requeues 0
> > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> > >>   new_flows_len 0 old_flows_len 0  
> > >
> > >
> > > Limit of 1024 packets and 1024 flows is not wise I think.
> > >
> > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
> > > which is almost the same than having no queue at all)
> > >
> > > I suggest to have at least 8 packets per bucket, to let Codel have a
> > > chance to trigger.
> > >
> > > So you could either reduce number of buckets to 128 (if memory is
> > > tight), or increase limit to 8192.  
> > 
> > Will try, but what I've posted is default, I didn't change/configure that.  
> 
> fq_codel has a default of 10240 packets and 1024 buckets.
> 
> http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
> 
> If someone changed that in the linux variant you use, he probably should
> explain the rationale.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-05 16:42                         ` Roman Yeryomin
@ 2016-05-06 10:55                           ` Roman Yeryomin
  0 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-06 10:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jonathan Morton, codel, Dave Taht, ath10k, make-wifi-fast

On 5 May 2016 at 19:42, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 5 May 2016 at 19:25, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>>>
>>>>
>>>> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>>>> quantum 1514 target 5.0ms interval 100.0ms ecn
>>>>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>>>  backlog 0b 0p requeues 0
>>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>>   new_flows_len 0 old_flows_len 0
>>>
>>>
>>> Limit of 1024 packets and 1024 flows is not wise I think.
>>>
>>> (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>>> which is almost the same than having no queue at all)
>>>
>>> I suggest to have at least 8 packets per bucket, to let Codel have a
>>> chance to trigger.
>>>
>>> So you could either reduce number of buckets to 128 (if memory is
>>> tight), or increase limit to 8192.
>>
>> Will try, but what I've posted is default, I didn't change/configure that.
>
> So it didn't change anything :(
> Do you want new stats?

Just realized that the patch should be applied/ported to Linux kernel
OpenWrt part, not compat-wireless.
Will try to do that now.

Regards,
Roman

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-06  8:41                             ` moeller0
@ 2016-05-06 11:33                               ` Jesper Dangaard Brouer
  2016-05-06 11:46                                 ` moeller0
  0 siblings, 1 reply; 108+ messages in thread
From: Jesper Dangaard Brouer @ 2016-05-06 11:33 UTC (permalink / raw)
  To: moeller0, Eric Dumazet
  Cc: make-wifi-fast, Dave Täht, ath10k, codel, brouer,
	Jonathan Morton, Roman Yeryomin


On Fri, 6 May 2016 10:41:53 +0200 moeller0 <moeller0@gmx.de> wrote:

> 	Speaking out of total ignorance, I ask why not account
> GRO/GSO packets by the number of their fragments against the packet
> limit? Counting a 64kB packets as equivalent to a 64B packet probably
> is the right thing if one tries to account for the work the OS needs
> to perform to figure out what to do with the packet, but for limiting
> the memory consumption it introduces an impressive/manly level of
> uncertainty (2 orders of magnitude). 

Looking at the drop code in fq_codel:
 https://github.com/torvalds/linux/blob/v4.6-rc6/net/sched/sch_fq_codel.c#L136

It looks like we are finding the "fat" flow to drop from based on
number of bytes queued.  And AFAIK skb->len also account for the total
length of all GSO packets (Eric?)

Even better, we are using qdisc_pkt_len(skb), which also account for
the GSO headers in qdisc_pkt_len_init(). 
 https://github.com/torvalds/linux/blob/v4.6-rc6/net/core/dev.c#L2993

If anything, the GSO packets get hit harder by the fq_codel_drop
function, as we drop the entire GSO skb.


Is the issue you are raising that the 1024 packet limit, would allow
1024 x 64K bytes to be queued before the drop kicks in? (Resulting in
using too much memory on your small device).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 11:33                               ` Jesper Dangaard Brouer
@ 2016-05-06 11:46                                 ` moeller0
  2016-05-06 13:25                                   ` Eric Dumazet
  0 siblings, 1 reply; 108+ messages in thread
From: moeller0 @ 2016-05-06 11:46 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, make-wifi-fast, Dave Täht, ath10k, codel,
	Jonathan Morton, Roman Yeryomin

Hi Jesper,

> On May 6, 2016, at 13:33 , Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
> 
> On Fri, 6 May 2016 10:41:53 +0200 moeller0 <moeller0@gmx.de> wrote:
> 
>> 	Speaking out of total ignorance, I ask why not account
>> GRO/GSO packets by the number of their fragments against the packet
>> limit? Counting a 64kB packets as equivalent to a 64B packet probably
>> is the right thing if one tries to account for the work the OS needs
>> to perform to figure out what to do with the packet, but for limiting
>> the memory consumption it introduces an impressive/manly level of
>> uncertainty (2 orders of magnitude). 
> 
> Looking at the drop code in fq_codel:
> https://github.com/torvalds/linux/blob/v4.6-rc6/net/sched/sch_fq_codel.c#L136
> 
> It looks like we are finding the "fat" flow to drop from based on
> number of bytes queued.  And AFAIK skb->len also account for the total
> length of all GSO packets (Eric?)
> 
> Even better, we are using qdisc_pkt_len(skb), which also account for
> the GSO headers in qdisc_pkt_len_init(). 
> https://github.com/torvalds/linux/blob/v4.6-rc6/net/core/dev.c#L2993
> 
> If anything, the GSO packets get hit harder by the fq_codel_drop
> function, as we drop the entire GSO skb.

	This sounds all very reassuring!

> 
> 
> Is the issue you are raising that the 1024 packet limit, would allow
> 1024 x 64K bytes to be queued before the drop kicks in? (Resulting in
> using too much memory on your small device).

	Yes, I guess I need to explain better. My wndr3700v7 only sports a measly 64MB ram total, so in the past with the default 10240 limit I could force OOM-initiated reboots. So my angle on this issue is always, I want my router to survive even if the the network is over-saturated (I do not expect the router to work well under those circumstances, but I somehow think it should at least not reboot during DOS conditions…). Cake allows to specify a hard memory limit that looks a skb-truesize to allow stricter memory control for people like me, making the whole GRO/GSO issue go away, at least as far as OOM is concerned. And as I begin to understand once a link reaches a certain bandwidth GRO/GSO will become helpful again, especially on small routers as they help reduce the work the kernel needs to to per byte…

Best Regards & Many Thanks
	Sebastian


> 
> -- 
> Best regards,
>  Jesper Dangaard Brouer
>  MSc.CS, Principal Kernel Engineer at Red Hat
>  Author of http://www.iptv-analyzer.org
>  LinkedIn: http://www.linkedin.com/in/brouer


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-06  9:42                             ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Jesper Dangaard Brouer
@ 2016-05-06 12:47                               ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 108+ messages in thread
From: Jesper Dangaard Brouer @ 2016-05-06 12:47 UTC (permalink / raw)
  To: Felix Fietkau, Dave Taht
  Cc: make-wifi-fast, zajec5, ath10k, codel, netdev, Jonathan Morton,
	Roman Yeryomin, openwrt-devel


I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
closed Felix'es OpenWRT email account (bad choice! emails bouncing).
Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
is in some kind of conflict.

OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349

[2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335


On Fri, 6 May 2016 11:42:43 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> Hi Felix,
> 
> This is an important fix for OpenWRT, please read!
> 
> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
> without also adjusting q->flows_cnt.  Eric explains below that you must
> also adjust the buckets (q->flows_cnt) for this not to break. (Just
> adjust it to 128)
> 
> Problematic OpenWRT commit in question:
>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
> 
> 
> I also highly recommend you cherry-pick this very recent commit:
>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>  https://git.kernel.org/davem/net-next/c/9d18562a227
> 
> This should fix very high CPU usage in-case fq_codel goes into drop mode.
> The problem is that drop mode was considered rare, and implementation
> wise it was chosen to be more expensive (to save cycles on normal mode).
> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
> especially expensive for smaller devices, as it scans a 4K big array,
> thus 64 cache misses for small devices!
> 
> The fix is to allow drop-mode to bulk-drop more packets when entering
> drop-mode (default 64 bulk drop).  That way we don't suddenly
> experience a significantly higher processing cost per packet, but
> instead can amortize this.
> 
> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
> drop, given we also recommend bucket size to be 128 ? (thus the amount
> of memory to scan is less, but their CPU is also much smaller).
> 
> --Jesper
> 
> 
> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:  
> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:    
> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
> > > >    
> > > >>
> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
> > > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
> > > >>  backlog 0b 0p requeues 0
> > > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> > > >>   new_flows_len 0 old_flows_len 0    
> > > >
> > > >
> > > > Limit of 1024 packets and 1024 flows is not wise I think.
> > > >
> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
> > > > which is almost the same than having no queue at all)
> > > >
> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
> > > > chance to trigger.
> > > >
> > > > So you could either reduce number of buckets to 128 (if memory is
> > > > tight), or increase limit to 8192.    
> > > 
> > > Will try, but what I've posted is default, I didn't change/configure that.    
> > 
> > fq_codel has a default of 10240 packets and 1024 buckets.
> > 
> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
> > 
> > If someone changed that in the linux variant you use, he probably should
> > explain the rationale.  

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-06 12:47                               ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 108+ messages in thread
From: Jesper Dangaard Brouer @ 2016-05-06 12:47 UTC (permalink / raw)
  To: Felix Fietkau, Dave Taht
  Cc: make-wifi-fast, zajec5, brouer, ath10k, codel, netdev,
	Jonathan Morton, Roman Yeryomin, openwrt-devel


I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
closed Felix'es OpenWRT email account (bad choice! emails bouncing).
Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
is in some kind of conflict.

OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349

[2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335


On Fri, 6 May 2016 11:42:43 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> Hi Felix,
> 
> This is an important fix for OpenWRT, please read!
> 
> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
> without also adjusting q->flows_cnt.  Eric explains below that you must
> also adjust the buckets (q->flows_cnt) for this not to break. (Just
> adjust it to 128)
> 
> Problematic OpenWRT commit in question:
>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
> 
> 
> I also highly recommend you cherry-pick this very recent commit:
>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>  https://git.kernel.org/davem/net-next/c/9d18562a227
> 
> This should fix very high CPU usage in-case fq_codel goes into drop mode.
> The problem is that drop mode was considered rare, and implementation
> wise it was chosen to be more expensive (to save cycles on normal mode).
> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
> especially expensive for smaller devices, as it scans a 4K big array,
> thus 64 cache misses for small devices!
> 
> The fix is to allow drop-mode to bulk-drop more packets when entering
> drop-mode (default 64 bulk drop).  That way we don't suddenly
> experience a significantly higher processing cost per packet, but
> instead can amortize this.
> 
> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
> drop, given we also recommend bucket size to be 128 ? (thus the amount
> of memory to scan is less, but their CPU is also much smaller).
> 
> --Jesper
> 
> 
> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:  
> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:    
> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
> > > >    
> > > >>
> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
> > > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
> > > >>  backlog 0b 0p requeues 0
> > > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> > > >>   new_flows_len 0 old_flows_len 0    
> > > >
> > > >
> > > > Limit of 1024 packets and 1024 flows is not wise I think.
> > > >
> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
> > > > which is almost the same than having no queue at all)
> > > >
> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
> > > > chance to trigger.
> > > >
> > > > So you could either reduce number of buckets to 128 (if memory is
> > > > tight), or increase limit to 8192.    
> > > 
> > > Will try, but what I've posted is default, I didn't change/configure that.    
> > 
> > fq_codel has a default of 10240 packets and 1024 buckets.
> > 
> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
> > 
> > If someone changed that in the linux variant you use, he probably should
> > explain the rationale.  

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 11:46                                 ` moeller0
@ 2016-05-06 13:25                                   ` Eric Dumazet
  2016-05-06 15:25                                     ` moeller0
  2016-05-06 15:55                                     ` [PATCH net-next] fq_codel: add memory limitation per queue Eric Dumazet
  0 siblings, 2 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-06 13:25 UTC (permalink / raw)
  To: moeller0
  Cc: make-wifi-fast, Dave Täht, ath10k, codel,
	Jesper Dangaard Brouer, Jonathan Morton, Roman Yeryomin

On Fri, 2016-05-06 at 13:46 +0200, moeller0 wrote:
> Hi Jesper,
> 
> > On May 6, 2016, at 13:33 , Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> > 
> > 
> > On Fri, 6 May 2016 10:41:53 +0200 moeller0 <moeller0@gmx.de> wrote:
> > 
> >> 	Speaking out of total ignorance, I ask why not account
> >> GRO/GSO packets by the number of their fragments against the packet
> >> limit? Counting a 64kB packets as equivalent to a 64B packet
> probably
> >> is the right thing if one tries to account for the work the OS
> needs
> >> to perform to figure out what to do with the packet, but for
> limiting
> >> the memory consumption it introduces an impressive/manly level of
> >> uncertainty (2 orders of magnitude). 
> > 
> > Looking at the drop code in fq_codel:
> >
> https://github.com/torvalds/linux/blob/v4.6-rc6/net/sched/sch_fq_codel.c#L136
> > 
> > It looks like we are finding the "fat" flow to drop from based on
> > number of bytes queued.  And AFAIK skb->len also account for the
> total
> > length of all GSO packets (Eric?)
> > 
> > Even better, we are using qdisc_pkt_len(skb), which also account for
> > the GSO headers in qdisc_pkt_len_init(). 
> > https://github.com/torvalds/linux/blob/v4.6-rc6/net/core/dev.c#L2993
> > 
> > If anything, the GSO packets get hit harder by the fq_codel_drop
> > function, as we drop the entire GSO skb.
> 
> 	This sounds all very reassuring!
> 
> > 
> > 
> > Is the issue you are raising that the 1024 packet limit, would allow
> > 1024 x 64K bytes to be queued before the drop kicks in? (Resulting
> in
> > using too much memory on your small device).
> 
> 	Yes, I guess I need to explain better. My wndr3700v7 only sports a
> measly 64MB ram total, so in the past with the default 10240 limit I
> could force OOM-initiated reboots. So my angle on this issue is
> always, I want my router to survive even if the the network is
> over-saturated (I do not expect the router to work well under those
> circumstances, but I somehow think it should at least not reboot
> during DOS conditions…). Cake allows to specify a hard memory limit
> that looks a skb-truesize to allow stricter memory control for people
> like me, making the whole GRO/GSO issue go away, at least as far as
> OOM is concerned. And as I begin to understand once a link reaches a
> certain bandwidth GRO/GSO will become helpful again, especially on
> small routers as they help reduce the work the kernel needs to to per
> byte…

Angles of attack :

1) I will provide a per device /sys/class/net/eth0/gro_max_frags so that
we can more easily control amount of segs per GRO packets. It makes
sense to have GRO, but not so much allowing it to cook big packets that
might hurt FQ.

2) Tracking skb->truesize looks mandatory for small devices.
I will add this to fq_codel.

3) Making sure skb->truesize is accurate is a long term effort we want
to constantly monitor, since some drivers are doing under estimations.

Thanks.






_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 13:25                                   ` Eric Dumazet
@ 2016-05-06 15:25                                     ` moeller0
  2016-05-06 15:58                                       ` Eric Dumazet
  2016-05-06 15:55                                     ` [PATCH net-next] fq_codel: add memory limitation per queue Eric Dumazet
  1 sibling, 1 reply; 108+ messages in thread
From: moeller0 @ 2016-05-06 15:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: make-wifi-fast, Dave Täht, ath10k, codel,
	Jesper Dangaard Brouer, Jonathan Morton, Roman Yeryomin

Hi Eric,

> On May 6, 2016, at 15:25 , Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> On Fri, 2016-05-06 at 13:46 +0200, moeller0 wrote:
>> Hi Jesper,
>> 
>>> On May 6, 2016, at 13:33 , Jesper Dangaard Brouer
>> <brouer@redhat.com> wrote:
>>> 
>>> 
>>> On Fri, 6 May 2016 10:41:53 +0200 moeller0 <moeller0@gmx.de> wrote:
>>> 
>>>> 	Speaking out of total ignorance, I ask why not account
>>>> GRO/GSO packets by the number of their fragments against the packet
>>>> limit? Counting a 64kB packets as equivalent to a 64B packet
>> probably
>>>> is the right thing if one tries to account for the work the OS
>> needs
>>>> to perform to figure out what to do with the packet, but for
>> limiting
>>>> the memory consumption it introduces an impressive/manly level of
>>>> uncertainty (2 orders of magnitude). 
>>> 
>>> Looking at the drop code in fq_codel:
>>> 
>> https://github.com/torvalds/linux/blob/v4.6-rc6/net/sched/sch_fq_codel.c#L136
>>> 
>>> It looks like we are finding the "fat" flow to drop from based on
>>> number of bytes queued.  And AFAIK skb->len also account for the
>> total
>>> length of all GSO packets (Eric?)
>>> 
>>> Even better, we are using qdisc_pkt_len(skb), which also account for
>>> the GSO headers in qdisc_pkt_len_init(). 
>>> https://github.com/torvalds/linux/blob/v4.6-rc6/net/core/dev.c#L2993
>>> 
>>> If anything, the GSO packets get hit harder by the fq_codel_drop
>>> function, as we drop the entire GSO skb.
>> 
>> 	This sounds all very reassuring!
>> 
>>> 
>>> 
>>> Is the issue you are raising that the 1024 packet limit, would allow
>>> 1024 x 64K bytes to be queued before the drop kicks in? (Resulting
>> in
>>> using too much memory on your small device).
>> 
>> 	Yes, I guess I need to explain better. My wndr3700v7 only sports a
>> measly 64MB ram total, so in the past with the default 10240 limit I
>> could force OOM-initiated reboots. So my angle on this issue is
>> always, I want my router to survive even if the the network is
>> over-saturated (I do not expect the router to work well under those
>> circumstances, but I somehow think it should at least not reboot
>> during DOS conditions…). Cake allows to specify a hard memory limit
>> that looks a skb-truesize to allow stricter memory control for people
>> like me, making the whole GRO/GSO issue go away, at least as far as
>> OOM is concerned. And as I begin to understand once a link reaches a
>> certain bandwidth GRO/GSO will become helpful again, especially on
>> small routers as they help reduce the work the kernel needs to to per
>> byte…
> 
> Angles of attack :
> 
> 1) I will provide a per device /sys/class/net/eth0/gro_max_frags so that
> we can more easily control amount of segs per GRO packets. It makes
> sense to have GRO, but not so much allowing it to cook big packets that
> might hurt FQ.

	This sounds great, so we can teach, say sqm to set this to a reasonable value given the (shaped) bandwidth of a given interface. Would something like this also make sense/is possible on the send side for GSO/TSO?

> 
> 2) Tracking skb->truesize looks mandatory for small devices.
> I will add this to fq_codel.

	Thanks.

> 
> 3) Making sure skb->truesize is accurate is a long term effort we want
> to constantly monitor, since some drivers are doing under estimations.

	Is there an easy way to test/measure this? Say if 2) is implemented how can one figure out how much memory is actually allocated for a given queue?

Best Regards & Merci beaucoup
	Sebastian

> 
> Thanks.
> 
> 
> 
> 
> 


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH net-next] fq_codel: add memory limitation per queue
  2016-05-06 13:25                                   ` Eric Dumazet
  2016-05-06 15:25                                     ` moeller0
@ 2016-05-06 15:55                                     ` Eric Dumazet
  2016-05-09  3:49                                       ` David Miller
                                                         ` (2 more replies)
  1 sibling, 3 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-06 15:55 UTC (permalink / raw)
  To: David Miller; +Cc: Jesper Dangaard Brouer, Dave Täht, netdev, moeller0

From: Eric Dumazet <edumazet@google.com>

On small embedded routers, one wants to control maximal amount of
memory used by fq_codel, instead of controlling number of packets or
bytes, since GRO/TSO make these not practical.

Assuming skb->truesize is accurate, we have to keep track of
skb->truesize sum for skbs in queue.

This patch adds a new TCA_FQ_CODEL_MEMORY_LIMIT attribute.

I chose a default value of 32 MBytes, which looks reasonable even
for heavy duty usages. (Prior fq_codel users should not be hurt
when they upgrade their kernels)

Two fields are added to tc_fq_codel_qd_stats to report :
 - Current memory usage
 - Number of drops caused by memory limits

# tc qd replace dev eth1 root est 1sec 4sec fq_codel memory_limit 4M
..
# tc -s -d qd sh dev eth1
qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024
 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0
requeues 21705223) 
 rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223 
  maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414
  ecn_mark 0 memory_used 4190048 drop_overmemory 4994406
  new_flows_len 1 old_flows_len 177


Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Dave Täht <dave.taht@gmail.com>
Cc: Sebastian Möller <moeller0@gmx.de>
---
 include/uapi/linux/pkt_sched.h |    3 +++
 net/sched/sch_fq_codel.c       |   27 ++++++++++++++++++++++++---
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index a11afecd4482..2382eed50278 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -719,6 +719,7 @@ enum {
 	TCA_FQ_CODEL_QUANTUM,
 	TCA_FQ_CODEL_CE_THRESHOLD,
 	TCA_FQ_CODEL_DROP_BATCH_SIZE,
+	TCA_FQ_CODEL_MEMORY_LIMIT,
 	__TCA_FQ_CODEL_MAX
 };
 
@@ -743,6 +744,8 @@ struct tc_fq_codel_qd_stats {
 	__u32	new_flows_len;	/* count of flows in new list */
 	__u32	old_flows_len;	/* count of flows in old list */
 	__u32	ce_mark;	/* packets above ce_threshold */
+	__u32	memory_usage;	/* in bytes */
+	__u32	drop_overmemory;
 };
 
 struct tc_fq_codel_cl_stats {
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index e7b42b0d5145..bb8bd9314629 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -60,8 +60,11 @@ struct fq_codel_sched_data {
 	u32		perturbation;	/* hash perturbation */
 	u32		quantum;	/* psched_mtu(qdisc_dev(sch)); */
 	u32		drop_batch_size;
+	u32		memory_limit;
 	struct codel_params cparams;
 	struct codel_stats cstats;
+	u32		memory_usage;
+	u32		drop_overmemory;
 	u32		drop_overlimit;
 	u32		new_flow_count;
 
@@ -143,6 +146,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets)
 	unsigned int maxbacklog = 0, idx = 0, i, len;
 	struct fq_codel_flow *flow;
 	unsigned int threshold;
+	unsigned int mem = 0;
 
 	/* Queue is full! Find the fat flow and drop packet(s) from it.
 	 * This might sound expensive, but with 1024 flows, we scan
@@ -167,11 +171,13 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets)
 	do {
 		skb = dequeue_head(flow);
 		len += qdisc_pkt_len(skb);
+		mem += skb->truesize;
 		kfree_skb(skb);
 	} while (++i < max_packets && len < threshold);
 
 	flow->dropped += i;
 	q->backlogs[idx] -= len;
+	q->memory_usage -= mem;
 	sch->qstats.drops += i;
 	sch->qstats.backlog -= len;
 	sch->q.qlen -= i;
@@ -193,6 +199,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	unsigned int idx, prev_backlog, prev_qlen;
 	struct fq_codel_flow *flow;
 	int uninitialized_var(ret);
+	bool memory_limited;
 
 	idx = fq_codel_classify(skb, sch, &ret);
 	if (idx == 0) {
@@ -215,7 +222,9 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		flow->deficit = q->quantum;
 		flow->dropped = 0;
 	}
-	if (++sch->q.qlen <= sch->limit)
+	q->memory_usage += skb->truesize;
+	memory_limited = q->memory_usage > q->memory_limit;
+	if (++sch->q.qlen <= sch->limit && !memory_limited)
 		return NET_XMIT_SUCCESS;
 
 	prev_backlog = sch->qstats.backlog;
@@ -229,7 +238,8 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	ret = fq_codel_drop(sch, q->drop_batch_size);
 
 	q->drop_overlimit += prev_qlen - sch->q.qlen;
-
+	if (memory_limited)
+		q->drop_overmemory += prev_qlen - sch->q.qlen;
 	/* As we dropped packet(s), better let upper stack know this */
 	qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
 				  prev_backlog - sch->qstats.backlog);
@@ -308,6 +318,7 @@ begin:
 			list_del_init(&flow->flowchain);
 		goto begin;
 	}
+	q->memory_usage -= skb->truesize;
 	qdisc_bstats_update(sch, skb);
 	flow->deficit -= qdisc_pkt_len(skb);
 	/* We cant call qdisc_tree_reduce_backlog() if our qlen is 0,
@@ -355,6 +366,7 @@ static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {
 	[TCA_FQ_CODEL_QUANTUM]	= { .type = NLA_U32 },
 	[TCA_FQ_CODEL_CE_THRESHOLD] = { .type = NLA_U32 },
 	[TCA_FQ_CODEL_DROP_BATCH_SIZE] = { .type = NLA_U32 },
+	[TCA_FQ_CODEL_MEMORY_LIMIT] = { .type = NLA_U32 },
 };
 
 static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
@@ -409,7 +421,11 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
 	if (tb[TCA_FQ_CODEL_DROP_BATCH_SIZE])
 		q->drop_batch_size = min(1U, nla_get_u32(tb[TCA_FQ_CODEL_DROP_BATCH_SIZE]));
 
-	while (sch->q.qlen > sch->limit) {
+	if (tb[TCA_FQ_CODEL_MEMORY_LIMIT])
+		q->memory_limit = min(1U << 31, nla_get_u32(tb[TCA_FQ_CODEL_MEMORY_LIMIT]));
+
+	while (sch->q.qlen > sch->limit ||
+	       q->memory_usage > q->memory_limit) {
 		struct sk_buff *skb = fq_codel_dequeue(sch);
 
 		q->cstats.drop_len += qdisc_pkt_len(skb);
@@ -454,6 +470,7 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
 
 	sch->limit = 10*1024;
 	q->flows_cnt = 1024;
+	q->memory_limit = 32 << 20; /* 32 MBytes */
 	q->drop_batch_size = 64;
 	q->quantum = psched_mtu(qdisc_dev(sch));
 	q->perturbation = prandom_u32();
@@ -515,6 +532,8 @@ static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb)
 			q->quantum) ||
 	    nla_put_u32(skb, TCA_FQ_CODEL_DROP_BATCH_SIZE,
 			q->drop_batch_size) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_MEMORY_LIMIT,
+			q->memory_limit) ||
 	    nla_put_u32(skb, TCA_FQ_CODEL_FLOWS,
 			q->flows_cnt))
 		goto nla_put_failure;
@@ -543,6 +562,8 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 	st.qdisc_stats.ecn_mark = q->cstats.ecn_mark;
 	st.qdisc_stats.new_flow_count = q->new_flow_count;
 	st.qdisc_stats.ce_mark = q->cstats.ce_mark;
+	st.qdisc_stats.memory_usage  = q->memory_usage;
+	st.qdisc_stats.drop_overmemory = q->drop_overmemory;
 
 	list_for_each(pos, &q->new_flows)
 		st.qdisc_stats.new_flows_len++;

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 15:25                                     ` moeller0
@ 2016-05-06 15:58                                       ` Eric Dumazet
  2016-05-06 16:30                                         ` moeller0
  0 siblings, 1 reply; 108+ messages in thread
From: Eric Dumazet @ 2016-05-06 15:58 UTC (permalink / raw)
  To: moeller0
  Cc: make-wifi-fast, Dave Täht, ath10k, codel,
	Jesper Dangaard Brouer, Jonathan Morton, Roman Yeryomin

On Fri, 2016-05-06 at 17:25 +0200, moeller0 wrote:
> Hi Eric,
> 
> > On May 6, 2016, at 15:25 , Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > Angles of attack :
> > 
> > 1) I will provide a per device /sys/class/net/eth0/gro_max_frags so that
> > we can more easily control amount of segs per GRO packets. It makes
> > sense to have GRO, but not so much allowing it to cook big packets that
> > might hurt FQ.
> 
> 	This sounds great, so we can teach, say sqm to set this to a
> reasonable value given the (shaped) bandwidth of a given interface.
> Would something like this also make sense/is possible on the send side
> for GSO/TSO?

Problem of doing this on the send side, is that too big GRO packets
would need to be segmented _before_ reaching qdisc, and we do not have
such support. (The segmentation happens after qdisc before hitting
device)

In any case, that would be more cpu cycles. It is probably better to
control GRO sizes to optimal values.

I posted the fq_codel patch to netdev :

 https://patchwork.ozlabs.org/patch/619344/



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Codel] fq_codel_drop vs a udp flood
  2016-05-06 15:58                                       ` Eric Dumazet
@ 2016-05-06 16:30                                         ` moeller0
  0 siblings, 0 replies; 108+ messages in thread
From: moeller0 @ 2016-05-06 16:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: make-wifi-fast, Dave Täht, ath10k, codel,
	Jesper Dangaard Brouer, Jonathan Morton, Roman Yeryomin

Hi Eric,

> On May 6, 2016, at 17:58 , Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> On Fri, 2016-05-06 at 17:25 +0200, moeller0 wrote:
>> Hi Eric,
>> 
>>> On May 6, 2016, at 15:25 , Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>>> Angles of attack :
>>> 
>>> 1) I will provide a per device /sys/class/net/eth0/gro_max_frags so that
>>> we can more easily control amount of segs per GRO packets. It makes
>>> sense to have GRO, but not so much allowing it to cook big packets that
>>> might hurt FQ.
>> 
>> 	This sounds great, so we can teach, say sqm to set this to a
>> reasonable value given the (shaped) bandwidth of a given interface.
>> Would something like this also make sense/is possible on the send side
>> for GSO/TSO?
> 
> Problem of doing this on the send side, is that too big GRO packets
> would need to be segmented _before_ reaching qdisc, and we do not have
> such support. (The segmentation happens after qdisc before hitting
> device)

	Ah, so not really possible then.

> 
> In any case, that would be more cpu cycles. It is probably better to
> control GRO sizes to optimal values.

	I guess we can always limit the GRO segments on the internal ingress interfaces, so that the external egress interface never sees too “long” (as in required transmission time) aggregates/super-packets.

> 
> I posted the fq_codel patch to netdev :
> 
> https://patchwork.ozlabs.org/patch/619344/

	Saw this, to quote Stimpy “happy happy joy joy”

Best Regards
	Sebastian

> 
> 


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-06 12:47                               ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Jesper Dangaard Brouer
@ 2016-05-06 18:43                                 ` Roman Yeryomin
  -1 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-06 18:43 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: make-wifi-fast, Rafał Miłecki, ath10k, codel, netdev,
	Jonathan Morton, OpenWrt Development List, Felix Fietkau

On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
> is in some kind of conflict.
>
> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>
> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335

OK, so, after porting the patch to 4.1 openwrt kernel and playing a
bit with fq_codel limits I was able to get 420Mbps UDP like this:
tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256

This is certainly better than 30Mbps but still more than two times
less than before (900).
TCP also improved a little (550 to ~590).

Felix, others, do you want to see the ported patch, maybe I did something wrong?
Doesn't look like it will save ath10k from performance regression.

>
> On Fri, 6 May 2016 11:42:43 +0200
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
>> Hi Felix,
>>
>> This is an important fix for OpenWRT, please read!
>>
>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>> without also adjusting q->flows_cnt.  Eric explains below that you must
>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>> adjust it to 128)
>>
>> Problematic OpenWRT commit in question:
>>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>
>>
>> I also highly recommend you cherry-pick this very recent commit:
>>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>  https://git.kernel.org/davem/net-next/c/9d18562a227
>>
>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>> The problem is that drop mode was considered rare, and implementation
>> wise it was chosen to be more expensive (to save cycles on normal mode).
>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>> especially expensive for smaller devices, as it scans a 4K big array,
>> thus 64 cache misses for small devices!
>>
>> The fix is to allow drop-mode to bulk-drop more packets when entering
>> drop-mode (default 64 bulk drop).  That way we don't suddenly
>> experience a significantly higher processing cost per packet, but
>> instead can amortize this.
>>
>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>> of memory to scan is less, but their CPU is also much smaller).
>>
>> --Jesper
>>
>>
>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>> > > >
>> > > >>
>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>> > > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>> > > >>  backlog 0b 0p requeues 0
>> > > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>> > > >>   new_flows_len 0 old_flows_len 0
>> > > >
>> > > >
>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>> > > >
>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>> > > > which is almost the same than having no queue at all)
>> > > >
>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>> > > > chance to trigger.
>> > > >
>> > > > So you could either reduce number of buckets to 128 (if memory is
>> > > > tight), or increase limit to 8192.
>> > >
>> > > Will try, but what I've posted is default, I didn't change/configure that.
>> >
>> > fq_codel has a default of 10240 packets and 1024 buckets.
>> >
>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>> >
>> > If someone changed that in the linux variant you use, he probably should
>> > explain the rationale.
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-06 18:43                                 ` Roman Yeryomin
  0 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-06 18:43 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: make-wifi-fast, Rafał Miłecki, Dave Taht, ath10k,
	codel, netdev, Jonathan Morton, OpenWrt Development List,
	Felix Fietkau

On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
> is in some kind of conflict.
>
> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>
> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335

OK, so, after porting the patch to 4.1 openwrt kernel and playing a
bit with fq_codel limits I was able to get 420Mbps UDP like this:
tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256

This is certainly better than 30Mbps but still more than two times
less than before (900).
TCP also improved a little (550 to ~590).

Felix, others, do you want to see the ported patch, maybe I did something wrong?
Doesn't look like it will save ath10k from performance regression.

>
> On Fri, 6 May 2016 11:42:43 +0200
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
>> Hi Felix,
>>
>> This is an important fix for OpenWRT, please read!
>>
>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>> without also adjusting q->flows_cnt.  Eric explains below that you must
>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>> adjust it to 128)
>>
>> Problematic OpenWRT commit in question:
>>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>
>>
>> I also highly recommend you cherry-pick this very recent commit:
>>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>  https://git.kernel.org/davem/net-next/c/9d18562a227
>>
>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>> The problem is that drop mode was considered rare, and implementation
>> wise it was chosen to be more expensive (to save cycles on normal mode).
>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>> especially expensive for smaller devices, as it scans a 4K big array,
>> thus 64 cache misses for small devices!
>>
>> The fix is to allow drop-mode to bulk-drop more packets when entering
>> drop-mode (default 64 bulk drop).  That way we don't suddenly
>> experience a significantly higher processing cost per packet, but
>> instead can amortize this.
>>
>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>> of memory to scan is less, but their CPU is also much smaller).
>>
>> --Jesper
>>
>>
>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>> > > >
>> > > >>
>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>> > > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>> > > >>  backlog 0b 0p requeues 0
>> > > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>> > > >>   new_flows_len 0 old_flows_len 0
>> > > >
>> > > >
>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>> > > >
>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>> > > > which is almost the same than having no queue at all)
>> > > >
>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>> > > > chance to trigger.
>> > > >
>> > > > So you could either reduce number of buckets to 128 (if memory is
>> > > > tight), or increase limit to 8192.
>> > >
>> > > Will try, but what I've posted is default, I didn't change/configure that.
>> >
>> > fq_codel has a default of 10240 packets and 1024 buckets.
>> >
>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>> >
>> > If someone changed that in the linux variant you use, he probably should
>> > explain the rationale.
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-06 18:43                                 ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
@ 2016-05-06 18:56                                   ` Roman Yeryomin
  -1 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-06 18:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: make-wifi-fast, Rafał Miłecki, ath10k, codel, netdev,
	Jonathan Morton, OpenWrt Development List, Felix Fietkau

On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>
>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>> is in some kind of conflict.
>>
>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>
>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>
> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
> bit with fq_codel limits I was able to get 420Mbps UDP like this:
> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256

Forgot to mention, I've reduced drop_batch_size down to 32

> This is certainly better than 30Mbps but still more than two times
> less than before (900).
> TCP also improved a little (550 to ~590).
>
> Felix, others, do you want to see the ported patch, maybe I did something wrong?
> Doesn't look like it will save ath10k from performance regression.
>
>>
>> On Fri, 6 May 2016 11:42:43 +0200
>> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>
>>> Hi Felix,
>>>
>>> This is an important fix for OpenWRT, please read!
>>>
>>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>>> without also adjusting q->flows_cnt.  Eric explains below that you must
>>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>>> adjust it to 128)
>>>
>>> Problematic OpenWRT commit in question:
>>>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>>
>>>
>>> I also highly recommend you cherry-pick this very recent commit:
>>>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>>  https://git.kernel.org/davem/net-next/c/9d18562a227
>>>
>>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>>> The problem is that drop mode was considered rare, and implementation
>>> wise it was chosen to be more expensive (to save cycles on normal mode).
>>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>>> especially expensive for smaller devices, as it scans a 4K big array,
>>> thus 64 cache misses for small devices!
>>>
>>> The fix is to allow drop-mode to bulk-drop more packets when entering
>>> drop-mode (default 64 bulk drop).  That way we don't suddenly
>>> experience a significantly higher processing cost per packet, but
>>> instead can amortize this.
>>>
>>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>>> of memory to scan is less, but their CPU is also much smaller).
>>>
>>> --Jesper
>>>
>>>
>>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>>> > > >
>>> > > >>
>>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>>> > > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>> > > >>  backlog 0b 0p requeues 0
>>> > > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>> > > >>   new_flows_len 0 old_flows_len 0
>>> > > >
>>> > > >
>>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>>> > > >
>>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>>> > > > which is almost the same than having no queue at all)
>>> > > >
>>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>>> > > > chance to trigger.
>>> > > >
>>> > > > So you could either reduce number of buckets to 128 (if memory is
>>> > > > tight), or increase limit to 8192.
>>> > >
>>> > > Will try, but what I've posted is default, I didn't change/configure that.
>>> >
>>> > fq_codel has a default of 10240 packets and 1024 buckets.
>>> >
>>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>>> >
>>> > If someone changed that in the linux variant you use, he probably should
>>> > explain the rationale.
>>
>> --
>> Best regards,
>>   Jesper Dangaard Brouer
>>   MSc.CS, Principal Kernel Engineer at Red Hat
>>   Author of http://www.iptv-analyzer.org
>>   LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-06 18:56                                   ` Roman Yeryomin
  0 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-06 18:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: make-wifi-fast, Rafał Miłecki, Dave Taht, ath10k,
	codel, netdev, Jonathan Morton, OpenWrt Development List,
	Felix Fietkau

On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>
>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>> is in some kind of conflict.
>>
>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>
>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>
> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
> bit with fq_codel limits I was able to get 420Mbps UDP like this:
> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256

Forgot to mention, I've reduced drop_batch_size down to 32

> This is certainly better than 30Mbps but still more than two times
> less than before (900).
> TCP also improved a little (550 to ~590).
>
> Felix, others, do you want to see the ported patch, maybe I did something wrong?
> Doesn't look like it will save ath10k from performance regression.
>
>>
>> On Fri, 6 May 2016 11:42:43 +0200
>> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>
>>> Hi Felix,
>>>
>>> This is an important fix for OpenWRT, please read!
>>>
>>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>>> without also adjusting q->flows_cnt.  Eric explains below that you must
>>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>>> adjust it to 128)
>>>
>>> Problematic OpenWRT commit in question:
>>>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>>
>>>
>>> I also highly recommend you cherry-pick this very recent commit:
>>>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>>  https://git.kernel.org/davem/net-next/c/9d18562a227
>>>
>>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>>> The problem is that drop mode was considered rare, and implementation
>>> wise it was chosen to be more expensive (to save cycles on normal mode).
>>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>>> especially expensive for smaller devices, as it scans a 4K big array,
>>> thus 64 cache misses for small devices!
>>>
>>> The fix is to allow drop-mode to bulk-drop more packets when entering
>>> drop-mode (default 64 bulk drop).  That way we don't suddenly
>>> experience a significantly higher processing cost per packet, but
>>> instead can amortize this.
>>>
>>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>>> of memory to scan is less, but their CPU is also much smaller).
>>>
>>> --Jesper
>>>
>>>
>>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>>> > > >
>>> > > >>
>>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>>> > > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>> > > >>  backlog 0b 0p requeues 0
>>> > > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>> > > >>   new_flows_len 0 old_flows_len 0
>>> > > >
>>> > > >
>>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>>> > > >
>>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>>> > > > which is almost the same than having no queue at all)
>>> > > >
>>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>>> > > > chance to trigger.
>>> > > >
>>> > > > So you could either reduce number of buckets to 128 (if memory is
>>> > > > tight), or increase limit to 8192.
>>> > >
>>> > > Will try, but what I've posted is default, I didn't change/configure that.
>>> >
>>> > fq_codel has a default of 10240 packets and 1024 buckets.
>>> >
>>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>>> >
>>> > If someone changed that in the linux variant you use, he probably should
>>> > explain the rationale.
>>
>> --
>> Best regards,
>>   Jesper Dangaard Brouer
>>   MSc.CS, Principal Kernel Engineer at Red Hat
>>   Author of http://www.iptv-analyzer.org
>>   LinkedIn: http://www.linkedin.com/in/brouer

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-06 18:56                                   ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
@ 2016-05-06 19:43                                     ` Dave Taht
  -1 siblings, 0 replies; 108+ messages in thread
From: Dave Taht @ 2016-05-06 19:43 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: make-wifi-fast, Rafał Miłecki, ath10k, netdev, codel,
	Jonathan Morton, OpenWrt Development List, Felix Fietkau

On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>
>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>> is in some kind of conflict.
>>>
>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>
>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>
>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>
> Forgot to mention, I've reduced drop_batch_size down to 32

0) Not clear to me if that's the right line, there are 4 wifi queues,
and the third one
is the BE queue. That is too low a limit, also, for normal use. And:
for the purpose of this particular UDP test, flows 16 is ok, but not
ideal.

1) What's the tcp number (with a simultaneous ping) with this latest patchset?
(I care about tcp performance a lot more than udp floods - surviving a
udp flood yes, performance, no)

before/after?

tc -s qdisc show dev wlan0 during/after results?

IF you are doing builds for the archer c7v2, I can join in on this... (?)

I did do a test of the ath10k "before", fq_codel *never engaged*, and
tcp induced latencies under load, e at 100mbit, cracked 600ms, while
staying flat (20ms) at 100mbit. (not the same patches you are testing)
on x86. I have got tcp 300Mbit out of an osx box, similar latency,
have yet to get anything more on anything I currently have
before/after patchsets.

I'll go add flooding to the tests, I just finished a series comparing
two different speed stations and life was good on that.

"before" - fq_codel never engages, we see seconds of latency under load.

root@apu2:~# tc -s qdisc show dev wlp4s0
qdisc mq 0: root
 Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
 Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
 Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
 Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
  new_flows_len 1 old_flows_len 3
qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
 Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
  new_flows_len 1 old_flows_len 0
  ```


>> This is certainly better than 30Mbps but still more than two times
>> less than before (900).

The number that I still am not sure we got is that you were sending
900mbit udp and recieving 900mbit on the prior tests?

>> TCP also improved a little (550 to ~590).

The limit is probably a bit low, also.  You might want to try target
20ms as well.

>>
>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>> Doesn't look like it will save ath10k from performance regression.

what was tcp "before"? (I'm sorry, such a long thread)

>>
>>>
>>> On Fri, 6 May 2016 11:42:43 +0200
>>> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>
>>>> Hi Felix,
>>>>
>>>> This is an important fix for OpenWRT, please read!
>>>>
>>>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>>>> without also adjusting q->flows_cnt.  Eric explains below that you must
>>>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>>>> adjust it to 128)
>>>>
>>>> Problematic OpenWRT commit in question:
>>>>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>>>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>>>
>>>>
>>>> I also highly recommend you cherry-pick this very recent commit:
>>>>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>>>  https://git.kernel.org/davem/net-next/c/9d18562a227
>>>>
>>>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>>>> The problem is that drop mode was considered rare, and implementation
>>>> wise it was chosen to be more expensive (to save cycles on normal mode).
>>>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>>>> especially expensive for smaller devices, as it scans a 4K big array,
>>>> thus 64 cache misses for small devices!
>>>>
>>>> The fix is to allow drop-mode to bulk-drop more packets when entering
>>>> drop-mode (default 64 bulk drop).  That way we don't suddenly
>>>> experience a significantly higher processing cost per packet, but
>>>> instead can amortize this.
>>>>
>>>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>>>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>>>> of memory to scan is less, but their CPU is also much smaller).
>>>>
>>>> --Jesper
>>>>
>>>>
>>>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>
>>>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>>>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>>>> > > >
>>>> > > >>
>>>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>>>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>>>> > > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>>> > > >>  backlog 0b 0p requeues 0
>>>> > > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>> > > >>   new_flows_len 0 old_flows_len 0
>>>> > > >
>>>> > > >
>>>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>>>> > > >
>>>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>>>> > > > which is almost the same than having no queue at all)
>>>> > > >
>>>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>>>> > > > chance to trigger.
>>>> > > >
>>>> > > > So you could either reduce number of buckets to 128 (if memory is
>>>> > > > tight), or increase limit to 8192.
>>>> > >
>>>> > > Will try, but what I've posted is default, I didn't change/configure that.
>>>> >
>>>> > fq_codel has a default of 10240 packets and 1024 buckets.
>>>> >
>>>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>>>> >
>>>> > If someone changed that in the linux variant you use, he probably should
>>>> > explain the rationale.
>>>
>>> --
>>> Best regards,
>>>   Jesper Dangaard Brouer
>>>   MSc.CS, Principal Kernel Engineer at Red Hat
>>>   Author of http://www.iptv-analyzer.org
>>>   LinkedIn: http://www.linkedin.com/in/brouer



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-06 19:43                                     ` Dave Taht
  0 siblings, 0 replies; 108+ messages in thread
From: Dave Taht @ 2016-05-06 19:43 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: make-wifi-fast, Rafał Miłecki, ath10k, netdev, codel,
	Jesper Dangaard Brouer, Jonathan Morton,
	OpenWrt Development List, Felix Fietkau

On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>
>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>> is in some kind of conflict.
>>>
>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>
>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>
>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>
> Forgot to mention, I've reduced drop_batch_size down to 32

0) Not clear to me if that's the right line, there are 4 wifi queues,
and the third one
is the BE queue. That is too low a limit, also, for normal use. And:
for the purpose of this particular UDP test, flows 16 is ok, but not
ideal.

1) What's the tcp number (with a simultaneous ping) with this latest patchset?
(I care about tcp performance a lot more than udp floods - surviving a
udp flood yes, performance, no)

before/after?

tc -s qdisc show dev wlan0 during/after results?

IF you are doing builds for the archer c7v2, I can join in on this... (?)

I did do a test of the ath10k "before", fq_codel *never engaged*, and
tcp induced latencies under load, e at 100mbit, cracked 600ms, while
staying flat (20ms) at 100mbit. (not the same patches you are testing)
on x86. I have got tcp 300Mbit out of an osx box, similar latency,
have yet to get anything more on anything I currently have
before/after patchsets.

I'll go add flooding to the tests, I just finished a series comparing
two different speed stations and life was good on that.

"before" - fq_codel never engages, we see seconds of latency under load.

root@apu2:~# tc -s qdisc show dev wlp4s0
qdisc mq 0: root
 Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
 Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
 Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
 Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
  new_flows_len 1 old_flows_len 3
qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
 Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
  new_flows_len 1 old_flows_len 0
  ```


>> This is certainly better than 30Mbps but still more than two times
>> less than before (900).

The number that I still am not sure we got is that you were sending
900mbit udp and recieving 900mbit on the prior tests?

>> TCP also improved a little (550 to ~590).

The limit is probably a bit low, also.  You might want to try target
20ms as well.

>>
>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>> Doesn't look like it will save ath10k from performance regression.

what was tcp "before"? (I'm sorry, such a long thread)

>>
>>>
>>> On Fri, 6 May 2016 11:42:43 +0200
>>> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>
>>>> Hi Felix,
>>>>
>>>> This is an important fix for OpenWRT, please read!
>>>>
>>>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>>>> without also adjusting q->flows_cnt.  Eric explains below that you must
>>>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>>>> adjust it to 128)
>>>>
>>>> Problematic OpenWRT commit in question:
>>>>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>>>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>>>
>>>>
>>>> I also highly recommend you cherry-pick this very recent commit:
>>>>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>>>  https://git.kernel.org/davem/net-next/c/9d18562a227
>>>>
>>>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>>>> The problem is that drop mode was considered rare, and implementation
>>>> wise it was chosen to be more expensive (to save cycles on normal mode).
>>>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>>>> especially expensive for smaller devices, as it scans a 4K big array,
>>>> thus 64 cache misses for small devices!
>>>>
>>>> The fix is to allow drop-mode to bulk-drop more packets when entering
>>>> drop-mode (default 64 bulk drop).  That way we don't suddenly
>>>> experience a significantly higher processing cost per packet, but
>>>> instead can amortize this.
>>>>
>>>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>>>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>>>> of memory to scan is less, but their CPU is also much smaller).
>>>>
>>>> --Jesper
>>>>
>>>>
>>>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>
>>>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>>>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>>>> > > >
>>>> > > >>
>>>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>>>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>>>> > > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>>> > > >>  backlog 0b 0p requeues 0
>>>> > > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>> > > >>   new_flows_len 0 old_flows_len 0
>>>> > > >
>>>> > > >
>>>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>>>> > > >
>>>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>>>> > > > which is almost the same than having no queue at all)
>>>> > > >
>>>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>>>> > > > chance to trigger.
>>>> > > >
>>>> > > > So you could either reduce number of buckets to 128 (if memory is
>>>> > > > tight), or increase limit to 8192.
>>>> > >
>>>> > > Will try, but what I've posted is default, I didn't change/configure that.
>>>> >
>>>> > fq_codel has a default of 10240 packets and 1024 buckets.
>>>> >
>>>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>>>> >
>>>> > If someone changed that in the linux variant you use, he probably should
>>>> > explain the rationale.
>>>
>>> --
>>> Best regards,
>>>   Jesper Dangaard Brouer
>>>   MSc.CS, Principal Kernel Engineer at Red Hat
>>>   Author of http://www.iptv-analyzer.org
>>>   LinkedIn: http://www.linkedin.com/in/brouer



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
  2016-05-06  9:42                             ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Jesper Dangaard Brouer
  (?)
  (?)
@ 2016-05-07  9:57                             ` Kevin Darbyshire-Bryant
  2016-05-15 22:47                                 ` Roman Yeryomin
  -1 siblings, 1 reply; 108+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-05-07  9:57 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Eric Dumazet, Felix Fietkau, Dave Taht
  Cc: make-wifi-fast, zajec5, ath10k, codel, netdev, Jonathan Morton,
	Roman Yeryomin


[-- Attachment #1.1: Type: text/plain, Size: 1814 bytes --]



On 06/05/16 10:42, Jesper Dangaard Brouer wrote:
> Hi Felix,
>
> This is an important fix for OpenWRT, please read!
>
> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
> without also adjusting q->flows_cnt.  Eric explains below that you must
> also adjust the buckets (q->flows_cnt) for this not to break. (Just
> adjust it to 128)
>
> Problematic OpenWRT commit in question:
>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
I 'pull requested' this to the lede-staging tree on github.
https://github.com/lede-project/staging/pull/11

One way or another Felix & co should see the change :-)
>
>
> I also highly recommend you cherry-pick this very recent commit:
>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>  https://git.kernel.org/davem/net-next/c/9d18562a227
>
> This should fix very high CPU usage in-case fq_codel goes into drop mode.
> The problem is that drop mode was considered rare, and implementation
> wise it was chosen to be more expensive (to save cycles on normal mode).
> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
> especially expensive for smaller devices, as it scans a 4K big array,
> thus 64 cache misses for small devices!
>
> The fix is to allow drop-mode to bulk-drop more packets when entering
> drop-mode (default 64 bulk drop).  That way we don't suddenly
> experience a significantly higher processing cost per packet, but
> instead can amortize this.
I haven't done the above cherry-pick patch & backport patch creation for
4.4/4.1/3.18 yet - maybe if $dayjob permits time and no one else beats
me to it :-)

Kevin


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4816 bytes --]

[-- Attachment #2: Type: text/plain, Size: 146 bytes --]

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH net-next] fq_codel: add memory limitation per queue
  2016-05-06 15:55                                     ` [PATCH net-next] fq_codel: add memory limitation per queue Eric Dumazet
@ 2016-05-09  3:49                                       ` David Miller
  2016-05-09  4:14                                       ` Cong Wang
  2016-05-16  1:16                                       ` [PATCH net-next] fq_codel: fix memory limitation drift Eric Dumazet
  2 siblings, 0 replies; 108+ messages in thread
From: David Miller @ 2016-05-09  3:49 UTC (permalink / raw)
  To: eric.dumazet; +Cc: brouer, dave.taht, netdev, moeller0

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 06 May 2016 08:55:12 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> On small embedded routers, one wants to control maximal amount of
> memory used by fq_codel, instead of controlling number of packets or
> bytes, since GRO/TSO make these not practical.
> 
> Assuming skb->truesize is accurate, we have to keep track of
> skb->truesize sum for skbs in queue.
> 
> This patch adds a new TCA_FQ_CODEL_MEMORY_LIMIT attribute.
> 
> I chose a default value of 32 MBytes, which looks reasonable even
> for heavy duty usages. (Prior fq_codel users should not be hurt
> when they upgrade their kernels)
> 
> Two fields are added to tc_fq_codel_qd_stats to report :
>  - Current memory usage
>  - Number of drops caused by memory limits
> 
> # tc qd replace dev eth1 root est 1sec 4sec fq_codel memory_limit 4M
> ..
> # tc -s -d qd sh dev eth1
> qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024
>  quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
>  Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0
> requeues 21705223) 
>  rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223 
>   maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414
>   ecn_mark 0 memory_used 4190048 drop_overmemory 4994406
>   new_flows_len 1 old_flows_len 177
> 
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH net-next] fq_codel: add memory limitation per queue
  2016-05-06 15:55                                     ` [PATCH net-next] fq_codel: add memory limitation per queue Eric Dumazet
  2016-05-09  3:49                                       ` David Miller
@ 2016-05-09  4:14                                       ` Cong Wang
  2016-05-09  4:31                                         ` Eric Dumazet
  2016-05-16  1:16                                       ` [PATCH net-next] fq_codel: fix memory limitation drift Eric Dumazet
  2 siblings, 1 reply; 108+ messages in thread
From: Cong Wang @ 2016-05-09  4:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev, moeller0

On Fri, May 6, 2016 at 8:55 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> @@ -193,6 +199,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>         unsigned int idx, prev_backlog, prev_qlen;
>         struct fq_codel_flow *flow;
>         int uninitialized_var(ret);
> +       bool memory_limited;
>
>         idx = fq_codel_classify(skb, sch, &ret);
>         if (idx == 0) {
> @@ -215,7 +222,9 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>                 flow->deficit = q->quantum;
>                 flow->dropped = 0;
>         }
> -       if (++sch->q.qlen <= sch->limit)
> +       q->memory_usage += skb->truesize;
> +       memory_limited = q->memory_usage > q->memory_limit;
> +       if (++sch->q.qlen <= sch->limit && !memory_limited)
>                 return NET_XMIT_SUCCESS;
>
>         prev_backlog = sch->qstats.backlog;
> @@ -229,7 +238,8 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>         ret = fq_codel_drop(sch, q->drop_batch_size);
>
>         q->drop_overlimit += prev_qlen - sch->q.qlen;
> -
> +       if (memory_limited)
> +               q->drop_overmemory += prev_qlen - sch->q.qlen;

So when the packet is dropped due to memory over limit, should
we return failure for this case? Or I miss anything?

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH net-next] fq_codel: add memory limitation per queue
  2016-05-09  4:14                                       ` Cong Wang
@ 2016-05-09  4:31                                         ` Eric Dumazet
  2016-05-09  5:07                                           ` Cong Wang
  0 siblings, 1 reply; 108+ messages in thread
From: Eric Dumazet @ 2016-05-09  4:31 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev, moeller0

On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:

> So when the packet is dropped due to memory over limit, should
> we return failure for this case? Or I miss anything?

Same behavior than before.

If we dropped some packets of this flow, we return NET_XMIT_CN

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH net-next] fq_codel: add memory limitation per queue
  2016-05-09  4:31                                         ` Eric Dumazet
@ 2016-05-09  5:07                                           ` Cong Wang
  2016-05-09 14:26                                             ` Eric Dumazet
  0 siblings, 1 reply; 108+ messages in thread
From: Cong Wang @ 2016-05-09  5:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev, moeller0

On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
>
>> So when the packet is dropped due to memory over limit, should
>> we return failure for this case? Or I miss anything?
>
> Same behavior than before.
>
> If we dropped some packets of this flow, we return NET_XMIT_CN

I think for the limited memory case, the upper layer is supposed
to stop sending more packets when hitting the limit.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH net-next] fq_codel: add memory limitation per queue
  2016-05-09  5:07                                           ` Cong Wang
@ 2016-05-09 14:26                                             ` Eric Dumazet
  2016-05-10  4:34                                               ` Cong Wang
  0 siblings, 1 reply; 108+ messages in thread
From: Eric Dumazet @ 2016-05-09 14:26 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev, moeller0

On Sun, 2016-05-08 at 22:07 -0700, Cong Wang wrote:
> On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
> >
> >> So when the packet is dropped due to memory over limit, should
> >> we return failure for this case? Or I miss anything?
> >
> > Same behavior than before.
> >
> > If we dropped some packets of this flow, we return NET_XMIT_CN
> 
> I think for the limited memory case, the upper layer is supposed
> to stop sending more packets when hitting the limit.

They doe. NET_XMIT_CN for example aborts IP fragmentation.

TCP flows will also instantly react.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH net-next] fq_codel: add memory limitation per queue
  2016-05-09 14:26                                             ` Eric Dumazet
@ 2016-05-10  4:34                                               ` Cong Wang
  2016-05-10  4:45                                                 ` Eric Dumazet
  0 siblings, 1 reply; 108+ messages in thread
From: Cong Wang @ 2016-05-10  4:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev, moeller0

On Mon, May 9, 2016 at 7:26 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2016-05-08 at 22:07 -0700, Cong Wang wrote:
>> On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
>> >
>> >> So when the packet is dropped due to memory over limit, should
>> >> we return failure for this case? Or I miss anything?
>> >
>> > Same behavior than before.
>> >
>> > If we dropped some packets of this flow, we return NET_XMIT_CN
>>
>> I think for the limited memory case, the upper layer is supposed
>> to stop sending more packets when hitting the limit.
>
> They doe. NET_XMIT_CN for example aborts IP fragmentation.
>
> TCP flows will also instantly react.

But not for the NET_XMIT_SUCCESS case:

        return ret == idx ? NET_XMIT_CN : NET_XMIT_SUCCESS;

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH net-next] fq_codel: add memory limitation per queue
  2016-05-10  4:34                                               ` Cong Wang
@ 2016-05-10  4:45                                                 ` Eric Dumazet
  2016-05-10  4:57                                                   ` Cong Wang
  0 siblings, 1 reply; 108+ messages in thread
From: Eric Dumazet @ 2016-05-10  4:45 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev, moeller0

On Mon, 2016-05-09 at 21:34 -0700, Cong Wang wrote:
> On Mon, May 9, 2016 at 7:26 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Sun, 2016-05-08 at 22:07 -0700, Cong Wang wrote:
> >> On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> > On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
> >> >
> >> >> So when the packet is dropped due to memory over limit, should
> >> >> we return failure for this case? Or I miss anything?
> >> >
> >> > Same behavior than before.
> >> >
> >> > If we dropped some packets of this flow, we return NET_XMIT_CN
> >>
> >> I think for the limited memory case, the upper layer is supposed
> >> to stop sending more packets when hitting the limit.
> >
> > They doe. NET_XMIT_CN for example aborts IP fragmentation.
> >
> > TCP flows will also instantly react.
> 
> But not for the NET_XMIT_SUCCESS case:
> 
>         return ret == idx ? NET_XMIT_CN : NET_XMIT_SUCCESS;


I believe you missed whole point of FQ (SFQ, FQ_CODEL, FQ, ...)

If we dropped a packet of another flow because this other flow is an
elephant, why should we notify the mouse that we shot an elephant ?

We return NET_XMIT_SUCCESS because we properly queued this packet for
this flow. This is absolutely right.

If you do not like fq, just use pfifo, and yes you'll kill the mice.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH net-next] fq_codel: add memory limitation per queue
  2016-05-10  4:45                                                 ` Eric Dumazet
@ 2016-05-10  4:57                                                   ` Cong Wang
  2016-05-10  5:10                                                     ` Eric Dumazet
  0 siblings, 1 reply; 108+ messages in thread
From: Cong Wang @ 2016-05-10  4:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev, moeller0

On Mon, May 9, 2016 at 9:45 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-05-09 at 21:34 -0700, Cong Wang wrote:
>> On Mon, May 9, 2016 at 7:26 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Sun, 2016-05-08 at 22:07 -0700, Cong Wang wrote:
>> >> On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >> > On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
>> >> >
>> >> >> So when the packet is dropped due to memory over limit, should
>> >> >> we return failure for this case? Or I miss anything?
>> >> >
>> >> > Same behavior than before.
>> >> >
>> >> > If we dropped some packets of this flow, we return NET_XMIT_CN
>> >>
>> >> I think for the limited memory case, the upper layer is supposed
>> >> to stop sending more packets when hitting the limit.
>> >
>> > They doe. NET_XMIT_CN for example aborts IP fragmentation.
>> >
>> > TCP flows will also instantly react.
>>
>> But not for the NET_XMIT_SUCCESS case:
>>
>>         return ret == idx ? NET_XMIT_CN : NET_XMIT_SUCCESS;
>
>
> I believe you missed whole point of FQ (SFQ, FQ_CODEL, FQ, ...)
>
> If we dropped a packet of another flow because this other flow is an
> elephant, why should we notify the mouse that we shot an elephant ?
>
> We return NET_XMIT_SUCCESS because we properly queued this packet for
> this flow. This is absolutely right.
>

Sure, but we are talking about memory constraint case, aren't we?

If the whole system are suffering from memory pressure, the whole
qdisc should be halted?

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH net-next] fq_codel: add memory limitation per queue
  2016-05-10  4:57                                                   ` Cong Wang
@ 2016-05-10  5:10                                                     ` Eric Dumazet
  0 siblings, 0 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-10  5:10 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Jesper Dangaard Brouer, Dave Täht, netdev, moeller0

On Mon, 2016-05-09 at 21:57 -0700, Cong Wang wrote:

> Sure, but we are talking about memory constraint case, aren't we?
> 
> If the whole system are suffering from memory pressure, the whole
> qdisc should be halted?

Please read the patch again.

I added a mem control, exactly to control memory usage in the first
place. If the admin allows this qdisc to consume 4MBytes, then we can
queue up to 4 Mbytes on it.

If we evict packets from _other_ flow because of whatever limit is hit
(being number of packets or memory usage), we do not report to the
innocent guy that some packets were dropped.

The innocent guy packet _is_ queued and _should_ be sent eventually.

Of course, if we could predict the future and that 456 usec later, the
packet will be lost anyway, we would notify the innocent guy right away.

But this is left for future improvement.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-06 19:43                                     ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Dave Taht
@ 2016-05-15 22:34                                       ` Roman Yeryomin
  -1 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-15 22:34 UTC (permalink / raw)
  To: Dave Taht
  Cc: make-wifi-fast, Rafał Miłecki, ath10k, netdev, codel,
	Jonathan Morton, OpenWrt Development List, Felix Fietkau

On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>
>>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>>> is in some kind of conflict.
>>>>
>>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>>
>>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>>
>>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>>
>> Forgot to mention, I've reduced drop_batch_size down to 32
>
> 0) Not clear to me if that's the right line, there are 4 wifi queues,
> and the third one
> is the BE queue.

That was an example, sorry, should have stated that. I've applied same
settings to all 4 queues.

> That is too low a limit, also, for normal use. And:
> for the purpose of this particular UDP test, flows 16 is ok, but not
> ideal.

I played with different combinations, it doesn't make any
(significant) difference: 20-30Mbps, not more.
What numbers would you propose?

> 1) What's the tcp number (with a simultaneous ping) with this latest patchset?
> (I care about tcp performance a lot more than udp floods - surviving a
> udp flood yes, performance, no)

During the test (both TCP and UDP) it's roughly 5ms in average, not
running tests ~2ms. Actually I'm now wondering if target is working at
all, because I had same result with target 80ms..
So, yes, latency is good, but performance is poor.

> before/after?
>
> tc -s qdisc show dev wlan0 during/after results?

during the test:

qdisc mq 0: root
 Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17)
 backlog 1545794b 1021p requeues 17
qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
 backlog 1541252b 1018p requeues 17
  maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0


after the test (60sec):

qdisc mq 0: root
 Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
 backlog 0b 0p requeues 28
qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
 backlog 0b 0p requeues 28
  maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0


> IF you are doing builds for the archer c7v2, I can join in on this... (?)

I'm not but I have c7 somewhere, so I can do a build for it and also
test, so we are on the same page.

> I did do a test of the ath10k "before", fq_codel *never engaged*, and
> tcp induced latencies under load, e at 100mbit, cracked 600ms, while
> staying flat (20ms) at 100mbit. (not the same patches you are testing)
> on x86. I have got tcp 300Mbit out of an osx box, similar latency,
> have yet to get anything more on anything I currently have
> before/after patchsets.
>
> I'll go add flooding to the tests, I just finished a series comparing
> two different speed stations and life was good on that.
>
> "before" - fq_codel never engages, we see seconds of latency under load.
>
> root@apu2:~# tc -s qdisc show dev wlp4s0
> qdisc mq 0: root
>  Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
> target 5.0ms interval 100.0ms ecn
>  Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
> target 5.0ms interval 100.0ms ecn
>  Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>   new_flows_len 0 old_flows_len 1
> qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
> target 5.0ms interval 100.0ms ecn
>  Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
>   new_flows_len 1 old_flows_len 3
> qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
> target 5.0ms interval 100.0ms ecn
>  Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>   new_flows_len 1 old_flows_len 0
>   ```
>
>
>>> This is certainly better than 30Mbps but still more than two times
>>> less than before (900).
>
> The number that I still am not sure we got is that you were sending
> 900mbit udp and recieving 900mbit on the prior tests?

900 was sending, AP POV (wifi client is downloading)

>>> TCP also improved a little (550 to ~590).
>
> The limit is probably a bit low, also.  You might want to try target
> 20ms as well.

I've tried limit up to 1024 and target up to 80ms

>>>
>>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>>> Doesn't look like it will save ath10k from performance regression.
>
> what was tcp "before"? (I'm sorry, such a long thread)

750Mbps

>>>
>>>>
>>>> On Fri, 6 May 2016 11:42:43 +0200
>>>> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>
>>>>> Hi Felix,
>>>>>
>>>>> This is an important fix for OpenWRT, please read!
>>>>>
>>>>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>>>>> without also adjusting q->flows_cnt.  Eric explains below that you must
>>>>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>>>>> adjust it to 128)
>>>>>
>>>>> Problematic OpenWRT commit in question:
>>>>>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>>>>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>>>>
>>>>>
>>>>> I also highly recommend you cherry-pick this very recent commit:
>>>>>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>>>>  https://git.kernel.org/davem/net-next/c/9d18562a227
>>>>>
>>>>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>>>>> The problem is that drop mode was considered rare, and implementation
>>>>> wise it was chosen to be more expensive (to save cycles on normal mode).
>>>>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>>>>> especially expensive for smaller devices, as it scans a 4K big array,
>>>>> thus 64 cache misses for small devices!
>>>>>
>>>>> The fix is to allow drop-mode to bulk-drop more packets when entering
>>>>> drop-mode (default 64 bulk drop).  That way we don't suddenly
>>>>> experience a significantly higher processing cost per packet, but
>>>>> instead can amortize this.
>>>>>
>>>>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>>>>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>>>>> of memory to scan is less, but their CPU is also much smaller).
>>>>>
>>>>> --Jesper
>>>>>
>>>>>
>>>>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>>
>>>>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>>>>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>>>>> > > >
>>>>> > > >>
>>>>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>>>>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>>>>> > > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>>>> > > >>  backlog 0b 0p requeues 0
>>>>> > > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>>> > > >>   new_flows_len 0 old_flows_len 0
>>>>> > > >
>>>>> > > >
>>>>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>>>>> > > >
>>>>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>>>>> > > > which is almost the same than having no queue at all)
>>>>> > > >
>>>>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>>>>> > > > chance to trigger.
>>>>> > > >
>>>>> > > > So you could either reduce number of buckets to 128 (if memory is
>>>>> > > > tight), or increase limit to 8192.
>>>>> > >
>>>>> > > Will try, but what I've posted is default, I didn't change/configure that.
>>>>> >
>>>>> > fq_codel has a default of 10240 packets and 1024 buckets.
>>>>> >
>>>>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>>>>> >
>>>>> > If someone changed that in the linux variant you use, he probably should
>>>>> > explain the rationale.
>>>>
>>>> --
>>>> Best regards,
>>>>   Jesper Dangaard Brouer
>>>>   MSc.CS, Principal Kernel Engineer at Red Hat
>>>>   Author of http://www.iptv-analyzer.org
>>>>   LinkedIn: http://www.linkedin.com/in/brouer
>
>
>
> --
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> http://blog.cerowrt.org
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-15 22:34                                       ` Roman Yeryomin
  0 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-15 22:34 UTC (permalink / raw)
  To: Dave Taht
  Cc: make-wifi-fast, Rafał Miłecki, ath10k, netdev, codel,
	Michal Kazior, Jesper Dangaard Brouer, Jonathan Morton,
	OpenWrt Development List, Felix Fietkau

On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>
>>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>>> is in some kind of conflict.
>>>>
>>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>>
>>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>>
>>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>>
>> Forgot to mention, I've reduced drop_batch_size down to 32
>
> 0) Not clear to me if that's the right line, there are 4 wifi queues,
> and the third one
> is the BE queue.

That was an example, sorry, should have stated that. I've applied same
settings to all 4 queues.

> That is too low a limit, also, for normal use. And:
> for the purpose of this particular UDP test, flows 16 is ok, but not
> ideal.

I played with different combinations, it doesn't make any
(significant) difference: 20-30Mbps, not more.
What numbers would you propose?

> 1) What's the tcp number (with a simultaneous ping) with this latest patchset?
> (I care about tcp performance a lot more than udp floods - surviving a
> udp flood yes, performance, no)

During the test (both TCP and UDP) it's roughly 5ms in average, not
running tests ~2ms. Actually I'm now wondering if target is working at
all, because I had same result with target 80ms..
So, yes, latency is good, but performance is poor.

> before/after?
>
> tc -s qdisc show dev wlan0 during/after results?

during the test:

qdisc mq 0: root
 Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17)
 backlog 1545794b 1021p requeues 17
qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
 backlog 1541252b 1018p requeues 17
  maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0


after the test (60sec):

qdisc mq 0: root
 Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
 backlog 0b 0p requeues 28
qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
 backlog 0b 0p requeues 28
  maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
target 80.0ms ce_threshold 32us interval 100.0ms ecn
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0


> IF you are doing builds for the archer c7v2, I can join in on this... (?)

I'm not but I have c7 somewhere, so I can do a build for it and also
test, so we are on the same page.

> I did do a test of the ath10k "before", fq_codel *never engaged*, and
> tcp induced latencies under load, e at 100mbit, cracked 600ms, while
> staying flat (20ms) at 100mbit. (not the same patches you are testing)
> on x86. I have got tcp 300Mbit out of an osx box, similar latency,
> have yet to get anything more on anything I currently have
> before/after patchsets.
>
> I'll go add flooding to the tests, I just finished a series comparing
> two different speed stations and life was good on that.
>
> "before" - fq_codel never engages, we see seconds of latency under load.
>
> root@apu2:~# tc -s qdisc show dev wlp4s0
> qdisc mq 0: root
>  Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
> target 5.0ms interval 100.0ms ecn
>  Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
> target 5.0ms interval 100.0ms ecn
>  Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>   new_flows_len 0 old_flows_len 1
> qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
> target 5.0ms interval 100.0ms ecn
>  Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
>   new_flows_len 1 old_flows_len 3
> qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
> target 5.0ms interval 100.0ms ecn
>  Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>   new_flows_len 1 old_flows_len 0
>   ```
>
>
>>> This is certainly better than 30Mbps but still more than two times
>>> less than before (900).
>
> The number that I still am not sure we got is that you were sending
> 900mbit udp and recieving 900mbit on the prior tests?

900 was sending, AP POV (wifi client is downloading)

>>> TCP also improved a little (550 to ~590).
>
> The limit is probably a bit low, also.  You might want to try target
> 20ms as well.

I've tried limit up to 1024 and target up to 80ms

>>>
>>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>>> Doesn't look like it will save ath10k from performance regression.
>
> what was tcp "before"? (I'm sorry, such a long thread)

750Mbps

>>>
>>>>
>>>> On Fri, 6 May 2016 11:42:43 +0200
>>>> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>
>>>>> Hi Felix,
>>>>>
>>>>> This is an important fix for OpenWRT, please read!
>>>>>
>>>>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>>>>> without also adjusting q->flows_cnt.  Eric explains below that you must
>>>>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>>>>> adjust it to 128)
>>>>>
>>>>> Problematic OpenWRT commit in question:
>>>>>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>>>>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>>>>
>>>>>
>>>>> I also highly recommend you cherry-pick this very recent commit:
>>>>>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>>>>  https://git.kernel.org/davem/net-next/c/9d18562a227
>>>>>
>>>>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>>>>> The problem is that drop mode was considered rare, and implementation
>>>>> wise it was chosen to be more expensive (to save cycles on normal mode).
>>>>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>>>>> especially expensive for smaller devices, as it scans a 4K big array,
>>>>> thus 64 cache misses for small devices!
>>>>>
>>>>> The fix is to allow drop-mode to bulk-drop more packets when entering
>>>>> drop-mode (default 64 bulk drop).  That way we don't suddenly
>>>>> experience a significantly higher processing cost per packet, but
>>>>> instead can amortize this.
>>>>>
>>>>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>>>>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>>>>> of memory to scan is less, but their CPU is also much smaller).
>>>>>
>>>>> --Jesper
>>>>>
>>>>>
>>>>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>>
>>>>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>>>>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>>>>> > > >
>>>>> > > >>
>>>>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>>>>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>>>>> > > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>>>> > > >>  backlog 0b 0p requeues 0
>>>>> > > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>>> > > >>   new_flows_len 0 old_flows_len 0
>>>>> > > >
>>>>> > > >
>>>>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>>>>> > > >
>>>>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>>>>> > > > which is almost the same than having no queue at all)
>>>>> > > >
>>>>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>>>>> > > > chance to trigger.
>>>>> > > >
>>>>> > > > So you could either reduce number of buckets to 128 (if memory is
>>>>> > > > tight), or increase limit to 8192.
>>>>> > >
>>>>> > > Will try, but what I've posted is default, I didn't change/configure that.
>>>>> >
>>>>> > fq_codel has a default of 10240 packets and 1024 buckets.
>>>>> >
>>>>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>>>>> >
>>>>> > If someone changed that in the linux variant you use, he probably should
>>>>> > explain the rationale.
>>>>
>>>> --
>>>> Best regards,
>>>>   Jesper Dangaard Brouer
>>>>   MSc.CS, Principal Kernel Engineer at Red Hat
>>>>   Author of http://www.iptv-analyzer.org
>>>>   LinkedIn: http://www.linkedin.com/in/brouer
>
>
>
> --
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
  2016-05-07  9:57                             ` Kevin Darbyshire-Bryant
@ 2016-05-15 22:47                                 ` Roman Yeryomin
  0 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-15 22:47 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant
  Cc: Jesper Dangaard Brouer, Eric Dumazet, Felix Fietkau, Dave Taht,
	make-wifi-fast, Rafał Miłecki, ath10k, netdev, codel,
	Jonathan Morton

On 7 May 2016 at 12:57, Kevin Darbyshire-Bryant
<kevin@darbyshire-bryant.me.uk> wrote:
>
>
> On 06/05/16 10:42, Jesper Dangaard Brouer wrote:
>> Hi Felix,
>>
>> This is an important fix for OpenWRT, please read!
>>
>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>> without also adjusting q->flows_cnt.  Eric explains below that you must
>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>> adjust it to 128)
>>
>> Problematic OpenWRT commit in question:
>>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
> I 'pull requested' this to the lede-staging tree on github.
> https://github.com/lede-project/staging/pull/11
>
> One way or another Felix & co should see the change :-)

If you would follow the white rabbit, you would see that it doesn't help

>>
>>
>> I also highly recommend you cherry-pick this very recent commit:
>>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>  https://git.kernel.org/davem/net-next/c/9d18562a227
>>
>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>> The problem is that drop mode was considered rare, and implementation
>> wise it was chosen to be more expensive (to save cycles on normal mode).
>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>> especially expensive for smaller devices, as it scans a 4K big array,
>> thus 64 cache misses for small devices!
>>
>> The fix is to allow drop-mode to bulk-drop more packets when entering
>> drop-mode (default 64 bulk drop).  That way we don't suddenly
>> experience a significantly higher processing cost per packet, but
>> instead can amortize this.
> I haven't done the above cherry-pick patch & backport patch creation for
> 4.4/4.1/3.18 yet - maybe if $dayjob permits time and no one else beats
> me to it :-)
>
> Kevin
>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-15 22:47                                 ` Roman Yeryomin
  0 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-15 22:47 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant
  Cc: Felix Fietkau, Eric Dumazet, make-wifi-fast,
	Rafał Miłecki, Dave Taht, ath10k, netdev, codel,
	Jesper Dangaard Brouer, Jonathan Morton

On 7 May 2016 at 12:57, Kevin Darbyshire-Bryant
<kevin@darbyshire-bryant.me.uk> wrote:
>
>
> On 06/05/16 10:42, Jesper Dangaard Brouer wrote:
>> Hi Felix,
>>
>> This is an important fix for OpenWRT, please read!
>>
>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>> without also adjusting q->flows_cnt.  Eric explains below that you must
>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>> adjust it to 128)
>>
>> Problematic OpenWRT commit in question:
>>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
> I 'pull requested' this to the lede-staging tree on github.
> https://github.com/lede-project/staging/pull/11
>
> One way or another Felix & co should see the change :-)

If you would follow the white rabbit, you would see that it doesn't help

>>
>>
>> I also highly recommend you cherry-pick this very recent commit:
>>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>  https://git.kernel.org/davem/net-next/c/9d18562a227
>>
>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>> The problem is that drop mode was considered rare, and implementation
>> wise it was chosen to be more expensive (to save cycles on normal mode).
>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>> especially expensive for smaller devices, as it scans a 4K big array,
>> thus 64 cache misses for small devices!
>>
>> The fix is to allow drop-mode to bulk-drop more packets when entering
>> drop-mode (default 64 bulk drop).  That way we don't suddenly
>> experience a significantly higher processing cost per packet, but
>> instead can amortize this.
> I haven't done the above cherry-pick patch & backport patch creation for
> 4.4/4.1/3.18 yet - maybe if $dayjob permits time and no one else beats
> me to it :-)
>
> Kevin
>

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-15 22:34                                       ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
@ 2016-05-15 23:07                                         ` Eric Dumazet
  -1 siblings, 0 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-15 23:07 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: make-wifi-fast, Rafał Miłecki, ath10k, netdev, codel,
	Jonathan Morton, OpenWrt Development List, Felix Fietkau

On Mon, 2016-05-16 at 01:34 +0300, Roman Yeryomin wrote:

> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>  backlog 1541252b 1018p requeues 17
>   maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>   new_flows_len 0 old_flows_len 1

Why do you have ce_threshold set ? You really should not (even if it
does not matter for the kind of traffic you have at this moment)

If your expected link speed is around 1Gbps, or 80,000 packets per
second, then you have to understand that 1024 packets limit is about 12
ms at most.

Even if the queue is full, max sojourn time of a packet would be 12 ms.

I really do not see how 'target 80 ms' could be hit.

You basically have FQ, with no Codel effect, but with the associated
cost of Codel (having to take timestamps)



_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-15 23:07                                         ` Eric Dumazet
  0 siblings, 0 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-15 23:07 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: make-wifi-fast, Rafał Miłecki, Dave Taht, ath10k,
	netdev, codel, Michal Kazior, Jesper Dangaard Brouer,
	Jonathan Morton, OpenWrt Development List, Felix Fietkau

On Mon, 2016-05-16 at 01:34 +0300, Roman Yeryomin wrote:

> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>  backlog 1541252b 1018p requeues 17
>   maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>   new_flows_len 0 old_flows_len 1

Why do you have ce_threshold set ? You really should not (even if it
does not matter for the kind of traffic you have at this moment)

If your expected link speed is around 1Gbps, or 80,000 packets per
second, then you have to understand that 1024 packets limit is about 12
ms at most.

Even if the queue is full, max sojourn time of a packet would be 12 ms.

I really do not see how 'target 80 ms' could be hit.

You basically have FQ, with no Codel effect, but with the associated
cost of Codel (having to take timestamps)




_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-15 23:07                                         ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Eric Dumazet
@ 2016-05-15 23:27                                           ` Roman Yeryomin
  -1 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-15 23:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: make-wifi-fast, Rafał Miłecki, ath10k, netdev, codel,
	Jonathan Morton, OpenWrt Development List, Felix Fietkau

On 16 May 2016 at 02:07, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-05-16 at 01:34 +0300, Roman Yeryomin wrote:
>
>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>>  backlog 1541252b 1018p requeues 17
>>   maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>>   new_flows_len 0 old_flows_len 1
>
> Why do you have ce_threshold set ? You really should not (even if it
> does not matter for the kind of traffic you have at this moment)

No idea, it was there always. How do I unset it? Setting it to 0 doesn't help.

> If your expected link speed is around 1Gbps, or 80,000 packets per
> second, then you have to understand that 1024 packets limit is about 12
> ms at most.
>
> Even if the queue is full, max sojourn time of a packet would be 12 ms.
>
> I really do not see how 'target 80 ms' could be hit.

Well, as I said, I've tried different options. Neither target 20ms (as
Dave proposed) not 12ms save the situation.

> You basically have FQ, with no Codel effect, but with the associated
> cost of Codel (having to take timestamps)
>
>
>
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-15 23:27                                           ` Roman Yeryomin
  0 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-15 23:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: make-wifi-fast, Rafał Miłecki, Dave Taht, ath10k,
	netdev, codel, Michal Kazior, Jesper Dangaard Brouer,
	Jonathan Morton, OpenWrt Development List, Felix Fietkau

On 16 May 2016 at 02:07, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-05-16 at 01:34 +0300, Roman Yeryomin wrote:
>
>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>>  backlog 1541252b 1018p requeues 17
>>   maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>>   new_flows_len 0 old_flows_len 1
>
> Why do you have ce_threshold set ? You really should not (even if it
> does not matter for the kind of traffic you have at this moment)

No idea, it was there always. How do I unset it? Setting it to 0 doesn't help.

> If your expected link speed is around 1Gbps, or 80,000 packets per
> second, then you have to understand that 1024 packets limit is about 12
> ms at most.
>
> Even if the queue is full, max sojourn time of a packet would be 12 ms.
>
> I really do not see how 'target 80 ms' could be hit.

Well, as I said, I've tried different options. Neither target 20ms (as
Dave proposed) not 12ms save the situation.

> You basically have FQ, with no Codel effect, but with the associated
> cost of Codel (having to take timestamps)
>
>
>

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH net-next] fq_codel: fix memory limitation drift
  2016-05-06 15:55                                     ` [PATCH net-next] fq_codel: add memory limitation per queue Eric Dumazet
  2016-05-09  3:49                                       ` David Miller
  2016-05-09  4:14                                       ` Cong Wang
@ 2016-05-16  1:16                                       ` Eric Dumazet
  2016-05-17  1:57                                         ` David Miller
  2 siblings, 1 reply; 108+ messages in thread
From: Eric Dumazet @ 2016-05-16  1:16 UTC (permalink / raw)
  To: David Miller; +Cc: Jesper Dangaard Brouer, Dave Täht, netdev, moeller0

From: Eric Dumazet <edumazet@google.com>

memory_usage must be decreased in dequeue_func(), not in
fq_codel_dequeue(), otherwise packets dropped by Codel algo
are missing this decrease.

Also we need to clear memory_usage in fq_codel_reset()

Fixes: 95b58430abe7 ("fq_codel: add memory limitation per queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_fq_codel.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index bb8bd9314629..6883a8971562 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -262,6 +262,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
 	if (flow->head) {
 		skb = dequeue_head(flow);
 		q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb);
+		q->memory_usage -= skb->truesize;
 		sch->q.qlen--;
 		sch->qstats.backlog -= qdisc_pkt_len(skb);
 	}
@@ -318,7 +319,6 @@ begin:
 			list_del_init(&flow->flowchain);
 		goto begin;
 	}
-	q->memory_usage -= skb->truesize;
 	qdisc_bstats_update(sch, skb);
 	flow->deficit -= qdisc_pkt_len(skb);
 	/* We cant call qdisc_tree_reduce_backlog() if our qlen is 0,
@@ -355,6 +355,7 @@ static void fq_codel_reset(struct Qdisc *sch)
 	}
 	memset(q->backlogs, 0, q->flows_cnt * sizeof(u32));
 	sch->q.qlen = 0;
+	q->memory_usage = 0;
 }
 
 static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-15 22:34                                       ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
@ 2016-05-16  8:12                                         ` David Lang
  -1 siblings, 0 replies; 108+ messages in thread
From: David Lang @ 2016-05-16  8:12 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: make-wifi-fast, Rafał Miłecki, ath10k, codel, netdev,
	OpenWrt Development List, Felix Fietkau

On Mon, 16 May 2016, Roman Yeryomin wrote:

> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>>
>
>> That is too low a limit, also, for normal use. And:
>> for the purpose of this particular UDP test, flows 16 is ok, but not
>> ideal.
>
> I played with different combinations, it doesn't make any
> (significant) difference: 20-30Mbps, not more.
> What numbers would you propose?

How many different flows did you have going at once? I believe that the reason 
for higher numbers isn't for throughput, but to allow for more flows to be 
isolated from each other. If you have too few buckets, different flows will end 
up being combined into one bucket so that one will affect the other more.

David Lang
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-16  8:12                                         ` David Lang
  0 siblings, 0 replies; 108+ messages in thread
From: David Lang @ 2016-05-16  8:12 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: make-wifi-fast, Rafał Miłecki, Dave Taht, ath10k,
	codel, netdev, OpenWrt Development List, Felix Fietkau

On Mon, 16 May 2016, Roman Yeryomin wrote:

> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>>
>
>> That is too low a limit, also, for normal use. And:
>> for the purpose of this particular UDP test, flows 16 is ok, but not
>> ideal.
>
> I played with different combinations, it doesn't make any
> (significant) difference: 20-30Mbps, not more.
> What numbers would you propose?

How many different flows did you have going at once? I believe that the reason 
for higher numbers isn't for throughput, but to allow for more flows to be 
isolated from each other. If you have too few buckets, different flows will end 
up being combined into one bucket so that one will affect the other more.

David Lang

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-15 22:34                                       ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
@ 2016-05-16  8:14                                         ` Roman Yeryomin
  -1 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-16  8:14 UTC (permalink / raw)
  To: Rajkumar Manoharan, Michal Kazior
  Cc: make-wifi-fast, Rafał Miłecki, ath10k, netdev, codel,
	Jonathan Morton, OpenWrt Development List, Felix Fietkau

On 16 May 2016 at 01:34, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>>
>>>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>>>> is in some kind of conflict.
>>>>>
>>>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>>>
>>>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>>>
>>>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>>>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>>>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>>>
>>> Forgot to mention, I've reduced drop_batch_size down to 32
>>
>> 0) Not clear to me if that's the right line, there are 4 wifi queues,
>> and the third one
>> is the BE queue.
>
> That was an example, sorry, should have stated that. I've applied same
> settings to all 4 queues.
>
>> That is too low a limit, also, for normal use. And:
>> for the purpose of this particular UDP test, flows 16 is ok, but not
>> ideal.
>
> I played with different combinations, it doesn't make any
> (significant) difference: 20-30Mbps, not more.
> What numbers would you propose?
>
>> 1) What's the tcp number (with a simultaneous ping) with this latest patchset?
>> (I care about tcp performance a lot more than udp floods - surviving a
>> udp flood yes, performance, no)
>
> During the test (both TCP and UDP) it's roughly 5ms in average, not
> running tests ~2ms. Actually I'm now wondering if target is working at
> all, because I had same result with target 80ms..
> So, yes, latency is good, but performance is poor.
>
>> before/after?
>>
>> tc -s qdisc show dev wlan0 during/after results?
>
> during the test:
>
> qdisc mq 0: root
>  Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17)
>  backlog 1545794b 1021p requeues 17
> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>  backlog 1541252b 1018p requeues 17
>   maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>   new_flows_len 0 old_flows_len 1
> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
>
>
> after the test (60sec):
>
> qdisc mq 0: root
>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>  backlog 0b 0p requeues 28
> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>  backlog 0b 0p requeues 28
>   maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0
>   new_flows_len 0 old_flows_len 1
> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
>
>
>> IF you are doing builds for the archer c7v2, I can join in on this... (?)
>
> I'm not but I have c7 somewhere, so I can do a build for it and also
> test, so we are on the same page.
>
>> I did do a test of the ath10k "before", fq_codel *never engaged*, and
>> tcp induced latencies under load, e at 100mbit, cracked 600ms, while
>> staying flat (20ms) at 100mbit. (not the same patches you are testing)
>> on x86. I have got tcp 300Mbit out of an osx box, similar latency,
>> have yet to get anything more on anything I currently have
>> before/after patchsets.
>>
>> I'll go add flooding to the tests, I just finished a series comparing
>> two different speed stations and life was good on that.
>>
>> "before" - fq_codel never engages, we see seconds of latency under load.
>>
>> root@apu2:~# tc -s qdisc show dev wlp4s0
>> qdisc mq 0: root
>>  Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>> qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
>> target 5.0ms interval 100.0ms ecn
>>  Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
>> target 5.0ms interval 100.0ms ecn
>>  Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>   new_flows_len 0 old_flows_len 1
>> qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
>> target 5.0ms interval 100.0ms ecn
>>  Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
>>   new_flows_len 1 old_flows_len 3
>> qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
>> target 5.0ms interval 100.0ms ecn
>>  Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>   new_flows_len 1 old_flows_len 0
>>   ```
>>
>>
>>>> This is certainly better than 30Mbps but still more than two times
>>>> less than before (900).
>>
>> The number that I still am not sure we got is that you were sending
>> 900mbit udp and recieving 900mbit on the prior tests?
>
> 900 was sending, AP POV (wifi client is downloading)
>
>>>> TCP also improved a little (550 to ~590).
>>
>> The limit is probably a bit low, also.  You might want to try target
>> 20ms as well.
>
> I've tried limit up to 1024 and target up to 80ms
>
>>>>
>>>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>>>> Doesn't look like it will save ath10k from performance regression.
>>
>> what was tcp "before"? (I'm sorry, such a long thread)
>
> 750Mbps

Michal, after retesting with your patch (sorry, it was late yesterday,
confused compat-wireless archives) I saw the difference.
So the progress looks like this (all with fq_codel flows 16 limit 1024
target 20ms):
no patches: 380Mbps UDP, 550 TCP
Eric's (fq_codel drop) patch: 420Mbps UDP, 590 TCP (+40Mbps), latency
5-6ms during test
Michal's (improve tx scheduling) patch: 580Mbps UDP, 660 TCP, latency
up to 30-40ms during test
after Rajkumar's proposal to "try without registering wake_tx_queue
callback": 820Mbps UDP, 690 TCP.

So, very close to "as before": 900Mbps UDP, 750 TCP.
But still, I was expecting performance improvements from latest ath10k
code, not regressions.
I know that hw is capable of 800Mbps TCP, which I'm targeting.

Regards,
Roman

p.s. sorry for confusion
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-16  8:14                                         ` Roman Yeryomin
  0 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-16  8:14 UTC (permalink / raw)
  To: Rajkumar Manoharan, Michal Kazior
  Cc: make-wifi-fast, Rafał Miłecki, Dave Taht, ath10k,
	netdev, codel, Jesper Dangaard Brouer, Jonathan Morton,
	OpenWrt Development List, Felix Fietkau

On 16 May 2016 at 01:34, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>>
>>>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>>>> is in some kind of conflict.
>>>>>
>>>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>>>
>>>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>>>
>>>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>>>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>>>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>>>
>>> Forgot to mention, I've reduced drop_batch_size down to 32
>>
>> 0) Not clear to me if that's the right line, there are 4 wifi queues,
>> and the third one
>> is the BE queue.
>
> That was an example, sorry, should have stated that. I've applied same
> settings to all 4 queues.
>
>> That is too low a limit, also, for normal use. And:
>> for the purpose of this particular UDP test, flows 16 is ok, but not
>> ideal.
>
> I played with different combinations, it doesn't make any
> (significant) difference: 20-30Mbps, not more.
> What numbers would you propose?
>
>> 1) What's the tcp number (with a simultaneous ping) with this latest patchset?
>> (I care about tcp performance a lot more than udp floods - surviving a
>> udp flood yes, performance, no)
>
> During the test (both TCP and UDP) it's roughly 5ms in average, not
> running tests ~2ms. Actually I'm now wondering if target is working at
> all, because I had same result with target 80ms..
> So, yes, latency is good, but performance is poor.
>
>> before/after?
>>
>> tc -s qdisc show dev wlan0 during/after results?
>
> during the test:
>
> qdisc mq 0: root
>  Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17)
>  backlog 1545794b 1021p requeues 17
> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>  backlog 1541252b 1018p requeues 17
>   maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>   new_flows_len 0 old_flows_len 1
> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
>
>
> after the test (60sec):
>
> qdisc mq 0: root
>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>  backlog 0b 0p requeues 28
> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>  backlog 0b 0p requeues 28
>   maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0
>   new_flows_len 0 old_flows_len 1
> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
>
>
>> IF you are doing builds for the archer c7v2, I can join in on this... (?)
>
> I'm not but I have c7 somewhere, so I can do a build for it and also
> test, so we are on the same page.
>
>> I did do a test of the ath10k "before", fq_codel *never engaged*, and
>> tcp induced latencies under load, e at 100mbit, cracked 600ms, while
>> staying flat (20ms) at 100mbit. (not the same patches you are testing)
>> on x86. I have got tcp 300Mbit out of an osx box, similar latency,
>> have yet to get anything more on anything I currently have
>> before/after patchsets.
>>
>> I'll go add flooding to the tests, I just finished a series comparing
>> two different speed stations and life was good on that.
>>
>> "before" - fq_codel never engages, we see seconds of latency under load.
>>
>> root@apu2:~# tc -s qdisc show dev wlp4s0
>> qdisc mq 0: root
>>  Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>> qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
>> target 5.0ms interval 100.0ms ecn
>>  Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
>> target 5.0ms interval 100.0ms ecn
>>  Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>   new_flows_len 0 old_flows_len 1
>> qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
>> target 5.0ms interval 100.0ms ecn
>>  Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
>>   new_flows_len 1 old_flows_len 3
>> qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
>> target 5.0ms interval 100.0ms ecn
>>  Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>   new_flows_len 1 old_flows_len 0
>>   ```
>>
>>
>>>> This is certainly better than 30Mbps but still more than two times
>>>> less than before (900).
>>
>> The number that I still am not sure we got is that you were sending
>> 900mbit udp and recieving 900mbit on the prior tests?
>
> 900 was sending, AP POV (wifi client is downloading)
>
>>>> TCP also improved a little (550 to ~590).
>>
>> The limit is probably a bit low, also.  You might want to try target
>> 20ms as well.
>
> I've tried limit up to 1024 and target up to 80ms
>
>>>>
>>>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>>>> Doesn't look like it will save ath10k from performance regression.
>>
>> what was tcp "before"? (I'm sorry, such a long thread)
>
> 750Mbps

Michal, after retesting with your patch (sorry, it was late yesterday,
confused compat-wireless archives) I saw the difference.
So the progress looks like this (all with fq_codel flows 16 limit 1024
target 20ms):
no patches: 380Mbps UDP, 550 TCP
Eric's (fq_codel drop) patch: 420Mbps UDP, 590 TCP (+40Mbps), latency
5-6ms during test
Michal's (improve tx scheduling) patch: 580Mbps UDP, 660 TCP, latency
up to 30-40ms during test
after Rajkumar's proposal to "try without registering wake_tx_queue
callback": 820Mbps UDP, 690 TCP.

So, very close to "as before": 900Mbps UDP, 750 TCP.
But still, I was expecting performance improvements from latest ath10k
code, not regressions.
I know that hw is capable of 800Mbps TCP, which I'm targeting.

Regards,
Roman

p.s. sorry for confusion

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
  2016-05-16  8:12                                         ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " David Lang
@ 2016-05-16  8:26                                           ` Roman Yeryomin
  -1 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-16  8:26 UTC (permalink / raw)
  To: David Lang
  Cc: make-wifi-fast, Dave Taht, ath10k, codel, netdev,
	OpenWrt Development List

On 16 May 2016 at 11:12, David Lang <david@lang.hm> wrote:
> On Mon, 16 May 2016, Roman Yeryomin wrote:
>
>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>>
>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com>
>>> wrote:
>>>>
>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>>
>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>
>>> That is too low a limit, also, for normal use. And:
>>> for the purpose of this particular UDP test, flows 16 is ok, but not
>>> ideal.
>>
>>
>> I played with different combinations, it doesn't make any
>> (significant) difference: 20-30Mbps, not more.
>> What numbers would you propose?
>
>
> How many different flows did you have going at once? I believe that the
> reason for higher numbers isn't for throughput, but to allow for more flows
> to be isolated from each other. If you have too few buckets, different flows
> will end up being combined into one bucket so that one will affect the other
> more.

I'm testing with one flow, I never saw bigger performance with more
flows (e.g. -P8 to iperf3).

Regards,
Roman
_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-16  8:26                                           ` Roman Yeryomin
  0 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-16  8:26 UTC (permalink / raw)
  To: David Lang
  Cc: make-wifi-fast, Rafał Miłecki, Dave Taht, ath10k,
	codel, netdev, OpenWrt Development List, Felix Fietkau

On 16 May 2016 at 11:12, David Lang <david@lang.hm> wrote:
> On Mon, 16 May 2016, Roman Yeryomin wrote:
>
>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>>
>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com>
>>> wrote:
>>>>
>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>>
>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>
>>> That is too low a limit, also, for normal use. And:
>>> for the purpose of this particular UDP test, flows 16 is ok, but not
>>> ideal.
>>
>>
>> I played with different combinations, it doesn't make any
>> (significant) difference: 20-30Mbps, not more.
>> What numbers would you propose?
>
>
> How many different flows did you have going at once? I believe that the
> reason for higher numbers isn't for throughput, but to allow for more flows
> to be isolated from each other. If you have too few buckets, different flows
> will end up being combined into one bucket so that one will affect the other
> more.

I'm testing with one flow, I never saw bigger performance with more
flows (e.g. -P8 to iperf3).

Regards,
Roman

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-16  8:26                                           ` Roman Yeryomin
@ 2016-05-16  8:46                                             ` David Lang
  -1 siblings, 0 replies; 108+ messages in thread
From: David Lang @ 2016-05-16  8:46 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: make-wifi-fast, Rafał Miłecki, ath10k, codel, netdev,
	OpenWrt Development List, Felix Fietkau

On Mon, 16 May 2016, Roman Yeryomin wrote:

> On 16 May 2016 at 11:12, David Lang <david@lang.hm> wrote:
>> On Mon, 16 May 2016, Roman Yeryomin wrote:
>>
>>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>>>
>>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com>
>>>> wrote:
>>>>>
>>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>>>
>>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>
>>>> That is too low a limit, also, for normal use. And:
>>>> for the purpose of this particular UDP test, flows 16 is ok, but not
>>>> ideal.
>>>
>>>
>>> I played with different combinations, it doesn't make any
>>> (significant) difference: 20-30Mbps, not more.
>>> What numbers would you propose?
>>
>>
>> How many different flows did you have going at once? I believe that the
>> reason for higher numbers isn't for throughput, but to allow for more flows
>> to be isolated from each other. If you have too few buckets, different flows
>> will end up being combined into one bucket so that one will affect the other
>> more.
>
> I'm testing with one flow, I never saw bigger performance with more
> flows (e.g. -P8 to iperf3).

The issue isn't performance, it's isolating a DNS request from a VoIP flow 
from a streaming video flow from a DVD image download.

The question is how many buckets do you need to have to isolate these in 
practice? it depends how many flows you have. The default was 1024 buckets, but 
got changed to 128 for low memory devices, and that lower value got made into 
the default, even for devices with lots of memory.

I'm wondering if instead of trying to size this based on device memory, can it 
be resizable on the fly and grow if too many flows/collisions are detected?

David Lang
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-16  8:46                                             ` David Lang
  0 siblings, 0 replies; 108+ messages in thread
From: David Lang @ 2016-05-16  8:46 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: make-wifi-fast, Rafał Miłecki, Dave Taht, ath10k,
	codel, netdev, OpenWrt Development List, Felix Fietkau

On Mon, 16 May 2016, Roman Yeryomin wrote:

> On 16 May 2016 at 11:12, David Lang <david@lang.hm> wrote:
>> On Mon, 16 May 2016, Roman Yeryomin wrote:
>>
>>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>>>
>>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com>
>>>> wrote:
>>>>>
>>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>>>
>>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>
>>>> That is too low a limit, also, for normal use. And:
>>>> for the purpose of this particular UDP test, flows 16 is ok, but not
>>>> ideal.
>>>
>>>
>>> I played with different combinations, it doesn't make any
>>> (significant) difference: 20-30Mbps, not more.
>>> What numbers would you propose?
>>
>>
>> How many different flows did you have going at once? I believe that the
>> reason for higher numbers isn't for throughput, but to allow for more flows
>> to be isolated from each other. If you have too few buckets, different flows
>> will end up being combined into one bucket so that one will affect the other
>> more.
>
> I'm testing with one flow, I never saw bigger performance with more
> flows (e.g. -P8 to iperf3).

The issue isn't performance, it's isolating a DNS request from a VoIP flow 
from a streaming video flow from a DVD image download.

The question is how many buckets do you need to have to isolate these in 
practice? it depends how many flows you have. The default was 1024 buckets, but 
got changed to 128 for low memory devices, and that lower value got made into 
the default, even for devices with lots of memory.

I'm wondering if instead of trying to size this based on device memory, can it 
be resizable on the fly and grow if too many flows/collisions are detected?

David Lang

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [OpenWrt-Devel] [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-16  8:46                                             ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " David Lang
@ 2016-05-16 10:34                                               ` Sebastian Moeller
  -1 siblings, 0 replies; 108+ messages in thread
From: Sebastian Moeller @ 2016-05-16 10:34 UTC (permalink / raw)
  To: David Lang, Roman Yeryomin
  Cc: make-wifi-fast, ath10k, codel, netdev, OpenWrt Development List

Hi David,

On May 16, 2016 10:46:25 AM GMT+02:00, David Lang <david@lang.hm> wrote:
>On Mon, 16 May 2016, Roman Yeryomin wrote:
>
>> On 16 May 2016 at 11:12, David Lang <david@lang.hm> wrote:
>>> On Mon, 16 May 2016, Roman Yeryomin wrote:
>>>
>>>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>>>>
>>>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin
><leroi.lists@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com>
>wrote:
>>>>>>>
>>>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer
><brouer@redhat.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>
>>>>> That is too low a limit, also, for normal use. And:
>>>>> for the purpose of this particular UDP test, flows 16 is ok, but
>not
>>>>> ideal.
>>>>
>>>>
>>>> I played with different combinations, it doesn't make any
>>>> (significant) difference: 20-30Mbps, not more.
>>>> What numbers would you propose?
>>>
>>>
>>> How many different flows did you have going at once? I believe that
>the
>>> reason for higher numbers isn't for throughput, but to allow for
>more flows
>>> to be isolated from each other. If you have too few buckets,
>different flows
>>> will end up being combined into one bucket so that one will affect
>the other
>>> more.
>>
>> I'm testing with one flow, I never saw bigger performance with more
>> flows (e.g. -P8 to iperf3).
>
>The issue isn't performance, it's isolating a DNS request from a VoIP
>flow 
>from a streaming video flow from a DVD image download.
>
>The question is how many buckets do you need to have to isolate these
>in 
>practice? it depends how many flows you have. The default was 1024
>buckets, but 
>got changed to 128 for low memory devices, and that lower value got
>made into 
>the default, even for devices with lots of memory.

And I believe that the reduction was suboptimal, we need the Hash buckets to spread the glows around to avoid shared fate due to shared buckets... So the 1024 glows make a lot of sense even if the number of real  concurrent flows is lower think birthday paradoxon.
The change came because at full saturation our reduced packet limit only allowed one packet per bucket which is too low for decent performance... also less hash buckets make searching faster.
Since we now can specify a memory limit in addition to the packet limit, we should set the packet limit back to its default of 10240 and instead set the memory limit to something same for each platform. This will effectively have the same consequences as setting a packet limit, except it becomes clearer why performance degrades and I at least take a performance hit gladly over a forced oom reboot....



>
>I'm wondering if instead of trying to size this based on device memory,
>can it 
>be resizable on the fly and grow if too many flows/collisions are
>detected?
>
>David Lang
>_______________________________________________
>openwrt-devel mailing list
>openwrt-devel@lists.openwrt.org
>https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [OpenWrt-Devel] [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-16 10:34                                               ` Sebastian Moeller
  0 siblings, 0 replies; 108+ messages in thread
From: Sebastian Moeller @ 2016-05-16 10:34 UTC (permalink / raw)
  To: David Lang, Roman Yeryomin
  Cc: make-wifi-fast, Dave Taht, ath10k, codel, netdev,
	OpenWrt Development List

Hi David,

On May 16, 2016 10:46:25 AM GMT+02:00, David Lang <david@lang.hm> wrote:
>On Mon, 16 May 2016, Roman Yeryomin wrote:
>
>> On 16 May 2016 at 11:12, David Lang <david@lang.hm> wrote:
>>> On Mon, 16 May 2016, Roman Yeryomin wrote:
>>>
>>>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>>>>
>>>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin
><leroi.lists@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com>
>wrote:
>>>>>>>
>>>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer
><brouer@redhat.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>
>>>>> That is too low a limit, also, for normal use. And:
>>>>> for the purpose of this particular UDP test, flows 16 is ok, but
>not
>>>>> ideal.
>>>>
>>>>
>>>> I played with different combinations, it doesn't make any
>>>> (significant) difference: 20-30Mbps, not more.
>>>> What numbers would you propose?
>>>
>>>
>>> How many different flows did you have going at once? I believe that
>the
>>> reason for higher numbers isn't for throughput, but to allow for
>more flows
>>> to be isolated from each other. If you have too few buckets,
>different flows
>>> will end up being combined into one bucket so that one will affect
>the other
>>> more.
>>
>> I'm testing with one flow, I never saw bigger performance with more
>> flows (e.g. -P8 to iperf3).
>
>The issue isn't performance, it's isolating a DNS request from a VoIP
>flow 
>from a streaming video flow from a DVD image download.
>
>The question is how many buckets do you need to have to isolate these
>in 
>practice? it depends how many flows you have. The default was 1024
>buckets, but 
>got changed to 128 for low memory devices, and that lower value got
>made into 
>the default, even for devices with lots of memory.

And I believe that the reduction was suboptimal, we need the Hash buckets to spread the glows around to avoid shared fate due to shared buckets... So the 1024 glows make a lot of sense even if the number of real  concurrent flows is lower think birthday paradoxon.
The change came because at full saturation our reduced packet limit only allowed one packet per bucket which is too low for decent performance... also less hash buckets make searching faster.
Since we now can specify a memory limit in addition to the packet limit, we should set the packet limit back to its default of 10240 and instead set the memory limit to something same for each platform. This will effectively have the same consequences as setting a packet limit, except it becomes clearer why performance degrades and I at least take a performance hit gladly over a forced oom reboot....



>
>I'm wondering if instead of trying to size this based on device memory,
>can it 
>be resizable on the fly and grow if too many flows/collisions are
>detected?
>
>David Lang
>_______________________________________________
>openwrt-devel mailing list
>openwrt-devel@lists.openwrt.org
>https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-16  8:14                                         ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
@ 2016-05-16 14:23                                           ` Eric Dumazet
  -1 siblings, 0 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-16 14:23 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: Rajkumar Manoharan, make-wifi-fast, Rafał Miłecki,
	ath10k, codel, netdev, OpenWrt Development List, Felix Fietkau

On Mon, 2016-05-16 at 11:14 +0300, Roman Yeryomin wrote:

> So, very close to "as before": 900Mbps UDP, 750 TCP.
> But still, I was expecting performance improvements from latest ath10k
> code, not regressions.
> I know that hw is capable of 800Mbps TCP, which I'm targeting.

One flow can reach 800Mbps.

To get this, a simple pfifo is enough.

But _if_ you also want to get decent results with hundreds of flows
under stress, you need something else, and I do not see how 'something'
else would come for free.

You will see some 'regressions' because of additional cpu costs, unless
you have enough cpu cycles and KB of memory to burn for free.

If your goal is to get max throughput on a single TCP flow, in a clean
env an cheap hardware, you absolutely should stick to pfifo. Nothing
could beat pfifo (well, pfifo could be improved using lockless
implementation, but that would matter if you have different cpus
queueing and dequeueing packets)

But I guess your issues mostly come from a too small packet limits, or
to big TCP windows.

Basically, if you test a single TCP flow, fq_codel should behave like a
pfifo, unless maybe your kernel has a very slow ktime_get_ns()
implementation [1]

If you set a limit of 1024 packets on pfifo, you'll have the same amount
of drops and lower TCP throughput.

[1] We probably should have a self-test to have an estimation of
ktime_get_ns() cost



_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-16 14:23                                           ` Eric Dumazet
  0 siblings, 0 replies; 108+ messages in thread
From: Eric Dumazet @ 2016-05-16 14:23 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: Rajkumar Manoharan, make-wifi-fast, Rafał Miłecki,
	ath10k, codel, Michal Kazior, netdev, OpenWrt Development List,
	Felix Fietkau

On Mon, 2016-05-16 at 11:14 +0300, Roman Yeryomin wrote:

> So, very close to "as before": 900Mbps UDP, 750 TCP.
> But still, I was expecting performance improvements from latest ath10k
> code, not regressions.
> I know that hw is capable of 800Mbps TCP, which I'm targeting.

One flow can reach 800Mbps.

To get this, a simple pfifo is enough.

But _if_ you also want to get decent results with hundreds of flows
under stress, you need something else, and I do not see how 'something'
else would come for free.

You will see some 'regressions' because of additional cpu costs, unless
you have enough cpu cycles and KB of memory to burn for free.

If your goal is to get max throughput on a single TCP flow, in a clean
env an cheap hardware, you absolutely should stick to pfifo. Nothing
could beat pfifo (well, pfifo could be improved using lockless
implementation, but that would matter if you have different cpus
queueing and dequeueing packets)

But I guess your issues mostly come from a too small packet limits, or
to big TCP windows.

Basically, if you test a single TCP flow, fq_codel should behave like a
pfifo, unless maybe your kernel has a very slow ktime_get_ns()
implementation [1]

If you set a limit of 1024 packets on pfifo, you'll have the same amount
of drops and lower TCP throughput.

[1] We probably should have a self-test to have an estimation of
ktime_get_ns() cost




_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
  2016-05-16  8:14                                         ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
@ 2016-05-16 16:04                                           ` Dave Taht
  -1 siblings, 0 replies; 108+ messages in thread
From: Dave Taht @ 2016-05-16 16:04 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: Rajkumar Manoharan, Michal Kazior, Jesper Dangaard Brouer,
	Felix Fietkau, Jonathan Morton, codel, ath10k, make-wifi-fast,
	Rafał Miłecki, netdev, OpenWrt Development List

On Mon, May 16, 2016 at 1:14 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 16 May 2016 at 01:34, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>>>
>>>>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>>>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>>>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>>>>> is in some kind of conflict.
>>>>>>
>>>>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>>>>
>>>>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>>>>
>>>>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>>>>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>>>>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>>>>
>>>> Forgot to mention, I've reduced drop_batch_size down to 32
>>>
>>> 0) Not clear to me if that's the right line, there are 4 wifi queues,
>>> and the third one
>>> is the BE queue.
>>
>> That was an example, sorry, should have stated that. I've applied same
>> settings to all 4 queues.
>>
>>> That is too low a limit, also, for normal use. And:
>>> for the purpose of this particular UDP test, flows 16 is ok, but not
>>> ideal.
>>
>> I played with different combinations, it doesn't make any
>> (significant) difference: 20-30Mbps, not more.
>> What numbers would you propose?
>>
>>> 1) What's the tcp number (with a simultaneous ping) with this latest patchset?
>>> (I care about tcp performance a lot more than udp floods - surviving a
>>> udp flood yes, performance, no)
>>
>> During the test (both TCP and UDP) it's roughly 5ms in average, not
>> running tests ~2ms. Actually I'm now wondering if target is working at
>> all, because I had same result with target 80ms..
>> So, yes, latency is good, but performance is poor.
>>
>>> before/after?
>>>
>>> tc -s qdisc show dev wlan0 during/after results?
>>
>> during the test:
>>
>> qdisc mq 0: root
>>  Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17)
>>  backlog 1545794b 1021p requeues 17
>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>>  backlog 1541252b 1018p requeues 17
>>   maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>>   new_flows_len 0 old_flows_len 1
>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>>
>>
>> after the test (60sec):
>>
>> qdisc mq 0: root
>>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>>  backlog 0b 0p requeues 28
>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>>  backlog 0b 0p requeues 28
>>   maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0
>>   new_flows_len 0 old_flows_len 1
>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>>
>>
>>> IF you are doing builds for the archer c7v2, I can join in on this... (?)
>>
>> I'm not but I have c7 somewhere, so I can do a build for it and also
>> test, so we are on the same page.
>>
>>> I did do a test of the ath10k "before", fq_codel *never engaged*, and
>>> tcp induced latencies under load, e at 100mbit, cracked 600ms, while
>>> staying flat (20ms) at 100mbit. (not the same patches you are testing)
>>> on x86. I have got tcp 300Mbit out of an osx box, similar latency,
>>> have yet to get anything more on anything I currently have
>>> before/after patchsets.
>>>
>>> I'll go add flooding to the tests, I just finished a series comparing
>>> two different speed stations and life was good on that.
>>>
>>> "before" - fq_codel never engages, we see seconds of latency under load.
>>>
>>> root@apu2:~# tc -s qdisc show dev wlp4s0
>>> qdisc mq 0: root
>>>  Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>> qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
>>> target 5.0ms interval 100.0ms ecn
>>>  Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
>>> target 5.0ms interval 100.0ms ecn
>>>  Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 1
>>> qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
>>> target 5.0ms interval 100.0ms ecn
>>>  Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
>>>   new_flows_len 1 old_flows_len 3
>>> qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
>>> target 5.0ms interval 100.0ms ecn
>>>  Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>>   new_flows_len 1 old_flows_len 0
>>>   ```
>>>
>>>
>>>>> This is certainly better than 30Mbps but still more than two times
>>>>> less than before (900).
>>>
>>> The number that I still am not sure we got is that you were sending
>>> 900mbit udp and recieving 900mbit on the prior tests?
>>
>> 900 was sending, AP POV (wifi client is downloading)
>>
>>>>> TCP also improved a little (550 to ~590).
>>>
>>> The limit is probably a bit low, also.  You might want to try target
>>> 20ms as well.
>>
>> I've tried limit up to 1024 and target up to 80ms
>>
>>>>>
>>>>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>>>>> Doesn't look like it will save ath10k from performance regression.
>>>
>>> what was tcp "before"? (I'm sorry, such a long thread)
>>
>> 750Mbps
>
> Michal, after retesting with your patch (sorry, it was late yesterday,
> confused compat-wireless archives) I saw the difference.
> So the progress looks like this (all with fq_codel flows 16 limit 1024
> target 20ms):
> no patches: 380Mbps UDP, 550 TCP
> Eric's (fq_codel drop) patch: 420Mbps UDP, 590 TCP (+40Mbps), latency
> 5-6ms during test
> Michal's (improve tx scheduling) patch: 580Mbps UDP, 660 TCP, latency
> up to 30-40ms during test
> after Rajkumar's proposal to "try without registering wake_tx_queue
> callback": 820Mbps UDP, 690 TCP.

And the simultaneous ping on the last test was?

> So, very close to "as before": 900Mbps UDP, 750 TCP.
> But still, I was expecting performance improvements from latest ath10k
> code, not regressions.
> I know that hw is capable of 800Mbps TCP, which I'm targeting.
>
> Regards,
> Roman
>
> p.s. sorry for confusion



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-16 16:04                                           ` Dave Taht
  0 siblings, 0 replies; 108+ messages in thread
From: Dave Taht @ 2016-05-16 16:04 UTC (permalink / raw)
  To: Roman Yeryomin
  Cc: Rajkumar Manoharan, make-wifi-fast, Rafał Miłecki,
	ath10k, netdev, codel, Michal Kazior, Jesper Dangaard Brouer,
	Jonathan Morton, OpenWrt Development List, Felix Fietkau

On Mon, May 16, 2016 at 1:14 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
> On 16 May 2016 at 01:34, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>>>
>>>>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>>>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>>>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>>>>> is in some kind of conflict.
>>>>>>
>>>>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>>>>
>>>>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>>>>
>>>>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>>>>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>>>>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>>>>
>>>> Forgot to mention, I've reduced drop_batch_size down to 32
>>>
>>> 0) Not clear to me if that's the right line, there are 4 wifi queues,
>>> and the third one
>>> is the BE queue.
>>
>> That was an example, sorry, should have stated that. I've applied same
>> settings to all 4 queues.
>>
>>> That is too low a limit, also, for normal use. And:
>>> for the purpose of this particular UDP test, flows 16 is ok, but not
>>> ideal.
>>
>> I played with different combinations, it doesn't make any
>> (significant) difference: 20-30Mbps, not more.
>> What numbers would you propose?
>>
>>> 1) What's the tcp number (with a simultaneous ping) with this latest patchset?
>>> (I care about tcp performance a lot more than udp floods - surviving a
>>> udp flood yes, performance, no)
>>
>> During the test (both TCP and UDP) it's roughly 5ms in average, not
>> running tests ~2ms. Actually I'm now wondering if target is working at
>> all, because I had same result with target 80ms..
>> So, yes, latency is good, but performance is poor.
>>
>>> before/after?
>>>
>>> tc -s qdisc show dev wlan0 during/after results?
>>
>> during the test:
>>
>> qdisc mq 0: root
>>  Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17)
>>  backlog 1545794b 1021p requeues 17
>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>>  backlog 1541252b 1018p requeues 17
>>   maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>>   new_flows_len 0 old_flows_len 1
>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>>
>>
>> after the test (60sec):
>>
>> qdisc mq 0: root
>>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>>  backlog 0b 0p requeues 28
>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>>  backlog 0b 0p requeues 28
>>   maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0
>>   new_flows_len 0 old_flows_len 1
>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>>
>>
>>> IF you are doing builds for the archer c7v2, I can join in on this... (?)
>>
>> I'm not but I have c7 somewhere, so I can do a build for it and also
>> test, so we are on the same page.
>>
>>> I did do a test of the ath10k "before", fq_codel *never engaged*, and
>>> tcp induced latencies under load, e at 100mbit, cracked 600ms, while
>>> staying flat (20ms) at 100mbit. (not the same patches you are testing)
>>> on x86. I have got tcp 300Mbit out of an osx box, similar latency,
>>> have yet to get anything more on anything I currently have
>>> before/after patchsets.
>>>
>>> I'll go add flooding to the tests, I just finished a series comparing
>>> two different speed stations and life was good on that.
>>>
>>> "before" - fq_codel never engages, we see seconds of latency under load.
>>>
>>> root@apu2:~# tc -s qdisc show dev wlp4s0
>>> qdisc mq 0: root
>>>  Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>> qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
>>> target 5.0ms interval 100.0ms ecn
>>>  Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
>>> target 5.0ms interval 100.0ms ecn
>>>  Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 1
>>> qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
>>> target 5.0ms interval 100.0ms ecn
>>>  Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
>>>   new_flows_len 1 old_flows_len 3
>>> qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
>>> target 5.0ms interval 100.0ms ecn
>>>  Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>>   new_flows_len 1 old_flows_len 0
>>>   ```
>>>
>>>
>>>>> This is certainly better than 30Mbps but still more than two times
>>>>> less than before (900).
>>>
>>> The number that I still am not sure we got is that you were sending
>>> 900mbit udp and recieving 900mbit on the prior tests?
>>
>> 900 was sending, AP POV (wifi client is downloading)
>>
>>>>> TCP also improved a little (550 to ~590).
>>>
>>> The limit is probably a bit low, also.  You might want to try target
>>> 20ms as well.
>>
>> I've tried limit up to 1024 and target up to 80ms
>>
>>>>>
>>>>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>>>>> Doesn't look like it will save ath10k from performance regression.
>>>
>>> what was tcp "before"? (I'm sorry, such a long thread)
>>
>> 750Mbps
>
> Michal, after retesting with your patch (sorry, it was late yesterday,
> confused compat-wireless archives) I saw the difference.
> So the progress looks like this (all with fq_codel flows 16 limit 1024
> target 20ms):
> no patches: 380Mbps UDP, 550 TCP
> Eric's (fq_codel drop) patch: 420Mbps UDP, 590 TCP (+40Mbps), latency
> 5-6ms during test
> Michal's (improve tx scheduling) patch: 580Mbps UDP, 660 TCP, latency
> up to 30-40ms during test
> after Rajkumar's proposal to "try without registering wake_tx_queue
> callback": 820Mbps UDP, 690 TCP.

And the simultaneous ping on the last test was?

> So, very close to "as before": 900Mbps UDP, 750 TCP.
> But still, I was expecting performance improvements from latest ath10k
> code, not regressions.
> I know that hw is capable of 800Mbps TCP, which I'm targeting.
>
> Regards,
> Roman
>
> p.s. sorry for confusion



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood)
  2016-05-16 16:04                                           ` Dave Taht
@ 2016-05-16 19:46                                             ` Roman Yeryomin
  -1 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-16 19:46 UTC (permalink / raw)
  To: Dave Taht
  Cc: Rajkumar Manoharan, make-wifi-fast, Rafał Miłecki,
	ath10k, netdev, codel, Jonathan Morton, OpenWrt Development List,
	Felix Fietkau

On 16 May 2016 at 19:04, Dave Taht <dave.taht@gmail.com> wrote:
> On Mon, May 16, 2016 at 1:14 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 16 May 2016 at 01:34, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>>>>
>>>>>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>>>>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>>>>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>>>>>> is in some kind of conflict.
>>>>>>>
>>>>>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>>>>>
>>>>>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>>>>>
>>>>>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>>>>>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>>>>>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>>>>>
>>>>> Forgot to mention, I've reduced drop_batch_size down to 32
>>>>
>>>> 0) Not clear to me if that's the right line, there are 4 wifi queues,
>>>> and the third one
>>>> is the BE queue.
>>>
>>> That was an example, sorry, should have stated that. I've applied same
>>> settings to all 4 queues.
>>>
>>>> That is too low a limit, also, for normal use. And:
>>>> for the purpose of this particular UDP test, flows 16 is ok, but not
>>>> ideal.
>>>
>>> I played with different combinations, it doesn't make any
>>> (significant) difference: 20-30Mbps, not more.
>>> What numbers would you propose?
>>>
>>>> 1) What's the tcp number (with a simultaneous ping) with this latest patchset?
>>>> (I care about tcp performance a lot more than udp floods - surviving a
>>>> udp flood yes, performance, no)
>>>
>>> During the test (both TCP and UDP) it's roughly 5ms in average, not
>>> running tests ~2ms. Actually I'm now wondering if target is working at
>>> all, because I had same result with target 80ms..
>>> So, yes, latency is good, but performance is poor.
>>>
>>>> before/after?
>>>>
>>>> tc -s qdisc show dev wlan0 during/after results?
>>>
>>> during the test:
>>>
>>> qdisc mq 0: root
>>>  Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17)
>>>  backlog 1545794b 1021p requeues 17
>>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>>>  backlog 1541252b 1018p requeues 17
>>>   maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 1
>>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>>
>>>
>>> after the test (60sec):
>>>
>>> qdisc mq 0: root
>>>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>>>  backlog 0b 0p requeues 28
>>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>>>  backlog 0b 0p requeues 28
>>>   maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 1
>>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>>
>>>
>>>> IF you are doing builds for the archer c7v2, I can join in on this... (?)
>>>
>>> I'm not but I have c7 somewhere, so I can do a build for it and also
>>> test, so we are on the same page.
>>>
>>>> I did do a test of the ath10k "before", fq_codel *never engaged*, and
>>>> tcp induced latencies under load, e at 100mbit, cracked 600ms, while
>>>> staying flat (20ms) at 100mbit. (not the same patches you are testing)
>>>> on x86. I have got tcp 300Mbit out of an osx box, similar latency,
>>>> have yet to get anything more on anything I currently have
>>>> before/after patchsets.
>>>>
>>>> I'll go add flooding to the tests, I just finished a series comparing
>>>> two different speed stations and life was good on that.
>>>>
>>>> "before" - fq_codel never engages, we see seconds of latency under load.
>>>>
>>>> root@apu2:~# tc -s qdisc show dev wlp4s0
>>>> qdisc mq 0: root
>>>>  Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
>>>>  backlog 0b 0p requeues 0
>>>> qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
>>>> target 5.0ms interval 100.0ms ecn
>>>>  Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
>>>>  backlog 0b 0p requeues 0
>>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>>   new_flows_len 0 old_flows_len 0
>>>> qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
>>>> target 5.0ms interval 100.0ms ecn
>>>>  Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
>>>>  backlog 0b 0p requeues 0
>>>>   maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>>>   new_flows_len 0 old_flows_len 1
>>>> qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
>>>> target 5.0ms interval 100.0ms ecn
>>>>  Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
>>>>  backlog 0b 0p requeues 0
>>>>   maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
>>>>   new_flows_len 1 old_flows_len 3
>>>> qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
>>>> target 5.0ms interval 100.0ms ecn
>>>>  Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
>>>>  backlog 0b 0p requeues 0
>>>>   maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>>>   new_flows_len 1 old_flows_len 0
>>>>   ```
>>>>
>>>>
>>>>>> This is certainly better than 30Mbps but still more than two times
>>>>>> less than before (900).
>>>>
>>>> The number that I still am not sure we got is that you were sending
>>>> 900mbit udp and recieving 900mbit on the prior tests?
>>>
>>> 900 was sending, AP POV (wifi client is downloading)
>>>
>>>>>> TCP also improved a little (550 to ~590).
>>>>
>>>> The limit is probably a bit low, also.  You might want to try target
>>>> 20ms as well.
>>>
>>> I've tried limit up to 1024 and target up to 80ms
>>>
>>>>>>
>>>>>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>>>>>> Doesn't look like it will save ath10k from performance regression.
>>>>
>>>> what was tcp "before"? (I'm sorry, such a long thread)
>>>
>>> 750Mbps
>>
>> Michal, after retesting with your patch (sorry, it was late yesterday,
>> confused compat-wireless archives) I saw the difference.
>> So the progress looks like this (all with fq_codel flows 16 limit 1024
>> target 20ms):
>> no patches: 380Mbps UDP, 550 TCP
>> Eric's (fq_codel drop) patch: 420Mbps UDP, 590 TCP (+40Mbps), latency
>> 5-6ms during test
>> Michal's (improve tx scheduling) patch: 580Mbps UDP, 660 TCP, latency
>> up to 30-40ms during test
>> after Rajkumar's proposal to "try without registering wake_tx_queue
>> callback": 820Mbps UDP, 690 TCP.
>
> And the simultaneous ping on the last test was?

same as previous: 30-40ms

Regards,
Roman
_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
@ 2016-05-16 19:46                                             ` Roman Yeryomin
  0 siblings, 0 replies; 108+ messages in thread
From: Roman Yeryomin @ 2016-05-16 19:46 UTC (permalink / raw)
  To: Dave Taht
  Cc: Rajkumar Manoharan, make-wifi-fast, Rafał Miłecki,
	ath10k, netdev, codel, Michal Kazior, Jesper Dangaard Brouer,
	Jonathan Morton, OpenWrt Development List, Felix Fietkau

On 16 May 2016 at 19:04, Dave Taht <dave.taht@gmail.com> wrote:
> On Mon, May 16, 2016 at 1:14 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>> On 16 May 2016 at 01:34, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>> On 6 May 2016 at 22:43, Dave Taht <dave.taht@gmail.com> wrote:
>>>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@gmail.com> wrote:
>>>>>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>>>>>>
>>>>>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>>>>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>>>>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>>>>>> is in some kind of conflict.
>>>>>>>
>>>>>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>>>>>
>>>>>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>>>>>
>>>>>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>>>>>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>>>>>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>>>>>
>>>>> Forgot to mention, I've reduced drop_batch_size down to 32
>>>>
>>>> 0) Not clear to me if that's the right line, there are 4 wifi queues,
>>>> and the third one
>>>> is the BE queue.
>>>
>>> That was an example, sorry, should have stated that. I've applied same
>>> settings to all 4 queues.
>>>
>>>> That is too low a limit, also, for normal use. And:
>>>> for the purpose of this particular UDP test, flows 16 is ok, but not
>>>> ideal.
>>>
>>> I played with different combinations, it doesn't make any
>>> (significant) difference: 20-30Mbps, not more.
>>> What numbers would you propose?
>>>
>>>> 1) What's the tcp number (with a simultaneous ping) with this latest patchset?
>>>> (I care about tcp performance a lot more than udp floods - surviving a
>>>> udp flood yes, performance, no)
>>>
>>> During the test (both TCP and UDP) it's roughly 5ms in average, not
>>> running tests ~2ms. Actually I'm now wondering if target is working at
>>> all, because I had same result with target 80ms..
>>> So, yes, latency is good, but performance is poor.
>>>
>>>> before/after?
>>>>
>>>> tc -s qdisc show dev wlan0 during/after results?
>>>
>>> during the test:
>>>
>>> qdisc mq 0: root
>>>  Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17)
>>>  backlog 1545794b 1021p requeues 17
>>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17)
>>>  backlog 1541252b 1018p requeues 17
>>>   maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 1
>>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>>
>>>
>>> after the test (60sec):
>>>
>>> qdisc mq 0: root
>>>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>>>  backlog 0b 0p requeues 28
>>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28)
>>>  backlog 0b 0p requeues 28
>>>   maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 1
>>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>>
>>>
>>>> IF you are doing builds for the archer c7v2, I can join in on this... (?)
>>>
>>> I'm not but I have c7 somewhere, so I can do a build for it and also
>>> test, so we are on the same page.
>>>
>>>> I did do a test of the ath10k "before", fq_codel *never engaged*, and
>>>> tcp induced latencies under load, e at 100mbit, cracked 600ms, while
>>>> staying flat (20ms) at 100mbit. (not the same patches you are testing)
>>>> on x86. I have got tcp 300Mbit out of an osx box, similar latency,
>>>> have yet to get anything more on anything I currently have
>>>> before/after patchsets.
>>>>
>>>> I'll go add flooding to the tests, I just finished a series comparing
>>>> two different speed stations and life was good on that.
>>>>
>>>> "before" - fq_codel never engages, we see seconds of latency under load.
>>>>
>>>> root@apu2:~# tc -s qdisc show dev wlp4s0
>>>> qdisc mq 0: root
>>>>  Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
>>>>  backlog 0b 0p requeues 0
>>>> qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
>>>> target 5.0ms interval 100.0ms ecn
>>>>  Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
>>>>  backlog 0b 0p requeues 0
>>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>>   new_flows_len 0 old_flows_len 0
>>>> qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
>>>> target 5.0ms interval 100.0ms ecn
>>>>  Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
>>>>  backlog 0b 0p requeues 0
>>>>   maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>>>   new_flows_len 0 old_flows_len 1
>>>> qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
>>>> target 5.0ms interval 100.0ms ecn
>>>>  Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
>>>>  backlog 0b 0p requeues 0
>>>>   maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
>>>>   new_flows_len 1 old_flows_len 3
>>>> qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
>>>> target 5.0ms interval 100.0ms ecn
>>>>  Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
>>>>  backlog 0b 0p requeues 0
>>>>   maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>>>>   new_flows_len 1 old_flows_len 0
>>>>   ```
>>>>
>>>>
>>>>>> This is certainly better than 30Mbps but still more than two times
>>>>>> less than before (900).
>>>>
>>>> The number that I still am not sure we got is that you were sending
>>>> 900mbit udp and recieving 900mbit on the prior tests?
>>>
>>> 900 was sending, AP POV (wifi client is downloading)
>>>
>>>>>> TCP also improved a little (550 to ~590).
>>>>
>>>> The limit is probably a bit low, also.  You might want to try target
>>>> 20ms as well.
>>>
>>> I've tried limit up to 1024 and target up to 80ms
>>>
>>>>>>
>>>>>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>>>>>> Doesn't look like it will save ath10k from performance regression.
>>>>
>>>> what was tcp "before"? (I'm sorry, such a long thread)
>>>
>>> 750Mbps
>>
>> Michal, after retesting with your patch (sorry, it was late yesterday,
>> confused compat-wireless archives) I saw the difference.
>> So the progress looks like this (all with fq_codel flows 16 limit 1024
>> target 20ms):
>> no patches: 380Mbps UDP, 550 TCP
>> Eric's (fq_codel drop) patch: 420Mbps UDP, 590 TCP (+40Mbps), latency
>> 5-6ms during test
>> Michal's (improve tx scheduling) patch: 580Mbps UDP, 660 TCP, latency
>> up to 30-40ms during test
>> after Rajkumar's proposal to "try without registering wake_tx_queue
>> callback": 820Mbps UDP, 690 TCP.
>
> And the simultaneous ping on the last test was?

same as previous: 30-40ms

Regards,
Roman

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH net-next] fq_codel: fix memory limitation drift
  2016-05-16  1:16                                       ` [PATCH net-next] fq_codel: fix memory limitation drift Eric Dumazet
@ 2016-05-17  1:57                                         ` David Miller
  0 siblings, 0 replies; 108+ messages in thread
From: David Miller @ 2016-05-17  1:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: brouer, dave.taht, netdev, moeller0

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 15 May 2016 18:16:38 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> memory_usage must be decreased in dequeue_func(), not in
> fq_codel_dequeue(), otherwise packets dropped by Codel algo
> are missing this decrease.
> 
> Also we need to clear memory_usage in fq_codel_reset()
> 
> Fixes: 95b58430abe7 ("fq_codel: add memory limitation per queue")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply	[flat|nested] 108+ messages in thread

end of thread, other threads:[~2016-05-17  1:57 UTC | newest]

Thread overview: 108+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-01  3:41 fq_codel_drop vs a udp flood Dave Taht
2016-05-01  4:46 ` [Make-wifi-fast] " Jonathan Morton
2016-05-01  5:08 ` Ben Greear
2016-05-01  5:23   ` Dave Taht
2016-05-01 14:47     ` [Make-wifi-fast] " dpreed
2016-05-02 14:03       ` Roman Yeryomin
2016-05-02 18:40         ` Dave Taht
2016-05-05 13:55           ` Roman Yeryomin
2016-05-05 14:55             ` Roman Yeryomin
2016-05-02 19:47         ` David Lang
2016-05-01 17:59 ` [Codel] " Eric Dumazet
2016-05-01 18:20   ` Jonathan Morton
2016-05-01 18:46     ` Eric Dumazet
2016-05-01 19:55       ` Eric Dumazet
2016-05-02  7:47         ` Jesper Dangaard Brouer
2016-05-01 20:35       ` Jonathan Morton
2016-05-01 20:55         ` Eric Dumazet
2016-05-02 14:18           ` Roman Yeryomin
2016-05-02 15:07             ` Eric Dumazet
2016-05-02 15:43               ` Roman Yeryomin
2016-05-02 16:14                 ` Eric Dumazet
2016-05-02 17:08                   ` Dave Taht
2016-05-02 17:44                     ` Eric Dumazet
2016-05-05 14:32                     ` Roman Yeryomin
2016-05-05 14:53                   ` Roman Yeryomin
2016-05-05 15:32                     ` Dave Taht
2016-05-05 16:07                       ` Roman Yeryomin
2016-05-05 16:59                         ` Jonathan Morton
2016-05-05 17:39                           ` Roman Yeryomin
2016-05-05 18:16                             ` Dave Taht
2016-05-05 18:33                           ` Dave Taht
2016-05-05 16:12                     ` Eric Dumazet
2016-05-05 16:25                       ` Roman Yeryomin
2016-05-05 16:42                         ` Roman Yeryomin
2016-05-06 10:55                           ` Roman Yeryomin
2016-05-05 19:23                         ` Eric Dumazet
2016-05-05 19:41                           ` Dave Taht
2016-05-06  8:41                             ` moeller0
2016-05-06 11:33                               ` Jesper Dangaard Brouer
2016-05-06 11:46                                 ` moeller0
2016-05-06 13:25                                   ` Eric Dumazet
2016-05-06 15:25                                     ` moeller0
2016-05-06 15:58                                       ` Eric Dumazet
2016-05-06 16:30                                         ` moeller0
2016-05-06 15:55                                     ` [PATCH net-next] fq_codel: add memory limitation per queue Eric Dumazet
2016-05-09  3:49                                       ` David Miller
2016-05-09  4:14                                       ` Cong Wang
2016-05-09  4:31                                         ` Eric Dumazet
2016-05-09  5:07                                           ` Cong Wang
2016-05-09 14:26                                             ` Eric Dumazet
2016-05-10  4:34                                               ` Cong Wang
2016-05-10  4:45                                                 ` Eric Dumazet
2016-05-10  4:57                                                   ` Cong Wang
2016-05-10  5:10                                                     ` Eric Dumazet
2016-05-16  1:16                                       ` [PATCH net-next] fq_codel: fix memory limitation drift Eric Dumazet
2016-05-17  1:57                                         ` David Miller
2016-05-06  9:42                           ` OpenWRT wrong adjustment of fq_codel defaults (Was: fq_codel_drop vs a udp flood) Jesper Dangaard Brouer
2016-05-06  9:42                             ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Jesper Dangaard Brouer
2016-05-06 12:47                             ` OpenWRT wrong adjustment of fq_codel defaults (Was: " Jesper Dangaard Brouer
2016-05-06 12:47                               ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Jesper Dangaard Brouer
2016-05-06 18:43                               ` OpenWRT wrong adjustment of fq_codel defaults (Was: " Roman Yeryomin
2016-05-06 18:43                                 ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
2016-05-06 18:56                                 ` OpenWRT wrong adjustment of fq_codel defaults (Was: " Roman Yeryomin
2016-05-06 18:56                                   ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
2016-05-06 19:43                                   ` OpenWRT wrong adjustment of fq_codel defaults (Was: " Dave Taht
2016-05-06 19:43                                     ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Dave Taht
2016-05-15 22:34                                     ` OpenWRT wrong adjustment of fq_codel defaults (Was: " Roman Yeryomin
2016-05-15 22:34                                       ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
2016-05-15 23:07                                       ` OpenWRT wrong adjustment of fq_codel defaults (Was: " Eric Dumazet
2016-05-15 23:07                                         ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Eric Dumazet
2016-05-15 23:27                                         ` OpenWRT wrong adjustment of fq_codel defaults (Was: " Roman Yeryomin
2016-05-15 23:27                                           ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
2016-05-16  8:12                                       ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: " David Lang
2016-05-16  8:12                                         ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " David Lang
2016-05-16  8:26                                         ` Roman Yeryomin
2016-05-16  8:26                                           ` Roman Yeryomin
2016-05-16  8:46                                           ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: " David Lang
2016-05-16  8:46                                             ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " David Lang
2016-05-16 10:34                                             ` [OpenWrt-Devel] [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: " Sebastian Moeller
2016-05-16 10:34                                               ` [OpenWrt-Devel] [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Sebastian Moeller
2016-05-16  8:14                                       ` OpenWRT wrong adjustment of fq_codel defaults (Was: " Roman Yeryomin
2016-05-16  8:14                                         ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
2016-05-16 14:23                                         ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: " Eric Dumazet
2016-05-16 14:23                                           ` [Make-wifi-fast] OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Eric Dumazet
2016-05-16 16:04                                         ` Dave Taht
2016-05-16 16:04                                           ` Dave Taht
2016-05-16 19:46                                           ` OpenWRT wrong adjustment of fq_codel defaults (Was: " Roman Yeryomin
2016-05-16 19:46                                             ` OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] " Roman Yeryomin
2016-05-07  9:57                             ` Kevin Darbyshire-Bryant
2016-05-15 22:47                               ` Roman Yeryomin
2016-05-15 22:47                                 ` Roman Yeryomin
2016-05-03  2:26     ` [Codel] fq_codel_drop vs a udp flood Dave Taht
2016-05-03  5:21       ` Dave Taht
2016-05-03 12:39         ` Agarwal, Anil
2016-05-03 12:50           ` Agarwal, Anil
2016-05-03 13:35             ` Eric Dumazet
2016-05-03 15:37               ` Agarwal, Anil
2016-05-03 17:37               ` Dave Taht
2016-05-03 17:54                 ` Eric Dumazet
2016-05-03 18:11                   ` Dave Taht
2016-05-03 13:20       ` Kevin Darbyshire-Bryant
2016-05-01 18:26   ` Dave Taht
2016-05-01 22:30     ` Eric Dumazet
2016-05-02 14:09   ` Roman Yeryomin
2016-05-02 15:04     ` Eric Dumazet
2016-05-02 15:42       ` Roman Yeryomin
2016-05-02 13:47 ` [Make-wifi-fast] " Roman Yeryomin
2016-05-02 15:01   ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.