All of lore.kernel.org
 help / color / mirror / Atom feed
* TCP reaching to maximum throughput after a long time
@ 2016-04-12 12:17 Machani, Yaniv
  2016-04-12 14:52 ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Machani, Yaniv @ 2016-04-12 12:17 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
	Nandita Dukkipati, open list, Kama, Meirav

Hi,
After updating from Kernel 3.14 to Kernel 4.4 we have seen a TCP performance degradation over Wi-Fi.
In 3.14 kernel, TCP got to its max throughout after less than a second, while in the 4.4  it is taking ~20-30 seconds.
UDP TX/RX and TCP RX performance is as expected.
We are using a Beagle Bone Black and a WiLink8 device.

Were there any related changes that might cause such behavior ?
Kernel configuration and sysctl values were compared, but no significant differences have been found.

See a log of the behavior below :
-----------------------------------------------------------
Client connecting to 10.2.46.5, TCP port 5001
TCP window size:  320 KByte (WARNING: requested  256 KByte)
------------------------------------------------------------
[  3] local 10.2.46.6 port 49282 connected with 10.2.46.5 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  5.75 MBytes  48.2 Mbits/sec
[  3]  1.0- 2.0 sec  6.50 MBytes  54.5 Mbits/sec
[  3]  2.0- 3.0 sec  6.50 MBytes  54.5 Mbits/sec
[  3]  3.0- 4.0 sec  6.50 MBytes  54.5 Mbits/sec
[  3]  4.0- 5.0 sec  6.75 MBytes  56.6 Mbits/sec
[  3]  5.0- 6.0 sec  3.38 MBytes  28.3 Mbits/sec
[  3]  6.0- 7.0 sec  6.38 MBytes  53.5 Mbits/sec
[  3]  7.0- 8.0 sec  6.88 MBytes  57.7 Mbits/sec
[  3]  8.0- 9.0 sec  7.12 MBytes  59.8 Mbits/sec
[  3]  9.0-10.0 sec  7.12 MBytes  59.8 Mbits/sec
[  3] 10.0-11.0 sec  7.12 MBytes  59.8 Mbits/sec
[  3] 11.0-12.0 sec  7.25 MBytes  60.8 Mbits/sec
[  3] 12.0-13.0 sec  7.12 MBytes  59.8 Mbits/sec
[  3] 13.0-14.0 sec  7.25 MBytes  60.8 Mbits/sec
[  3] 14.0-15.0 sec  7.62 MBytes  64.0 Mbits/sec
[  3] 15.0-16.0 sec  7.88 MBytes  66.1 Mbits/sec
[  3] 16.0-17.0 sec  8.12 MBytes  68.2 Mbits/sec
[  3] 17.0-18.0 sec  8.25 MBytes  69.2 Mbits/sec
[  3] 18.0-19.0 sec  8.50 MBytes  71.3 Mbits/sec
[  3] 19.0-20.0 sec  8.88 MBytes  74.4 Mbits/sec
[  3] 20.0-21.0 sec  8.75 MBytes  73.4 Mbits/sec
[  3] 21.0-22.0 sec  8.62 MBytes  72.4 Mbits/sec
[  3] 22.0-23.0 sec  8.75 MBytes  73.4 Mbits/sec
[  3] 23.0-24.0 sec  8.50 MBytes  71.3 Mbits/sec
[  3] 24.0-25.0 sec  8.62 MBytes  72.4 Mbits/sec
[  3] 25.0-26.0 sec  8.62 MBytes  72.4 Mbits/sec
[  3] 26.0-27.0 sec  8.62 MBytes  72.4 Mbits/sec

--
Thanks,
Yaniv Machani

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TCP reaching to maximum throughput after a long time
  2016-04-12 12:17 TCP reaching to maximum throughput after a long time Machani, Yaniv
@ 2016-04-12 14:52 ` Eric Dumazet
  2016-04-12 15:04   ` Ben Greear
  2016-04-12 17:05   ` Yuchung Cheng
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2016-04-12 14:52 UTC (permalink / raw)
  To: Machani, Yaniv, netdev
  Cc: David S. Miller, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
	Nandita Dukkipati, open list, Kama, Meirav

On Tue, 2016-04-12 at 12:17 +0000, Machani, Yaniv wrote:
> Hi,
> After updating from Kernel 3.14 to Kernel 4.4 we have seen a TCP performance degradation over Wi-Fi.
> In 3.14 kernel, TCP got to its max throughout after less than a second, while in the 4.4  it is taking ~20-30 seconds.
> UDP TX/RX and TCP RX performance is as expected.
> We are using a Beagle Bone Black and a WiLink8 device.
> 
> Were there any related changes that might cause such behavior ?
> Kernel configuration and sysctl values were compared, but no significant differences have been found.
> 
> See a log of the behavior below :
> -----------------------------------------------------------
> Client connecting to 10.2.46.5, TCP port 5001
> TCP window size:  320 KByte (WARNING: requested  256 KByte)
> ------------------------------------------------------------
> [  3] local 10.2.46.6 port 49282 connected with 10.2.46.5 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0- 1.0 sec  5.75 MBytes  48.2 Mbits/sec
> [  3]  1.0- 2.0 sec  6.50 MBytes  54.5 Mbits/sec
> [  3]  2.0- 3.0 sec  6.50 MBytes  54.5 Mbits/sec
> [  3]  3.0- 4.0 sec  6.50 MBytes  54.5 Mbits/sec
> [  3]  4.0- 5.0 sec  6.75 MBytes  56.6 Mbits/sec
> [  3]  5.0- 6.0 sec  3.38 MBytes  28.3 Mbits/sec
> [  3]  6.0- 7.0 sec  6.38 MBytes  53.5 Mbits/sec
> [  3]  7.0- 8.0 sec  6.88 MBytes  57.7 Mbits/sec
> [  3]  8.0- 9.0 sec  7.12 MBytes  59.8 Mbits/sec
> [  3]  9.0-10.0 sec  7.12 MBytes  59.8 Mbits/sec
> [  3] 10.0-11.0 sec  7.12 MBytes  59.8 Mbits/sec
> [  3] 11.0-12.0 sec  7.25 MBytes  60.8 Mbits/sec
> [  3] 12.0-13.0 sec  7.12 MBytes  59.8 Mbits/sec
> [  3] 13.0-14.0 sec  7.25 MBytes  60.8 Mbits/sec
> [  3] 14.0-15.0 sec  7.62 MBytes  64.0 Mbits/sec
> [  3] 15.0-16.0 sec  7.88 MBytes  66.1 Mbits/sec
> [  3] 16.0-17.0 sec  8.12 MBytes  68.2 Mbits/sec
> [  3] 17.0-18.0 sec  8.25 MBytes  69.2 Mbits/sec
> [  3] 18.0-19.0 sec  8.50 MBytes  71.3 Mbits/sec
> [  3] 19.0-20.0 sec  8.88 MBytes  74.4 Mbits/sec
> [  3] 20.0-21.0 sec  8.75 MBytes  73.4 Mbits/sec
> [  3] 21.0-22.0 sec  8.62 MBytes  72.4 Mbits/sec
> [  3] 22.0-23.0 sec  8.75 MBytes  73.4 Mbits/sec
> [  3] 23.0-24.0 sec  8.50 MBytes  71.3 Mbits/sec
> [  3] 24.0-25.0 sec  8.62 MBytes  72.4 Mbits/sec
> [  3] 25.0-26.0 sec  8.62 MBytes  72.4 Mbits/sec
> [  3] 26.0-27.0 sec  8.62 MBytes  72.4 Mbits/sec
> 

CC netdev, where this is better discussed.

This could be a lot of different factors, and caused by a sender
problem, a receiver problem, ...

TCP behavior depends on the drivers, so maybe a change there can explain
this.

You could capture the first 5000 frames of the flow and post the pcap ?
(-s 128 to capture only the headers)

tcpdump -p -s 128 -i eth0 -c 5000 host 10.2.46.5 -w flow.pcap

 
Also, while test is running, you could fetch 
ss -temoi dst 10.2.46.5:5001

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TCP reaching to maximum throughput after a long time
  2016-04-12 14:52 ` Eric Dumazet
@ 2016-04-12 15:04   ` Ben Greear
  2016-04-12 19:31     ` Machani, Yaniv
  2016-04-12 17:05   ` Yuchung Cheng
  1 sibling, 1 reply; 13+ messages in thread
From: Ben Greear @ 2016-04-12 15:04 UTC (permalink / raw)
  To: Machani, Yaniv, netdev
  Cc: Eric Dumazet, David S. Miller, Eric Dumazet, Neal Cardwell,
	Yuchung Cheng, Nandita Dukkipati, open list, Kama, Meirav

On 04/12/2016 07:52 AM, Eric Dumazet wrote:
> On Tue, 2016-04-12 at 12:17 +0000, Machani, Yaniv wrote:
>> Hi,
>> After updating from Kernel 3.14 to Kernel 4.4 we have seen a TCP performance degradation over Wi-Fi.
>> In 3.14 kernel, TCP got to its max throughout after less than a second, while in the 4.4  it is taking ~20-30 seconds.
>> UDP TX/RX and TCP RX performance is as expected.
>> We are using a Beagle Bone Black and a WiLink8 device.
>>
>> Were there any related changes that might cause such behavior ?
>> Kernel configuration and sysctl values were compared, but no significant differences have been found.

If you are using 'Cubic' TCP congestion control, then please try something different.
It was broken last I checked, at least when used with the ath10k driver.

https://marc.info/?l=linux-netdev&m=144405216005715&w=2

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TCP reaching to maximum throughput after a long time
  2016-04-12 14:52 ` Eric Dumazet
  2016-04-12 15:04   ` Ben Greear
@ 2016-04-12 17:05   ` Yuchung Cheng
  1 sibling, 0 replies; 13+ messages in thread
From: Yuchung Cheng @ 2016-04-12 17:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Machani, Yaniv, netdev, David S. Miller, Eric Dumazet,
	Neal Cardwell, Nandita Dukkipati, open list, Kama, Meirav

On Tue, Apr 12, 2016 at 7:52 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> On Tue, 2016-04-12 at 12:17 +0000, Machani, Yaniv wrote:
> > Hi,
> > After updating from Kernel 3.14 to Kernel 4.4 we have seen a TCP performance degradation over Wi-Fi.
> > In 3.14 kernel, TCP got to its max throughout after less than a second, while in the 4.4  it is taking ~20-30 seconds.
> > UDP TX/RX and TCP RX performance is as expected.
> > We are using a Beagle Bone Black and a WiLink8 device.
> >
> > Were there any related changes that might cause such behavior ?
> > Kernel configuration and sysctl values were compared, but no significant differences have been found.
> >
> > See a log of the behavior below :
> > -----------------------------------------------------------
> > Client connecting to 10.2.46.5, TCP port 5001
> > TCP window size:  320 KByte (WARNING: requested  256 KByte)
> > ------------------------------------------------------------
> > [  3] local 10.2.46.6 port 49282 connected with 10.2.46.5 port 5001
> > [ ID] Interval       Transfer     Bandwidth
> > [  3]  0.0- 1.0 sec  5.75 MBytes  48.2 Mbits/sec
> > [  3]  1.0- 2.0 sec  6.50 MBytes  54.5 Mbits/sec
> > [  3]  2.0- 3.0 sec  6.50 MBytes  54.5 Mbits/sec
> > [  3]  3.0- 4.0 sec  6.50 MBytes  54.5 Mbits/sec
> > [  3]  4.0- 5.0 sec  6.75 MBytes  56.6 Mbits/sec
> > [  3]  5.0- 6.0 sec  3.38 MBytes  28.3 Mbits/sec
> > [  3]  6.0- 7.0 sec  6.38 MBytes  53.5 Mbits/sec
> > [  3]  7.0- 8.0 sec  6.88 MBytes  57.7 Mbits/sec
> > [  3]  8.0- 9.0 sec  7.12 MBytes  59.8 Mbits/sec
> > [  3]  9.0-10.0 sec  7.12 MBytes  59.8 Mbits/sec
> > [  3] 10.0-11.0 sec  7.12 MBytes  59.8 Mbits/sec
> > [  3] 11.0-12.0 sec  7.25 MBytes  60.8 Mbits/sec
> > [  3] 12.0-13.0 sec  7.12 MBytes  59.8 Mbits/sec
> > [  3] 13.0-14.0 sec  7.25 MBytes  60.8 Mbits/sec
> > [  3] 14.0-15.0 sec  7.62 MBytes  64.0 Mbits/sec
> > [  3] 15.0-16.0 sec  7.88 MBytes  66.1 Mbits/sec
> > [  3] 16.0-17.0 sec  8.12 MBytes  68.2 Mbits/sec
> > [  3] 17.0-18.0 sec  8.25 MBytes  69.2 Mbits/sec
> > [  3] 18.0-19.0 sec  8.50 MBytes  71.3 Mbits/sec
> > [  3] 19.0-20.0 sec  8.88 MBytes  74.4 Mbits/sec
> > [  3] 20.0-21.0 sec  8.75 MBytes  73.4 Mbits/sec
> > [  3] 21.0-22.0 sec  8.62 MBytes  72.4 Mbits/sec
> > [  3] 22.0-23.0 sec  8.75 MBytes  73.4 Mbits/sec
> > [  3] 23.0-24.0 sec  8.50 MBytes  71.3 Mbits/sec
> > [  3] 24.0-25.0 sec  8.62 MBytes  72.4 Mbits/sec
> > [  3] 25.0-26.0 sec  8.62 MBytes  72.4 Mbits/sec
> > [  3] 26.0-27.0 sec  8.62 MBytes  72.4 Mbits/sec
> >
>
> CC netdev, where this is better discussed.
>
> This could be a lot of different factors, and caused by a sender
> problem, a receiver problem, ...
>
> TCP behavior depends on the drivers, so maybe a change there can explain
> this.
>
> You could capture the first 5000 frames of the flow and post the pcap ?
> (-s 128 to capture only the headers)
pcap would be really helpful indeed. if possible please capture on
both 4.4 and 3.14 kernels.

>
> tcpdump -p -s 128 -i eth0 -c 5000 host 10.2.46.5 -w flow.pcap
>
>
> Also, while test is running, you could fetch
> ss -temoi dst 10.2.46.5:5001
>
>
>
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: TCP reaching to maximum throughput after a long time
  2016-04-12 15:04   ` Ben Greear
@ 2016-04-12 19:31     ` Machani, Yaniv
  2016-04-12 20:11       ` Ben Greear
  0 siblings, 1 reply; 13+ messages in thread
From: Machani, Yaniv @ 2016-04-12 19:31 UTC (permalink / raw)
  To: Ben Greear, netdev
  Cc: Eric Dumazet, David S. Miller, Eric Dumazet, Neal Cardwell,
	Yuchung Cheng, Nandita Dukkipati, open list, Kama, Meirav

On Tue, Apr 12, 2016 at 18:04:52, Ben Greear wrote:
> On 04/12/2016 07:52 AM, Eric Dumazet wrote:
> > On Tue, 2016-04-12 at 12:17 +0000, Machani, Yaniv wrote:
>>>
> 
> If you are using 'Cubic' TCP congestion control, then please try 
> something different.
> It was broken last I checked, at least when used with the ath10k driver.
> 

Thanks Ben, this indeed seems to be the issue !
Switching to reno got me to max throughput instantly.

I'm still looking through the thread you have shared, but from what I understand there is no planned fix for it ?

--
Thanks,
Yaniv

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TCP reaching to maximum throughput after a long time
  2016-04-12 19:31     ` Machani, Yaniv
@ 2016-04-12 20:11       ` Ben Greear
  2016-04-12 20:17         ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Greear @ 2016-04-12 20:11 UTC (permalink / raw)
  To: Machani, Yaniv, netdev
  Cc: Eric Dumazet, David S. Miller, Eric Dumazet, Neal Cardwell,
	Yuchung Cheng, Nandita Dukkipati, open list, Kama, Meirav

On 04/12/2016 12:31 PM, Machani, Yaniv wrote:
> On Tue, Apr 12, 2016 at 18:04:52, Ben Greear wrote:
>> On 04/12/2016 07:52 AM, Eric Dumazet wrote:
>>> On Tue, 2016-04-12 at 12:17 +0000, Machani, Yaniv wrote:
>>>>
>>
>> If you are using 'Cubic' TCP congestion control, then please try
>> something different.
>> It was broken last I checked, at least when used with the ath10k driver.
>>
>
> Thanks Ben, this indeed seems to be the issue !
> Switching to reno got me to max throughput instantly.
>
> I'm still looking through the thread you have shared, but from what I understand there is no planned fix for it ?

I think at the time it was blamed on ath10k and no one cared to try to fix it.

Or, maybe no one really uses CUBIC anymore?

Either way, I have no plans to try to fix CUBIC, but maybe someone who knows
this code better could give it a try.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TCP reaching to maximum throughput after a long time
  2016-04-12 20:11       ` Ben Greear
@ 2016-04-12 20:17         ` Eric Dumazet
  2016-04-12 20:23           ` Ben Greear
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2016-04-12 20:17 UTC (permalink / raw)
  To: Ben Greear
  Cc: Machani, Yaniv, netdev, David S. Miller, Eric Dumazet,
	Neal Cardwell, Yuchung Cheng, Nandita Dukkipati, open list, Kama,
	Meirav

On Tue, 2016-04-12 at 13:11 -0700, Ben Greear wrote:
> On 04/12/2016 12:31 PM, Machani, Yaniv wrote:
> > On Tue, Apr 12, 2016 at 18:04:52, Ben Greear wrote:
> >> On 04/12/2016 07:52 AM, Eric Dumazet wrote:
> >>> On Tue, 2016-04-12 at 12:17 +0000, Machani, Yaniv wrote:
> >>>>
> >>
> >> If you are using 'Cubic' TCP congestion control, then please try
> >> something different.
> >> It was broken last I checked, at least when used with the ath10k driver.
> >>
> >
> > Thanks Ben, this indeed seems to be the issue !
> > Switching to reno got me to max throughput instantly.
> >
> > I'm still looking through the thread you have shared, but from what I understand there is no planned fix for it ?
> 
> I think at the time it was blamed on ath10k and no one cared to try to fix it.
> 
> Or, maybe no one really uses CUBIC anymore?
> 
> Either way, I have no plans to try to fix CUBIC, but maybe someone who knows
> this code better could give it a try.

Well, cubic seems to work in many cases, assuming they are not too many
drops.

Assuming one flow can get nominal speed in few RTT is kind a dream, and
so far nobody claimed a CC was able to do that, while still being fair
and resilient.

TCP CC are full of heuristics, and by definition heuristics that were
working 6 years ago might need to be refreshed.

We are still maintaining Cubic for sure.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TCP reaching to maximum throughput after a long time
  2016-04-12 20:17         ` Eric Dumazet
@ 2016-04-12 20:23           ` Ben Greear
  2016-04-12 20:29             ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Greear @ 2016-04-12 20:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Machani, Yaniv, netdev, David S. Miller, Eric Dumazet,
	Neal Cardwell, Yuchung Cheng, Nandita Dukkipati, open list, Kama,
	Meirav

On 04/12/2016 01:17 PM, Eric Dumazet wrote:
> On Tue, 2016-04-12 at 13:11 -0700, Ben Greear wrote:
>> On 04/12/2016 12:31 PM, Machani, Yaniv wrote:
>>> On Tue, Apr 12, 2016 at 18:04:52, Ben Greear wrote:
>>>> On 04/12/2016 07:52 AM, Eric Dumazet wrote:
>>>>> On Tue, 2016-04-12 at 12:17 +0000, Machani, Yaniv wrote:
>>>>>>
>>>>
>>>> If you are using 'Cubic' TCP congestion control, then please try
>>>> something different.
>>>> It was broken last I checked, at least when used with the ath10k driver.
>>>>
>>>
>>> Thanks Ben, this indeed seems to be the issue !
>>> Switching to reno got me to max throughput instantly.
>>>
>>> I'm still looking through the thread you have shared, but from what I understand there is no planned fix for it ?
>>
>> I think at the time it was blamed on ath10k and no one cared to try to fix it.
>>
>> Or, maybe no one really uses CUBIC anymore?
>>
>> Either way, I have no plans to try to fix CUBIC, but maybe someone who knows
>> this code better could give it a try.
>
> Well, cubic seems to work in many cases, assuming they are not too many
> drops.
>
> Assuming one flow can get nominal speed in few RTT is kind a dream, and
> so far nobody claimed a CC was able to do that, while still being fair
> and resilient.
>
> TCP CC are full of heuristics, and by definition heuristics that were
> working 6 years ago might need to be refreshed.
>
> We are still maintaining Cubic for sure.

It worked well enough for years that I didn't even know other algorithms were
available.  It was broken around 4.0 time, and I reported it to the list,
and no one seemed to really care enough to do anything about it.  I changed
to reno and ignored the problem as well.

It is trivially easy to see the regression when using ath10k NIC, and from this email
thread, I guess other NICs have similar issues.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TCP reaching to maximum throughput after a long time
  2016-04-12 20:23           ` Ben Greear
@ 2016-04-12 20:29             ` Eric Dumazet
  2016-04-12 21:40               ` Ben Greear
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2016-04-12 20:29 UTC (permalink / raw)
  To: Ben Greear
  Cc: Machani, Yaniv, netdev, David S. Miller, Eric Dumazet,
	Neal Cardwell, Yuchung Cheng, Nandita Dukkipati, open list, Kama,
	Meirav

On Tue, 2016-04-12 at 13:23 -0700, Ben Greear wrote:

> It worked well enough for years that I didn't even know other algorithms were
> available.  It was broken around 4.0 time, and I reported it to the list,
> and no one seemed to really care enough to do anything about it.  I changed
> to reno and ignored the problem as well.
> 
> It is trivially easy to see the regression when using ath10k NIC, and from this email
> thread, I guess other NICs have similar issues.

Since it is so trivial, why don't you start a bisection ?

I asked a capture, I did not say ' switch to Reno or whatever ', right ?

Guessing is nice, but investigating and fixing is better.

Do not assume that nothing can be done, please ?

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TCP reaching to maximum throughput after a long time
  2016-04-12 20:29             ` Eric Dumazet
@ 2016-04-12 21:40               ` Ben Greear
  2016-04-13  3:08                 ` Yuchung Cheng
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Greear @ 2016-04-12 21:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Machani, Yaniv, netdev, David S. Miller, Eric Dumazet,
	Neal Cardwell, Yuchung Cheng, Nandita Dukkipati, open list, Kama,
	Meirav

On 04/12/2016 01:29 PM, Eric Dumazet wrote:
> On Tue, 2016-04-12 at 13:23 -0700, Ben Greear wrote:
>
>> It worked well enough for years that I didn't even know other algorithms were
>> available.  It was broken around 4.0 time, and I reported it to the list,
>> and no one seemed to really care enough to do anything about it.  I changed
>> to reno and ignored the problem as well.
>>
>> It is trivially easy to see the regression when using ath10k NIC, and from this email
>> thread, I guess other NICs have similar issues.
>
> Since it is so trivial, why don't you start a bisection ?

I vaguely remember doing a bisect, but I can't find any email about
that, so maybe I didn't.  At any rate, it is somewhere between 3.17 and 4.0.
 From memory, it was between 3.19 and 4.0, but I am not certain of that.

Neil's suggestion, from the thread below, is that it was likely:  "605ad7f tcp: refine TSO autosizing"

Here is previous email thread:

https://www.mail-archive.com/netdev@vger.kernel.org/msg80803.html

This one has a link to a pcap I made at the time:

https://www.mail-archive.com/netdev@vger.kernel.org/msg80890.html

>
> I asked a capture, I did not say ' switch to Reno or whatever ', right ?
>
> Guessing is nice, but investigating and fixing is better.
>
> Do not assume that nothing can be done, please ?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TCP reaching to maximum throughput after a long time
  2016-04-12 21:40               ` Ben Greear
@ 2016-04-13  3:08                 ` Yuchung Cheng
  2016-04-13  3:32                   ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Yuchung Cheng @ 2016-04-13  3:08 UTC (permalink / raw)
  To: Ben Greear
  Cc: Eric Dumazet, Machani, Yaniv, netdev, David S. Miller,
	Eric Dumazet, Neal Cardwell, Nandita Dukkipati, open list, Kama,
	Meirav

On Tue, Apr 12, 2016 at 2:40 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 04/12/2016 01:29 PM, Eric Dumazet wrote:
>>
>> On Tue, 2016-04-12 at 13:23 -0700, Ben Greear wrote:
>>
>>> It worked well enough for years that I didn't even know other algorithms
>>> were
>>> available.  It was broken around 4.0 time, and I reported it to the list,
>>> and no one seemed to really care enough to do anything about it.  I
>>> changed
>>> to reno and ignored the problem as well.
>>>
>>> It is trivially easy to see the regression when using ath10k NIC, and
>>> from this email
>>> thread, I guess other NICs have similar issues.
>>
>>
>> Since it is so trivial, why don't you start a bisection ?
>
>
> I vaguely remember doing a bisect, but I can't find any email about
> that, so maybe I didn't.  At any rate, it is somewhere between 3.17 and 4.0.
> From memory, it was between 3.19 and 4.0, but I am not certain of that.
>
> Neil's suggestion, from the thread below, is that it was likely:  "605ad7f
> tcp: refine TSO autosizing"
>
> Here is previous email thread:
>
> https://www.mail-archive.com/netdev@vger.kernel.org/msg80803.html
>
> This one has a link to a pcap I made at the time:
>
> https://www.mail-archive.com/netdev@vger.kernel.org/msg80890.html
based on the prev thread I propose we disable hystart ack-train. It is
brittle under various circumstances. We've disabled that at Google for
years.

>
>>
>> I asked a capture, I did not say ' switch to Reno or whatever ', right ?
>>
>> Guessing is nice, but investigating and fixing is better.
>>
>> Do not assume that nothing can be done, please ?
>
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TCP reaching to maximum throughput after a long time
  2016-04-13  3:08                 ` Yuchung Cheng
@ 2016-04-13  3:32                   ` Eric Dumazet
       [not found]                     ` <CADVnQy=1eZbWxLRJ3t8grazBJzQrF6LjudiX3HF3sG=sNmGq5Q@mail.gmail.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2016-04-13  3:32 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Ben Greear, Machani, Yaniv, netdev, David S. Miller,
	Eric Dumazet, Neal Cardwell, Nandita Dukkipati, open list, Kama,
	Meirav

On Tue, 2016-04-12 at 20:08 -0700, Yuchung Cheng wrote:

> based on the prev thread I propose we disable hystart ack-train. It is
> brittle under various circumstances. We've disabled that at Google for
> years.

Right, but because we also use sch_fq packet scheduler and pacing ;)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: TCP reaching to maximum throughput after a long time
       [not found]                     ` <CADVnQy=1eZbWxLRJ3t8grazBJzQrF6LjudiX3HF3sG=sNmGq5Q@mail.gmail.com>
@ 2016-04-13 20:26                       ` Machani, Yaniv
  0 siblings, 0 replies; 13+ messages in thread
From: Machani, Yaniv @ 2016-04-13 20:26 UTC (permalink / raw)
  To: Neal Cardwell, Eric Dumazet
  Cc: Yuchung Cheng, Ben Greear, netdev, David S. Miller, Eric Dumazet,
	Nandita Dukkipati, open list, Kama, Meirav

On Wed, Apr 13, 2016 at 17:32:29, Neal Cardwell wrote:
> Miller; Eric Dumazet; Nandita Dukkipati; open list; Kama, Meirav
> Subject: Re: TCP reaching to maximum throughput after a long time
> 
> I like the idea to disable hystart ack-train.
> 
> 
> Yaniv, can you please re-run your test with CUBIC in three different
> scenarios:
> 
> a) echo 0 > /sys/module/tcp_cubic/parameters/hystart_detect
This fixes the issues, got to max throughput immediately. 

> 
> b) echo 1 > /sys/module/tcp_cubic/parameters/hystart_detect
> 
This shows the same results as before, starting low and increasing slowly.

>
> c) echo 2 > /sys/module/tcp_cubic/parameters/hystart_detect
This gets us a bit higher at the beginning, but never (waited ~60 sec) goes to the max.


Appreciate your help on this.
Yaniv
> 
> 
> This should help us isolate whether the hystart ack-train algorithm is 
> causing problems in this particular case.
> 
> Thanks!
> neal
> 
> 
> 
> On Tue, Apr 12, 2016 at 11:32 PM, Eric Dumazet 
> <eric.dumazet@gmail.com>
> wrote:
> 
> 
> 	On Tue, 2016-04-12 at 20:08 -0700, Yuchung Cheng wrote:
> 
> 	> based on the prev thread I propose we disable hystart ack-train. It 
> is
> 	> brittle under various circumstances. We've disabled that at Google 
> for
> 	> years.
> 
> 	Right, but because we also use sch_fq packet scheduler and pacing ;)
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-04-13 20:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-12 12:17 TCP reaching to maximum throughput after a long time Machani, Yaniv
2016-04-12 14:52 ` Eric Dumazet
2016-04-12 15:04   ` Ben Greear
2016-04-12 19:31     ` Machani, Yaniv
2016-04-12 20:11       ` Ben Greear
2016-04-12 20:17         ` Eric Dumazet
2016-04-12 20:23           ` Ben Greear
2016-04-12 20:29             ` Eric Dumazet
2016-04-12 21:40               ` Ben Greear
2016-04-13  3:08                 ` Yuchung Cheng
2016-04-13  3:32                   ` Eric Dumazet
     [not found]                     ` <CADVnQy=1eZbWxLRJ3t8grazBJzQrF6LjudiX3HF3sG=sNmGq5Q@mail.gmail.com>
2016-04-13 20:26                       ` Machani, Yaniv
2016-04-12 17:05   ` Yuchung Cheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.