All of lore.kernel.org
 help / color / mirror / Atom feed
* TCP funny-ness when over-driving a 1Gbps link.
@ 2011-05-19 22:47 Ben Greear
  2011-05-19 23:18 ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2011-05-19 22:47 UTC (permalink / raw)
  To: netdev

I noticed something that struck me as a bit weird today,
but perhaps it's normal.

I was using our application to create 3 TCP streams from one port to
another (1Gbps, igb driver), running through a network emulator.
Traffic is flowing bi-directional in each connection.

I am doing 24k byte writes per system call.  I tried 100ms, 10ms, and 1ms
latency (one-way) in the emulator, but behaviour is similar in each case.
The rest of this info was gathered with 1ms delay in the emulator.

If I ask all 3 connections to run 1Gbps, netstat shows 30+GB in the
sending queues and 1+ second latency (user-space to user-space).  Aggregate
throughput is around 700Mbps in each direction.

But, if I ask each of the connections to run at 300Mbps, latency averages
2ms and each connection runs right at 300Mbps (950Mbps or so on the wire).

It seems that when you over-drive the link, things back up and perform
quite badly over-all.

This is a core-i7 3.2Ghz with 12GB RAM, Fedora 14, 2.6.38.6 kernel
(with some hacks), 64-bit OS and user-space app.  Quick testing on 2.6.36.3
showed similar results, so I don't think it's a regression.

I am curious if others see similar results?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP funny-ness when over-driving a 1Gbps link.
  2011-05-19 22:47 TCP funny-ness when over-driving a 1Gbps link Ben Greear
@ 2011-05-19 23:18 ` Stephen Hemminger
  2011-05-19 23:20   ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2011-05-19 23:18 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

On Thu, 19 May 2011 15:47:14 -0700
Ben Greear <greearb@candelatech.com> wrote:

> I noticed something that struck me as a bit weird today,
> but perhaps it's normal.
> 
> I was using our application to create 3 TCP streams from one port to
> another (1Gbps, igb driver), running through a network emulator.
> Traffic is flowing bi-directional in each connection.
> 
> I am doing 24k byte writes per system call.  I tried 100ms, 10ms, and 1ms
> latency (one-way) in the emulator, but behaviour is similar in each case.
> The rest of this info was gathered with 1ms delay in the emulator.
> 
> If I ask all 3 connections to run 1Gbps, netstat shows 30+GB in the
> sending queues and 1+ second latency (user-space to user-space).  Aggregate
> throughput is around 700Mbps in each direction.
> 
> But, if I ask each of the connections to run at 300Mbps, latency averages
> 2ms and each connection runs right at 300Mbps (950Mbps or so on the wire).
> 
> It seems that when you over-drive the link, things back up and perform
> quite badly over-all.
> 
> This is a core-i7 3.2Ghz with 12GB RAM, Fedora 14, 2.6.38.6 kernel
> (with some hacks), 64-bit OS and user-space app.  Quick testing on 2.6.36.3
> showed similar results, so I don't think it's a regression.
> 
> I am curious if others see similar results?
> 
> Thanks,
> Ben
> 

If you overdrive, TCP expects your network emulator to have
a some but limited queueing (like a real router).

-- 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP funny-ness when over-driving a 1Gbps link.
  2011-05-19 23:18 ` Stephen Hemminger
@ 2011-05-19 23:20   ` Ben Greear
  2011-05-19 23:42     ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2011-05-19 23:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

On 05/19/2011 04:18 PM, Stephen Hemminger wrote:
> On Thu, 19 May 2011 15:47:14 -0700
> Ben Greear<greearb@candelatech.com>  wrote:
>
>> I noticed something that struck me as a bit weird today,
>> but perhaps it's normal.
>>
>> I was using our application to create 3 TCP streams from one port to
>> another (1Gbps, igb driver), running through a network emulator.
>> Traffic is flowing bi-directional in each connection.
>>
>> I am doing 24k byte writes per system call.  I tried 100ms, 10ms, and 1ms
>> latency (one-way) in the emulator, but behaviour is similar in each case.
>> The rest of this info was gathered with 1ms delay in the emulator.
>>
>> If I ask all 3 connections to run 1Gbps, netstat shows 30+GB in the
>> sending queues and 1+ second latency (user-space to user-space).  Aggregate
>> throughput is around 700Mbps in each direction.
>>
>> But, if I ask each of the connections to run at 300Mbps, latency averages
>> 2ms and each connection runs right at 300Mbps (950Mbps or so on the wire).
>>
>> It seems that when you over-drive the link, things back up and perform
>> quite badly over-all.
>>
>> This is a core-i7 3.2Ghz with 12GB RAM, Fedora 14, 2.6.38.6 kernel
>> (with some hacks), 64-bit OS and user-space app.  Quick testing on 2.6.36.3
>> showed similar results, so I don't think it's a regression.
>>
>> I am curious if others see similar results?
>>
>> Thanks,
>> Ben
>>
>
> If you overdrive, TCP expects your network emulator to have
> a some but limited queueing (like a real router).

The emulator is fine, it's not being over-driven (and has limited queueing if it was
being over-driven).  The queues that are backing up are in the tcp sockets on the
sending machine.

But, just to make sure, I'll re-run the test with a looped back cable...

Ben

>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP funny-ness when over-driving a 1Gbps link.
  2011-05-19 23:20   ` Ben Greear
@ 2011-05-19 23:42     ` Ben Greear
  2011-05-20  0:05       ` Rick Jones
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2011-05-19 23:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

On 05/19/2011 04:20 PM, Ben Greear wrote:
> On 05/19/2011 04:18 PM, Stephen Hemminger wrote:

>> If you overdrive, TCP expects your network emulator to have
>> a some but limited queueing (like a real router).
>
> The emulator is fine, it's not being over-driven (and has limited
> queueing if it was
> being over-driven). The queues that are backing up are in the tcp
> sockets on the
> sending machine.
>
> But, just to make sure, I'll re-run the test with a looped back cable...

Well, with looped back cable, it isn't so bad.  I still see a small drop
in aggregate throughput (around 900Mbps instead of 950Mbps), and
latency goes above 600ms, but it still performs better than when
going through the emulator.

At 950+Mbps, the emulator is going to impart 1-2 ms of latency
even when configured for wide-open.

If I use a bridge in place of the emulator, it seems to settle on
around 450Mbps in one direction and 945Mbps in the other (on the wire),
with round-trip latencies often over 5 seconds (user-space to user-space),
and a consistent large chunk of data in the socket send buffers:

[root@i7-965-1 igb]# netstat -an|grep tcp|grep 8.1.1
tcp        0      0 8.1.1.1:33038               0.0.0.0:*                   LISTEN
tcp        0      0 8.1.1.1:33040               0.0.0.0:*                   LISTEN
tcp        0      0 8.1.1.1:33042               0.0.0.0:*                   LISTEN
tcp        0 9328612 8.1.1.2:33039               8.1.1.1:33040               ESTABLISHED
tcp        0 17083176 8.1.1.1:33038               8.1.1.2:33037               ESTABLISHED
tcp        0 9437340 8.1.1.2:33037               8.1.1.1:33038               ESTABLISHED
tcp        0 17024620 8.1.1.1:33040               8.1.1.2:33039               ESTABLISHED
tcp        0 19557040 8.1.1.1:33042               8.1.1.2:33041               ESTABLISHED
tcp        0 9416600 8.1.1.2:33041               8.1.1.1:33042               ESTABLISHED

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP funny-ness when over-driving a 1Gbps link.
  2011-05-19 23:42     ` Ben Greear
@ 2011-05-20  0:05       ` Rick Jones
  2011-05-20  0:12         ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2011-05-20  0:05 UTC (permalink / raw)
  To: Ben Greear; +Cc: Stephen Hemminger, netdev

On Thu, 2011-05-19 at 16:42 -0700, Ben Greear wrote:
> On 05/19/2011 04:20 PM, Ben Greear wrote:
> > On 05/19/2011 04:18 PM, Stephen Hemminger wrote:
> 
> >> If you overdrive, TCP expects your network emulator to have
> >> a some but limited queueing (like a real router).
> >
> > The emulator is fine, it's not being over-driven (and has limited
> > queueing if it was
> > being over-driven). The queues that are backing up are in the tcp
> > sockets on the
> > sending machine.
> >
> > But, just to make sure, I'll re-run the test with a looped back cable...
> 
> Well, with looped back cable, it isn't so bad.  I still see a small drop
> in aggregate throughput (around 900Mbps instead of 950Mbps), and
> latency goes above 600ms, but it still performs better than when
> going through the emulator.
> 
> At 950+Mbps, the emulator is going to impart 1-2 ms of latency
> even when configured for wide-open.
> 
> If I use a bridge in place of the emulator, it seems to settle on
> around 450Mbps in one direction and 945Mbps in the other (on the wire),
> with round-trip latencies often over 5 seconds (user-space to user-space),
> and a consistent large chunk of data in the socket send buffers:
> 
> [root@i7-965-1 igb]# netstat -an|grep tcp|grep 8.1.1
> tcp        0      0 8.1.1.1:33038               0.0.0.0:*                   LISTEN
> tcp        0      0 8.1.1.1:33040               0.0.0.0:*                   LISTEN
> tcp        0      0 8.1.1.1:33042               0.0.0.0:*                   LISTEN
> tcp        0 9328612 8.1.1.2:33039               8.1.1.1:33040               ESTABLISHED
> tcp        0 17083176 8.1.1.1:33038               8.1.1.2:33037               ESTABLISHED
> tcp        0 9437340 8.1.1.2:33037               8.1.1.1:33038               ESTABLISHED
> tcp        0 17024620 8.1.1.1:33040               8.1.1.2:33039               ESTABLISHED
> tcp        0 19557040 8.1.1.1:33042               8.1.1.2:33041               ESTABLISHED
> tcp        0 9416600 8.1.1.2:33041               8.1.1.1:33042               ESTABLISHED

I take it your system has higher values for the tcp_wmem value:

net.ipv4.tcp_wmem = 4096 16384 4194304

and whatever is creating the TCP connections is not making explicit
setsockopt() calls to set SO_*BUF.

rick jones


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP funny-ness when over-driving a 1Gbps link.
  2011-05-20  0:05       ` Rick Jones
@ 2011-05-20  0:12         ` Ben Greear
  2011-05-20  0:24           ` Rick Jones
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2011-05-20  0:12 UTC (permalink / raw)
  To: rick.jones2; +Cc: Stephen Hemminger, netdev

On 05/19/2011 05:05 PM, Rick Jones wrote:
> On Thu, 2011-05-19 at 16:42 -0700, Ben Greear wrote:
>> On 05/19/2011 04:20 PM, Ben Greear wrote:
>>> On 05/19/2011 04:18 PM, Stephen Hemminger wrote:
>>
>>>> If you overdrive, TCP expects your network emulator to have
>>>> a some but limited queueing (like a real router).
>>>
>>> The emulator is fine, it's not being over-driven (and has limited
>>> queueing if it was
>>> being over-driven). The queues that are backing up are in the tcp
>>> sockets on the
>>> sending machine.
>>>
>>> But, just to make sure, I'll re-run the test with a looped back cable...
>>
>> Well, with looped back cable, it isn't so bad.  I still see a small drop
>> in aggregate throughput (around 900Mbps instead of 950Mbps), and
>> latency goes above 600ms, but it still performs better than when
>> going through the emulator.
>>
>> At 950+Mbps, the emulator is going to impart 1-2 ms of latency
>> even when configured for wide-open.
>>
>> If I use a bridge in place of the emulator, it seems to settle on
>> around 450Mbps in one direction and 945Mbps in the other (on the wire),
>> with round-trip latencies often over 5 seconds (user-space to user-space),
>> and a consistent large chunk of data in the socket send buffers:
>>
>> [root@i7-965-1 igb]# netstat -an|grep tcp|grep 8.1.1
>> tcp        0      0 8.1.1.1:33038               0.0.0.0:*                   LISTEN
>> tcp        0      0 8.1.1.1:33040               0.0.0.0:*                   LISTEN
>> tcp        0      0 8.1.1.1:33042               0.0.0.0:*                   LISTEN
>> tcp        0 9328612 8.1.1.2:33039               8.1.1.1:33040               ESTABLISHED
>> tcp        0 17083176 8.1.1.1:33038               8.1.1.2:33037               ESTABLISHED
>> tcp        0 9437340 8.1.1.2:33037               8.1.1.1:33038               ESTABLISHED
>> tcp        0 17024620 8.1.1.1:33040               8.1.1.2:33039               ESTABLISHED
>> tcp        0 19557040 8.1.1.1:33042               8.1.1.2:33041               ESTABLISHED
>> tcp        0 9416600 8.1.1.2:33041               8.1.1.1:33042               ESTABLISHED
>
> I take it your system has higher values for the tcp_wmem value:
>
> net.ipv4.tcp_wmem = 4096 16384 4194304

Yes:
[root@i7-965-1 igb]# cat /proc/sys/net/ipv4/tcp_wmem
4096	16384	50000000

> and whatever is creating the TCP connections is not making explicit
> setsockopt() calls to set SO_*BUF.

It is configured not to, but if you know of an independent way to verify
that, I'm interested.

Thanks,
Ben

>
> rick jones


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP funny-ness when over-driving a 1Gbps link.
  2011-05-20  0:12         ` Ben Greear
@ 2011-05-20  0:24           ` Rick Jones
  2011-05-20  0:37             ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2011-05-20  0:24 UTC (permalink / raw)
  To: Ben Greear; +Cc: Stephen Hemminger, netdev

> >> [root@i7-965-1 igb]# netstat -an|grep tcp|grep 8.1.1
> >> tcp        0      0 8.1.1.1:33038               0.0.0.0:*                   LISTEN
> >> tcp        0      0 8.1.1.1:33040               0.0.0.0:*                   LISTEN
> >> tcp        0      0 8.1.1.1:33042               0.0.0.0:*                   LISTEN
> >> tcp        0 9328612 8.1.1.2:33039               8.1.1.1:33040               ESTABLISHED
> >> tcp        0 17083176 8.1.1.1:33038               8.1.1.2:33037               ESTABLISHED
> >> tcp        0 9437340 8.1.1.2:33037               8.1.1.1:33038               ESTABLISHED
> >> tcp        0 17024620 8.1.1.1:33040               8.1.1.2:33039               ESTABLISHED
> >> tcp        0 19557040 8.1.1.1:33042               8.1.1.2:33041               ESTABLISHED
> >> tcp        0 9416600 8.1.1.2:33041               8.1.1.1:33042               ESTABLISHED
> >
> > I take it your system has higher values for the tcp_wmem value:
> >
> > net.ipv4.tcp_wmem = 4096 16384 4194304
> 
> Yes:
> [root@i7-965-1 igb]# cat /proc/sys/net/ipv4/tcp_wmem
> 4096	16384	50000000

Why?!?  Are you trying to get link-rate to Mars or something?  (I assume
tcp_rmem is similarly set...)  If you are indeed doing one 1 GbE, and no
more than 100ms then the default (?) of 4194304 should have been more
than sufficient.

> > and whatever is creating the TCP connections is not making explicit
> > setsockopt() calls to set SO_*BUF.
> 
> It is configured not to, but if you know of an independent way to verify
> that, I'm interested.

You could always strace the code.

rick


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP funny-ness when over-driving a 1Gbps link.
  2011-05-20  0:24           ` Rick Jones
@ 2011-05-20  0:37             ` Ben Greear
  2011-05-20  0:46               ` Rick Jones
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2011-05-20  0:37 UTC (permalink / raw)
  To: rick.jones2; +Cc: Stephen Hemminger, netdev

On 05/19/2011 05:24 PM, Rick Jones wrote:
>>>> [root@i7-965-1 igb]# netstat -an|grep tcp|grep 8.1.1
>>>> tcp        0      0 8.1.1.1:33038               0.0.0.0:*                   LISTEN
>>>> tcp        0      0 8.1.1.1:33040               0.0.0.0:*                   LISTEN
>>>> tcp        0      0 8.1.1.1:33042               0.0.0.0:*                   LISTEN
>>>> tcp        0 9328612 8.1.1.2:33039               8.1.1.1:33040               ESTABLISHED
>>>> tcp        0 17083176 8.1.1.1:33038               8.1.1.2:33037               ESTABLISHED
>>>> tcp        0 9437340 8.1.1.2:33037               8.1.1.1:33038               ESTABLISHED
>>>> tcp        0 17024620 8.1.1.1:33040               8.1.1.2:33039               ESTABLISHED
>>>> tcp        0 19557040 8.1.1.1:33042               8.1.1.2:33041               ESTABLISHED
>>>> tcp        0 9416600 8.1.1.2:33041               8.1.1.1:33042               ESTABLISHED
>>>
>>> I take it your system has higher values for the tcp_wmem value:
>>>
>>> net.ipv4.tcp_wmem = 4096 16384 4194304
>>
>> Yes:
>> [root@i7-965-1 igb]# cat /proc/sys/net/ipv4/tcp_wmem
>> 4096	16384	50000000
>
> Why?!?  Are you trying to get link-rate to Mars or something?  (I assume
> tcp_rmem is similarly set...)  If you are indeed doing one 1 GbE, and no
> more than 100ms then the default (?) of 4194304 should have been more
> than sufficient.

Well, we occasionally do tests over emulated links that have several
seconds of delay and may be running multiple Gbps.  Either way,
I'd hope that offering extra RAM to a subsystem wouldn't cause it
to go nuts.  Assuming this isn't some magical 1Gbps issue, you
could probably hit the same problem with a wifi link and
default tcp_wmem settings...

>>> and whatever is creating the TCP connections is not making explicit
>>> setsockopt() calls to set SO_*BUF.
>>
>> It is configured not to, but if you know of an independent way to verify
>> that, I'm interested.
>
> You could always strace the code.

Yeah...might be easier in this case to just comment out all those calls
and do a quick test.  Will be tomorrow before I can get to
that, however..

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP funny-ness when over-driving a 1Gbps link.
  2011-05-20  0:37             ` Ben Greear
@ 2011-05-20  0:46               ` Rick Jones
  2011-05-20  3:39                 ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2011-05-20  0:46 UTC (permalink / raw)
  To: Ben Greear; +Cc: Stephen Hemminger, netdev

On Thu, 2011-05-19 at 17:37 -0700, Ben Greear wrote:
> On 05/19/2011 05:24 PM, Rick Jones wrote:
> >>>> [root@i7-965-1 igb]# netstat -an|grep tcp|grep 8.1.1
> >>>> tcp        0      0 8.1.1.1:33038               0.0.0.0:*                   LISTEN
> >>>> tcp        0      0 8.1.1.1:33040               0.0.0.0:*                   LISTEN
> >>>> tcp        0      0 8.1.1.1:33042               0.0.0.0:*                   LISTEN
> >>>> tcp        0 9328612 8.1.1.2:33039               8.1.1.1:33040               ESTABLISHED
> >>>> tcp        0 17083176 8.1.1.1:33038               8.1.1.2:33037               ESTABLISHED
> >>>> tcp        0 9437340 8.1.1.2:33037               8.1.1.1:33038               ESTABLISHED
> >>>> tcp        0 17024620 8.1.1.1:33040               8.1.1.2:33039               ESTABLISHED
> >>>> tcp        0 19557040 8.1.1.1:33042               8.1.1.2:33041               ESTABLISHED
> >>>> tcp        0 9416600 8.1.1.2:33041               8.1.1.1:33042               ESTABLISHED
> >>>
> >>> I take it your system has higher values for the tcp_wmem value:
> >>>
> >>> net.ipv4.tcp_wmem = 4096 16384 4194304
> >>
> >> Yes:
> >> [root@i7-965-1 igb]# cat /proc/sys/net/ipv4/tcp_wmem
> >> 4096	16384	50000000
> >
> > Why?!?  Are you trying to get link-rate to Mars or something?  (I assume
> > tcp_rmem is similarly set...)  If you are indeed doing one 1 GbE, and no
> > more than 100ms then the default (?) of 4194304 should have been more
> > than sufficient.
> 
> Well, we occasionally do tests over emulated links that have several
> seconds of delay and may be running multiple Gbps.  Either way,
> I'd hope that offering extra RAM to a subsystem wouldn't cause it
> to go nuts.  

It has been my experience that the autotuning tends to grow things
beyond the bandwidthXdelay product.

As for several seconds of delay and multiple Gbps - unless you are
shooting the Moon, sounds like bufferbloat?-)

> Assuming this isn't some magical 1Gbps issue, you
> could probably hit the same problem with a wifi link and
> default tcp_wmem settings...

Do you also increase tx queue's for the NIC(s)?

rick


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP funny-ness when over-driving a 1Gbps link.
  2011-05-20  0:46               ` Rick Jones
@ 2011-05-20  3:39                 ` Ben Greear
  2011-05-20 21:33                   ` TCP funny-ness when over-driving a 1Gbps link (and wifi) Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2011-05-20  3:39 UTC (permalink / raw)
  To: rick.jones2; +Cc: Stephen Hemminger, netdev

On 05/19/2011 05:46 PM, Rick Jones wrote:
> On Thu, 2011-05-19 at 17:37 -0700, Ben Greear wrote:
>> On 05/19/2011 05:24 PM, Rick Jones wrote:
>>>>>> [root@i7-965-1 igb]# netstat -an|grep tcp|grep 8.1.1
>>>>>> tcp        0      0 8.1.1.1:33038               0.0.0.0:*                   LISTEN
>>>>>> tcp        0      0 8.1.1.1:33040               0.0.0.0:*                   LISTEN
>>>>>> tcp        0      0 8.1.1.1:33042               0.0.0.0:*                   LISTEN
>>>>>> tcp        0 9328612 8.1.1.2:33039               8.1.1.1:33040               ESTABLISHED
>>>>>> tcp        0 17083176 8.1.1.1:33038               8.1.1.2:33037               ESTABLISHED
>>>>>> tcp        0 9437340 8.1.1.2:33037               8.1.1.1:33038               ESTABLISHED
>>>>>> tcp        0 17024620 8.1.1.1:33040               8.1.1.2:33039               ESTABLISHED
>>>>>> tcp        0 19557040 8.1.1.1:33042               8.1.1.2:33041               ESTABLISHED
>>>>>> tcp        0 9416600 8.1.1.2:33041               8.1.1.1:33042               ESTABLISHED
>>>>>
>>>>> I take it your system has higher values for the tcp_wmem value:
>>>>>
>>>>> net.ipv4.tcp_wmem = 4096 16384 4194304
>>>>
>>>> Yes:
>>>> [root@i7-965-1 igb]# cat /proc/sys/net/ipv4/tcp_wmem
>>>> 4096	16384	50000000
>>>
>>> Why?!?  Are you trying to get link-rate to Mars or something?  (I assume
>>> tcp_rmem is similarly set...)  If you are indeed doing one 1 GbE, and no
>>> more than 100ms then the default (?) of 4194304 should have been more
>>> than sufficient.
>>
>> Well, we occasionally do tests over emulated links that have several
>> seconds of delay and may be running multiple Gbps.  Either way,
>> I'd hope that offering extra RAM to a subsystem wouldn't cause it
>> to go nuts.
>
> It has been my experience that the autotuning tends to grow things
> beyond the bandwidthXdelay product.

Seems a likely culprit, or somehow it's not detecting round-trip-time
correctly, or maybe the timestamp is calculated when the pkt goes into
the send queue, and not when it's actually sent to the NIC?

>
> As for several seconds of delay and multiple Gbps - unless you are
> shooting the Moon, sounds like bufferbloat?-)

We try to test our stuff in all sorts of strange cases.  Maybe
some users really are emulating lunar traffic, or even beyond.
We also can emulate buffer bloat..but in this particular case,
real round-trip time is about 1-2ms, so if the socket is queuing up
a second's worth of bytes on the xmit buffer, then it's not
the network's fault...it's the sender.

>> Assuming this isn't some magical 1Gbps issue, you
>> could probably hit the same problem with a wifi link and
>> default tcp_wmem settings...
>
> Do you also increase tx queue's for the NIC(s)?

No, they are at the default (1000, I think).  That's only
a few ms at 1Gbps speed, so the problem is mostly higher
in the stack.

Thanks,
Ben

>
> rick


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP funny-ness when over-driving a 1Gbps link (and wifi)
  2011-05-20  3:39                 ` Ben Greear
@ 2011-05-20 21:33                   ` Ben Greear
  2011-05-26 15:28                     ` Chris Friesen
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2011-05-20 21:33 UTC (permalink / raw)
  To: rick.jones2; +Cc: Stephen Hemminger, netdev

On 05/19/2011 08:39 PM, Ben Greear wrote:
> On 05/19/2011 05:46 PM, Rick Jones wrote:
>> On Thu, 2011-05-19 at 17:37 -0700, Ben Greear wrote:
>>> On 05/19/2011 05:24 PM, Rick Jones wrote:
>>>>>>> [root@i7-965-1 igb]# netstat -an|grep tcp|grep 8.1.1
>>>>>>> tcp 0 0 8.1.1.1:33038 0.0.0.0:* LISTEN
>>>>>>> tcp 0 0 8.1.1.1:33040 0.0.0.0:* LISTEN
>>>>>>> tcp 0 0 8.1.1.1:33042 0.0.0.0:* LISTEN
>>>>>>> tcp 0 9328612 8.1.1.2:33039 8.1.1.1:33040 ESTABLISHED
>>>>>>> tcp 0 17083176 8.1.1.1:33038 8.1.1.2:33037 ESTABLISHED
>>>>>>> tcp 0 9437340 8.1.1.2:33037 8.1.1.1:33038 ESTABLISHED
>>>>>>> tcp 0 17024620 8.1.1.1:33040 8.1.1.2:33039 ESTABLISHED
>>>>>>> tcp 0 19557040 8.1.1.1:33042 8.1.1.2:33041 ESTABLISHED
>>>>>>> tcp 0 9416600 8.1.1.2:33041 8.1.1.1:33042 ESTABLISHED
>>>>>>
>>>>>> I take it your system has higher values for the tcp_wmem value:

I tried a different test today:  3 TCP connections between two
wifi station interfaces (using ath9k).  Each connection is
endpoint configured to send 100Mbps of traffic to the peer.

With a single connection, it does OK (maybe 250ms round-trip time max).
With 3 of them running, round-trip user-space to user-space latency
often goes above 3 seconds.

I had set tcp_wmem smaller for this test, and I verified that
the socket SND/RCV buffer setsockopt was not being called.

[root@lec2010-ath9k-1 lanforge]# netstat -an|grep tcp|grep 33
tcp        0      0 12.12.12.4:33040            0.0.0.0:*                   LISTEN
tcp        0      0 12.12.12.4:33042            0.0.0.0:*                   LISTEN
tcp        0      0 12.12.12.4:33044            0.0.0.0:*                   LISTEN
tcp        0 556072 12.12.12.4:33040            12.12.12.3:33039            ESTABLISHED
tcp        0 274916 12.12.12.3:33043            12.12.12.4:33044            ESTABLISHED
tcp        0      0 192.168.100.138:33738       192.168.100.3:2049          ESTABLISHED
tcp        0 205156 12.12.12.4:33042            12.12.12.3:33041            ESTABLISHED
tcp        0 217184 12.12.12.3:33041            12.12.12.4:33042            ESTABLISHED
tcp        0 436552 12.12.12.3:33039            12.12.12.4:33040            ESTABLISHED
tcp        0 288820 12.12.12.4:33044            12.12.12.3:33043            ESTABLISHED

[root@lec2010-ath9k-1 lanforge]# cat /proc/sys/net/ipv4/tcp_wmem
4096	16384	4000000

This is 2.6.39-wl+ kernel.

So, seems a general issue with over-driving links with multiple TCP connections.
Doesn't seem like a regression, and probably not really a bug, but maybe the
buffer-bloat project will help this sort of thing...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP funny-ness when over-driving a 1Gbps link (and wifi)
  2011-05-20 21:33                   ` TCP funny-ness when over-driving a 1Gbps link (and wifi) Ben Greear
@ 2011-05-26 15:28                     ` Chris Friesen
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Friesen @ 2011-05-26 15:28 UTC (permalink / raw)
  To: Ben Greear; +Cc: rick.jones2, Stephen Hemminger, netdev

On 05/20/2011 03:33 PM, Ben Greear wrote:

> I tried a different test today: 3 TCP connections between two
> wifi station interfaces (using ath9k). Each connection is
> endpoint configured to send 100Mbps of traffic to the peer.
>
> With a single connection, it does OK (maybe 250ms round-trip time max).
> With 3 of them running, round-trip user-space to user-space latency
> often goes above 3 seconds.

<snip>

> So, seems a general issue with over-driving links with multiple TCP
> connections. Doesn't seem like a regression, and probably not really
> a bug, but maybe the buffer-bloat project will help this sort of
> thing...

Given that one rule of thumb for the send buffer size is twice the 
bandwidth delay product, it seems clear that on a wifi connection 3 
seconds worth of buffering is excessive.  I think I'd classify that as a 
bug.

Chris


-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-05-26 15:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-19 22:47 TCP funny-ness when over-driving a 1Gbps link Ben Greear
2011-05-19 23:18 ` Stephen Hemminger
2011-05-19 23:20   ` Ben Greear
2011-05-19 23:42     ` Ben Greear
2011-05-20  0:05       ` Rick Jones
2011-05-20  0:12         ` Ben Greear
2011-05-20  0:24           ` Rick Jones
2011-05-20  0:37             ` Ben Greear
2011-05-20  0:46               ` Rick Jones
2011-05-20  3:39                 ` Ben Greear
2011-05-20 21:33                   ` TCP funny-ness when over-driving a 1Gbps link (and wifi) Ben Greear
2011-05-26 15:28                     ` Chris Friesen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.