* Bonding, GRO and tcp_reordering
@ 2010-11-30 13:55 Simon Horman
2010-11-30 15:42 ` Ben Hutchings
2010-11-30 17:56 ` Rick Jones
0 siblings, 2 replies; 12+ messages in thread
From: Simon Horman @ 2010-11-30 13:55 UTC (permalink / raw)
To: netdev
Hi,
I just wanted to share what is a rather pleasing,
though to me somewhat surprising result.
I am testing bonding using balance-rr mode with three physical links to try
to get > gigabit speed for a single stream. Why? Because I'd like to run
various tests at > gigabit speed and I don't have any 10G hardware at my
disposal.
The result I have is that with a 1500 byte MTU, tcp_reordering=3 and both
LSO and GSO disabled on both the sender and receiver I see:
# netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216
(172.17.60.216) port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 1472 10.01 1646.13 40.01 -1.00 3.982 -1.000
But with GRO enabled on the receiver I see.
# netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216
(172.17.60.216) port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 1472 10.01 2613.83 19.32 -1.00 1.211 -1.000
Which is much better than any result I get tweaking tcp_reordering when
GRO is disabled on the receiver.
Tweaking tcp_reordering when GRO is enabled on the receiver seems to have
negligible effect. Which is interesting, because my brief reading on the
subject indicated that tcp_reordering was the key tuning parameter for
bonding with balance-rr.
The only other parameter that seemed to have significant effect was to
increase the mtu. In the case of MTU=9000, GRO seemed to have a negative
impact on throughput, though a significant positive effect on CPU
utilisation.
MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=off
netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 9872 10.01 2957.52 14.89 -1.00 0.825 -1.000
MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=on
netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 9872 10.01 2847.64 10.84 -1.00 0.624 -1.000
Test run using 2.6.37-rc1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering
2010-11-30 13:55 Bonding, GRO and tcp_reordering Simon Horman
@ 2010-11-30 15:42 ` Ben Hutchings
2010-11-30 16:04 ` Eric Dumazet
2010-12-01 4:31 ` Simon Horman
2010-11-30 17:56 ` Rick Jones
1 sibling, 2 replies; 12+ messages in thread
From: Ben Hutchings @ 2010-11-30 15:42 UTC (permalink / raw)
To: Simon Horman; +Cc: netdev
On Tue, 2010-11-30 at 22:55 +0900, Simon Horman wrote:
> Hi,
>
> I just wanted to share what is a rather pleasing,
> though to me somewhat surprising result.
>
> I am testing bonding using balance-rr mode with three physical links to try
> to get > gigabit speed for a single stream. Why? Because I'd like to run
> various tests at > gigabit speed and I don't have any 10G hardware at my
> disposal.
>
> The result I have is that with a 1500 byte MTU, tcp_reordering=3 and both
> LSO and GSO disabled on both the sender and receiver I see:
>
> # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216
> (172.17.60.216) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
>
> 87380 16384 1472 10.01 1646.13 40.01 -1.00 3.982 -1.000
>
> But with GRO enabled on the receiver I see.
>
> # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216
> (172.17.60.216) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
>
> 87380 16384 1472 10.01 2613.83 19.32 -1.00 1.211 -1.000
>
> Which is much better than any result I get tweaking tcp_reordering when
> GRO is disabled on the receiver.
Did you also enable TSO/GSO on the sender?
What TSO/GSO will do is to change the round-robin scheduling from one
packet per interface to one super-packet per interface. GRO then
coalesces the physical packets back into a super-packet. The intervals
between receiving super-packets then tend to exceed the difference in
delay between interfaces, hiding the reordering.
If you only enabled GRO then I don't understand this.
> Tweaking tcp_reordering when GRO is enabled on the receiver seems to have
> negligible effect. Which is interesting, because my brief reading on the
> subject indicated that tcp_reordering was the key tuning parameter for
> bonding with balance-rr.
>
> The only other parameter that seemed to have significant effect was to
> increase the mtu. In the case of MTU=9000, GRO seemed to have a negative
> impact on throughput, though a significant positive effect on CPU
> utilisation.
[...]
Increasing MTU also increases the interval between packets on a TCP flow
using maximum segment size so that it is more likely to exceed the
difference in delay.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering
2010-11-30 15:42 ` Ben Hutchings
@ 2010-11-30 16:04 ` Eric Dumazet
2010-12-01 4:34 ` Simon Horman
2010-12-01 4:31 ` Simon Horman
1 sibling, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2010-11-30 16:04 UTC (permalink / raw)
To: Ben Hutchings; +Cc: Simon Horman, netdev
Le mardi 30 novembre 2010 à 15:42 +0000, Ben Hutchings a écrit :
> On Tue, 2010-11-30 at 22:55 +0900, Simon Horman wrote:
> > The only other parameter that seemed to have significant effect was to
> > increase the mtu. In the case of MTU=9000, GRO seemed to have a negative
> > impact on throughput, though a significant positive effect on CPU
> > utilisation.
> [...]
>
> Increasing MTU also increases the interval between packets on a TCP flow
> using maximum segment size so that it is more likely to exceed the
> difference in delay.
>
GRO really is operational _if_ we receive in same NAPI run several
packets for the same flow.
As soon as we exit NAPI mode, GRO packets are flushed.
Big MTU --> bigger delays between packets, so big chance that GRO cannot
trigger at all, since NAPI runs for one packet only.
One possibility with big MTU is to tweak "ethtool -c eth0" params
rx-usecs: 20
rx-frames: 5
rx-usecs-irq: 0
rx-frames-irq: 5
so that "rx-usecs" is bigger than the delay between two MTU full sized
packets.
Gigabit speed means 1 nano second per bit, and MTU=9000 means 72 us
delay between packets.
So try :
ethtool -C eth0 rx-usecs 100
to get chance that several packets are delivered at once by NIC.
Unfortunately, this also add some latency, so it helps bulk transferts,
and slowdown interactive traffic
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering
2010-11-30 13:55 Bonding, GRO and tcp_reordering Simon Horman
2010-11-30 15:42 ` Ben Hutchings
@ 2010-11-30 17:56 ` Rick Jones
2010-11-30 18:14 ` Eric Dumazet
2010-12-01 4:30 ` Simon Horman
1 sibling, 2 replies; 12+ messages in thread
From: Rick Jones @ 2010-11-30 17:56 UTC (permalink / raw)
To: Simon Horman; +Cc: netdev
Simon Horman wrote:
> Hi,
>
> I just wanted to share what is a rather pleasing,
> though to me somewhat surprising result.
>
> I am testing bonding using balance-rr mode with three physical links to try
> to get > gigabit speed for a single stream. Why? Because I'd like to run
> various tests at > gigabit speed and I don't have any 10G hardware at my
> disposal.
>
> The result I have is that with a 1500 byte MTU, tcp_reordering=3 and both
> LSO and GSO disabled on both the sender and receiver I see:
>
> # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472
Why 1472 bytes per send? If you wanted a 1-1 between the send size and the MSS,
I would guess that 1448 would have been in order. 1472 would be the maximum
data payload for a UDP/IPv4 datagram. TCP will have more header than UDP.
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216
> (172.17.60.216) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
>
> 87380 16384 1472 10.01 1646.13 40.01 -1.00 3.982 -1.000
>
> But with GRO enabled on the receiver I see.
>
> # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216
> (172.17.60.216) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
>
> 87380 16384 1472 10.01 2613.83 19.32 -1.00 1.211 -1.000
If you are changing things on the receiver, you should probably enable remote
CPU utilization measurement with the -C option.
> Which is much better than any result I get tweaking tcp_reordering when
> GRO is disabled on the receiver.
>
> Tweaking tcp_reordering when GRO is enabled on the receiver seems to have
> negligible effect. Which is interesting, because my brief reading on the
> subject indicated that tcp_reordering was the key tuning parameter for
> bonding with balance-rr.
You are in a maze of twisty heuristics and algorithms, all interacting :) If
there are only three links in the bond, I suspect the chances for spurrious fast
retransmission are somewhat smaller than if you had say four, based on just
hand-waving on three duplicate ACKs requires receipt of perhaps four out of
order segments.
> The only other parameter that seemed to have significant effect was to
> increase the mtu. In the case of MTU=9000, GRO seemed to have a negative
> impact on throughput, though a significant positive effect on CPU
> utilisation.
>
> MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=off
> netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872
9872?
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
>
> 87380 16384 9872 10.01 2957.52 14.89 -1.00 0.825 -1.000
>
> MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=on
> netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
>
> 87380 16384 9872 10.01 2847.64 10.84 -1.00 0.624 -1.000
Short of packet traces, taking snapshots of netstat statistics before and after
each netperf run might be goodness - you can look at things like ratio of ACKs
to data segments/bytes and such. LRO/GRO can have a non-trivial effect on the
number of ACKs, and ACKs are what matter for fast retransmit.
netstat -s > before
netperf ...
netstat -s > after
beforeafter before after > delta
where beforeafter comes (for now, the site will have to go away before long as
the campus on which it is located has been sold)
ftp://ftp.cup.hp.com/dist/networking/tools/ and will subtract before from after.
happy benchmarking,
rick jones
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering
2010-11-30 17:56 ` Rick Jones
@ 2010-11-30 18:14 ` Eric Dumazet
2010-12-01 4:30 ` Simon Horman
1 sibling, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2010-11-30 18:14 UTC (permalink / raw)
To: Rick Jones; +Cc: Simon Horman, netdev
Le mardi 30 novembre 2010 à 09:56 -0800, Rick Jones a écrit :
> Short of packet traces, taking snapshots of netstat statistics before and after
> each netperf run might be goodness - you can look at things like ratio of ACKs
> to data segments/bytes and such. LRO/GRO can have a non-trivial effect on the
> number of ACKs, and ACKs are what matter for fast retransmit.
>
> netstat -s > before
> netperf ...
> netstat -s > after
> beforeafter before after > delta
>
> where beforeafter comes (for now, the site will have to go away before long as
> the campus on which it is located has been sold)
> ftp://ftp.cup.hp.com/dist/networking/tools/ and will subtract before from after.
>
> happy benchmarking,
Yes indeed. With fast enough medium (or small MTUS), we can enter in a
backlog processing problem {filling huge receive queues}, as seen on
loopback lately...
netstat -s can show some receive queue overrun in this case.
TCPBacklogDrop: xxx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering
2010-11-30 17:56 ` Rick Jones
2010-11-30 18:14 ` Eric Dumazet
@ 2010-12-01 4:30 ` Simon Horman
2010-12-01 19:42 ` Rick Jones
1 sibling, 1 reply; 12+ messages in thread
From: Simon Horman @ 2010-12-01 4:30 UTC (permalink / raw)
To: Rick Jones; +Cc: netdev
On Tue, Nov 30, 2010 at 09:56:02AM -0800, Rick Jones wrote:
> Simon Horman wrote:
> >Hi,
> >
> >I just wanted to share what is a rather pleasing,
> >though to me somewhat surprising result.
> >
> >I am testing bonding using balance-rr mode with three physical links to try
> >to get > gigabit speed for a single stream. Why? Because I'd like to run
> >various tests at > gigabit speed and I don't have any 10G hardware at my
> >disposal.
> >
> >The result I have is that with a 1500 byte MTU, tcp_reordering=3 and both
> >LSO and GSO disabled on both the sender and receiver I see:
> >
> ># netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472
>
> Why 1472 bytes per send? If you wanted a 1-1 between the send size
> and the MSS, I would guess that 1448 would have been in order. 1472
> would be the maximum data payload for a UDP/IPv4 datagram. TCP will
> have more header than UDP.
Only to be consistent with UDP testing that I was doing at the same time.
I'll re-test with 1448.
>
> >TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216
> >(172.17.60.216) port 0 AF_INET
> >Recv Send Send Utilization Service Demand
> >Socket Socket Message Elapsed Send Recv Send Recv
> >Size Size Size Time Throughput local remote local remote
> >bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
> >
> > 87380 16384 1472 10.01 1646.13 40.01 -1.00 3.982 -1.000
> >
> >But with GRO enabled on the receiver I see.
> >
> ># netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472
> >TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216
> >(172.17.60.216) port 0 AF_INET
> >Recv Send Send Utilization Service Demand
> >Socket Socket Message Elapsed Send Recv Send Recv
> >Size Size Size Time Throughput local remote local remote
> >bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
> >
> > 87380 16384 1472 10.01 2613.83 19.32 -1.00 1.211 -1.000
>
> If you are changing things on the receiver, you should probably
> enable remote CPU utilization measurement with the -C option.
Thanks, I will do so.
> >Which is much better than any result I get tweaking tcp_reordering when
> >GRO is disabled on the receiver.
> >
> >Tweaking tcp_reordering when GRO is enabled on the receiver seems to have
> >negligible effect. Which is interesting, because my brief reading on the
> >subject indicated that tcp_reordering was the key tuning parameter for
> >bonding with balance-rr.
>
> You are in a maze of twisty heuristics and algorithms, all
> interacting :) If there are only three links in the bond, I suspect
> the chances for spurrious fast retransmission are somewhat smaller
> than if you had say four, based on just hand-waving on three
> duplicate ACKs requires receipt of perhaps four out of order
> segments.
Unfortunately NIC/slot availability only stretches to three links :-(
If you think its really worthwhile I can obtain some more dual-port nics.
> >The only other parameter that seemed to have significant effect was to
> >increase the mtu. In the case of MTU=9000, GRO seemed to have a negative
> >impact on throughput, though a significant positive effect on CPU
> >utilisation.
> >
> >MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=off
> >netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872
>
> 9872?
It should have been 8972, I'll retest with 8948 as per your suggestion above.
> >Recv Send Send Utilization Service Demand
> >Socket Socket Message Elapsed Send Recv Send Recv
> >Size Size Size Time Throughput local remote local remote
> >bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
> >
> > 87380 16384 9872 10.01 2957.52 14.89 -1.00 0.825 -1.000
> >
> >MTU=9000, sender,receiver:tcp_reordering=3(default), receiver:GRO=on
> >netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 9872
> >Recv Send Send Utilization Service Demand
> >Socket Socket Message Elapsed Send Recv Send Recv
> >Size Size Size Time Throughput local remote local remote
> >bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
> >
> > 87380 16384 9872 10.01 2847.64 10.84 -1.00 0.624 -1.000
>
> Short of packet traces, taking snapshots of netstat statistics
> before and after each netperf run might be goodness - you can look
> at things like ratio of ACKs to data segments/bytes and such.
> LRO/GRO can have a non-trivial effect on the number of ACKs, and
> ACKs are what matter for fast retransmit.
>
> netstat -s > before
> netperf ...
> netstat -s > after
> beforeafter before after > delta
>
> where beforeafter comes (for now, the site will have to go away
> before long as the campus on which it is located has been sold)
> ftp://ftp.cup.hp.com/dist/networking/tools/ and will subtract
> before from after.
Thanks, I'll take a look into that.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering
2010-11-30 15:42 ` Ben Hutchings
2010-11-30 16:04 ` Eric Dumazet
@ 2010-12-01 4:31 ` Simon Horman
1 sibling, 0 replies; 12+ messages in thread
From: Simon Horman @ 2010-12-01 4:31 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev
On Tue, Nov 30, 2010 at 03:42:56PM +0000, Ben Hutchings wrote:
> On Tue, 2010-11-30 at 22:55 +0900, Simon Horman wrote:
> > Hi,
> >
> > I just wanted to share what is a rather pleasing,
> > though to me somewhat surprising result.
> >
> > I am testing bonding using balance-rr mode with three physical links to try
> > to get > gigabit speed for a single stream. Why? Because I'd like to run
> > various tests at > gigabit speed and I don't have any 10G hardware at my
> > disposal.
> >
> > The result I have is that with a 1500 byte MTU, tcp_reordering=3 and both
> > LSO and GSO disabled on both the sender and receiver I see:
> >
> > # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216
> > (172.17.60.216) port 0 AF_INET
> > Recv Send Send Utilization Service Demand
> > Socket Socket Message Elapsed Send Recv Send Recv
> > Size Size Size Time Throughput local remote local remote
> > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
> >
> > 87380 16384 1472 10.01 1646.13 40.01 -1.00 3.982 -1.000
> >
> > But with GRO enabled on the receiver I see.
> >
> > # netperf -c -4 -t TCP_STREAM -H 172.17.60.216 -- -m 1472
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216
> > (172.17.60.216) port 0 AF_INET
> > Recv Send Send Utilization Service Demand
> > Socket Socket Message Elapsed Send Recv Send Recv
> > Size Size Size Time Throughput local remote local remote
> > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
> >
> > 87380 16384 1472 10.01 2613.83 19.32 -1.00 1.211 -1.000
> >
> > Which is much better than any result I get tweaking tcp_reordering when
> > GRO is disabled on the receiver.
>
> Did you also enable TSO/GSO on the sender?
It didn't seem to make any difference either way.
I'll re-test just in case I missed something.
>
> What TSO/GSO will do is to change the round-robin scheduling from one
> packet per interface to one super-packet per interface. GRO then
> coalesces the physical packets back into a super-packet. The intervals
> between receiving super-packets then tend to exceed the difference in
> delay between interfaces, hiding the reordering.
>
> If you only enabled GRO then I don't understand this.
>
> > Tweaking tcp_reordering when GRO is enabled on the receiver seems to have
> > negligible effect. Which is interesting, because my brief reading on the
> > subject indicated that tcp_reordering was the key tuning parameter for
> > bonding with balance-rr.
> >
> > The only other parameter that seemed to have significant effect was to
> > increase the mtu. In the case of MTU=9000, GRO seemed to have a negative
> > impact on throughput, though a significant positive effect on CPU
> > utilisation.
> [...]
>
> Increasing MTU also increases the interval between packets on a TCP flow
> using maximum segment size so that it is more likely to exceed the
> difference in delay.
I hadn't considered that, thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering
2010-11-30 16:04 ` Eric Dumazet
@ 2010-12-01 4:34 ` Simon Horman
2010-12-01 4:47 ` Eric Dumazet
2010-12-03 13:38 ` Simon Horman
0 siblings, 2 replies; 12+ messages in thread
From: Simon Horman @ 2010-12-01 4:34 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Ben Hutchings, netdev
On Tue, Nov 30, 2010 at 05:04:33PM +0100, Eric Dumazet wrote:
> Le mardi 30 novembre 2010 à 15:42 +0000, Ben Hutchings a écrit :
> > On Tue, 2010-11-30 at 22:55 +0900, Simon Horman wrote:
>
> > > The only other parameter that seemed to have significant effect was to
> > > increase the mtu. In the case of MTU=9000, GRO seemed to have a negative
> > > impact on throughput, though a significant positive effect on CPU
> > > utilisation.
> > [...]
> >
> > Increasing MTU also increases the interval between packets on a TCP flow
> > using maximum segment size so that it is more likely to exceed the
> > difference in delay.
> >
>
> GRO really is operational _if_ we receive in same NAPI run several
> packets for the same flow.
>
> As soon as we exit NAPI mode, GRO packets are flushed.
>
> Big MTU --> bigger delays between packets, so big chance that GRO cannot
> trigger at all, since NAPI runs for one packet only.
>
> One possibility with big MTU is to tweak "ethtool -c eth0" params
> rx-usecs: 20
> rx-frames: 5
> rx-usecs-irq: 0
> rx-frames-irq: 5
> so that "rx-usecs" is bigger than the delay between two MTU full sized
> packets.
>
> Gigabit speed means 1 nano second per bit, and MTU=9000 means 72 us
> delay between packets.
>
> So try :
>
> ethtool -C eth0 rx-usecs 100
>
> to get chance that several packets are delivered at once by NIC.
>
> Unfortunately, this also add some latency, so it helps bulk transferts,
> and slowdown interactive traffic
Thanks Eric,
I was tweaking those values recently for some latency tuning
but I didn't think of them in relation to last night's tests.
In terms of my measurements, its just benchmarking at this stage.
So a trade-off between throughput and latency is acceptable, so long
as I remember to measure what it is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering
2010-12-01 4:34 ` Simon Horman
@ 2010-12-01 4:47 ` Eric Dumazet
2010-12-02 6:39 ` Simon Horman
2010-12-03 13:38 ` Simon Horman
1 sibling, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2010-12-01 4:47 UTC (permalink / raw)
To: Simon Horman; +Cc: Ben Hutchings, netdev
Le mercredi 01 décembre 2010 à 13:34 +0900, Simon Horman a écrit :
> I was tweaking those values recently for some latency tuning
> but I didn't think of them in relation to last night's tests.
>
> In terms of my measurements, its just benchmarking at this stage.
> So a trade-off between throughput and latency is acceptable, so long
> as I remember to measure what it is.
>
I was thinking again this morning about GRO and bonding, and dont know
if it actually works...
Is GRO on on individual eth0/eth1/eth2 you use, or on bonding device
itself ?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering
2010-12-01 4:30 ` Simon Horman
@ 2010-12-01 19:42 ` Rick Jones
0 siblings, 0 replies; 12+ messages in thread
From: Rick Jones @ 2010-12-01 19:42 UTC (permalink / raw)
To: Simon Horman; +Cc: netdev
>>You are in a maze of twisty heuristics and algorithms, all
>>interacting :) If there are only three links in the bond, I suspect
>>the chances for spurrious fast retransmission are somewhat smaller
>>than if you had say four, based on just hand-waving on three
>>duplicate ACKs requires receipt of perhaps four out of order
>>segments.
>
>
> Unfortunately NIC/slot availability only stretches to three links :-(
> If you think its really worthwhile I can obtain some more dual-port nics.
Only if you want to increase the chances of reordering that triggers spurrious
fast retransmits.
rick jones
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering
2010-12-01 4:47 ` Eric Dumazet
@ 2010-12-02 6:39 ` Simon Horman
0 siblings, 0 replies; 12+ messages in thread
From: Simon Horman @ 2010-12-02 6:39 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Ben Hutchings, netdev
On Wed, Dec 01, 2010 at 05:47:06AM +0100, Eric Dumazet wrote:
> Le mercredi 01 décembre 2010 à 13:34 +0900, Simon Horman a écrit :
>
> > I was tweaking those values recently for some latency tuning
> > but I didn't think of them in relation to last night's tests.
> >
> > In terms of my measurements, its just benchmarking at this stage.
> > So a trade-off between throughput and latency is acceptable, so long
> > as I remember to measure what it is.
> >
>
> I was thinking again this morning about GRO and bonding, and dont know
> if it actually works...
>
> Is GRO on on individual eth0/eth1/eth2 you use, or on bonding device
> itself ?
All of the above. I can check different combinations if it helps.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bonding, GRO and tcp_reordering
2010-12-01 4:34 ` Simon Horman
2010-12-01 4:47 ` Eric Dumazet
@ 2010-12-03 13:38 ` Simon Horman
1 sibling, 0 replies; 12+ messages in thread
From: Simon Horman @ 2010-12-03 13:38 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Ben Hutchings, netdev
On Wed, Dec 01, 2010 at 01:34:45PM +0900, Simon Horman wrote:
> On Tue, Nov 30, 2010 at 05:04:33PM +0100, Eric Dumazet wrote:
> > Le mardi 30 novembre 2010 à 15:42 +0000, Ben Hutchings a écrit :
> > > On Tue, 2010-11-30 at 22:55 +0900, Simon Horman wrote:
To clarify my statement in a previous email that GSO had no effect: I
re-ran the tests and I still haven't observed any affect of GSO on my
results. However, I did notice that in order for GRO on the server to have
effect I also need TSO enabled on the client. I thought that I had
previously checked that but I was mistaken.
Enabling TSO on the client while leaving GSO disabled on the server
resulted in increased CPU utilisation on the client, from ~15% to ~20%.
> > > > The only other parameter that seemed to have significant effect was to
> > > > increase the mtu. In the case of MTU=9000, GRO seemed to have a negative
> > > > impact on throughput, though a significant positive effect on CPU
> > > > utilisation.
> > > [...]
> > >
> > > Increasing MTU also increases the interval between packets on a TCP flow
> > > using maximum segment size so that it is more likely to exceed the
> > > difference in delay.
> > >
> >
> > GRO really is operational _if_ we receive in same NAPI run several
> > packets for the same flow.
> >
> > As soon as we exit NAPI mode, GRO packets are flushed.
> >
> > Big MTU --> bigger delays between packets, so big chance that GRO cannot
> > trigger at all, since NAPI runs for one packet only.
> >
> > One possibility with big MTU is to tweak "ethtool -c eth0" params
> > rx-usecs: 20
> > rx-frames: 5
> > rx-usecs-irq: 0
> > rx-frames-irq: 5
> > so that "rx-usecs" is bigger than the delay between two MTU full sized
> > packets.
> >
> > Gigabit speed means 1 nano second per bit, and MTU=9000 means 72 us
> > delay between packets.
> >
> > So try :
> >
> > ethtool -C eth0 rx-usecs 100
> >
> > to get chance that several packets are delivered at once by NIC.
> >
> > Unfortunately, this also add some latency, so it helps bulk transferts,
> > and slowdown interactive traffic
>
> Thanks Eric,
>
> I was tweaking those values recently for some latency tuning
> but I didn't think of them in relation to last night's tests.
>
> In terms of my measurements, its just benchmarking at this stage.
> So a trade-off between throughput and latency is acceptable, so long
> as I remember to measure what it is.
Thanks, rx-usecs was set to 3 and changing it to 15 on the server
did seem increase throughput with 1500 byte packets. Although
CPU utilisation increased too, disproportionally so on the client.
MTU=1500, client,server:tcp_reordering=3, client:GSO=off,
client:TSO=on, server:GRO=off, server:rx-usecs=3(default)
# netperf -c -4 -t TCP_STREAM -H 172.17.60.216
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 16384 10.00 1591.34 16.35 5.80 1.683 2.390
MTU=1500, client,server:tcp_reordering=3(default), client:GSO=off,
client:TSO=on, server:GRO=off server:rx-usecs=15
# netperf -c -4 -t TCP_STREAM -H 172.17.60.216
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 16384 10.00 1774.38 23.75 7.58 2.193 2.801
I also saw an improvement with GRO enabled on the server and TSO enabled on
the client. Although in this case I found rx-usecs=45 to be the best
value.
MTU=1500, client,server:tcp_reordering=3(default), client:GSO=off,
client:TSO=on, server:GRO=on server:rx-usecs=3(default)
# netperf -c -4 -t TCP_STREAM -H 172.17.60.216
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 16384 10.00 2553.27 13.31 3.35 0.854 0.860
MTU=1500, client,server:tcp_reordering=3(default), client:GSO=off,
client:TSO=on, server:GRO=on server:rx-usecs=45
# netperf -c -4 -t TCP_STREAM -H 172.17.60.216
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 16384 10.00 2727.53 29.45 9.48 1.769 2.278
I did not observe any improvement in throughput when increasing rx-usecs
from 3 when using mtu=9000 although there was a slight increase in CPU
utilisation (maybe, there is quite a lot of noise in the results).
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-12-03 13:38 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-30 13:55 Bonding, GRO and tcp_reordering Simon Horman
2010-11-30 15:42 ` Ben Hutchings
2010-11-30 16:04 ` Eric Dumazet
2010-12-01 4:34 ` Simon Horman
2010-12-01 4:47 ` Eric Dumazet
2010-12-02 6:39 ` Simon Horman
2010-12-03 13:38 ` Simon Horman
2010-12-01 4:31 ` Simon Horman
2010-11-30 17:56 ` Rick Jones
2010-11-30 18:14 ` Eric Dumazet
2010-12-01 4:30 ` Simon Horman
2010-12-01 19:42 ` Rick Jones
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.