linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* (no subject)
@ 2002-08-27 18:22 Steffen Persvold
  2002-08-27 19:27 ` your mail Willy Tarreau
  2002-08-28  8:06 ` Channel bonding GbE (Tigon3) Steffen Persvold
  0 siblings, 2 replies; 4+ messages in thread
From: Steffen Persvold @ 2002-08-27 18:22 UTC (permalink / raw)
  To: linux-kernel

Dear list people,

Lately I've been testing out a couple of Dell PowerEdge 2650 machines. 
These babies have dual onboard BCM95701A10 NICs (Tigon3 chip) mounted 
in the same PCI-X 133MHz 64 bit bus.

Since they have dual onboard GbE, I've been trying to channel bond them 
using just two crossover cables between two machines. The results I'm 
seeing is at the first glance very strange. What I see is that the 
performance when bonded (round robin) is about _half_ (and sometimes even 
less) compared to just using a single interface. Here are some netpipe-2.4 
results :

64k message size, single interface
  1:     65536 bytes  190 times -->  760.54 Mbps in 0.000657 sec

256k message size, single interface
  1:    262144 bytes   53 times -->  855.04 Mbps in 0.002339 sec

64 message size, both interfaces (using round robin)
  1:     65536 bytes   65 times -->  257.06 Mbps in 0.001945 sec

256k message size, both interfaces (using round robin)
  1:    262144 bytes   25 times -->  376.01 Mbps in 0.005319 sec

Looking at the output of netstat -s after a testrun with 256k message 
size, I see some differences (main items) :

Single interface :
 Tcp:
      0 segments retransmited

 TcpExt:
     109616 packets directly queued to recvmsg prequeue.
     52249581 packets directly received from backlog
     125694404 packets directly received from prequeue
     78 packets header predicted
     124999 packets header predicted and directly queued to user
     TCPPureAcks: 93
     TCPHPAcks: 22981

      
Bonded interfaces :
  Tcp:
      234 segments retransmited

  TcpExt:
      1 delayed acks sent
      Quick ack mode was activated 234 times
      67087 packets directly queued to recvmsg prequeue.
      6058227 packets directly received from backlog
      13276665 packets directly received from prequeue
      6232 packets header predicted
      4625 packets header predicted and directly queued to user
      TCPPureAcks: 25708
      TCPHPAcks: 4456


The biggest difference as far as I can see is the 'packtes header 
predicted', 'packets header predicted and directly queued to user', 
'TCPPureAcks' and TCPHPAcks.

I have an idea that this happens because the packets are comming out of 
order into the receiving node (i.e the bonding device is alternating 
between each interface when sending, and when the receiving node gets the 
packets it is possible that the first interface get packets number 0, 2, 
4 and 6 in one interrupt and queues it to the network stack before packet 
1, 3, 5 is handled on the other interface).

If this is the case, any ideas how to fix this...

I would really love to get 2Gbit/sec on these machines....


PS

I've also seen this feature on the Intel GbE cards (e1000), but these 
drivers has a parameter named RxIntDelay which can be set to 0 to get 
interrupt for each packet. Is this possible with the tg3 driver too ?

DS

Regards,
--
  Steffen Persvold   |       Scali AS
 mailto:sp@scali.com |  http://www.scali.com
Tel: (+47) 2262 8950 |   Olaf Helsets vei 6
Fax: (+47) 2262 8951 |   N0621 Oslo, NORWAY



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: your mail
  2002-08-27 18:22 Steffen Persvold
@ 2002-08-27 19:27 ` Willy Tarreau
  2002-08-28  8:06 ` Channel bonding GbE (Tigon3) Steffen Persvold
  1 sibling, 0 replies; 4+ messages in thread
From: Willy Tarreau @ 2002-08-27 19:27 UTC (permalink / raw)
  To: Steffen Persvold; +Cc: linux-kernel

On Tue, Aug 27, 2002 at 08:22:03PM +0200, Steffen Persvold wrote:
 
> I have an idea that this happens because the packets are comming out of 
> order into the receiving node (i.e the bonding device is alternating 
> between each interface when sending, and when the receiving node gets the 
> packets it is possible that the first interface get packets number 0, 2, 
> 4 and 6 in one interrupt and queues it to the network stack before packet 
> 1, 3, 5 is handled on the other interface).

You pointed your finger on this exact common problem.
You can use the XOR bonding mode (modprobe bonding mode=2), which uses a
hash of mac addresses to select the outgoing interface. This is interesting
if you have lots of L2 hosts on the same network switch.

Or if you have a few hosts on the same switch, you'd better use the "nexthop"
parameter of "ip route". IIRC, it should be something like :
  ip route add <destination> nexthop dev eth0 nexthop dev eth1
but read the help, I'm not certain.

Cheers,
Willy


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Channel bonding GbE (Tigon3)
  2002-08-27 18:22 Steffen Persvold
  2002-08-27 19:27 ` your mail Willy Tarreau
@ 2002-08-28  8:06 ` Steffen Persvold
  2002-08-28  8:15   ` David S. Miller
  1 sibling, 1 reply; 4+ messages in thread
From: Steffen Persvold @ 2002-08-28  8:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: beowulf


Sorry for reposting this one guys, but I noticed that my original email 
had no subject (which in some cases doesn't get peoples attention :)

On Tue, 27 Aug 2002, Steffen Persvold wrote:

> Dear list people,
> 
> Lately I've been testing out a couple of Dell PowerEdge 2650 machines. 
> These babies have dual onboard BCM95701A10 NICs (Tigon3 chip) mounted 
> in the same PCI-X 133MHz 64 bit bus.
> 
> Since they have dual onboard GbE, I've been trying to channel bond them 
> using just two crossover cables between two machines. The results I'm 
> seeing is at the first glance very strange. What I see is that the 
> performance when bonded (round robin) is about _half_ (and sometimes even 
> less) compared to just using a single interface. Here are some netpipe-2.4 
> results :
> 
> 64k message size, single interface
>   1:     65536 bytes  190 times -->  760.54 Mbps in 0.000657 sec
> 
> 256k message size, single interface
>   1:    262144 bytes   53 times -->  855.04 Mbps in 0.002339 sec
> 
> 64 message size, both interfaces (using round robin)
>   1:     65536 bytes   65 times -->  257.06 Mbps in 0.001945 sec
> 
> 256k message size, both interfaces (using round robin)
>   1:    262144 bytes   25 times -->  376.01 Mbps in 0.005319 sec
> 
> Looking at the output of netstat -s after a testrun with 256k message 
> size, I see some differences (main items) :
> 
> Single interface :
>  Tcp:
>       0 segments retransmited
> 
>  TcpExt:
>      109616 packets directly queued to recvmsg prequeue.
>      52249581 packets directly received from backlog
>      125694404 packets directly received from prequeue
>      78 packets header predicted
>      124999 packets header predicted and directly queued to user
>      TCPPureAcks: 93
>      TCPHPAcks: 22981
> 
>       
> Bonded interfaces :
>   Tcp:
>       234 segments retransmited
> 
>   TcpExt:
>       1 delayed acks sent
>       Quick ack mode was activated 234 times
>       67087 packets directly queued to recvmsg prequeue.
>       6058227 packets directly received from backlog
>       13276665 packets directly received from prequeue
>       6232 packets header predicted
>       4625 packets header predicted and directly queued to user
>       TCPPureAcks: 25708
>       TCPHPAcks: 4456
> 
> 
> The biggest difference as far as I can see is the 'packtes header 
> predicted', 'packets header predicted and directly queued to user', 
> 'TCPPureAcks' and TCPHPAcks.
> 
> I have an idea that this happens because the packets are comming out of 
> order into the receiving node (i.e the bonding device is alternating 
> between each interface when sending, and when the receiving node gets the 
> packets it is possible that the first interface get packets number 0, 2, 
> 4 and 6 in one interrupt and queues it to the network stack before packet 
> 1, 3, 5 is handled on the other interface).
> 
> If this is the case, any ideas how to fix this...
> 
> I would really love to get 2Gbit/sec on these machines....
> 
> 
> PS
> 
> I've also seen this feature on the Intel GbE cards (e1000), but these 
> drivers has a parameter named RxIntDelay which can be set to 0 to get 
> interrupt for each packet. Is this possible with the tg3 driver too ?
> 
> DS
> 
> Regards,
-- 
  Steffen Persvold   |       Scali AS      
 mailto:sp@scali.com | http://www.scali.com
Tel: (+47) 2262 8950 |  Olaf Helsets vei 6
Fax: (+47) 2262 8951 |  N0621 Oslo, NORWAY


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Channel bonding GbE (Tigon3)
  2002-08-28  8:06 ` Channel bonding GbE (Tigon3) Steffen Persvold
@ 2002-08-28  8:15   ` David S. Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David S. Miller @ 2002-08-28  8:15 UTC (permalink / raw)
  To: sp; +Cc: linux-kernel, beowulf

   From: Steffen Persvold <sp@scali.com>
   Date: Wed, 28 Aug 2002 10:06:19 +0200 (CEST)

   > I have an idea that this happens because the packets are comming out of 
   > order into the receiving node (i.e the bonding device is alternating 
   > between each interface when sending, and when the receiving node gets the 
   > packets it is possible that the first interface get packets number 0, 2, 
   > 4 and 6 in one interrupt and queues it to the network stack before packet 
   > 1, 3, 5 is handled on the other interface).

That is exactly what is happening.  Packets are being reordered.

Welcome to one of the flaws of round-robin trunking. :-)

   > If this is the case, any ideas how to fix this...

Don't use round-robin, choose the output device based upon
hashing of some bits in the IP/TCP headers :-)

You won't get 2Gb/sec for a single TCP stream, but you will
for 2 or more.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-08-28  8:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-27 18:22 Steffen Persvold
2002-08-27 19:27 ` your mail Willy Tarreau
2002-08-28  8:06 ` Channel bonding GbE (Tigon3) Steffen Persvold
2002-08-28  8:15   ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).