All of lore.kernel.org
 help / color / mirror / Atom feed
* One way TCP bottleneck over 6 x Gbit tecl aggregated link
@ 2015-03-21  0:10 Wolfgang Rosner
  2015-03-21  1:15 ` Eric Dumazet
       [not found] ` <201503220912.39912.wrosner@tirnet.de>
  0 siblings, 2 replies; 13+ messages in thread
From: Wolfgang Rosner @ 2015-03-21  0:10 UTC (permalink / raw)
  To: netdev

Hello, 

Im trying to configure a beowulf style cluster based on venerable HP blade 
server hardware.
I configured 6 parallel GBit vlans between 16 blade nodes and a gateway server 
with teql link aggregation.

After lots of tuning, nearly everything runs fine (i.e. > 5,5 GBit/s iperf 
transfer rate, which is 95 % of theoretical limit), but one bottleneck 
remains left:

>From gateway to blade nodes, I get only half of full rate if I use only a 
single iperf process / single TCP link. 
With 2 or more iperf in parallel, transfer rate is OK.

I don't see this bottleneck in the other direction, nor in the links between 
the blade nodes: 
there I have always > 5,5 GBit, even for a single process.

Is there just a some simple tuning paramter I overlooked, or do I have to dig 
for a deeper cause?


Wolfgang Rosner



===== test results ================================

A single process only yields little more than half of the 6GBit capacity:


root@cruncher:~# iperf -c 192.168.130.225
------------------------------------------------------------
Client connecting to 192.168.130.225, TCP port 5001
TCP window size:  488 KByte (default)
------------------------------------------------------------
[  3] local 192.168.130.250 port 38589 connected with 192.168.130.225 port 
5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  3.89 GBytes  3.34 Gbits/sec


If I use two (or more) connections in parallel to share the physical link, 
they add up to ~ 95 % of physical link limit:

root@cruncher:~# iperf -c 192.168.130.225 -P2
------------------------------------------------------------
Client connecting to 192.168.130.225, TCP port 5001
TCP window size:  488 KByte (default)
------------------------------------------------------------
[  4] local 192.168.130.250 port 38591 connected with 192.168.130.225 port 
5001
[  3] local 192.168.130.250 port 38590 connected with 192.168.130.225 port 
5001
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  3.31 GBytes  2.84 Gbits/sec
[  3]  0.0-10.0 sec  3.30 GBytes  2.84 Gbits/sec
[SUM]  0.0-10.0 sec  6.61 GBytes  5.68 Gbits/sec


The values are quite reproducable, +- 0.2 GBit I'd estimate.

When I run 6 iperf in parallel, each over one of the eth0 links directly 
(bypassing teql), I get 6 x 990 MBit.

===== eof test results ================================

Futile trials:

MTU is 9000 on all links, txqlen 1000
(was 100 on teql, no effect)

I tried to tweak the well known mem limits to really(?) large values - no 
effect

  echo 16777216 > /proc/sys/net/core/rmem_default
  echo 16777216 > /proc/sys/net/core/wmem_default
  echo 16777216 > /proc/sys/net/core/wmem_max
  echo 16777216 > /proc/sys/net/core/rmem_max

  echo 33554432 > /proc/sys/net/core/wmem_max
  echo 33554432 > /proc/sys/net/core/rmem_max

 echo 4096 500000 12000000 > /proc/sys/net/ipv4/tcp_wmem
 echo 4096 500000 12000000 > /proc/sys/net/ipv4/tcp_rmem


Increase TCP window size on both iperf sides to 1Mb - no effect


 ...  ethtool -G  eth* tx 1024 ... no effect
(was 254 before, is 512 on the blades)


========== some considerations ==================

I assume the cause is hidden somewhere in the gatway sending side.

I think I can exclude general system bottleneck:
In top, it's hard to trace any CPU load at all.

iperf over local loop: 33.3 Gbits/sec on gateway
iperf over local loop 7.95 Gbits/sec on blades

So I don't see a reason that on the gateway, core system performance were a 
limit. On the blades, where we are close to total system throughput, I don't 
see the bottleneck for blade-blade connections.


I had an issue with physical limit on one of the Quad-Port-NICs on the gateway 
due to PCIe misconfiguration.  There was only x1 bandwith available, instead 
of x4 required. 
But I think, this is solved. 
If it were not, I would not have full speed in the other direction nor with 2 
or more iperf in parallel.


I suppose there must be some bottleneck parameter either per Process, per CPU, 
per TCP link or similar, which I haven't identified.

What is different in the gateway, compared to the blades?
- Interrupt configuration?
- Chip and driver of NIC
- Mainboard structure
- Kernel version/configuration
- switch connectivity


I think I can rule out the switch, because then the bottleneck would not 
depend on the process allocation to the links, I think.

======= System details =============================

Client nodes:
16 x HP Blade server 460 G1
each dual quad core Xeon 2,66 MHz, 32 GByte RAM
HP NC373i - BCM5708S 2 Port GBit Nic (onboard)
	eth{0,1}	-  driver bnx2
HP NC325m - BCM5715S 4 Port GBit Nic (mezzanine PCIe Card)
	eth{2-5}	-  driver tg3
debian wheezy 7.8
debian backport kernel
Linux version 3.16.0-0.bpo.4-amd64 
nfsroot

Gateway:
Asus Sabertooth 990FX R2.0
AMD FX-8320 8-core  1400 ... 3500 MHz, 16 GByte RAM

RTL8111/8168/8411 PCI Express Gigabit 
	eth0 - driver r8169
RTL-8100/8101L/8139 PCI Fast Ethernet Adapter
	eth1 - driver 8139too
HP NC364T / Intel 82571EB  4 Port GBit Nic PCIe
Intel PRO/1000 PT / Intel 82571EB  4 Port GBit Nic PCIe
	eth{2-9},  driver e1000e
debian wheezy 7.7
Vanilla Kernel 3.19
oldconfig from debian 3.16 backport
nfs4 server for the blades (dnsmasq, pxelinux)
aufs, btrfs, ext4-root

==============Network configuration: ============

gateway
eth0 is uplink to outside world
eth1 is HP blade center config network
eth2...7 is cluster interconnect network

subnet 192.168.130.0/27 is required for boot (PXE, nfsroot)
and shares link with 192.168.130.32/27

192.168.130.32/27 to 192.168.130.192/27  are the parallel physical interlinks 
networks

192.168.130.224/27 is the aggregated teql layer over those 6 links

root@cruncher:/cluster/etc/scripts/available# ip route
default via 192.168.0.18 dev eth0
192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.53
192.168.129.0/24 dev eth1  proto kernel  scope link  src 192.168.129.250

192.168.130.0/27 dev eth6  proto kernel  scope link  src 192.168.130.30

192.168.130.32/27 dev eth6  proto kernel  scope link  src 192.168.130.62
192.168.130.64/27 dev eth7  proto kernel  scope link  src 192.168.130.94
192.168.130.96/27 dev eth4  proto kernel  scope link  src 192.168.130.126
192.168.130.128/27 dev eth5  proto kernel  scope link  src 192.168.130.158
192.168.130.160/27 dev eth2  proto kernel  scope link  src 192.168.130.190
192.168.130.192/27 dev eth3  proto kernel  scope link  src 192.168.130.222

192.168.130.224/27 dev teql0  proto kernel  scope link  src 192.168.130.250

ip link 
.....
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc teql0 state UP mode 
DEFAULT qlen 1000
.....(same for eth3...7)...
12: teql0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state 
UNKNOWN mode DEFAULT qlen 1000

matching config on the blades for eth0..eth5

Layer 2:
6 x "HP blade center 1/10 Virtual connect (VC) Ethernet modules"
each VC Module is internally linked to one GBit Port of  each of the blades 
and uplinked by one TP  cable to one of eth2...7 on the gateway.

The 6 .../27 subnets match each one of those modules and are configured as 
seperate vlan in the HP VC connection manager.
(This was the only way I got the VC to utilize all uplinks in parallel, not as 
failover.)


Why teql?

Initially, I tried (layer 2) linux bonding module instead of teql for link 
aggregation. 
This did not load balance well in LACP mode.
Id did fine in round-robin mode on internal connections between blades,  but I 
could not get the VC-switch layer to round robin on the uplink (only LCAP or 
failover).

I expected to have more control over the misbehaving switch by using layer 3 
bonding, where still each channel has its own IP and MAC accessible.
The performance on the blade-blade connections is proof that it works.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: One way TCP bottleneck over 6 x Gbit tecl aggregated link
  2015-03-21  0:10 One way TCP bottleneck over 6 x Gbit tecl aggregated link Wolfgang Rosner
@ 2015-03-21  1:15 ` Eric Dumazet
       [not found] ` <201503220912.39912.wrosner@tirnet.de>
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2015-03-21  1:15 UTC (permalink / raw)
  To: Wolfgang Rosner; +Cc: netdev

On Sat, 2015-03-21 at 01:10 +0100, Wolfgang Rosner wrote:
> Hello, 
> 
> Im trying to configure a beowulf style cluster based on venerable HP blade 
> server hardware.
> I configured 6 parallel GBit vlans between 16 blade nodes and a gateway server 
> with teql link aggregation.
> 
> After lots of tuning, nearly everything runs fine (i.e. > 5,5 GBit/s iperf 
> transfer rate, which is 95 % of theoretical limit), but one bottleneck 
> remains left:
> 
> From gateway to blade nodes, I get only half of full rate if I use only a 
> single iperf process / single TCP link. 
> With 2 or more iperf in parallel, transfer rate is OK.
> 
> I don't see this bottleneck in the other direction, nor in the links between 
> the blade nodes: 
> there I have always > 5,5 GBit, even for a single process.
> 
> Is there just a some simple tuning paramter I overlooked, or do I have to dig 
> for a deeper cause?

What linux version runs on sender ?

Could you send output of :

nstat >/dev/null
iperf -c 192.168.130.225
nstat

Also, please send ss output while iperf is running as in :

(please use a recent ss command, found in iproute2 package :
https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/  
so that it outputs the reordering level)

iperf -c 192.168.130.225 &
ss -temoi dst 192.168.130.225

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: One way TCP bottleneck over 6 x Gbit tecl aggregated link
       [not found]   ` <1427020159.25985.43.camel@edumazet-glaptop2.roam.corp.google.com>
@ 2015-03-22 15:14     ` Wolfgang Rosner
  2015-03-22 16:54       ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Wolfgang Rosner @ 2015-03-22 15:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Hello, Eric, 

Am Sonntag, 22. März 2015 11:29:19 schrieben Sie:
> On Sun, 2015-03-22 at 09:12 +0100, Wolfgang Rosner wrote:
> > Am Samstag, 21. März 2015 22:48:22 schrieben Sie:
> > > On Sat, 2015-03-21 at 12:32 +0100, Wolfgang Rosner wrote:
> > > > btw: as it looks to me, the current git version seems to ignore the
> > > > dsc filter. Sorry, you have to dig for the right line.
> > >
> > > Note I mentioned dst not dsc ;)
> >
> > sorry for the typo, but it's only in the mail text.
> > In the script it says "dst"
> > and  I checked it again.

> >
> > Anyway, I think this tiny bug is not going to hurt us here, is it?
> >
> > Wolfgang
> >
> >
> > ===================================
> >
> > root@cruncher:/home/tmp/netdev#
> > /home/build/iproute2/shemminger/iproute2/misc/ss -temoi dst
> > 192.168.130.225
> > State      Recv-Q Send-Q                                    Local
> > Address:Port Peer Address:Port
> > ESTAB      0      0                                         
> > 192.168.0.53:ssh 192.168.0.50:21760                

> > (....)
>
> ss -temoi dst 192.168.130.225
>
> should not output tcp sessions to other destination than 192.168.130.225
>
> Your example shows sessions to 192.168.0.50, this makes no sense to me.


Agree. 
This is why I considered it as a bug.

But just a sideline, not my main problem.
And maybe it is even related to my build environment.
Or to the fact that I call it with full path. or whatever....

I don't know whether this repo you pointed me
https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/  
is a bleeding edge developer fork, that is expected to show such fluctuations, 
or a should-be-stable mainstream one, where such behaviur is not supposed to 
occur. 

So if for you, there is reason to worry on this issue, we can try to debug 
into it. Let me know, and give me instructions for tests, if required.





However, from my perspective, I'd prefer to return to my real problem:
	"One way TCP bottleneck over 6 x Gbit tecl aggregated link"

If you got lost in the flood of netdev list (I just subscribed, and am really 
overwhelmed...), I don't mind to send you the info again and even remove the 
superfluous lines manually for you - see below.

Do these figure tell you a story?
Any hint where to look for a bottleneck?
Presumaly related to some per-TCP-connection ressource?


Wolfgang Rosner


##########################################################################


# cat test.sh

SSPATH=/home/build/iproute2/shemminger/iproute2/misc/ss
SEP=============================================================

(sleep 5 ; echo $SEP; $SSPATH -temoi dst 192.168.130.225 ; echo $SEP ) &

nstat >/dev/null
iperf -c 192.168.130.225
echo $SEP
nstat



# end of test script
#########################################################################
# output follows, superflous lines from ss bug removed manually


------------------------------------------------------------
Client connecting to 192.168.130.225, TCP port 5001
TCP window size:  488 KByte (default)
------------------------------------------------------------
[  3] local 192.168.130.250 port 39256 connected with 192.168.130.225 port 
5001
============================================================

# ....ss output - buggy lines removed ....

State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port                
ESTAB      0      4554532 192.168.130.250:39256                
192.168.130.225:5001                  timer:(on,200ms,0) ino:514423 
sk:ffff8801bdddd140 <->
	 skmem:(r0,rb500000,t66560,tb12000000,f3556096,w8469760,o0,bl3072) ts sack 
cubic wscale:7,10 rto:204 rtt:0.357/0.01 mss:8948 cwnd:956 ssthresh:17 send 
191692.7Mbps lastsnd:4 lastrcv:140569508 lastack:4 unacked:16 retrans:0/1 
sacked:1 reordering:22 rcv_space:26880

# .... buggy lines removed ....


============================================================
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  3.78 GBytes  3.24 Gbits/sec
============================================================
#kernel
IpInReceives                    444749             0.0
IpForwDatagrams                 4                  0.0
IpInDelivers                    444745             0.0
IpOutRequests                   453019             0.0
TcpActiveOpens                  1                  0.0
TcpInSegs                       444742             0.0
TcpOutSegs                      453013             0.0
TcpRetransSegs                  1                  0.0
UdpInDatagrams                  1                  0.0
UdpOutDatagrams                 1                  0.0
UdpIgnoredMulti                 2                  0.0
TcpExtTCPHPHits                 12                 0.0
TcpExtTCPPureAcks               125155             0.0
TcpExtTCPHPAcks                 71412              0.0
TcpExtTCPSackRecovery           1                  0.0
TcpExtTCPSACKReorder            9                  0.0
TcpExtTCPTSReorder              1                  0.0
TcpExtTCPPartialUndo            1                  0.0
TcpExtTCPFastRetrans            1                  0.0
TcpExtTCPDSACKRecv              1                  0.0
TcpExtTCPDSACKIgnoredNoUndo     1                  0.0
TcpExtTCPSackShiftFallback      50003              0.0
TcpExtTCPAutoCorking            23882              0.0
TcpExtTCPOrigDataSent           453010             0.0
TcpExtTCPHystartTrainDetect     1                  0.0
TcpExtTCPHystartTrainCwnd       17                 0.0
IpExtInBcastPkts                3                  0.0
IpExtOutBcastPkts               1                  0.0
IpExtInOctets                   24508230           0.0
IpExtOutOctets                  4076970728         0.0
IpExtInBcastOctets              538                0.0
IpExtOutBcastOctets             76                 0.0
IpExtInNoECTPkts                444749             0.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: One way TCP bottleneck over 6 x Gbit tecl aggregated link
  2015-03-22 15:14     ` Wolfgang Rosner
@ 2015-03-22 16:54       ` Eric Dumazet
  2015-03-22 20:47         ` Wolfgang Rosner
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2015-03-22 16:54 UTC (permalink / raw)
  To: Wolfgang Rosner; +Cc: netdev

On Sun, 2015-03-22 at 16:14 +0100, Wolfgang Rosner wrote:

> Agree. 
> This is why I considered it as a bug.
> 
> But just a sideline, not my main problem.
> And maybe it is even related to my build environment.
> Or to the fact that I call it with full path. or whatever....
> 
> I don't know whether this repo you pointed me
> https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/  
> is a bleeding edge developer fork, that is expected to show such fluctuations, 
> or a should-be-stable mainstream one, where such behaviur is not supposed to 
> occur. 
> 
> So if for you, there is reason to worry on this issue, we can try to debug 
> into it. Let me know, and give me instructions for tests, if required.
> 
> 
> 
> 
> 
> However, from my perspective, I'd prefer to return to my real problem:
> 	"One way TCP bottleneck over 6 x Gbit tecl aggregated link"
> 
> If you got lost in the flood of netdev list (I just subscribed, and am really 
> overwhelmed...), I don't mind to send you the info again and even remove the 
> superfluous lines manually for you - see below.
> 
> Do these figure tell you a story?
> Any hint where to look for a bottleneck?
> Presumaly related to some per-TCP-connection ressource?
> 

I see nothing wrong but a single retransmit.


You might be bitten by tcp_metric cache.

check /proc/sys/net/ipv4/tcp_no_metrics_save

(set it to 0, and flush tcp metric cache :

echo 0 >/proc/sys/net/ipv4/tcp_no_metrics_save
ip tcp_metrics flush


Also, not clear why you use jumbo frames with 1Gbit NIC (do not do that,
really), and it seems TSO is disabled on your NIC(s) ????


for dev in eth0 eth1 eth2 eth3 ....
do
 ethtool -k $dev
done

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: One way TCP bottleneck over 6 x Gbit tecl aggregated link
  2015-03-22 16:54       ` Eric Dumazet
@ 2015-03-22 20:47         ` Wolfgang Rosner
  2015-03-23  0:57           ` One way TCP bottleneck over 6 x Gbit tecl aggregated link // Kernel issue Wolfgang Rosner
  0 siblings, 1 reply; 13+ messages in thread
From: Wolfgang Rosner @ 2015-03-22 20:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 11140 bytes --]

Hello, Eric,

Am Sonntag, 22. März 2015 17:54:54 schrieb Eric Dumazet:
>
> I see nothing wrong but a single retransmit.
> You might be bitten by tcp_metric cache.
> check /proc/sys/net/ipv4/tcp_no_metrics_save

OK, lets go testing.

reference figures before any changes:
gateway -> blade 	3.21 Gbits/sec
blade -> blade		5.49 Gbits/sec


both on gateway and on blade:

root@cruncher:/home/tmp/netdev# cat /proc/sys/net/ipv4/tcp_no_metrics_save
0
root@blade-002:~# cat /proc/sys/net/ipv4/tcp_no_metrics_save
0

> (set it to 0, and flush tcp metric cache :
> echo 0 >/proc/sys/net/ipv4/tcp_no_metrics_save

no need to do so, , right?

> ip tcp_metrics flush

root@cruncher:/home/tmp/netdev# ip tcp_metrics flush
	Object "tcp_metrics" is unknown, try "ip help".


Looks like debian-stable's iproute tools are quite old, aren't they?
So I suppose the iproute2 package you pointed me to will do the job.
Is it OK this way, or shold I install and configure it on my systems?
Or will the iproute2 from debian backport do?

btw: the teql aggregation is constructed by ip anc tc commands which are part 
of iproute2 package, right. Could the old toolbox, maybe in connection with a 
recent 3.19 vanilla kernel, cause the problem?


/home/build/iproute2/shemminger/iproute2/ip/ip tcp_metrics flush
(both on blades and on gateway)



root@blade-002:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.130.226 port 5001 connected with 192.168.130.250 port 
49522
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  3.78 GBytes  3.25 Gbits/sec


root@cruncher:/home/tmp/netdev# iperf -c 192.168.130.226
------------------------------------------------------------
Client connecting to 192.168.130.226, TCP port 5001
TCP window size:  488 KByte (default)
------------------------------------------------------------
[  3] local 192.168.130.250 port 49522 connected with 192.168.130.226 port 
5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  3.78 GBytes  3.25 Gbits/sec


repeated tests: 3.25, 3.25, 3.12
was 3.12 before.
Sad to say: no effect

>
>
> Also, not clear why you use jumbo frames with 1Gbit NIC (do not do that,
> really), and it seems TSO is disabled on your NIC(s) ????

Because it increased thrughput by a factor of tree.

Jumbo frames was THE ONE AND ONLY really helpful tuning parameter.

What is wrong with jumbo frames in your view?

That's what many TCP tuning tips out there in the internet recommend.
I could follow their argument, that the lesser number of chunks to tranfer 
reduces system load.

And my half-educated guess was, even if GBit NICs do a lot of work in 
hardware / firmware, there is still the teql layer which I suppose is 100 % 
handled in CPU, right?



From my early setup test notes:

default MTU (1500)
	1622.65 MBit / s

jumbo frames (MTU = 9000)
(on both eth links and on teql)
	5143.78 MBit / s

This was on blade-blade links.
If I had this figure on gateway->blade link too, I would be glad and not 
bother you.


> and it seems TSO is disabled on your NIC(s) ????
> for dev in eth0 eth1 eth2 eth3 ....
> do
>  ethtool -k $dev
> done

TSO also known as tcp-segmentation-offload, right?
let me grep a little to keep the mail text readable.
I'll put the unfiltered ethtool -k outputs as attachments for further 
scrutiny.


On the gateway:

root@cruncher:~# for i in `seq 0 9` ; do  ethtool -k  eth$i | grep 'Features\|
segmentation-offload'  ; done
Features for eth0:
tcp-segmentation-offload: off
generic-segmentation-offload: off [requested on]
Features for eth1:
tcp-segmentation-offload: off
generic-segmentation-offload: on
Features for eth2:
tcp-segmentation-offload: on
generic-segmentation-offload: on
Features for eth3:
tcp-segmentation-offload: on
generic-segmentation-offload: on
Features for eth4:
tcp-segmentation-offload: on
generic-segmentation-offload: on
Features for eth5:
tcp-segmentation-offload: on
generic-segmentation-offload: on
Features for eth6:
tcp-segmentation-offload: on
generic-segmentation-offload: on
Features for eth7:
tcp-segmentation-offload: on
generic-segmentation-offload: on
Features for eth8:
tcp-segmentation-offload: on
generic-segmentation-offload: on
Features for eth9:
tcp-segmentation-offload: on
generic-segmentation-offload: on

Don't care for the first two:
eth0 is the external uplink and mostly used for ssh
eth1 is a 100Mbit link for the blade center admin subnet

This is what we need:
eth2 .... eth7 contribute the teql links towards the blades.


On the blades

root@blade-002:~# for i in `seq 0 5` ; do  ethtool -k  eth$i | 
grep 'Features\|segmentation-offload'  ; done
Features for eth0:
tcp-segmentation-offload: on
generic-segmentation-offload: on
Features for eth1:
tcp-segmentation-offload: on
generic-segmentation-offload: on
Features for eth2:
tcp-segmentation-offload: off
generic-segmentation-offload: on
Features for eth3:
tcp-segmentation-offload: off
generic-segmentation-offload: on
Features for eth4:
tcp-segmentation-offload: off
generic-segmentation-offload: on
Features for eth5:
tcp-segmentation-offload: off
generic-segmentation-offload: on


This may be due to different card types:

root@blade-002:~# lspci | grep -i ether
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S 
Gigabit Ethernet (rev 12)
07:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S 
Gigabit Ethernet (rev 12)
16:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5715S Gigabit 
Ethernet (rev a3)
16:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5715S Gigabit 
Ethernet (rev a3)
18:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5715S Gigabit 
Ethernet (rev a3)
18:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5715S Gigabit 
Ethernet (rev a3)

lspci -k .... 
eth{0,1}
        Subsystem: Hewlett-Packard Company NC373i Integrated Multifunction 
Gigabit Server Adapter
        Kernel driver in use: bnx2

eth{2,3,4,5}
        Subsystem: Hewlett-Packard Company NC325m PCIe Quad Port Adapter
        Kernel driver in use: tg3


I remember some discussion on the web that generic-segmentation-offload would 
be preferrable over tcp-segmentation-offload anyway.
Nevertheless, let's  give it a try, but it does not want to do so:

root@blade-002:~# ethtool -K eth4 tso on
Could not change any device features

Do I interpret this right when I say the card/the driver is not supporting 
TSO?


Just for curiousity:
Tried to set tso off for all cluster link eht? on the gateway, too
	.... 3.31 Gbits/sec -> no effect


but still the old picture: over 2 different connections, I can easyly saturate 
the link:


# iperf -c 192.168.130.226 -P2
------------------------------------------------------------
Client connecting to 192.168.130.226, TCP port 5001
TCP window size:  780 KByte (default)
------------------------------------------------------------
[  4] local 192.168.130.250 port 49543 connected with 192.168.130.226 port 
5001
[  3] local 192.168.130.250 port 49542 connected with 192.168.130.226 port 
5001
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  3.29 GBytes  2.82 Gbits/sec
[  3]  0.0-10.0 sec  3.30 GBytes  2.84 Gbits/sec
[SUM]  0.0-10.0 sec  6.59 GBytes  5.66 Gbits/sec


And doing it the other way round, I can saturat the link with only one 
connection:

# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  488 KByte (default)
------------------------------------------------------------
[  4] local 192.168.130.250 port 5001 connected with 192.168.130.226 port 
60139
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  6.52 GBytes  5.60 Gbits/sec

root@blade-002:~# iperf -c 192.168.130.250
------------------------------------------------------------
Client connecting to 192.168.130.250, TCP port 5001
TCP window size:  325 KByte (default)
------------------------------------------------------------
[  3] local 192.168.130.226 port 60139 connected with 192.168.130.250 port 
5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  6.52 GBytes  5.60 Gbits/sec


****************************************************************

What puzzles me are the differences in TCP window size:
Does it relate to the 	tcp_metric_cache  issue?

What we also might consider is that one of the physical links is used as "boot 
net"


As you can see, there are two addresses assigned to the first physical link 
(eth6 on gateway, eth0 on blade)
Subnet 192.168.130.0/27 is used for boot (PXE, tftp, nfsroot)
Subnet 192.168.130.32/27 is the first teql assigned subnet.

While there is no traffic blade <-> blade and not much blade ->gateway,
during boot, all the nfsroot traffic is transferred gateway -> blade
right in the direction we encounter bad teql performance.

I don't think that there is much transfer during the iperf tests. 
So it is not an issue of congestion.
But may it be that the boot traffic "skews" the tcp_metric_cache ?
May even just a small number of transfers imbalance the teql flow?


root@cruncher:~# ip route
default via 192.168.0.18 dev eth0
192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.53
192.168.129.0/24 dev eth1  proto kernel  scope link  src 192.168.129.250
192.168.130.0/27 dev eth6  proto kernel  scope link  src 192.168.130.30
192.168.130.32/27 dev eth6  proto kernel  scope link  src 192.168.130.62
192.168.130.64/27 dev eth7  proto kernel  scope link  src 192.168.130.94
192.168.130.96/27 dev eth4  proto kernel  scope link  src 192.168.130.126
192.168.130.128/27 dev eth5  proto kernel  scope link  src 192.168.130.158
192.168.130.160/27 dev eth2  proto kernel  scope link  src 192.168.130.190
192.168.130.192/27 dev eth3  proto kernel  scope link  src 192.168.130.222
192.168.130.224/27 dev teql0  proto kernel  scope link  src 192.168.130.250


root@blade-002:~# ip ro
default via 192.168.130.30 dev eth0
192.168.130.0/27 dev eth0  proto kernel  scope link  src 192.168.130.2
192.168.130.32/27 dev eth0  proto kernel  scope link  src 192.168.130.34
192.168.130.64/27 dev eth1  proto kernel  scope link  src 192.168.130.66
192.168.130.96/27 dev eth2  proto kernel  scope link  src 192.168.130.98
192.168.130.128/27 dev eth3  proto kernel  scope link  src 192.168.130.130
192.168.130.160/27 dev eth4  proto kernel  scope link  src 192.168.130.162
192.168.130.192/27 dev eth5  proto kernel  scope link  src 192.168.130.194
192.168.130.224/27 dev teql0  proto kernel  scope link  src 192.168.130.226



Sorry when things are weird that way.
But I wouldn't have bothered this venerable list when there had been a simple 
solution....



Wolfgang Rosner

[-- Attachment #2: eth_offload_blade-002.log --]
[-- Type: text/x-log, Size: 7694 bytes --]

Features for eth0:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: on
	tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth1:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: on
	tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth2:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [requested on]
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on [fixed]
tx-vlan-offload: on [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth3:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [requested on]
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on [fixed]
tx-vlan-offload: on [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth4:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [requested on]
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on [fixed]
tx-vlan-offload: on [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth5:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [requested on]
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on [fixed]
tx-vlan-offload: on [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]


[-- Attachment #3: eth_offload_gateway.log --]
[-- Type: text/x-log, Size: 12178 bytes --]

Features for eth0:
rx-checksumming: on
tx-checksumming: off
	tx-checksum-ipv4: off
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: off
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: off
	tx-scatter-gather: off
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
	tx-tcp-segmentation: off
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: off
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth1:
rx-checksumming: off [fixed]
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on [fixed]
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on [fixed]
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [fixed]
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth2:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth3:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth4:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth5:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth6:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth7:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth8:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Features for eth9:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: One way TCP bottleneck over 6 x Gbit tecl aggregated link // Kernel issue
  2015-03-22 20:47         ` Wolfgang Rosner
@ 2015-03-23  0:57           ` Wolfgang Rosner
  2015-03-23  2:33             ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Wolfgang Rosner @ 2015-03-23  0:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Hello, Eric,

It's the kernel, stupid me....

I can't belive it:

I just bootet my gatway into a debian kernel, and it works right away:
	5.93 Gbits/sec	over a single TCP connection :-)))

root@cruncher:~# cat /proc/version
Linux version 3.16.0-0.bpo.4-amd64 (debian-kernel@lists.debian.org) (gcc 
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.16.7-ckt4-3~bpo70+1 
(2015-02-12)

as opposed to vanilla 3.19, which was running in the previous tests.

=====================
the sad thing is, that I had to remove the "udba=notify" option from my 
aufs-layered nfsroots mimics.
So anytime I make config-Changes or add packages to the blade setup, I have to 
run a script to remount all nfs roots, to get the blade known of the change.
(this was the reason why I went for 3.19)
======================


Therefore, I would be glad to get it working with 3.19, too.
And I hope that you, too, were eager to rule out the possibility of a 
regression bug :-)


So, what could had have happend:
- regression bug somewhere in the kernel
- broken compatibility between 3.16 and 3.19 (maybe the iproute2 tools?)
- broken compatibility between Vanilla and Debian
- ????

Are you ready to assist me on this track?



Wolfgang Rosner

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: One way TCP bottleneck over 6 x Gbit tecl aggregated link // Kernel issue
  2015-03-23  0:57           ` One way TCP bottleneck over 6 x Gbit tecl aggregated link // Kernel issue Wolfgang Rosner
@ 2015-03-23  2:33             ` Eric Dumazet
  2015-03-23  9:05               ` Wolfgang Rosner
  2015-03-26 19:34               ` Wolfgang Rosner
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2015-03-23  2:33 UTC (permalink / raw)
  To: Wolfgang Rosner; +Cc: netdev

On Mon, 2015-03-23 at 01:57 +0100, Wolfgang Rosner wrote:

> 
> Are you ready to assist me on this track?

First check if the problem is already solved, using this git tree ?

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: One way TCP bottleneck over 6 x Gbit tecl aggregated link // Kernel issue
  2015-03-23  2:33             ` Eric Dumazet
@ 2015-03-23  9:05               ` Wolfgang Rosner
  2015-03-23  9:25                 ` Zimmermann, Alexander
  2015-03-26 19:34               ` Wolfgang Rosner
  1 sibling, 1 reply; 13+ messages in thread
From: Wolfgang Rosner @ 2015-03-23  9:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Am Montag, 23. März 2015 03:33:17 schrieb Eric Dumazet:
> On Mon, 2015-03-23 at 01:57 +0100, Wolfgang Rosner wrote:
> > Are you ready to assist me on this track?
>
> First check if the problem is already solved, using this git tree ?
>
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git


Is it OK to use this 3.19 tag?
https://kernel.googlesource.com/pub/scm/linux/kernel/git/davem/net.git/+/v3.19


I see that head is v4.0-rc4
I'm afraid that this might break other things.
I think of aufs, which I need to get my clients nfsroot booted.
Without clients, I can't iperf the teql link.

I have to admit that I'm not the master of this git beast.
But I'm working my way, somehow...




Wolfgang Rosner

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: One way TCP bottleneck over 6 x Gbit tecl aggregated link // Kernel issue
  2015-03-23  9:05               ` Wolfgang Rosner
@ 2015-03-23  9:25                 ` Zimmermann, Alexander
  2015-03-23 13:54                   ` Wolfgang Rosner
  0 siblings, 1 reply; 13+ messages in thread
From: Zimmermann, Alexander @ 2015-03-23  9:25 UTC (permalink / raw)
  To: Wolfgang Rosner; +Cc: Eric Dumazet, netdev

Hi Wolfgang,

> Am 23.03.2015 um 10:05 schrieb Wolfgang Rosner <wrosner@tirnet.de>:
> 
> Am Montag, 23. März 2015 03:33:17 schrieb Eric Dumazet:
>> On Mon, 2015-03-23 at 01:57 +0100, Wolfgang Rosner wrote:
>>> Are you ready to assist me on this track?
>> 
>> First check if the problem is already solved, using this git tree ?
>> 
>> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git
> 
> 
> Is it OK to use this 3.19 tag?
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/davem/net.git/+/v3.19
> 
> 
> I see that head is v4.0-rc4
> I'm afraid that this might break other things.
> I think of aufs, which I need to get my clients nfsroot booted.

probably you can take a look on OverlayFS (instead of aufs) which
was introduced in kernel 3.18

Alex

> Without clients, I can't iperf the teql link.
> 
> I have to admit that I'm not the master of this git beast.
> But I'm working my way, somehow...
> 
> 
> 
> 
> Wolfgang Rosner
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: One way TCP bottleneck over 6 x Gbit tecl aggregated link // Kernel issue
  2015-03-23  9:25                 ` Zimmermann, Alexander
@ 2015-03-23 13:54                   ` Wolfgang Rosner
  2015-03-24 12:59                     ` Zimmermann, Alexander
  0 siblings, 1 reply; 13+ messages in thread
From: Wolfgang Rosner @ 2015-03-23 13:54 UTC (permalink / raw)
  To: Zimmermann, Alexander; +Cc: netdev

Am Montag, 23. März 2015 10:25:33 schrieben Sie:
> Hi Wolfgang,
>
> > Am 23.03.2015 um 10:05 schrieb Wolfgang Rosner <wrosner@tirnet.de>:
> >
> > Am Montag, 23. März 2015 03:33:17 schrieb Eric Dumazet:
> >> On Mon, 2015-03-23 at 01:57 +0100, Wolfgang Rosner wrote:
> >>> Are you ready to assist me on this track?
> >>
> >> First check if the problem is already solved, using this git tree ?
> >>
> >> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git
> >
> > Is it OK to use this 3.19 tag?
> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/davem/net.git/+/
> >v3.19
> >
> >
> > I see that head is v4.0-rc4
> > I'm afraid that this might break other things.
> > I think of aufs, which I need to get my clients nfsroot booted.
>
> probably you can take a look on OverlayFS (instead of aufs) which
> was introduced in kernel 3.18
>
> Alex


Hi Alex,

Thank you for the tip.
But if it just were that easy as to have a look at it....

As far as I know, overlayfs only allows two layers, right?
So for more complex trees, I could maybe stack them (so I hope).

To do so, at least I had to redesign my complete script which builds the 
nfsroot mimics as shown below.

And as far as I could figure out, the representation of whiteouts is different 
in aufs and in overlayfs. So I cannnot simply use my existing aufs branches 
and stack them by overlayfs, without breaking the file/ whiteout consistency, 
or can I?


However, when making the decision in favour of aufs, the udba ("user direct 
branch access") feature (and lack thereof in overlayfs) was the main reason 
not to go for overlayfs:

> Changes to the underlying filesystems while part of a mounted overlay
> filesystem are not allowed.  If the underlying filesystem is changed,
> the behavior of the overlay is undefined,
( from https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt )

Is this going to change in the foreseeable future?
Are you aware of any plans to implement UDBA in overlayfs?
If so, I would be glad to participate in testing.

I could live without the udba feature for a limited test period.
But I got used to the possibility to install packages in a package layer, fine 
tune configuration in another layer and live test this on a parallel cluster 
of 16 machines, without dropping off nfsroot each time.

Maybe, part of this funcitonality can be replaced by btrfs snapshotting.
But nevertheless, it would end up as a comlepte redesign of my cluster setup.


Wolfgang Rosner



############################################################################

This is how my aufs mounts look like:

aufs-nfsroot-blade-001 on /cluster/nfs/nfsr/blade-001.crunchnet type aufs 
(rw,relatime,si=88a4f4d462df4343)
        0 rw id=64 path=/cluster/node_roots/wheezy-HD-2015_02_21/cow/cow_001
        1 ro+wh id=65 path=/cluster/node_roots/wheezy-HD-2015_02_21/config
        2 ro+wh id=66 path=/cluster/node_roots/wheezy-HD-2015_02_21/mask
        3 ro+wh id=67 path=/cluster/node_roots/wheezy-HD-2015_02_21/pkg_std
        4 ro id=68 path=/cluster/node_roots/wheezy-HD-2015_02_21/base
        xino: /tmp/.aufs.xino
	:
	(repeated 16 times)
	:
aufs-nfsroot-blade-016



plus 3 admin layers (writable for each intermediary layer)



root@cruncher:/cluster/etc/scripts/available# mount | grep aufs
aufs-nfsroot-blade-001 on /cluster/nfs/nfsr/blade-001.crunchnet type aufs 
(rw,relatime,si=88a4f4d462df4343)
aufs-nfsroot-blade-002 on /cluster/nfs/nfsr/blade-002.crunchnet type aufs 
(rw,relatime,si=88a4f4d3fe245343)
aufs-nfsroot-blade-003 on /cluster/nfs/nfsr/blade-003.crunchnet type aufs 
(rw,relatime,si=88a4f4d3fe9c5343)
aufs-nfsroot-blade-004 on /cluster/nfs/nfsr/blade-004.crunchnet type aufs 
(rw,relatime,si=88a4f4d467b53343)
aufs-nfsroot-blade-005 on /cluster/nfs/nfsr/blade-005.crunchnet type aufs 
(rw,relatime,si=88a4f4d47db69343)
aufs-nfsroot-blade-006 on /cluster/nfs/nfsr/blade-006.crunchnet type aufs 
(rw,relatime,si=88a4f4d4630bb343)
aufs-nfsroot-blade-007 on /cluster/nfs/nfsr/blade-007.crunchnet type aufs 
(rw,relatime,si=88a4f4d4586cc343)
aufs-nfsroot-blade-008 on /cluster/nfs/nfsr/blade-008.crunchnet type aufs 
(rw,relatime,si=88a4f4d458689343)
aufs-nfsroot-blade-009 on /cluster/nfs/nfsr/blade-009.crunchnet type aufs 
(rw,relatime,si=88a4f4d3fda97343)
aufs-nfsroot-blade-010 on /cluster/nfs/nfsr/blade-010.crunchnet type aufs 
(rw,relatime,si=88a4f4d3fe3be343)
aufs-nfsroot-blade-011 on /cluster/nfs/nfsr/blade-011.crunchnet type aufs 
(rw,relatime,si=88a4f4d3fe8cb343)
aufs-nfsroot-blade-012 on /cluster/nfs/nfsr/blade-012.crunchnet type aufs 
(rw,relatime,si=88a4f4d3fc90c343)
aufs-nfsroot-blade-013 on /cluster/nfs/nfsr/blade-013.crunchnet type aufs 
(rw,relatime,si=88a4f4d4622fb343)
aufs-nfsroot-blade-014 on /cluster/nfs/nfsr/blade-014.crunchnet type aufs 
(rw,relatime,si=88a4f4d3e2cc8343)
aufs-nfsroot-blade-015 on /cluster/nfs/nfsr/blade-015.crunchnet type aufs 
(rw,relatime,si=88a4f4d0ca64e343)
aufs-nfsroot-blade-016 on /cluster/nfs/nfsr/blade-016.crunchnet type aufs 
(rw,relatime,si=88a4f4d0ca92e343)
aufs_pkg_std on /cluster/node_roots/wheezy-HD-2015_02_21/admount-pkg_std type 
aufs (rw,relatime,si=88a4f4d46dced343)
aufs_mask on /cluster/node_roots/wheezy-HD-2015_02_21/admount-mask type aufs 
(rw,relatime,si=88a4f4d3b0a2b343)
aufs_config on /cluster/node_roots/wheezy-HD-2015_02_21/admount-config type 
aufs (rw,relatime,si=88a4f4d3fee4c343)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: One way TCP bottleneck over 6 x Gbit tecl aggregated link // Kernel issue
  2015-03-23 13:54                   ` Wolfgang Rosner
@ 2015-03-24 12:59                     ` Zimmermann, Alexander
  0 siblings, 0 replies; 13+ messages in thread
From: Zimmermann, Alexander @ 2015-03-24 12:59 UTC (permalink / raw)
  To: Wolfgang Rosner; +Cc: netdev

Hi Wolfgang,

> Am 23.03.2015 um 14:54 schrieb Wolfgang Rosner <wrosner@tirnet.de>:
> 
> Am Montag, 23. März 2015 10:25:33 schrieben Sie:
>> Hi Wolfgang,
>> 
>>> Am 23.03.2015 um 10:05 schrieb Wolfgang Rosner <wrosner@tirnet.de>:
>>> 
>>> Am Montag, 23. März 2015 03:33:17 schrieb Eric Dumazet:
>>>> On Mon, 2015-03-23 at 01:57 +0100, Wolfgang Rosner wrote:
>>>>> Are you ready to assist me on this track?
>>>> 
>>>> First check if the problem is already solved, using this git tree ?
>>>> 
>>>> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git
>>> 
>>> Is it OK to use this 3.19 tag?
>>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/davem/net.git/+/
>>> v3.19
>>> 
>>> 
>>> I see that head is v4.0-rc4
>>> I'm afraid that this might break other things.
>>> I think of aufs, which I need to get my clients nfsroot booted.
>> 
>> probably you can take a look on OverlayFS (instead of aufs) which
>> was introduced in kernel 3.18
>> 
>> Alex
> 
> 
> Hi Alex,
> 
> Thank you for the tip.
> But if it just were that easy as to have a look at it....
> 
> As far as I know, overlayfs only allows two layers, right?
> So for more complex trees, I could maybe stack them (so I hope).
> 
> To do so, at least I had to redesign my complete script which builds the 
> nfsroot mimics as shown below.
> 
> And as far as I could figure out, the representation of whiteouts is different 
> in aufs and in overlayfs. So I cannnot simply use my existing aufs branches 
> and stack them by overlayfs, without breaking the file/ whiteout consistency, 
> or can I?
> 

sorry about that, but I’ve no clue. I got rid of the overlay stuff last year.
I netboot my diskless machines via PXE, NFS readonly (expect home), and tmpfs for 
the rest. That’s it.

> 
> However, when making the decision in favour of aufs, the udba ("user direct 
> branch access") feature (and lack thereof in overlayfs) was the main reason 
> not to go for overlayfs:
> 
>> Changes to the underlying filesystems while part of a mounted overlay
>> filesystem are not allowed.  If the underlying filesystem is changed,
>> the behavior of the overlay is undefined,
> ( from https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt )
> 
> Is this going to change in the foreseeable future?
> Are you aware of any plans to implement UDBA in overlayfs?

I don’t follow the overlayfs development anymore.

> If so, I would be glad to participate in testing.
> 
> I could live without the udba feature for a limited test period.
> But I got used to the possibility to install packages in a package layer, fine 
> tune configuration in another layer and live test this on a parallel cluster 
> of 16 machines, without dropping off nfsroot each time.

I see. Your environment is much more complicated than mine.

Alex

> 
> Maybe, part of this funcitonality can be replaced by btrfs snapshotting.
> But nevertheless, it would end up as a comlepte redesign of my cluster setup.
> 
> 
> Wolfgang Rosner
> 
> 
> 
> ############################################################################
> 
> This is how my aufs mounts look like:
> 
> aufs-nfsroot-blade-001 on /cluster/nfs/nfsr/blade-001.crunchnet type aufs 
> (rw,relatime,si=88a4f4d462df4343)
>        0 rw id=64 path=/cluster/node_roots/wheezy-HD-2015_02_21/cow/cow_001
>        1 ro+wh id=65 path=/cluster/node_roots/wheezy-HD-2015_02_21/config
>        2 ro+wh id=66 path=/cluster/node_roots/wheezy-HD-2015_02_21/mask
>        3 ro+wh id=67 path=/cluster/node_roots/wheezy-HD-2015_02_21/pkg_std
>        4 ro id=68 path=/cluster/node_roots/wheezy-HD-2015_02_21/base
>        xino: /tmp/.aufs.xino
> 	:
> 	(repeated 16 times)
> 	:
> aufs-nfsroot-blade-016
> 
> 
> 
> plus 3 admin layers (writable for each intermediary layer)
> 
> 
> 
> root@cruncher:/cluster/etc/scripts/available# mount | grep aufs
> aufs-nfsroot-blade-001 on /cluster/nfs/nfsr/blade-001.crunchnet type aufs 
> (rw,relatime,si=88a4f4d462df4343)
> aufs-nfsroot-blade-002 on /cluster/nfs/nfsr/blade-002.crunchnet type aufs 
> (rw,relatime,si=88a4f4d3fe245343)
> aufs-nfsroot-blade-003 on /cluster/nfs/nfsr/blade-003.crunchnet type aufs 
> (rw,relatime,si=88a4f4d3fe9c5343)
> aufs-nfsroot-blade-004 on /cluster/nfs/nfsr/blade-004.crunchnet type aufs 
> (rw,relatime,si=88a4f4d467b53343)
> aufs-nfsroot-blade-005 on /cluster/nfs/nfsr/blade-005.crunchnet type aufs 
> (rw,relatime,si=88a4f4d47db69343)
> aufs-nfsroot-blade-006 on /cluster/nfs/nfsr/blade-006.crunchnet type aufs 
> (rw,relatime,si=88a4f4d4630bb343)
> aufs-nfsroot-blade-007 on /cluster/nfs/nfsr/blade-007.crunchnet type aufs 
> (rw,relatime,si=88a4f4d4586cc343)
> aufs-nfsroot-blade-008 on /cluster/nfs/nfsr/blade-008.crunchnet type aufs 
> (rw,relatime,si=88a4f4d458689343)
> aufs-nfsroot-blade-009 on /cluster/nfs/nfsr/blade-009.crunchnet type aufs 
> (rw,relatime,si=88a4f4d3fda97343)
> aufs-nfsroot-blade-010 on /cluster/nfs/nfsr/blade-010.crunchnet type aufs 
> (rw,relatime,si=88a4f4d3fe3be343)
> aufs-nfsroot-blade-011 on /cluster/nfs/nfsr/blade-011.crunchnet type aufs 
> (rw,relatime,si=88a4f4d3fe8cb343)
> aufs-nfsroot-blade-012 on /cluster/nfs/nfsr/blade-012.crunchnet type aufs 
> (rw,relatime,si=88a4f4d3fc90c343)
> aufs-nfsroot-blade-013 on /cluster/nfs/nfsr/blade-013.crunchnet type aufs 
> (rw,relatime,si=88a4f4d4622fb343)
> aufs-nfsroot-blade-014 on /cluster/nfs/nfsr/blade-014.crunchnet type aufs 
> (rw,relatime,si=88a4f4d3e2cc8343)
> aufs-nfsroot-blade-015 on /cluster/nfs/nfsr/blade-015.crunchnet type aufs 
> (rw,relatime,si=88a4f4d0ca64e343)
> aufs-nfsroot-blade-016 on /cluster/nfs/nfsr/blade-016.crunchnet type aufs 
> (rw,relatime,si=88a4f4d0ca92e343)
> aufs_pkg_std on /cluster/node_roots/wheezy-HD-2015_02_21/admount-pkg_std type 
> aufs (rw,relatime,si=88a4f4d46dced343)
> aufs_mask on /cluster/node_roots/wheezy-HD-2015_02_21/admount-mask type aufs 
> (rw,relatime,si=88a4f4d3b0a2b343)
> aufs_config on /cluster/node_roots/wheezy-HD-2015_02_21/admount-config type 
> aufs (rw,relatime,si=88a4f4d3fee4c343)
> 
> 
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: One way TCP bottleneck over 6 x Gbit tecl aggregated link // Kernel issue
  2015-03-23  2:33             ` Eric Dumazet
  2015-03-23  9:05               ` Wolfgang Rosner
@ 2015-03-26 19:34               ` Wolfgang Rosner
  2015-03-27 18:46                 ` Wolfgang Rosner
  1 sibling, 1 reply; 13+ messages in thread
From: Wolfgang Rosner @ 2015-03-26 19:34 UTC (permalink / raw)
  To: Eric Dumazet, netdev

Hello, Eric

Am Montag, 23. März 2015 03:33:17 schrieben Sie:
> On Mon, 2015-03-23 at 01:57 +0100, Wolfgang Rosner wrote:
> > Are you ready to assist me on this track?
>
> First check if the problem is already solved, using this git tree ?
>
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git


Does not look like that :-(



I tried both 3.19.2 frome kernel.org and 
a git clone frome above adress from "master" this morning.
$ uname -a
Linux cruncher 4.0.0-rc5+ #1 SMP Thu Mar 26 08:48:08 CET 2015 x86_64 GNU/Linux


Sorry when it took some time.

I've built a workaround for my aufs nfsroot to get rid of that, at least for 
testing.
Both kernels were built w/o aufs patches, so we cannot blame this anymore
(I've learned that people like to do so...)


-----------

I tried sequential run to check for jitter and to allow for adaptation

for i in `seq 10` ; do  iperf -c 192.168.130.225 -t 100 ; done

3.17 / 3.30 / 3.16 / 3.19 / 3.29 / 3.29 / 3.35 / 3.32 / 3.22 / 3.32
Gbits/sec

...full bandwith at shared Link utilisation:

[  4]  0.0-10.0 sec  3.31 GBytes  2.84 Gbits/sec
[  3]  0.0-10.0 sec  3.32 GBytes  2.85 Gbits/sec
[SUM]  0.0-10.0 sec  6.63 GBytes  5.69 Gbits/sec

...full bandwith (5.66 Gbits/sec) in the opposite direction

(Test figures are from davem 4.0.0-rc5+, on 3.19.2 they were quite the same)

------------------------

Tried a look at differences in kernel .config
between the versions where the problem occured

diff /boot/config-3.16.0-0.bpo.4-amd64 /boot/config-3.19.0 | less

there is one occurence of TCP
'> # CONFIG_TCP_CONG_DCTCP is not set

but it should not hurt if it is not set, should it?

All the other stuff does not ring any bell wiht my (very limited) range of 
experience.


So what next?



Wolfgang Rosner

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: One way TCP bottleneck over 6 x Gbit tecl aggregated link // Kernel issue
  2015-03-26 19:34               ` Wolfgang Rosner
@ 2015-03-27 18:46                 ` Wolfgang Rosner
  0 siblings, 0 replies; 13+ messages in thread
From: Wolfgang Rosner @ 2015-03-27 18:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Hello Eric,


I tried to discern the possibilities of Debian patches having solved the 
problem vs, regression bug in recent Kernels.

So I tested:
- Vanilla 3.16.7
- recent Debian 3.19.1

============

results:

- Vanilla 3.16.7
from
https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.16.7.tar.xz
result: really bad, 
~ 500 Mbit /s both inbound and outwards
inbound even with -P24 parallel threads not more than 800 Mbit
outobund with -P12 we are back over 5 Gbit/s again

From that, to mee it looks like the problem was introduced before  3.16.7
but only partly solved since then.

I tried to map vailla versions with the davem fork,
http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/tree/
 but got lost in git / developer workflow .



- recent Debian 3.19.1
same results as with any other recent kernel
	inbound > 5.5 GBit
	outbound ~ 3.2 GBit
	outbound -P 2 > 5 Gbit


So it looks like the debian people have _not_ found the solution, but were 
just lucky enough with their 
 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt4-3~bpo70+1 

not to hit it yet...



Am Donnerstag, 26. März 2015 20:34:05 schrieb Wolfgang Rosner:
> Hello, Eric
>
> Am Montag, 23. März 2015 03:33:17 schrieben Sie:
> > On Mon, 2015-03-23 at 01:57 +0100, Wolfgang Rosner wrote:
> > > Are you ready to assist me on this track?
> >
> > First check if the problem is already solved, using this git tree ?
> >
> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git
>
> Does not look like that :-(
>
>
>
> I tried both 3.19.2 frome kernel.org and
> a git clone frome above adress from "master" this morning.
> $ uname -a
> Linux cruncher 4.0.0-rc5+ #1 SMP Thu Mar 26 08:48:08 CET 2015 x86_64
> GNU/Linux
>
>
> Sorry when it took some time.
>
> I've built a workaround for my aufs nfsroot to get rid of that, at least
> for testing.
> Both kernels were built w/o aufs patches, so we cannot blame this anymore
> (I've learned that people like to do so...)
>
>
> -----------
>
> I tried sequential run to check for jitter and to allow for adaptation
>
> for i in `seq 10` ; do  iperf -c 192.168.130.225 -t 100 ; done
>
> 3.17 / 3.30 / 3.16 / 3.19 / 3.29 / 3.29 / 3.35 / 3.32 / 3.22 / 3.32
> Gbits/sec
>
> ...full bandwith at shared Link utilisation:
>
> [  4]  0.0-10.0 sec  3.31 GBytes  2.84 Gbits/sec
> [  3]  0.0-10.0 sec  3.32 GBytes  2.85 Gbits/sec
> [SUM]  0.0-10.0 sec  6.63 GBytes  5.69 Gbits/sec
>
> ...full bandwith (5.66 Gbits/sec) in the opposite direction
>
> (Test figures are from davem 4.0.0-rc5+, on 3.19.2 they were quite the
> same)
>
> ------------------------
>
> Tried a look at differences in kernel .config
> between the versions where the problem occured
>
> diff /boot/config-3.16.0-0.bpo.4-amd64 /boot/config-3.19.0 | less
>
> there is one occurence of TCP
> '> # CONFIG_TCP_CONG_DCTCP is not set
>
> but it should not hurt if it is not set, should it?
>
> All the other stuff does not ring any bell wiht my (very limited) range of
> experience.
>
>
> So what next?
>
>
>
> Wolfgang Rosner
>
> --


Wolfgang Rosner

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-03-27 18:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-21  0:10 One way TCP bottleneck over 6 x Gbit tecl aggregated link Wolfgang Rosner
2015-03-21  1:15 ` Eric Dumazet
     [not found] ` <201503220912.39912.wrosner@tirnet.de>
     [not found]   ` <1427020159.25985.43.camel@edumazet-glaptop2.roam.corp.google.com>
2015-03-22 15:14     ` Wolfgang Rosner
2015-03-22 16:54       ` Eric Dumazet
2015-03-22 20:47         ` Wolfgang Rosner
2015-03-23  0:57           ` One way TCP bottleneck over 6 x Gbit tecl aggregated link // Kernel issue Wolfgang Rosner
2015-03-23  2:33             ` Eric Dumazet
2015-03-23  9:05               ` Wolfgang Rosner
2015-03-23  9:25                 ` Zimmermann, Alexander
2015-03-23 13:54                   ` Wolfgang Rosner
2015-03-24 12:59                     ` Zimmermann, Alexander
2015-03-26 19:34               ` Wolfgang Rosner
2015-03-27 18:46                 ` Wolfgang Rosner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.