All of lore.kernel.org
 help / color / mirror / Atom feed
* forcedeth driver hangs under heavy load
       [not found]       ` <4BBCA19C.5080204@atlanticlinux.ie>
@ 2010-04-10 23:36         ` Ben Hutchings
  2010-04-12 10:01           ` Bug#572201: " stephen mulcahy
  0 siblings, 1 reply; 30+ messages in thread
From: Ben Hutchings @ 2010-04-10 23:36 UTC (permalink / raw)
  To: netdev, Eric Dumazet, Ayaz Abdulla; +Cc: stephen mulcahy, 572201

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

Stephen Mulcahy reported a regression in forcedeth at
<http://bugs.debian.org/572201>.  The system information and some
diagnostic information can be found there.  Anyone able to help?

Ben.

stephen mulcahy wrote:
> When running linux-image-2.6.32-trunk-amd64, the network stops 
> responding if large amounts of traffic are transmitted/received. Running 
> ifdown eth0 followed by ifup eth0 restores operation of the network. 
> There are no errors relating to this failure logged in /var/log that I 
> could see.
> 
> Downgrading to linux-image-2.6.30-2-amd64 results in a stable network. 
> Not sure if this is a forcedeth specific problem or a general problem in 
> the newer kernel (I have seen problems with forcedeth on other 
> distro/kernel combinations).
> 
> Happy to run further diagnostics to tie this down if you let me know 
> what to run.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Bug#572201: forcedeth driver hangs under heavy load
  2010-04-10 23:36         ` forcedeth driver hangs under heavy load Ben Hutchings
@ 2010-04-12 10:01           ` stephen mulcahy
  2010-04-12 12:39             ` stephen mulcahy
  0 siblings, 1 reply; 30+ messages in thread
From: stephen mulcahy @ 2010-04-12 10:01 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings, Eric Dumazet, Ayaz Abdulla, 572201

Ben Hutchings wrote:
> Stephen Mulcahy reported a regression in forcedeth at
> <http://bugs.debian.org/572201>.  The system information and some
> diagnostic information can be found there.  Anyone able to help?

Incidentally, I also tried the 2.6.33.2 kernel with 
CONFIG_FORCEDETH_NAPI set to "y" to see if that made a difference.

It doesn't - further testing over the weekend saw 6 of 45 machines drop 
off the network with this problem. Nothing in dmesg or system logs. 
Happy to run more tests if someone can advise on what should be run.

-stephen

-- 
Stephen Mulcahy     Atlantic Linux         http://www.atlanticlinux.ie
Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Bug#572201: forcedeth driver hangs under heavy load
  2010-04-12 10:01           ` Bug#572201: " stephen mulcahy
@ 2010-04-12 12:39             ` stephen mulcahy
  2010-04-12 12:47               ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: stephen mulcahy @ 2010-04-12 12:39 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings, Eric Dumazet, Ayaz Abdulla, 572201

stephen mulcahy wrote:
> It doesn't - further testing over the weekend saw 6 of 45 machines drop 
> off the network with this problem. Nothing in dmesg or system logs. 
> Happy to run more tests if someone can advise on what should be run.

I also just tried using the 2.6.30-2-amd64 (Debian) forcedeth kernel 
module while running the 2.6.32-3-amd64 (Debian) kernel and experienced 
the same symptoms.

Not sure if thats any help.

-stephen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Bug#572201: forcedeth driver hangs under heavy load
  2010-04-12 12:39             ` stephen mulcahy
@ 2010-04-12 12:47               ` Eric Dumazet
  2010-04-12 13:05                 ` stephen mulcahy
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2010-04-12 12:47 UTC (permalink / raw)
  To: stephen mulcahy; +Cc: netdev, Ben Hutchings, Ayaz Abdulla, 572201

Le lundi 12 avril 2010 à 13:39 +0100, stephen mulcahy a écrit :
> stephen mulcahy wrote:
> > It doesn't - further testing over the weekend saw 6 of 45 machines drop 
> > off the network with this problem. Nothing in dmesg or system logs. 
> > Happy to run more tests if someone can advise on what should be run.
> 
> I also just tried using the 2.6.30-2-amd64 (Debian) forcedeth kernel 
> module while running the 2.6.32-3-amd64 (Debian) kernel and experienced 
> the same symptoms.
> 
> Not sure if thats any help.
> 

I am not sure I understand. Are you saying that using 2.6.30-2-amd64
kernel also makes your forcedeth adapter being not functional ?

Are both way non functional (RX and TX), or only one side ?






-- 
To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-12 12:47               ` Eric Dumazet
@ 2010-04-12 13:05                 ` stephen mulcahy
  2010-04-12 13:19                   ` stephen mulcahy
  0 siblings, 1 reply; 30+ messages in thread
From: stephen mulcahy @ 2010-04-12 13:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Ben Hutchings, Ayaz Abdulla, 572201

Eric Dumazet wrote:
> Le lundi 12 avril 2010 à 13:39 +0100, stephen mulcahy a écrit :
> I am not sure I understand. Are you saying that using 2.6.30-2-amd64
> kernel also makes your forcedeth adapter being not functional ?

Hi Eric,

If I run my tests with the 2.6.30-2-amd64 kernel the network doesn't 
malfunction.

If I run my tests with the 2.6.32-3-amd64 kernel the network does 
malfunction.

If I take the forcedeth.ko module from the 2.6.30-2-amd64 kernel and 
drop that into /lib/modules/2.6.32-3-amd64/kernel/drivers/net/ and then 
reboot to 2.6.32-3-amd64 and rerun my tests - the network does malfunction.

> Are both way non functional (RX and TX), or only one side ?

Whats the best way of testing this? (tcpdump listening on both hosts and 
then running pings between the systems?)

-stephen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-12 13:05                 ` stephen mulcahy
@ 2010-04-12 13:19                   ` stephen mulcahy
  2010-04-12 15:24                     ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: stephen mulcahy @ 2010-04-12 13:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Ben Hutchings, Ayaz Abdulla, 572201

stephen mulcahy wrote:
>> Are both way non functional (RX and TX), or only one side ?
> 
> Whats the best way of testing this? (tcpdump listening on both hosts and 
> then running pings between the systems?)


stephen mulcahy wrote:
 >> Are both way non functional (RX and TX), or only one side ?
 >
 > Whats the best way of testing this? (tcpdump listening on both hosts and
 > then running pings between the systems?)

On one of the nodes that is in the malfunctioning state (node05), I ran

ssh node20

and grabbed the following output from running tcpdump on node20

root@node20:~# tcpdump host node20 and node05
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
14:12:59.612626 IP node05.webstar.cnet.36295 > node20.ssh: Flags [S], 
seq 3677858646, win 5840, options [mss 1460,sackOK,TS val 1599534 ecr 
0,nop,wscale 7], length 0
14:12:59.612656 IP node20.ssh > node05.webstar.cnet.36295: Flags [S.], 
seq 3610575850, ack 3677858647, win 5792, options [mss 1460,sackOK,TS 
val 1598775 ecr 1599534,nop,wscale 7], length 0
14:12:59.612718 IP node05.webstar.cnet.36295 > node20.ssh: Flags [.], 
ack 1, win 46, options [nop,nop,TS val 1599534 ecr 1598775], length 0
14:12:59.617434 IP node20.ssh > node05.webstar.cnet.36295: Flags [P.], 
seq 1:33, ack 1, win 46, options [nop,nop,TS val 1598776 ecr 1599534], 
length 32
14:12:59.617522 IP node05.webstar.cnet.36295 > node20.ssh: Flags [.], 
ack 33, win 46, options [nop,nop,TS val 1599535 ecr 1598776], length 0
14:12:59.617609 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], 
seq 1:33, ack 33, win 46, options [nop,nop,TS val 1599535 ecr 1598776], 
length 32
14:12:59.820434 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], 
seq 4294936586:4294936618, ack 2620194849, win 46, options [nop,nop,TS 
val 1599586 ecr 1598776], length 32
14:13:00.229069 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], 
seq 4294961734:4294961766, ack 3928358945, win 46, options [nop,nop,TS 
val 1599688 ecr 1598776], length 32
14:13:01.044396 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], 
seq 4294964167:4294964199, ack 410320929, win 46, options [nop,nop,TS 
val 1599892 ecr 1598776], length 32
14:13:02.676308 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], 
seq 1:33, ack 33, win 46, options [nop,nop,TS val 1600300 ecr 1598776], 
length 32
14:13:05.940804 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], 
seq 17294:17326, ack 3045851169, win 46, options [nop,nop,TS val 1601116 
ecr 1598776], length 32
14:13:12.468484 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], 
seq 17294:17326, ack 3045851169, win 46, options [nop,nop,TS val 1602748 
ecr 1598776], length 32
14:13:23.846891 IP node20.ssh > node05.webstar.cnet.36084: Flags [F.], 
seq 2093054475, ack 2175389538, win 46, options [nop,nop,TS val 1604834 
ecr 1575591], length 0
14:13:23.847278 IP node05.webstar.cnet.36084 > node20.ssh: Flags [R], 
seq 2175389538, win 0, length 0
14:13:25.523850 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], 
seq 1:33, ack 33, win 46, options [nop,nop,TS val 1606012 ecr 1598776], 
length 32
14:13:50.127509 IP node20.ssh > node05.webstar.cnet.36143: Flags [F.], 
seq 2526196657, ack 2590340885, win 46, options [nop,nop,TS val 1611404 
ecr 1582161], length 0
14:13:50.127879 IP node05.webstar.cnet.36143 > node20.ssh: Flags [R], 
seq 2590340885, win 0, length 0
14:13:51.633934 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], 
seq 4294963190:4294963222, ack 9830433, win 46, options [nop,nop,TS val 
1612540 ecr 1598776], length 32
14:13:55.125525 ARP, Request who-has node05.webstar.cnet tell node20, 
length 28
14:13:55.125886 ARP, Reply node05.webstar.cnet is-at 00:30:48:ce:dc:02 
(oui Unknown), length 46
14:14:43.855380 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], 
seq 1:33, ack 33, win 46, options [nop,nop,TS val 1625596 ecr 1598776], 
length 32
14:14:48.855143 ARP, Request who-has node20 tell node05.webstar.cnet, 
length 46
14:14:48.855469 ARP, Reply node20 is-at 00:30:48:ce:de:34 (oui Unknown), 
length 28
14:14:59.617675 IP node20.ssh > node05.webstar.cnet.36295: Flags [F.], 
seq 33, ack 1, win 46, options [nop,nop,TS val 1628777 ecr 1599535], 
length 0
14:14:59.618202 IP node05.webstar.cnet.36295 > node20.ssh: Flags [FP.], 
seq 4294959654:4294960446, ack 3930456098, win 46, options [nop,nop,TS 
val 1629536 ecr 1628777], length 792
14:14:59.821527 IP node20.ssh > node05.webstar.cnet.36295: Flags [F.], 
seq 33, ack 1, win 46, options [nop,nop,TS val 1628828 ecr 1599535], 
length 0
14:14:59.821598 IP node05.webstar.cnet.36295 > node20.ssh: Flags [.], 
ack 34, win 46, options [nop,nop,TS val 1629587 ecr 1628828,nop,nop,sack 
1 {33:34}], length 0
^C^
27 packets captured
31 packets received by filter
0 packets dropped by kernel


I then did ifdown and ifup on node05 and again ran

ssh node20

and grabbed the following output from running tcpdump on node20

root@node20:~# tcpdump host node20 and node05
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
14:15:50.626410 IP node05.webstar.cnet.36690 > node20.ssh: Flags [S], 
seq 2044900531, win 5840, options [mss 1460,sackOK,TS val 1642289 ecr 
0,nop,wscale 7], length 0
14:15:50.626441 IP node20.ssh > node05.webstar.cnet.36690: Flags [S.], 
seq 1976694445, ack 2044900532, win 5792, options [mss 1460,sackOK,TS 
val 1641529 ecr 1642289,nop,wscale 7], length 0
14:15:50.626482 IP node05.webstar.cnet.36690 > node20.ssh: Flags [.], 
ack 1, win 46, options [nop,nop,TS val 1642289 ecr 1641529], length 0
14:15:50.631138 IP node20.ssh > node05.webstar.cnet.36690: Flags [P.], 
seq 1:33, ack 1, win 46, options [nop,nop,TS val 1641530 ecr 1642289], 
length 32
14:15:50.631218 IP node05.webstar.cnet.36690 > node20.ssh: Flags [.], 
ack 33, win 46, options [nop,nop,TS val 1642290 ecr 1641530], length 0
14:15:50.631267 IP node05.webstar.cnet.36690 > node20.ssh: Flags [P.], 
seq 1:33, ack 33, win 46, options [nop,nop,TS val 1642290 ecr 1641530], 
length 32
14:15:50.631281 IP node20.ssh > node05.webstar.cnet.36690: Flags [.], 
ack 33, win 46, options [nop,nop,TS val 1641530 ecr 1642290], length 0
14:15:50.631367 IP node05.webstar.cnet.36690 > node20.ssh: Flags [P.], 
seq 33:825, ack 33, win 46, options [nop,nop,TS val 1642290 ecr 
1641530], length 792
14:15:50.631376 IP node20.ssh > node05.webstar.cnet.36690: Flags [.], 
ack 825, win 58, options [nop,nop,TS val 1641530 ecr 1642290], length 0
14:15:50.631808 IP node20.ssh > node05.webstar.cnet.36690: Flags [P.], 
seq 33:817, ack 825, win 58, options [nop,nop,TS val 1641530 ecr 
1642290], length 784
14:15:50.631950 IP node05.webstar.cnet.36690 > node20.ssh: Flags [P.], 
seq 825:849, ack 817, win 58, options [nop,nop,TS val 1642290 ecr 
1641530], length 24
14:15:50.633353 IP node20.ssh > node05.webstar.cnet.36690: Flags [P.], 
seq 817:969, ack 849, win 58, options [nop,nop,TS val 1641530 ecr 
1642290], length 152
14:15:50.633932 IP node05.webstar.cnet.36690 > node20.ssh: Flags [P.], 
seq 849:993, ack 969, win 71, options [nop,nop,TS val 1642291 ecr 
1641530], length 144
14:15:50.637998 IP node20.ssh > node05.webstar.cnet.36690: Flags [P.], 
seq 969:1689, ack 993, win 70, options [nop,nop,TS val 1641532 ecr 
1642291], length 720
14:15:50.676465 IP node05.webstar.cnet.36690 > node20.ssh: Flags [.], 
ack 1689, win 83, options [nop,nop,TS val 1642302 ecr 1641532], length 0
14:16:09.776134 IP node05.webstar.cnet.49671 > node20.50060: Flags [S], 
seq 2348078217, win 5840, options [mss 1460,sackOK,TS val 1647077 ecr 
0,nop,wscale 7], length 0
14:16:09.776498 IP node20.50060 > node05.webstar.cnet.49671: Flags [R.], 
seq 0, ack 2348078218, win 0, length 0
^C
17 packets captured
21 packets received by filter
0 packets dropped by kernel


Does that help?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-12 13:19                   ` stephen mulcahy
@ 2010-04-12 15:24                     ` Eric Dumazet
  2010-04-12 16:11                       ` stephen mulcahy
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2010-04-12 15:24 UTC (permalink / raw)
  To: stephen mulcahy; +Cc: netdev, Ben Hutchings, Ayaz Abdulla, 572201

Le lundi 12 avril 2010 à 14:19 +0100, stephen mulcahy a écrit :

> Does that help?

Well, yes, because it seems a TCP problem.

root@node20:~# tcpdump host node20 and node05
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
14:12:59.612626 IP node05.webstar.cnet.36295 > node20.ssh: Flags [S], seq 3677858646, win 5840, options [mss 1460,sackOK,TS val 1599534 ecr 0,nop,wscale 7], length 0
14:12:59.612656 IP node20.ssh > node05.webstar.cnet.36295: Flags [S.], seq 3610575850, ack 3677858647, win 5792, options [mss 1460,sackOK,TS val 1598775 ecr 1599534,nop,wscale 7], length 0
14:12:59.612718 IP node05.webstar.cnet.36295 > node20.ssh: Flags [.], ack 1, win 46, options [nop,nop,TS val 1599534 ecr 1598775], length 0
14:12:59.617434 IP node20.ssh > node05.webstar.cnet.36295: Flags [P.], seq 1:33, ack 1, win 46, options [nop,nop,TS val 1598776 ecr 1599534], length 32
14:12:59.617522 IP node05.webstar.cnet.36295 > node20.ssh: Flags [.], ack 33, win 46, options [nop,nop,TS val 1599535 ecr 1598776], length 0
14:12:59.617609 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], seq 1:33, ack 33, win 46, options [nop,nop,TS val 1599535 ecr 1598776], length 32

All following xmitted frames are completely out of sync, this makes no sense.

Sequence number went backward.

14:12:59.820434 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], seq 4294936586:4294936618, ack 2620194849, win 46, options [nop,nop,TS val 1599586 ecr 1598776], length 32
14:13:00.229069 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], seq 4294961734:4294961766, ack 3928358945, win 46, options [nop,nop,TS val 1599688 ecr 1598776], length 32
14:13:01.044396 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], seq 4294964167:4294964199, ack 410320929, win 46, options [nop,nop,TS val 1599892 ecr 1598776], length 32


14:13:02.676308 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], seq 1:33, ack 33, win 46, options [nop,nop,TS val 1600300 ecr 1598776], length 32
14:13:05.940804 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], seq 17294:17326, ack 3045851169, win 46, options [nop,nop,TS val 1601116 ecr 1598776], length 32
14:13:12.468484 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], seq 17294:17326, ack 3045851169, win 46, options [nop,nop,TS val 1602748 ecr 1598776], length 32
14:13:25.523850 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], seq 1:33, ack 33, win 46, options [nop,nop,TS val 1606012 ecr 1598776], length 32
14:13:51.633934 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], seq 4294963190:4294963222, ack 9830433, win 46, options [nop,nop,TS val 1612540 ecr 1598776], length 32
14:14:43.855380 IP node05.webstar.cnet.36295 > node20.ssh: Flags [P.], seq 1:33, ack 33, win 46, options [nop,nop,TS val 1625596 ecr 1598776], length 32
14:14:59.617675 IP node20.ssh > node05.webstar.cnet.36295: Flags [F.], seq 33, ack 1, win 46, options [nop,nop,TS val 1628777 ecr 1599535], length 0
14:14:59.618202 IP node05.webstar.cnet.36295 > node20.ssh: Flags [FP.], seq 4294959654:4294960446, ack 3930456098, win 46, options [nop,nop,TS val 1629536 ecr 1628777], length 792
14:14:59.821527 IP node20.ssh > node05.webstar.cnet.36295: Flags [F.], seq 33, ack 1, win 46, options [nop,nop,TS val 1628828 ecr 1599535], length 0
14:14:59.821598 IP node05.webstar.cnet.36295 > node20.ssh: Flags [.], ack 34, win 46, options [nop,nop,TS val 1629587 ecr 1628828,nop,nop,sack 1 {33:34}], length 0

Do you have some netfilters rules ?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-12 15:24                     ` Eric Dumazet
@ 2010-04-12 16:11                       ` stephen mulcahy
  2010-04-12 16:59                         ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: stephen mulcahy @ 2010-04-12 16:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Ben Hutchings, Ayaz Abdulla, 572201

Eric Dumazet wrote:
> Le lundi 12 avril 2010 à 14:19 +0100, stephen mulcahy a écrit :
> 
> Do you have some netfilters rules ?
> 

Hi Eric,

I don't have any netfilters rules:

root@node34:~# for table in filter nat mangle raw; do iptables -t $table 
-L; done
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination


I re-ran this on the 2.6.32 kernel (with the 2.6.32 forcedeth module) 
just in case that was screwing something up.

node33 is in the unresponsive state this time. I'm running tcpdump on 
node34. on node33 I try to ssh to node34 (using ip address of node34). I 
note that I can ping between node33 and node34.

root@node34:~# tcpdump -v host node34 and node33
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 
bytes
17:05:19.622384 IP (tos 0x0, ttl 64, id 21435, offset 0, flags [DF], 
proto TCP (6), length 60)
     node33.webstar.cnet.43653 > node34.ssh: Flags [S], cksum 0xb994 
(correct), seq 1675314077, win 5840, options [mss 1460,sackOK,TS val 
331814 ecr 0,nop,wscale 7], length 0
17:05:19.622754 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto 
TCP (6), length 60)
     node34.ssh > node33.webstar.cnet.43653: Flags [S.], cksum 0x9d81 
(correct), seq 1669769379, ack 1675314078, win 5792, options [mss 
1460,sackOK,TS val 331779 ecr 331814,nop,wscale 7], length 0
17:05:19.622813 IP (tos 0x0, ttl 64, id 21436, offset 0, flags [DF], 
proto TCP (6), length 52)
     node33.webstar.cnet.43653 > node34.ssh: Flags [.], cksum 0xe2bf 
(correct), ack 1, win 46, options [nop,nop,TS val 331814 ecr 331779], 
length 0
17:05:19.627666 IP (tos 0x0, ttl 64, id 47271, offset 0, flags [DF], 
proto TCP (6), length 84)
     node34.ssh > node33.webstar.cnet.43653: Flags [P.], seq 1:33, ack 
1, win 46, options [nop,nop,TS val 331780 ecr 331814], length 32
17:05:19.627748 IP (tos 0x0, ttl 64, id 21437, offset 0, flags [DF], 
proto TCP (6), length 52)
     node33.webstar.cnet.43653 > node34.ssh: Flags [.], cksum 0xe29c 
(correct), ack 33, win 46, options [nop,nop,TS val 331816 ecr 331780], 
length 0
17:05:19.627833 IP (tos 0x0, ttl 64, id 21438, offset 0, flags [DF], 
proto TCP (6), length 84, bad cksum 1f8a (->d189)!)
     node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
23413:23445, ack 2749038625, win 46, options [nop,nop,TS val 331816 ecr 
331780], length 32
17:05:19.831634 IP (tos 0x0, ttl 64, id 21439, offset 0, flags [DF], 
proto TCP (6), length 84, bad cksum d189 (->d188)!)
     node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 1:33, ack 
33, win 46, options [nop,nop,TS val 331867 ecr 331780], length 32
17:05:20.239603 IP (tos 0x0, ttl 64, id 21440, offset 0, flags [DF], 
proto TCP (6), length 84, bad cksum 15c6 (->d187)!)
     node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
30492:30524, ack 809893921, win 46, options [nop,nop,TS val 331969 ecr 
331780], length 32
17:05:21.055534 IP (tos 0x0, ttl 64, id 21441, offset 0, flags [DF], 
proto TCP (6), length 84, bad cksum d187 (->d186)!)
     node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 1:33, ack 
33, win 46, options [nop,nop,TS val 332173 ecr 331780], length 32
17:05:22.687386 IP (tos 0x0, ttl 64, id 21442, offset 0, flags [DF], 
proto TCP (6), length 84, bad cksum d186 (->d185)!)
     node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 1:33, ack 
33, win 46, options [nop,nop,TS val 332581 ecr 331780], length 32
17:05:25.950935 IP (tos 0x0, ttl 64, id 21443, offset 0, flags [DF], 
proto TCP (6), length 84, bad cksum 15c4 (->d184)!)
     node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
30492:30524, ack 809893921, win 46, options [nop,nop,TS val 333397 ecr 
331780], length 32
17:05:32.478527 IP (tos 0x0, ttl 64, id 21444, offset 0, flags [DF], 
proto TCP (6), length 84, bad cksum c01 (->d183)!)
     node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
43997:44029, ack 1311047713, win 46, options [nop,nop,TS val 335029 ecr 
331780], length 32
17:05:45.533370 IP (tos 0x0, ttl 64, id 21445, offset 0, flags [DF], 
proto TCP (6), length 84, bad cksum 23d (->d182)!)
     node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 3348:3380, 
ack 4054450209, win 46, options [nop,nop,TS val 338293 ecr 331780], 
length 32
17:06:08.719187 IP (tos 0x0, ttl 64, id 27660, offset 0, flags [DF], 
proto TCP (6), length 1500, bad cksum 5360 (->b3b3)!)
     node33.webstar.cnet.50060 > node34.35725: Flags [.], seq 
1203473738:1203475186, ack 1191452767, win 54, options [nop,nop,TS val 
344089 ecr 256770], length 1448
17:06:11.643080 IP (tos 0x0, ttl 64, id 21446, offset 0, flags [DF], 
proto TCP (6), length 84, bad cksum e4f2 (->d181)!)
     node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
47331:47363, ack 4110811169, win 46, options [nop,nop,TS val 344821 ecr 
331780], length 32
17:06:13.715233 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
node34 tell node33.webstar.cnet, length 46
17:06:13.715257 ARP, Ethernet (len 6), IPv4 (len 4), Reply node34 is-at 
00:30:48:f0:06:72 (oui Unknown), length 28
17:07:03.866492 IP (tos 0x0, ttl 64, id 21447, offset 0, flags [DF], 
proto TCP (6), length 84, bad cksum b413 (->d180)!)
     node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
28939:28971, ack 1913782305, win 46, options [nop,nop,TS val 357877 ecr 
331780], length 32
17:07:08.862055 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
node34 tell node33.webstar.cnet, length 46
17:07:08.862370 ARP, Ethernet (len 6), IPv4 (len 4), Reply node34 is-at 
00:30:48:f0:06:72 (oui Unknown), length 28
17:07:19.627910 IP (tos 0x0, ttl 64, id 47272, offset 0, flags [DF], 
proto TCP (6), length 52)
     node34.ssh > node33.webstar.cnet.43653: Flags [F.], cksum 0x6d6b 
(correct), seq 33, ack 1, win 46, options [nop,nop,TS val 361780 ecr 
331816], length 0
17:07:19.628403 IP (tos 0x0, ttl 64, id 21448, offset 0, flags [DF], 
proto TCP (6), length 844, bad cksum aa4d (->ce87)!)
     node33.webstar.cnet.43653 > node34.ssh: Flags [FP.], seq 
20399:21191, ack 2356871202, win 46, options [nop,nop,TS val 361818 ecr 
361780], length 792
17:07:19.833456 IP (tos 0x0, ttl 64, id 47273, offset 0, flags [DF], 
proto TCP (6), length 52)
     node34.ssh > node33.webstar.cnet.43653: Flags [F.], cksum 0x6d37 
(correct), seq 33, ack 1, win 46, options [nop,nop,TS val 361832 ecr 
331816], length 0
17:07:19.833517 IP (tos 0x0, ttl 64, id 21449, offset 0, flags [DF], 
proto TCP (6), length 64)
     node33.webstar.cnet.43653 > node34.ssh: Flags [.], cksum 0xa5e9 
(correct), ack 34, win 46, options [nop,nop,TS val 361870 ecr 
361832,nop,nop,sack 1 {33:34}], length 0

At this point, I see a "Connection closed by 10.141.0.34" message on 
node33 (from where I am attempting to ssh).

Again, if I ifdown on node33 and ifup again - I can then see from node33 
to node34 without problems.

-stephen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-12 16:11                       ` stephen mulcahy
@ 2010-04-12 16:59                         ` Eric Dumazet
  2010-04-13 10:03                           ` stephen mulcahy
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2010-04-12 16:59 UTC (permalink / raw)
  To: stephen mulcahy; +Cc: netdev, Ben Hutchings, Ayaz Abdulla, 572201

Le lundi 12 avril 2010 à 17:11 +0100, stephen mulcahy a écrit :
> Eric Dumazet wrote:
> > Le lundi 12 avril 2010 à 14:19 +0100, stephen mulcahy a écrit :
> > 
> > Do you have some netfilters rules ?
> > 
> 
> Hi Eric,
> 
> I don't have any netfilters rules:
> 
> root@node34:~# for table in filter nat mangle raw; do iptables -t $table 
> -L; done
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> Chain PREROUTING (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain POSTROUTING (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> Chain PREROUTING (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain POSTROUTING (policy ACCEPT)
> target     prot opt source               destination
> Chain PREROUTING (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> 
> 
> I re-ran this on the 2.6.32 kernel (with the 2.6.32 forcedeth module) 
> just in case that was screwing something up.
> 
> node33 is in the unresponsive state this time. I'm running tcpdump on 
> node34. on node33 I try to ssh to node34 (using ip address of node34). I 
> note that I can ping between node33 and node34.
> 
> root@node34:~# tcpdump -v host node34 and node33
> tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 
> bytes
> 17:05:19.622384 IP (tos 0x0, ttl 64, id 21435, offset 0, flags [DF], 
> proto TCP (6), length 60)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [S], cksum 0xb994 
> (correct), seq 1675314077, win 5840, options [mss 1460,sackOK,TS val 
> 331814 ecr 0,nop,wscale 7], length 0
> 17:05:19.622754 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto 
> TCP (6), length 60)
>      node34.ssh > node33.webstar.cnet.43653: Flags [S.], cksum 0x9d81 
> (correct), seq 1669769379, ack 1675314078, win 5792, options [mss 
> 1460,sackOK,TS val 331779 ecr 331814,nop,wscale 7], length 0
> 17:05:19.622813 IP (tos 0x0, ttl 64, id 21436, offset 0, flags [DF], 
> proto TCP (6), length 52)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [.], cksum 0xe2bf 
> (correct), ack 1, win 46, options [nop,nop,TS val 331814 ecr 331779], 
> length 0
> 17:05:19.627666 IP (tos 0x0, ttl 64, id 47271, offset 0, flags [DF], 
> proto TCP (6), length 84)
>      node34.ssh > node33.webstar.cnet.43653: Flags [P.], seq 1:33, ack 
> 1, win 46, options [nop,nop,TS val 331780 ecr 331814], length 32
> 17:05:19.627748 IP (tos 0x0, ttl 64, id 21437, offset 0, flags [DF], 
> proto TCP (6), length 52)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [.], cksum 0xe29c 
> (correct), ack 33, win 46, options [nop,nop,TS val 331816 ecr 331780], 
> length 0
> 17:05:19.627833 IP (tos 0x0, ttl 64, id 21438, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum 1f8a (->d189)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
> 23413:23445, ack 2749038625, win 46, options [nop,nop,TS val 331816 ecr 
> 331780], length 32
> 17:05:19.831634 IP (tos 0x0, ttl 64, id 21439, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum d189 (->d188)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 1:33, ack 
> 33, win 46, options [nop,nop,TS val 331867 ecr 331780], length 32
> 17:05:20.239603 IP (tos 0x0, ttl 64, id 21440, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum 15c6 (->d187)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
> 30492:30524, ack 809893921, win 46, options [nop,nop,TS val 331969 ecr 
> 331780], length 32
> 17:05:21.055534 IP (tos 0x0, ttl 64, id 21441, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum d187 (->d186)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 1:33, ack 
> 33, win 46, options [nop,nop,TS val 332173 ecr 331780], length 32
> 17:05:22.687386 IP (tos 0x0, ttl 64, id 21442, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum d186 (->d185)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 1:33, ack 
> 33, win 46, options [nop,nop,TS val 332581 ecr 331780], length 32
> 17:05:25.950935 IP (tos 0x0, ttl 64, id 21443, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum 15c4 (->d184)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
> 30492:30524, ack 809893921, win 46, options [nop,nop,TS val 333397 ecr 
> 331780], length 32
> 17:05:32.478527 IP (tos 0x0, ttl 64, id 21444, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum c01 (->d183)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
> 43997:44029, ack 1311047713, win 46, options [nop,nop,TS val 335029 ecr 
> 331780], length 32
> 17:05:45.533370 IP (tos 0x0, ttl 64, id 21445, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum 23d (->d182)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 3348:3380, 
> ack 4054450209, win 46, options [nop,nop,TS val 338293 ecr 331780], 
> length 32
> 17:06:08.719187 IP (tos 0x0, ttl 64, id 27660, offset 0, flags [DF], 
> proto TCP (6), length 1500, bad cksum 5360 (->b3b3)!)
>      node33.webstar.cnet.50060 > node34.35725: Flags [.], seq 
> 1203473738:1203475186, ack 1191452767, win 54, options [nop,nop,TS val 
> 344089 ecr 256770], length 1448
> 17:06:11.643080 IP (tos 0x0, ttl 64, id 21446, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum e4f2 (->d181)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
> 47331:47363, ack 4110811169, win 46, options [nop,nop,TS val 344821 ecr 
> 331780], length 32
> 17:06:13.715233 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
> node34 tell node33.webstar.cnet, length 46
> 17:06:13.715257 ARP, Ethernet (len 6), IPv4 (len 4), Reply node34 is-at 
> 00:30:48:f0:06:72 (oui Unknown), length 28
> 17:07:03.866492 IP (tos 0x0, ttl 64, id 21447, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum b413 (->d180)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
> 28939:28971, ack 1913782305, win 46, options [nop,nop,TS val 357877 ecr 
> 331780], length 32
> 17:07:08.862055 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
> node34 tell node33.webstar.cnet, length 46
> 17:07:08.862370 ARP, Ethernet (len 6), IPv4 (len 4), Reply node34 is-at 
> 00:30:48:f0:06:72 (oui Unknown), length 28
> 17:07:19.627910 IP (tos 0x0, ttl 64, id 47272, offset 0, flags [DF], 
> proto TCP (6), length 52)
>      node34.ssh > node33.webstar.cnet.43653: Flags [F.], cksum 0x6d6b 
> (correct), seq 33, ack 1, win 46, options [nop,nop,TS val 361780 ecr 
> 331816], length 0
> 17:07:19.628403 IP (tos 0x0, ttl 64, id 21448, offset 0, flags [DF], 
> proto TCP (6), length 844, bad cksum aa4d (->ce87)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [FP.], seq 
> 20399:21191, ack 2356871202, win 46, options [nop,nop,TS val 361818 ecr 
> 361780], length 792
> 17:07:19.833456 IP (tos 0x0, ttl 64, id 47273, offset 0, flags [DF], 
> proto TCP (6), length 52)
>      node34.ssh > node33.webstar.cnet.43653: Flags [F.], cksum 0x6d37 
> (correct), seq 33, ack 1, win 46, options [nop,nop,TS val 361832 ecr 
> 331816], length 0
> 17:07:19.833517 IP (tos 0x0, ttl 64, id 21449, offset 0, flags [DF], 
> proto TCP (6), length 64)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [.], cksum 0xa5e9 
> (correct), ack 34, win 46, options [nop,nop,TS val 361870 ecr 
> 361832,nop,nop,sack 1 {33:34}], length 0
> 
> At this point, I see a "Connection closed by 10.141.0.34" message on 
> node33 (from where I am attempting to ssh).
> 
> Again, if I ifdown on node33 and ifup again - I can then see from node33 
> to node34 without problems.
> 

OK it seems forcedeth has problem with checksums ?

Try to change "ethtool -k eth0" settings ?

ethtool -K eth0 tso off tx off




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-12 16:59                         ` Eric Dumazet
@ 2010-04-13 10:03                           ` stephen mulcahy
  2010-04-13 10:49                             ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: stephen mulcahy @ 2010-04-13 10:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Ben Hutchings, Ayaz Abdulla, 572201

Eric Dumazet wrote:
> OK it seems forcedeth has problem with checksums ?
> 
> Try to change "ethtool -k eth0" settings ?
> 
> ethtool -K eth0 tso off tx off

Yes, that makes an unresponsive system responsive again immediately, nice!

Should the driver default to disabling this until we problem is corrected?

-stephen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 10:03                           ` stephen mulcahy
@ 2010-04-13 10:49                             ` Eric Dumazet
  2010-04-13 11:00                               ` stephen mulcahy
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2010-04-13 10:49 UTC (permalink / raw)
  To: stephen mulcahy; +Cc: netdev, Ben Hutchings, Ayaz Abdulla, 572201

Le mardi 13 avril 2010 à 11:03 +0100, stephen mulcahy a écrit :
> Eric Dumazet wrote:
> > OK it seems forcedeth has problem with checksums ?
> > 
> > Try to change "ethtool -k eth0" settings ?
> > 
> > ethtool -K eth0 tso off tx off
> 
> Yes, that makes an unresponsive system responsive again immediately, nice!
> 
> Should the driver default to disabling this until we problem is corrected?
> 
> -stephen

Both flags need to be disabled, or only one is OK ?




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 10:49                             ` Eric Dumazet
@ 2010-04-13 11:00                               ` stephen mulcahy
  2010-04-13 12:04                                 ` Ben Hutchings
  0 siblings, 1 reply; 30+ messages in thread
From: stephen mulcahy @ 2010-04-13 11:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Ben Hutchings, Ayaz Abdulla, 572201

Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 11:03 +0100, stephen mulcahy a écrit :
>> Eric Dumazet wrote:
>>> OK it seems forcedeth has problem with checksums ?
>>>
>>> Try to change "ethtool -k eth0" settings ?
>>>
>>> ethtool -K eth0 tso off tx off
>> Yes, that makes an unresponsive system responsive again immediately, nice!
>>
>> Should the driver default to disabling this until we problem is corrected?
>>
>> -stephen
> 
> Both flags need to be disabled, or only one is OK ?

ethtool -K eth0 tx off

fixes the problem (without tso)

but running

ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off

seems to indicate that tso is also disabled by this - does that sound 
correct?

-stephen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 11:00                               ` stephen mulcahy
@ 2010-04-13 12:04                                 ` Ben Hutchings
  2010-04-13 14:27                                   ` stephen mulcahy
  0 siblings, 1 reply; 30+ messages in thread
From: Ben Hutchings @ 2010-04-13 12:04 UTC (permalink / raw)
  To: stephen mulcahy; +Cc: Eric Dumazet, netdev, Ben Hutchings, Ayaz Abdulla, 572201

On Tue, 2010-04-13 at 12:00 +0100, stephen mulcahy wrote:
> Eric Dumazet wrote:
> > Le mardi 13 avril 2010 à 11:03 +0100, stephen mulcahy a écrit :
> >> Eric Dumazet wrote:
> >>> OK it seems forcedeth has problem with checksums ?
> >>>
> >>> Try to change "ethtool -k eth0" settings ?
> >>>
> >>> ethtool -K eth0 tso off tx off
> >> Yes, that makes an unresponsive system responsive again immediately, nice!
> >>
> >> Should the driver default to disabling this until we problem is corrected?
> >>
> >> -stephen
> > 
> > Both flags need to be disabled, or only one is OK ?
> 
> ethtool -K eth0 tx off
> 
> fixes the problem (without tso)
> 
> but running
> 
> ethtool -k eth0
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: off
> scatter-gather: off
> tcp-segmentation-offload: off
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: off
> large-receive-offload: off
> 
> seems to indicate that tso is also disabled by this - does that sound 
> correct?

That's correct - TSO requires TX offload.  What happens if you only turn
off TSO?

Ben. (wearing another hat)

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 12:04                                 ` Ben Hutchings
@ 2010-04-13 14:27                                   ` stephen mulcahy
  2010-04-13 14:42                                     ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: stephen mulcahy @ 2010-04-13 14:27 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Eric Dumazet, netdev, Ben Hutchings, Ayaz Abdulla, 572201

Ok, I've tried both of the following with my reproducer

1. ethtool -K eth0 tso off

RESULT: reproducer causes multiple hosts to be come unresponsive on 
first run.

2. ethtool -K eth0 tx off

RESULT: reproducer runs three times without any hosts becoming unresponsive.

-stephen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 14:27                                   ` stephen mulcahy
@ 2010-04-13 14:42                                     ` Eric Dumazet
  2010-04-13 14:49                                       ` stephen mulcahy
  2010-04-13 21:43                                       ` David Miller
  0 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2010-04-13 14:42 UTC (permalink / raw)
  To: stephen mulcahy
  Cc: Ben Hutchings, netdev, Ben Hutchings, Ayaz Abdulla, 572201

Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit :
> Ok, I've tried both of the following with my reproducer
> 
> 1. ethtool -K eth0 tso off
> 
> RESULT: reproducer causes multiple hosts to be come unresponsive on 
> first run.
> 
> 2. ethtool -K eth0 tx off
> 
> RESULT: reproducer runs three times without any hosts becoming unresponsive.
> 
> -stephen

Thanks Stephen !

Now some brave fouls to check the 6410 lines of this driver ? ;)

Question of the day : Why TSO is broken in forcedeth ?
Is it generically broken or is it broken for specific NICS ?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 14:42                                     ` Eric Dumazet
@ 2010-04-13 14:49                                       ` stephen mulcahy
  2010-04-13 15:00                                         ` stephen mulcahy
  2010-04-13 15:05                                         ` Eric Dumazet
  2010-04-13 21:43                                       ` David Miller
  1 sibling, 2 replies; 30+ messages in thread
From: stephen mulcahy @ 2010-04-13 14:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Hutchings, netdev, Ben Hutchings, Ayaz Abdulla, 572201

Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit :
>> Ok, I've tried both of the following with my reproducer
>>
>> 1. ethtool -K eth0 tso off
>>
>> RESULT: reproducer causes multiple hosts to be come unresponsive on 
>> first run.
>>
>> 2. ethtool -K eth0 tx off
>>
>> RESULT: reproducer runs three times without any hosts becoming unresponsive.
>>
>> -stephen
> 
> Thanks Stephen !
> 
> Now some brave fouls to check the 6410 lines of this driver ? ;)
> 
> Question of the day : Why TSO is broken in forcedeth ?
> Is it generically broken or is it broken for specific NICS ?
> 

Actually, it is only when tx-checksumming is turned off that the problem 
  doesn't occur (so I'm not sure TSO is the problem).

Additionally, a google also turns up this existing Debian bug 
http://bugs.debian.org/506419 which seems to be related.

-stephen


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 14:49                                       ` stephen mulcahy
@ 2010-04-13 15:00                                         ` stephen mulcahy
  2010-04-13 15:05                                         ` Eric Dumazet
  1 sibling, 0 replies; 30+ messages in thread
From: stephen mulcahy @ 2010-04-13 15:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Hutchings, netdev, Ben Hutchings, Ayaz Abdulla, 572201

stephen mulcahy wrote:
>> Now some brave fouls to check the 6410 lines of this driver ? ;)
>>
>> Question of the day : Why TSO is broken in forcedeth ?
>> Is it generically broken or is it broken for specific NICS ?
>>
> 
> Actually, it is only when tx-checksumming is turned off that the problem 
>  doesn't occur (so I'm not sure TSO is the problem).
> 
> Additionally, a google also turns up this existing Debian bug 
> http://bugs.debian.org/506419 which seems to be related.

As mentioned in the original Debian bug - I can reproduce this by 
running Hadoop[1] TeraSort[2] but I haven't identified a simpler 
reproducer. I tried to recreate this with iperf and ping -f but neither 
helped - it may be that the problem only occurs when systems are passing 
large amounts of traffic and have very high cpu utilisation (when 
running the Hadoop TeraSort all 8 cores run at 70-100% utilisation as 
measure with htop - I plan to instrument the nodes with something like 
Zabbix or Ganglia but it hasn't happened yet).

-stephen

[1] http://hadoop.apache.org/
[2] 
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 14:49                                       ` stephen mulcahy
  2010-04-13 15:00                                         ` stephen mulcahy
@ 2010-04-13 15:05                                         ` Eric Dumazet
  2010-04-13 15:08                                           ` stephen mulcahy
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2010-04-13 15:05 UTC (permalink / raw)
  To: stephen mulcahy
  Cc: Ben Hutchings, netdev, Ben Hutchings, Ayaz Abdulla, 572201

Le mardi 13 avril 2010 à 15:49 +0100, stephen mulcahy a écrit :
> Eric Dumazet wrote:
> > Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit :
> >> Ok, I've tried both of the following with my reproducer
> >>
> >> 1. ethtool -K eth0 tso off
> >>
> >> RESULT: reproducer causes multiple hosts to be come unresponsive on 
> >> first run.
> >>
> >> 2. ethtool -K eth0 tx off
> >>
> >> RESULT: reproducer runs three times without any hosts becoming unresponsive.
> >>
> >> -stephen
> > 
> > Thanks Stephen !
> > 
> > Now some brave fouls to check the 6410 lines of this driver ? ;)
> > 
> > Question of the day : Why TSO is broken in forcedeth ?
> > Is it generically broken or is it broken for specific NICS ?
> > 
> 
> Actually, it is only when tx-checksumming is turned off that the problem 
>   doesn't occur (so I'm not sure TSO is the problem).
> 
> Additionally, a google also turns up this existing Debian bug 
> http://bugs.debian.org/506419 which seems to be related.
> 
> -stephen
> 

I am scratching my head, but I thought you told me that

ethtool -K eth0 tso off
ethtool -K eth0 tx on 

was working ?

And, given that when you perform "ethtool -K eth0 tx off", TSO is
automatically off...

tx checksums work, TSO doesnt work, therefore TSO is broken ?




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 15:05                                         ` Eric Dumazet
@ 2010-04-13 15:08                                           ` stephen mulcahy
  2010-04-13 15:22                                             ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: stephen mulcahy @ 2010-04-13 15:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Hutchings, netdev, Ben Hutchings, Ayaz Abdulla, 572201

Eric Dumazet wrote:

> I am scratching my head, but I thought you told me that
> 
> ethtool -K eth0 tso off
> ethtool -K eth0 tx on 
> 
> was working ?

No, sorry for the confusion.

ethtool -K eth0 tx off

fixes the problem.


Setting only

ethtool -K eth0 tso off
ethtool -K eth0 tx on

still results in failures.

-stephen


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 15:08                                           ` stephen mulcahy
@ 2010-04-13 15:22                                             ` Eric Dumazet
  2010-04-13 15:25                                               ` stephen mulcahy
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2010-04-13 15:22 UTC (permalink / raw)
  To: stephen mulcahy
  Cc: Ben Hutchings, netdev, Ben Hutchings, Ayaz Abdulla, 572201

Le mardi 13 avril 2010 à 16:08 +0100, stephen mulcahy a écrit :
> Eric Dumazet wrote:
> 
> > I am scratching my head, but I thought you told me that
> > 
> > ethtool -K eth0 tso off
> > ethtool -K eth0 tx on 
> > 
> > was working ?
> 
> No, sorry for the confusion.
> 
> ethtool -K eth0 tx off
> 
> fixes the problem.
> 
> 
> Setting only
> 
> ethtool -K eth0 tso off
> ethtool -K eth0 tx on
> 
> still results in failures.

OK, thanks for clarification.

Last question, did you tried a vanilla kernel, aka 2.6.33.2 for
example ?




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 15:22                                             ` Eric Dumazet
@ 2010-04-13 15:25                                               ` stephen mulcahy
  2010-04-13 20:01                                                 ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: stephen mulcahy @ 2010-04-13 15:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Hutchings, netdev, Ben Hutchings, Ayaz Abdulla, 572201

Eric Dumazet wrote:
> OK, thanks for clarification.
> 
> Last question, did you tried a vanilla kernel, aka 2.6.33.2 for
> example ?

I built a Debian package from the vanilla 2.6.33.2 and installed that on 
all nodes and tried my reproducer with the same results - nodes becoming 
unresponsive.

I didn't try changing the tso and tx settings with the 2.6.33.2 kernel 
though. Let me know if that would be useful (and/or if there is another 
kernel that you would like me to test with) and I'll try to fit it in.

Thanks again for your help.

-stephen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 15:25                                               ` stephen mulcahy
@ 2010-04-13 20:01                                                 ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2010-04-13 20:01 UTC (permalink / raw)
  To: stephen mulcahy
  Cc: Ben Hutchings, netdev, Ben Hutchings, Ayaz Abdulla, 572201

Le mardi 13 avril 2010 à 16:25 +0100, stephen mulcahy a écrit :
> Eric Dumazet wrote:
> > OK, thanks for clarification.
> > 
> > Last question, did you tried a vanilla kernel, aka 2.6.33.2 for
> > example ?
> 
> I built a Debian package from the vanilla 2.6.33.2 and installed that on 
> all nodes and tried my reproducer with the same results - nodes becoming 
> unresponsive.
> 
> I didn't try changing the tso and tx settings with the 2.6.33.2 kernel 
> though. Let me know if that would be useful (and/or if there is another 
> kernel that you would like me to test with) and I'll try to fit it in.
> 

I tried 2.6.34-rc4 (64bits) on an old machine I had lying at home.



00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
	Subsystem: ASUSTeK Computer Inc. K8N4-E or A8N-E Mainboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
	Latency: 0 (250ns min, 5000ns max)
	Interrupt: pin A routed to IRQ 21
	Region 0: Memory at d4000000 (32-bit, non-prefetchable) [size=4K]
	Region 1: I/O ports at b000 [size=8]
	Capabilities: [44] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable+ DSel=0 DScale=0 PME-
	Kernel driver in use: forcedeth
	Kernel modules: forcedeth

I could not reproduce the problem you have.

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 31
model name	: AMD Athlon(tm) 64 Processor 3200+
stepping	: 0
cpu MHz		: 1000.000
cache size	: 512 KB
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good lahf_lm
bogomips	: 2010.09
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: ts fid vid ttp


RAM : 3 Gbytes 

Only strange thing I noticed is ethtool -S results with an insane tx_broadcast

# ethtool -S eth1
NIC statistics:
     tx_bytes: 90388
     tx_zero_rexmt: 348
     tx_one_rexmt: 0
     tx_many_rexmt: 0
     tx_late_collision: 0
     tx_fifo_errors: 0
     tx_carrier_errors: 0
     tx_excess_deferral: 0
     tx_retry_error: 0
     rx_frame_error: 0
     rx_extra_byte: 0
     rx_late_collision: 0
     rx_runt: 0
     rx_frame_too_long: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_align_error: 0
     rx_length_error: 0
     rx_unicast: 413
     rx_multicast: 22
     rx_broadcast: 2
     rx_packets: 437
     rx_errors_total: 0
     tx_errors_total: 0
     tx_deferral: 718
     tx_packets: 718
     rx_bytes: 718
     tx_pause: 718
     rx_pause: 718
     rx_drop_frame: 718
     tx_unicast: 15748
     tx_multicast: 5552
     tx_broadcast: 115174309658

[root@localhost ~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:11:D8:9A:6D:06  
          inet adr:192.168.99.99  Bcast:192.168.99.255  Masque:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:466 errors:0 dropped:0 overruns:0 frame:0
          TX packets:354 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 lg file transmission:1000 
          RX bytes:50751 (49.5 KiB)  TX bytes:92974 (90.7 KiB)
          Interruption:21 Adresse de base:0x2000 

[root@localhost ~]# grep eth1 /proc/interrupts 
 21:        954   IO-APIC-fasteoi   eth1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 14:42                                     ` Eric Dumazet
  2010-04-13 14:49                                       ` stephen mulcahy
@ 2010-04-13 21:43                                       ` David Miller
  2010-04-13 21:46                                         ` Eric Dumazet
  2010-04-14  5:31                                         ` Bug#572201: [PATCH] forcedeth: fix tx limit2 flag check Ayaz Abdulla
  1 sibling, 2 replies; 30+ messages in thread
From: David Miller @ 2010-04-13 21:43 UTC (permalink / raw)
  To: eric.dumazet; +Cc: smulcahy, bhutchings, netdev, ben, aabdulla, 572201

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Apr 2010 16:42:21 +0200

> Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit :
>> Ok, I've tried both of the following with my reproducer
>> 
>> 1. ethtool -K eth0 tso off
>> 
>> RESULT: reproducer causes multiple hosts to be come unresponsive on 
>> first run.
>> 
>> 2. ethtool -K eth0 tx off
>> 
>> RESULT: reproducer runs three times without any hosts becoming unresponsive.
>> 
>> -stephen
> 
> Thanks Stephen !
> 
> Now some brave fouls to check the 6410 lines of this driver ? ;)
> 
> Question of the day : Why TSO is broken in forcedeth ?
> Is it generically broken or is it broken for specific NICS ?

Do you really come to the conclusion that TSO is broken with the above
test results?

I would conclude that there is a TX checksumming issue, since merely
turning TSO off does not fix the problem whereas turning TX
checksumming off does.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-13 21:43                                       ` David Miller
@ 2010-04-13 21:46                                         ` Eric Dumazet
  2010-04-14  5:33                                           ` Bug#572201: " Ayaz Abdulla
  2010-04-14  5:31                                         ` Bug#572201: [PATCH] forcedeth: fix tx limit2 flag check Ayaz Abdulla
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2010-04-13 21:46 UTC (permalink / raw)
  To: David Miller; +Cc: smulcahy, bhutchings, netdev, ben, aabdulla, 572201

Le mardi 13 avril 2010 à 14:43 -0700, David Miller a écrit :
> Do you really come to the conclusion that TSO is broken with the above
> test results?
> 
> I would conclude that there is a TX checksumming issue, since merely
> turning TSO off does not fix the problem whereas turning TX
> checksumming off does.

Indeed, we clarified the point and it is a TX checksum issue.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-14  5:33                                           ` Bug#572201: " Ayaz Abdulla
@ 2010-04-14  1:41                                             ` David Miller
  2010-04-14 14:30                                             ` stephen mulcahy
  1 sibling, 0 replies; 30+ messages in thread
From: David Miller @ 2010-04-14  1:41 UTC (permalink / raw)
  To: aabdulla; +Cc: eric.dumazet, smulcahy, bhutchings, netdev, ben, 572201

From: Ayaz Abdulla <aabdulla@nvidia.com>
Date: Wed, 14 Apr 2010 01:33:15 -0400

> Attached fix has been submitted to netdev.

Thanks!

I apply this soon.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Bug#572201: [PATCH] forcedeth: fix tx limit2 flag check
  2010-04-13 21:43                                       ` David Miller
  2010-04-13 21:46                                         ` Eric Dumazet
@ 2010-04-14  5:31                                         ` Ayaz Abdulla
  2010-04-14 10:14                                           ` stephen mulcahy
  1 sibling, 1 reply; 30+ messages in thread
From: Ayaz Abdulla @ 2010-04-14  5:31 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, smulcahy, bhutchings, netdev, ben, 572201

[-- Attachment #1: Type: text/plain, Size: 247 bytes --]

This patch fixes the TX_LIMIT feature flag. The previous logic check for 
TX_LIMIT2 also took into account a device that only had TX_LIMIT set.

Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>

This is a fix for bug 572201 @ bugs.debian.org





[-- Attachment #2: patch-forcedeth-tx-limit2-fix --]
[-- Type: text/plain, Size: 462 bytes --]

--- old/drivers/net/forcedeth.c	2010-04-14 01:18:51.000000000 -0400
+++ new/drivers/net/forcedeth.c	2010-04-14 01:20:40.000000000 -0400
@@ -5901,7 +5901,7 @@
 	/* Limit the number of tx's outstanding for hw bug */
 	if (id->driver_data & DEV_NEED_TX_LIMIT) {
 		np->tx_limit = 1;
-		if ((id->driver_data & DEV_NEED_TX_LIMIT2) &&
+		if (((id->driver_data & DEV_NEED_TX_LIMIT2) == DEV_NEED_TX_LIMIT2) &&
 		    pci_dev->revision >= 0xA2)
 			np->tx_limit = 0;
 	}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Bug#572201: forcedeth driver hangs under heavy load
  2010-04-13 21:46                                         ` Eric Dumazet
@ 2010-04-14  5:33                                           ` Ayaz Abdulla
  2010-04-14  1:41                                             ` David Miller
  2010-04-14 14:30                                             ` stephen mulcahy
  0 siblings, 2 replies; 30+ messages in thread
From: Ayaz Abdulla @ 2010-04-14  5:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, smulcahy, bhutchings, netdev, ben, 572201

[-- Attachment #1: Type: text/plain, Size: 484 bytes --]

Attached fix has been submitted to netdev.

Ayaz


Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 14:43 -0700, David Miller a écrit :
> 
>>Do you really come to the conclusion that TSO is broken with the above
>>test results?
>>
>>I would conclude that there is a TX checksumming issue, since merely
>>turning TSO off does not fix the problem whereas turning TX
>>checksumming off does.
> 
> 
> Indeed, we clarified the point and it is a TX checksum issue.
> 
> 

[-- Attachment #2: patch-forcedeth-tx-limit2-fix --]
[-- Type: text/plain, Size: 462 bytes --]

--- old/drivers/net/forcedeth.c	2010-04-14 01:18:51.000000000 -0400
+++ new/drivers/net/forcedeth.c	2010-04-14 01:20:40.000000000 -0400
@@ -5901,7 +5901,7 @@
 	/* Limit the number of tx's outstanding for hw bug */
 	if (id->driver_data & DEV_NEED_TX_LIMIT) {
 		np->tx_limit = 1;
-		if ((id->driver_data & DEV_NEED_TX_LIMIT2) &&
+		if (((id->driver_data & DEV_NEED_TX_LIMIT2) == DEV_NEED_TX_LIMIT2) &&
 		    pci_dev->revision >= 0xA2)
 			np->tx_limit = 0;
 	}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] forcedeth: fix tx limit2 flag check
  2010-04-14  5:31                                         ` Bug#572201: [PATCH] forcedeth: fix tx limit2 flag check Ayaz Abdulla
@ 2010-04-14 10:14                                           ` stephen mulcahy
  0 siblings, 0 replies; 30+ messages in thread
From: stephen mulcahy @ 2010-04-14 10:14 UTC (permalink / raw)
  To: Ayaz Abdulla; +Cc: David Miller, eric.dumazet, bhutchings, netdev, ben, 572201

Ayaz Abdulla wrote:
> This patch fixes the TX_LIMIT feature flag. The previous logic check for 
> TX_LIMIT2 also took into account a device that only had TX_LIMIT set.
> 
> Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
> 
> This is a fix for bug 572201 @ bugs.debian.org

Hi,

Thanks! I'll rebuild my Debian kernel with this and run a test today.

-stephen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
  2010-04-14  5:33                                           ` Bug#572201: " Ayaz Abdulla
  2010-04-14  1:41                                             ` David Miller
@ 2010-04-14 14:30                                             ` stephen mulcahy
  1 sibling, 0 replies; 30+ messages in thread
From: stephen mulcahy @ 2010-04-14 14:30 UTC (permalink / raw)
  To: Ayaz Abdulla; +Cc: Eric Dumazet, David Miller, bhutchings, netdev, ben, 572201

Ayaz Abdulla wrote:
> Attached fix has been submitted to netdev.

I've run my reproducer with this patch applied to be Debian 2.6.32 
kernel and so far the problem with nodes becoming unresponsive hasn't 
occurred.

NIC settings were left the default so this looks positive

root@node23:~# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off

Thanks!

-stephen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: forcedeth driver hangs under heavy load
@ 2010-04-13 17:22 Xose Vazquez Perez
  0 siblings, 0 replies; 30+ messages in thread
From: Xose Vazquez Perez @ 2010-04-13 17:22 UTC (permalink / raw)
  To: netdev

stephen mulcahy  wrote:

> running Hadoop[1] TeraSort[2] but I haven't identified a simpler 
> reproducer. I tried to recreate this with iperf and ping -f but neither 
> helped - it may be that the problem only occurs when systems are passing 
> large amounts of traffic and have very high cpu utilisation (when 

Did you try ISIC(IP Stack Integrity Checker)[1] tools ?

Net-drivers usually break running these tools.


[1] http://isic.sf.net needs libnet[2]
[2] http://github.com/sam-github/libnet

-- 
«Allá muevan feroz guerra, ciegos reyes por un palmo más de tierra;
que yo aquí tengo por mío cuanto abarca el mar bravío, a quien nadie
impuso leyes. Y no hay playa, sea cualquiera, ni bandera de esplendor,
que no sienta mi derecho y dé pecho a mi valor.»

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2010-04-14 14:30 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4B9E6C60.7030300@atlanticlinux.ie>
     [not found] ` <20100315182220.GQ2763@decadent.org.uk>
     [not found]   ` <4B9F5E5E.2060209@atlanticlinux.ie>
     [not found]     ` <1270393967.8341.11.camel@localhost>
     [not found]       ` <4BBCA19C.5080204@atlanticlinux.ie>
2010-04-10 23:36         ` forcedeth driver hangs under heavy load Ben Hutchings
2010-04-12 10:01           ` Bug#572201: " stephen mulcahy
2010-04-12 12:39             ` stephen mulcahy
2010-04-12 12:47               ` Eric Dumazet
2010-04-12 13:05                 ` stephen mulcahy
2010-04-12 13:19                   ` stephen mulcahy
2010-04-12 15:24                     ` Eric Dumazet
2010-04-12 16:11                       ` stephen mulcahy
2010-04-12 16:59                         ` Eric Dumazet
2010-04-13 10:03                           ` stephen mulcahy
2010-04-13 10:49                             ` Eric Dumazet
2010-04-13 11:00                               ` stephen mulcahy
2010-04-13 12:04                                 ` Ben Hutchings
2010-04-13 14:27                                   ` stephen mulcahy
2010-04-13 14:42                                     ` Eric Dumazet
2010-04-13 14:49                                       ` stephen mulcahy
2010-04-13 15:00                                         ` stephen mulcahy
2010-04-13 15:05                                         ` Eric Dumazet
2010-04-13 15:08                                           ` stephen mulcahy
2010-04-13 15:22                                             ` Eric Dumazet
2010-04-13 15:25                                               ` stephen mulcahy
2010-04-13 20:01                                                 ` Eric Dumazet
2010-04-13 21:43                                       ` David Miller
2010-04-13 21:46                                         ` Eric Dumazet
2010-04-14  5:33                                           ` Bug#572201: " Ayaz Abdulla
2010-04-14  1:41                                             ` David Miller
2010-04-14 14:30                                             ` stephen mulcahy
2010-04-14  5:31                                         ` Bug#572201: [PATCH] forcedeth: fix tx limit2 flag check Ayaz Abdulla
2010-04-14 10:14                                           ` stephen mulcahy
2010-04-13 17:22 forcedeth driver hangs under heavy load Xose Vazquez Perez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.