linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* netdev issues (3c905B)
@ 2001-02-21  0:06 Vibol Hou
  2001-02-21  0:21 ` Martin Moerman
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Vibol Hou @ 2001-02-21  0:06 UTC (permalink / raw)
  To: Linux-Kernel

Hi,

I have some problems on a heavily loaded web server.  The first is that the
kernel is spitting out a bunch of "NETDEV WATCHDOG: eth0: transmit timed
out" errors.  I do not recall this happening in 2.4.0 under the same
conditions.

Another problem that I seem to have, of which I have had reports from
clients, is that the server has problems talking to clients using modems
This didn't occur before with the 2.2 series kernel (all other things held
constant).  It seems each time a client tries to load up any site on the
server, the connection will just die (or stall).  This does not apply to
high-bandwidth connections (DSL and up) since everything seems fine on DSL
and faster, but I tried connecting using my dial-up account with Earthlink,
and the reports seem to be true.  Can those of you on a 56k modem try
connecting to http://khmerconnection.com and see if the page loads?  Apache
isn't the only service affected.  It seems *any* TCP communication runs like
a turtle (even SSH.  takes minutes to login, then minutes to echo each
letter.  doesn't do this on a DSL connection from the same computer).

The card that is exhibiting this problem is a 3c905B (lspci below):

00:08.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 30)
        Subsystem: 3Com Corporation: Unknown device 9055
        Flags: bus master, medium devsel, latency 80, IRQ 17
        I/O ports at e400 [size=128]
        Memory at e8001000 (32-bit, non-prefetchable) [size=128]
        Expansion ROM at e4000000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 1

dmesg shows hordes of these at high peak usage (300KBps+):

NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e601.
  diagnostics: net 0cd8 media 8880 dma 0000003a.
eth0: Interrupt posted but not delivered -- IRQ blocked by another device?
  Flags; bus-master 1, full 0; dirty 9256291(3) current 9256291(3).
  Transmit list 00000000 vs. f7de5230.
  0: @f7de5200  length 80000042 status 00010042
  1: @f7de5210  length 8000004a status 8001004a
  2: @f7de5220  length 80000036 status 80010036
  3: @f7de5230  length 80000036 status 00010036
  4: @f7de5240  length 80000042 status 00010042
  5: @f7de5250  length 80000036 status 00010036
  6: @f7de5260  length 800005ea status 000105ea
  7: @f7de5270  length 800005ea status 000105ea
  8: @f7de5280  length 8000003a status 0001003a
  9: @f7de5290  length 8000003e status 0001003e
  10: @f7de52a0  length 8000003a status 0001003a
  11: @f7de52b0  length 8000003e status 0001003e
  12: @f7de52c0  length 8000003e status 0001003e
  13: @f7de52d0  length 8000004a status 0001004a
  14: @f7de52e0  length 8000004a status 0001004a
  15: @f7de52f0  length 8000003e status 0001003e
eth0: Resetting the Tx ring pointer.

Any ideas?

Thanks,
--
Vibol Hou


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netdev issues (3c905B)
  2001-02-21  0:06 netdev issues (3c905B) Vibol Hou
@ 2001-02-21  0:21 ` Martin Moerman
  2001-02-21  0:34   ` Vibol Hou
  2001-02-21  9:47 ` 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B)) Ookhoi
  2001-02-21 10:57 ` David S. Miller
  2 siblings, 1 reply; 18+ messages in thread
From: Martin Moerman @ 2001-02-21  0:21 UTC (permalink / raw)
  To: Vibol Hou; +Cc: Linux-Kernel



Vibol,

I see that the card is on IRQ 17 ???

can you send us /proc/interrupts

/Martin


On Tue, 20 Feb 2001, Vibol Hou wrote:

> Hi,
> 
> I have some problems on a heavily loaded web server.  The first is that the
> kernel is spitting out a bunch of "NETDEV WATCHDOG: eth0: transmit timed
> out" errors.  I do not recall this happening in 2.4.0 under the same
> conditions.
> 
> Another problem that I seem to have, of which I have had reports from
> clients, is that the server has problems talking to clients using modems
> This didn't occur before with the 2.2 series kernel (all other things held
> constant).  It seems each time a client tries to load up any site on the
> server, the connection will just die (or stall).  This does not apply to
> high-bandwidth connections (DSL and up) since everything seems fine on DSL
> and faster, but I tried connecting using my dial-up account with Earthlink,
> and the reports seem to be true.  Can those of you on a 56k modem try
> connecting to http://khmerconnection.com and see if the page loads?  Apache
> isn't the only service affected.  It seems *any* TCP communication runs like
> a turtle (even SSH.  takes minutes to login, then minutes to echo each
> letter.  doesn't do this on a DSL connection from the same computer).
> 
> The card that is exhibiting this problem is a 3c905B (lspci below):
> 
> 00:08.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
> (rev 30)
>         Subsystem: 3Com Corporation: Unknown device 9055
>         Flags: bus master, medium devsel, latency 80, IRQ 17
>         I/O ports at e400 [size=128]
>         Memory at e8001000 (32-bit, non-prefetchable) [size=128]
>         Expansion ROM at e4000000 [disabled] [size=128K]
>         Capabilities: [dc] Power Management version 1
> 
> dmesg shows hordes of these at high peak usage (300KBps+):
> 
> NETDEV WATCHDOG: eth0: transmit timed out
> eth0: transmit timed out, tx_status 00 status e601.
>   diagnostics: net 0cd8 media 8880 dma 0000003a.
> eth0: Interrupt posted but not delivered -- IRQ blocked by another device?
>   Flags; bus-master 1, full 0; dirty 9256291(3) current 9256291(3).
>   Transmit list 00000000 vs. f7de5230.
>   0: @f7de5200  length 80000042 status 00010042
>   1: @f7de5210  length 8000004a status 8001004a
>   2: @f7de5220  length 80000036 status 80010036
>   3: @f7de5230  length 80000036 status 00010036
>   4: @f7de5240  length 80000042 status 00010042
>   5: @f7de5250  length 80000036 status 00010036
>   6: @f7de5260  length 800005ea status 000105ea
>   7: @f7de5270  length 800005ea status 000105ea
>   8: @f7de5280  length 8000003a status 0001003a
>   9: @f7de5290  length 8000003e status 0001003e
>   10: @f7de52a0  length 8000003a status 0001003a
>   11: @f7de52b0  length 8000003e status 0001003e
>   12: @f7de52c0  length 8000003e status 0001003e
>   13: @f7de52d0  length 8000004a status 0001004a
>   14: @f7de52e0  length 8000004a status 0001004a
>   15: @f7de52f0  length 8000003e status 0001003e
> eth0: Resetting the Tx ring pointer.
> 
> Any ideas?
> 
> Thanks,
> --
> Vibol Hou
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: netdev issues (3c905B)
  2001-02-21  0:21 ` Martin Moerman
@ 2001-02-21  0:34   ` Vibol Hou
  0 siblings, 0 replies; 18+ messages in thread
From: Vibol Hou @ 2001-02-21  0:34 UTC (permalink / raw)
  To: Martin Moerman; +Cc: Linux-Kernel

Hi Martin,

Here's /proc/interrupts:

           CPU0       CPU1
  0:    2748043    2754927    IO-APIC-edge  timer
  1:          2          0    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  4:       2737       2892    IO-APIC-edge  serial
 17:    9573612    9568840   IO-APIC-level  eth0
 18:     483436     482421   IO-APIC-level  aic7xxx
NMI:    5505505    5505399
LOC:    5502609    5502508
ERR:          0

-Vibol

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Martin Moerman
Sent: Tuesday, February 20, 2001 4:22 PM
To: Vibol Hou
Cc: Linux-Kernel
Subject: Re: netdev issues (3c905B)




Vibol,

I see that the card is on IRQ 17 ???

can you send us /proc/interrupts

/Martin


On Tue, 20 Feb 2001, Vibol Hou wrote:

> Hi,
>
> I have some problems on a heavily loaded web server.  The first is that
the
> kernel is spitting out a bunch of "NETDEV WATCHDOG: eth0: transmit timed
> out" errors.  I do not recall this happening in 2.4.0 under the same
> conditions.
>
> Another problem that I seem to have, of which I have had reports from
> clients, is that the server has problems talking to clients using modems
> This didn't occur before with the 2.2 series kernel (all other things held
> constant).  It seems each time a client tries to load up any site on the
> server, the connection will just die (or stall).  This does not apply to
> high-bandwidth connections (DSL and up) since everything seems fine on DSL
> and faster, but I tried connecting using my dial-up account with
Earthlink,
> and the reports seem to be true.  Can those of you on a 56k modem try
> connecting to http://khmerconnection.com and see if the page loads?
Apache
> isn't the only service affected.  It seems *any* TCP communication runs
like
> a turtle (even SSH.  takes minutes to login, then minutes to echo each
> letter.  doesn't do this on a DSL connection from the same computer).
>
> The card that is exhibiting this problem is a 3c905B (lspci below):
>
> 00:08.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
> (rev 30)
>         Subsystem: 3Com Corporation: Unknown device 9055
>         Flags: bus master, medium devsel, latency 80, IRQ 17
>         I/O ports at e400 [size=128]
>         Memory at e8001000 (32-bit, non-prefetchable) [size=128]
>         Expansion ROM at e4000000 [disabled] [size=128K]
>         Capabilities: [dc] Power Management version 1
>
> dmesg shows hordes of these at high peak usage (300KBps+):
>
> NETDEV WATCHDOG: eth0: transmit timed out
> eth0: transmit timed out, tx_status 00 status e601.
>   diagnostics: net 0cd8 media 8880 dma 0000003a.
> eth0: Interrupt posted but not delivered -- IRQ blocked by another device?
>   Flags; bus-master 1, full 0; dirty 9256291(3) current 9256291(3).
>   Transmit list 00000000 vs. f7de5230.
>   0: @f7de5200  length 80000042 status 00010042
>   1: @f7de5210  length 8000004a status 8001004a
>   2: @f7de5220  length 80000036 status 80010036
>   3: @f7de5230  length 80000036 status 00010036
>   4: @f7de5240  length 80000042 status 00010042
>   5: @f7de5250  length 80000036 status 00010036
>   6: @f7de5260  length 800005ea status 000105ea
>   7: @f7de5270  length 800005ea status 000105ea
>   8: @f7de5280  length 8000003a status 0001003a
>   9: @f7de5290  length 8000003e status 0001003e
>   10: @f7de52a0  length 8000003a status 0001003a
>   11: @f7de52b0  length 8000003e status 0001003e
>   12: @f7de52c0  length 8000003e status 0001003e
>   13: @f7de52d0  length 8000004a status 0001004a
>   14: @f7de52e0  length 8000004a status 0001004a
>   15: @f7de52f0  length 8000003e status 0001003e
> eth0: Resetting the Tx ring pointer.
>
> Any ideas?
>
> Thanks,
> --
> Vibol Hou
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
  2001-02-21  0:06 netdev issues (3c905B) Vibol Hou
  2001-02-21  0:21 ` Martin Moerman
@ 2001-02-21  9:47 ` Ookhoi
  2001-02-21 13:12   ` Gregory Maxwell
  2001-02-21 10:57 ` David S. Miller
  2 siblings, 1 reply; 18+ messages in thread
From: Ookhoi @ 2001-02-21  9:47 UTC (permalink / raw)
  To: Vibol Hou; +Cc: Linux-Kernel, sim

Hi!

> Another problem that I seem to have, of which I have had reports from
> clients, is that the server has problems talking to clients using modems
> This didn't occur before with the 2.2 series kernel (all other things held
> constant).  It seems each time a client tries to load up any site on the
> server, the connection will just die (or stall).  This does not apply to
> high-bandwidth connections (DSL and up) since everything seems fine on DSL
> and faster, but I tried connecting using my dial-up account with Earthlink,
> and the reports seem to be true.  Can those of you on a 56k modem try
> connecting to http://khmerconnection.com and see if the page loads?  Apache
> isn't the only service affected.  It seems *any* TCP communication runs like
> a turtle (even SSH.  takes minutes to login, then minutes to echo each
> letter.  doesn't do this on a DSL connection from the same computer).
> 
> The card that is exhibiting this problem is a 3c905B (lspci below):

[cut]

We have exactly the same problem but in our case it depends on the
following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
header compression turned on, 3, a free internet access provider in
Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
connection').
If we remove one of the three conditions, the connection is oke. It is
only tcp which is affected.
A packet on its way from linux server to windows client seems to get
dropped once and retransmitted. This makes the connection _very_ slow.

It seemes that Simon has the same problem. Can I provide tcp dumps to
help and find the cause to this problem? Not sure yet this is only with
3com nics. Will test that.

	Ookhoi


Date: 	Fri, 16 Feb 2001 20:02:11 -0500
From: Simon Kirby <sim@stormix.com>
To: linux-kernel@vger.kernel.org, davem@redhat.com
Cc: Alan Evetts <alane@netnation.com>
Subject: Re: 2.4 TCP(?) timeouts

On Fri, Feb 16, 2001 at 07:08:05PM -0500, Simon Kirby wrote:

> Hello,
> 
> Today we put 2.4.1 on our mail server after having see it perform well on
> some other boxes.  It seems now we are receiving a few calls every hour
> from customers reporting that the server tends to hang and eventually
> time out on them when downloading mail.  All customers that have reported
> this problem so far are on a didalup connection.  Apparently the server
> will stop transmitting data (or the client seems to think so), and then
> their mail client will time out.

We recorded a trace on the mail server end to one of the customers having
the problem.  At first they closed the connection because their mail
client was set to a timeout of 1 minute, but then when they changed it to
5 seconds, it seemed to limp along further.  It seems to me just like
there's a huge amount of packet loss, but pinging the machine just after
this shows 0% loss (just occasional jumps in response time).

During this trace, when long periods of nothing went by, "netstat -tan
|grep ip" showed nothing abnormal: a 0 byte receive queue and some
data in the send queue equal to what would be retransmitted and
eventually go through two minutes later.

nmap:
Remote operating system guess: Windows 2000 Professional, Build 2128

16:26:14.738836 < client.1104 > mail.pop3: S 1263956200:1263956200(0) win 8760 <mss 536,nop,nop,sackOK> (DF)
16:26:14.738888 > mail.pop3 > client.1104: S 26894293:26894293(0) ack 1263956201 win 5840 <mss 1460,nop,nop,sackOK> (DF)
16:26:15.014145 < client.1104 > mail.pop3: . 1:1(0) ack 1 win 9112 (DF)
16:26:15.014866 > mail.pop3 > client.1104: P 1:92(91) ack 1 win 5840 (DF)
16:26:15.291998 < client.1104 > mail.pop3: P 1:16(15) ack 92 win 9021 (DF)
16:26:15.292199 > mail.pop3 > client.1104: . 92:92(0) ack 16 win 5840 (DF)
16:26:15.292305 > mail.pop3 > client.1104: P 92:115(23) ack 16 win 5840 (DF)
16:26:16.686295 > mail.pop3 > client.1104: P 92:115(23) ack 16 win 5840 (DF)
16:26:16.954563 < client.1104 > mail.pop3: P 16:30(14) ack 115 win 8998 (DF)
16:26:16.976908 > mail.pop3 > client.1104: P 115:137(22) ack 30 win 5840 (DF)
16:26:19.776322 > mail.pop3 > client.1104: P 115:137(22) ack 30 win 5840 (DF)
16:26:20.033951 < client.1104 > mail.pop3: P 30:36(6) ack 137 win 8976 (DF)
16:26:20.034063 > mail.pop3 > client.1104: P 137:149(12) ack 36 win 5840 (DF)
16:26:25.626301 > mail.pop3 > client.1104: P 137:149(12) ack 36 win 5840 (DF)
16:26:25.922151 < client.1104 > mail.pop3: P 36:42(6) ack 149 win 8964 (DF)
16:26:25.922254 > mail.pop3 > client.1104: P 149:219(70) ack 42 win 5840 (DF)
16:26:36.949499 < client.1104 > mail.pop3: P 36:42(6) ack 149 win 8964 (DF)
16:26:36.949533 > mail.pop3 > client.1104: . 219:219(0) ack 42 win 5840 <nop,nop, sack 1 {36:42} > (DF)
16:26:37.116302 > mail.pop3 > client.1104: P 149:219(70) ack 42 win 5840 (DF)
16:26:37.380554 < client.1104 > mail.pop3: P 42:50(8) ack 219 win 8894 (DF)
16:26:37.380645 > mail.pop3 > client.1104: . 219:219(0) ack 50 win 5840 (DF)
16:26:37.380709 > mail.pop3 > client.1104: P 219:231(12) ack 50 win 5840 (DF)
16:26:59.567440 < client.1104 > mail.pop3: P 42:50(8) ack 219 win 8894 (DF)
16:26:59.567476 > mail.pop3 > client.1104: . 231:231(0) ack 50 win 5840 <nop,nop, sack 1 {42:50} > (DF)
16:26:59.776301 > mail.pop3 > client.1104: P 219:231(12) ack 50 win 5840 (DF)
16:27:00.043125 < client.1104 > mail.pop3: P 50:59(9) ack 231 win 8882 (DF)
16:27:00.043186 > mail.pop3 > client.1104: . 231:231(0) ack 59 win 5840 (DF)
16:27:00.043475 > mail.pop3 > client.1104: . 231:767(536) ack 59 win 5840 (DF)
16:27:00.043491 > mail.pop3 > client.1104: P 767:1220(453) ack 59 win 5840 (DF)
16:27:44.399831 < client.1104 > mail.pop3: P 50:59(9) ack 231 win 8882 (DF)
16:27:44.399869 > mail.pop3 > client.1104: . 1220:1220(0) ack 59 win 5840 <nop,nop, sack 1 {50:59} > (DF)
16:27:44.836304 > mail.pop3 > client.1104: . 231:767(536) ack 59 win 5840 (DF)
16:27:45.295946 < client.1104 > mail.pop3: . 59:59(0) ack 767 win 9112 (DF)
16:27:45.296003 > mail.pop3 > client.1104: P 767:1220(453) ack 59 win 5840 (DF)
16:29:14.886322 > mail.pop3 > client.1104: P 767:1220(453) ack 59 win 5840 (DF)
16:29:15.264417 < client.1104 > mail.pop3: P 59:67(8) ack 1220 win 8659 (DF)
16:29:15.264479 > mail.pop3 > client.1104: . 1220:1220(0) ack 67 win 5840 (DF)
16:29:15.265127 > mail.pop3 > client.1104: . 1220:1756(536) ack 67 win 5840 (DF)
16:29:15.265145 > mail.pop3 > client.1104: . 1756:2292(536) ack 67 win 5840 (DF)
16:30:45.187652 < client.1104 > mail.pop3: P 59:67(8) ack 1220 win 8659 (DF)
16:30:45.187727 > mail.pop3 > client.1104: . 2292:2292(0) ack 67 win 5840 <nop,nop, sack 1 {59:67} > (DF)
16:31:16.326378 > mail.pop3 > client.1104: . 1220:1756(536) ack 67 win 5840 (DF)
16:31:17.513053 < client.1104 > mail.pop3: . 67:67(0) ack 1756 win 9112 (DF)
16:31:17.513129 > mail.pop3 > client.1104: . 1756:2292(536) ack 67 win 5840 (DF)
16:31:17.513143 > mail.pop3 > client.1104: . 2292:2828(536) ack 67 win 5840 (DF)
16:33:17.506376 > mail.pop3 > client.1104: . 1756:2292(536) ack 67 win 5840 (DF)
16:33:17.919146 < client.1104 > mail.pop3: . 67:67(0) ack 2292 win 9112 (DF)
16:33:17.919198 > mail.pop3 > client.1104: . 2292:2828(536) ack 67 win 5840 (DF)
16:33:17.919211 > mail.pop3 > client.1104: . 2828:3364(536) ack 67 win 5840 (DF)
16:35:17.916383 > mail.pop3 > client.1104: . 2292:2828(536) ack 67 win 5840 (DF)
16:35:18.401250 < client.1104 > mail.pop3: . 67:67(0) ack 2828 win 9112 (DF)
16:35:18.401394 > mail.pop3 > client.1104: . 2828:3364(536) ack 67 win 5840 (DF)
16:35:18.401414 > mail.pop3 > client.1104: . 3364:3900(536) ack 67 win 5840 (DF)
16:37:18.396373 > mail.pop3 > client.1104: . 2828:3364(536) ack 67 win 5840 (DF)
16:37:21.763859 < client.1104 > mail.pop3: . 67:67(0) ack 3364 win 9112 (DF)
16:37:21.764049 > mail.pop3 > client.1104: . 3364:3900(536) ack 67 win 5840 (DF)
16:37:21.764062 > mail.pop3 > client.1104: . 3900:4436(536) ack 67 win 5840 (DF)
16:42:22.308578 < client.1104 > mail.pop3: F 67:67(0) ack 3364 win 9112 (DF)
16:42:22.308625 > mail.pop3 > client.1104: R 26897657:26897657(0) win 0 (DF)

I'm not sure how the last part happened, but I'm guessing the server was
waiting on the next transmit to send that it had already closed the
connection, and the RST was sent out as a response to the socket already
being closed locally when the customer eventually closed the connection.

Would any of the networking changes in 2.4.1pre3 affect what is happening
here?

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
  2001-02-21  0:06 netdev issues (3c905B) Vibol Hou
  2001-02-21  0:21 ` Martin Moerman
  2001-02-21  9:47 ` 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B)) Ookhoi
@ 2001-02-21 10:57 ` David S. Miller
  2001-02-21 11:33   ` Ookhoi
                     ` (6 more replies)
  2 siblings, 7 replies; 18+ messages in thread
From: David S. Miller @ 2001-02-21 10:57 UTC (permalink / raw)
  To: ookhoi; +Cc: Vibol Hou, Linux-Kernel, sim


Ookhoi writes:
 > We have exactly the same problem but in our case it depends on the
 > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
 > header compression turned on, 3, a free internet access provider in
 > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
 > connection').
 > If we remove one of the three conditions, the connection is oke. It is
 > only tcp which is affected.
 > A packet on its way from linux server to windows client seems to get
 > dropped once and retransmitted. This makes the connection _very_ slow.

:-( I hate these buggy systems.

Does this patch below fix the performance problem and are the windows
clients win2000 or win95?

--- include/net/ip.h.~1~	Mon Feb 19 00:12:31 2001
+++ include/net/ip.h	Wed Feb 21 02:56:15 2001
@@ -190,9 +190,11 @@
 
 static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst)
 {
+#if 0
 	if (iph->frag_off&__constant_htons(IP_DF))
 		iph->id = 0;
 	else
+#endif
 		__ip_select_ident(iph, dst);
 }
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
  2001-02-21 10:57 ` David S. Miller
@ 2001-02-21 11:33   ` Ookhoi
  2001-02-21 17:17   ` Ookhoi
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Ookhoi @ 2001-02-21 11:33 UTC (permalink / raw)
  To: David S. Miller; +Cc: Vibol Hou, Linux-Kernel, sim

Hi David,

>  > We have exactly the same problem but in our case it depends on the
>  > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
>  > header compression turned on, 3, a free internet access provider in
>  > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
>  > connection').
>  > If we remove one of the three conditions, the connection is oke. It is
>  > only tcp which is affected.
>  > A packet on its way from linux server to windows client seems to get
>  > dropped once and retransmitted. This makes the connection _very_ slow.
> 
> :-( I hate these buggy systems.
> 
> Does this patch below fix the performance problem and are the windows
> clients win2000 or win95?

It is 95 in our case. I'll test the patch today and report back to you.
Thanks a lot!

	Ookhoi


> --- include/net/ip.h.~1~	Mon Feb 19 00:12:31 2001
> +++ include/net/ip.h	Wed Feb 21 02:56:15 2001
> @@ -190,9 +190,11 @@
>  
>  static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst)
>  {
> +#if 0
>  	if (iph->frag_off&__constant_htons(IP_DF))
>  		iph->id = 0;
>  	else
> +#endif
>  		__ip_select_ident(iph, dst);
>  }
>  

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
  2001-02-21  9:47 ` 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B)) Ookhoi
@ 2001-02-21 13:12   ` Gregory Maxwell
  0 siblings, 0 replies; 18+ messages in thread
From: Gregory Maxwell @ 2001-02-21 13:12 UTC (permalink / raw)
  To: Ookhoi; +Cc: Vibol Hou, Linux-Kernel, sim

On Wed, Feb 21, 2001 at 10:47:24AM +0100, Ookhoi wrote:
[snip]
> We have exactly the same problem but in our case it depends on the
> following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
> header compression turned on, 3, a free internet access provider in
> Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
> connection').
> If we remove one of the three conditions, the connection is oke. It is
> only tcp which is affected.
> A packet on its way from linux server to windows client seems to get
> dropped once and retransmitted. This makes the connection _very_ slow.
[snip]

It's been true for some time now that there are several firewalls, RAS, and
NAT devices that break TCP connections in subtile but horrible ways when they
encounter SACK, timestamps, have header compression enabled, or other
'exotic' features.

Has anyone compiled a list of such bugs so that a test application could be
created?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
  2001-02-21 10:57 ` David S. Miller
  2001-02-21 11:33   ` Ookhoi
@ 2001-02-21 17:17   ` Ookhoi
  2001-02-21 19:06   ` Vibol Hou
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Ookhoi @ 2001-02-21 17:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: Vibol Hou, Linux-Kernel, sim

Hi David!

>  > We have exactly the same problem but in our case it depends on the
>  > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
>  > header compression turned on, 3, a free internet access provider in
>  > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
>  > connection').
>  > If we remove one of the three conditions, the connection is oke. It is
>  > only tcp which is affected.
>  > A packet on its way from linux server to windows client seems to get
>  > dropped once and retransmitted. This makes the connection _very_ slow.
> 
> :-( I hate these buggy systems.
> 
> Does this patch below fix the performance problem and are the windows
> clients win2000 or win95?

Yes, the problem is fixed! Thank you very much. :-)  'great' patch!

	Ookhoi


> --- include/net/ip.h.~1~	Mon Feb 19 00:12:31 2001
> +++ include/net/ip.h	Wed Feb 21 02:56:15 2001
> @@ -190,9 +190,11 @@
>  
>  static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst)
>  {
> +#if 0
>  	if (iph->frag_off&__constant_htons(IP_DF))
>  		iph->id = 0;
>  	else
> +#endif
>  		__ip_select_ident(iph, dst);
>  }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
  2001-02-21 10:57 ` David S. Miller
  2001-02-21 11:33   ` Ookhoi
  2001-02-21 17:17   ` Ookhoi
@ 2001-02-21 19:06   ` Vibol Hou
  2001-02-21 19:22   ` Vibol Hou
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Vibol Hou @ 2001-02-21 19:06 UTC (permalink / raw)
  To: David S. Miller, ookhoi; +Cc: Linux-Kernel, sim

Win2K here, I'll apply the patch and let you know what happens.

-Vibol

-----Original Message-----
From: David S. Miller [mailto:davem@redhat.com]
Sent: Wednesday, February 21, 2001 2:57 AM
To: ookhoi@dds.nl
Cc: Vibol Hou; Linux-Kernel; sim@stormix.com
Subject: Re: 2.4 tcp very slow under certain circumstances (Re: netdev
issues (3c905B))



Ookhoi writes:
 > We have exactly the same problem but in our case it depends on the
 > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
 > header compression turned on, 3, a free internet access provider in
 > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
 > connection').
 > If we remove one of the three conditions, the connection is oke. It is
 > only tcp which is affected.
 > A packet on its way from linux server to windows client seems to get
 > dropped once and retransmitted. This makes the connection _very_ slow.

:-( I hate these buggy systems.

Does this patch below fix the performance problem and are the windows
clients win2000 or win95?

--- include/net/ip.h.~1~	Mon Feb 19 00:12:31 2001
+++ include/net/ip.h	Wed Feb 21 02:56:15 2001
@@ -190,9 +190,11 @@

 static inline void ip_select_ident(struct iphdr *iph, struct dst_entry
*dst)
 {
+#if 0
 	if (iph->frag_off&__constant_htons(IP_DF))
 		iph->id = 0;
 	else
+#endif
 		__ip_select_ident(iph, dst);
 }




^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
  2001-02-21 10:57 ` David S. Miller
                     ` (2 preceding siblings ...)
  2001-02-21 19:06   ` Vibol Hou
@ 2001-02-21 19:22   ` Vibol Hou
  2001-02-21 22:30   ` Jordan Mendelson
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Vibol Hou @ 2001-02-21 19:22 UTC (permalink / raw)
  To: David S. Miller, ookhoi; +Cc: Linux-Kernel, sim

It looks like the patch fixed the problem.  TCP communications over modem
seems fine now with the same settings that didnt' work earlier.

-Vibol

-----Original Message-----
From: David S. Miller [mailto:davem@redhat.com]
Sent: Wednesday, February 21, 2001 2:57 AM
To: ookhoi@dds.nl
Cc: Vibol Hou; Linux-Kernel; sim@stormix.com
Subject: Re: 2.4 tcp very slow under certain circumstances (Re: netdev
issues (3c905B))



Ookhoi writes:
 > We have exactly the same problem but in our case it depends on the
 > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
 > header compression turned on, 3, a free internet access provider in
 > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
 > connection').
 > If we remove one of the three conditions, the connection is oke. It is
 > only tcp which is affected.
 > A packet on its way from linux server to windows client seems to get
 > dropped once and retransmitted. This makes the connection _very_ slow.

:-( I hate these buggy systems.

Does this patch below fix the performance problem and are the windows
clients win2000 or win95?

--- include/net/ip.h.~1~	Mon Feb 19 00:12:31 2001
+++ include/net/ip.h	Wed Feb 21 02:56:15 2001
@@ -190,9 +190,11 @@

 static inline void ip_select_ident(struct iphdr *iph, struct dst_entry
*dst)
 {
+#if 0
 	if (iph->frag_off&__constant_htons(IP_DF))
 		iph->id = 0;
 	else
+#endif
 		__ip_select_ident(iph, dst);
 }




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues  (3c905B))
  2001-02-21 10:57 ` David S. Miller
                     ` (3 preceding siblings ...)
  2001-02-21 19:22   ` Vibol Hou
@ 2001-02-21 22:30   ` Jordan Mendelson
  2001-02-22  8:28     ` Ookhoi
  2001-02-21 23:49   ` Jordan Mendelson
  2001-02-21 23:52   ` David S. Miller
  6 siblings, 1 reply; 18+ messages in thread
From: Jordan Mendelson @ 2001-02-21 22:30 UTC (permalink / raw)
  To: David S. Miller; +Cc: ookhoi, Vibol Hou, Linux-Kernel, sim

"David S. Miller" wrote:
> 
> Ookhoi writes:
>  > We have exactly the same problem but in our case it depends on the
>  > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
>  > header compression turned on, 3, a free internet access provider in
>  > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
>  > connection').
>  > If we remove one of the three conditions, the connection is oke. It is
>  > only tcp which is affected.
>  > A packet on its way from linux server to windows client seems to get
>  > dropped once and retransmitted. This makes the connection _very_ slow.
> 
> :-( I hate these buggy systems.
> 
> Does this patch below fix the performance problem and are the windows
> clients win2000 or win95?

I wanted to see if this would fix the problem I was seeing with Win9x
users on PPP w/ compression dialing up to Earthlink in the bay area
(there are others, but it's the only one I can reproduce).

I compiled 2.4.1 with this change and for some odd reason, the kernel
started dropping packets and became unusable (couldn't ssh in) after
around 4050 connections were opened. I tested it also with 2.4.1-ac20
and had the same problem right around 4050 connections.

This is on a VA Linux box with dual eepro100's (one used) connected to a
Cisco 6509.



> --- include/net/ip.h.~1~        Mon Feb 19 00:12:31 2001
> +++ include/net/ip.h    Wed Feb 21 02:56:15 2001
> @@ -190,9 +190,11 @@
> 
>  static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst)
>  {
> +#if 0
>         if (iph->frag_off&__constant_htons(IP_DF))
>                 iph->id = 0;
>         else
> +#endif
>                 __ip_select_ident(iph, dst);
>  }
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues  (3c905B))
  2001-02-21 10:57 ` David S. Miller
                     ` (4 preceding siblings ...)
  2001-02-21 22:30   ` Jordan Mendelson
@ 2001-02-21 23:49   ` Jordan Mendelson
  2001-02-21 23:52   ` David S. Miller
  6 siblings, 0 replies; 18+ messages in thread
From: Jordan Mendelson @ 2001-02-21 23:49 UTC (permalink / raw)
  To: David S. Miller; +Cc: ookhoi, Vibol Hou, Linux-Kernel, sim, netdev

"David S. Miller" wrote:
> 
> Ookhoi writes:
>  > We have exactly the same problem but in our case it depends on the
>  > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
>  > header compression turned on, 3, a free internet access provider in
>  > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
>  > connection').
>  > If we remove one of the three conditions, the connection is oke. It is
>  > only tcp which is affected.
>  > A packet on its way from linux server to windows client seems to get
>  > dropped once and retransmitted. This makes the connection _very_ slow.
> 
> :-( I hate these buggy systems.
> 
> Does this patch below fix the performance problem and are the windows
> clients win2000 or win95?

Just a note however... this patch did fix the problem we were seeing
with retransmits and Win95 compressed PPP and dialup over earthlink in
the bay area.

Now, if it didn't have the side effect of dropping packets left and
right after ~4000 open connections (simultaneously), I could finally
move our production system to 2.4.x.



Jordan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues  (3c905B))
  2001-02-21 10:57 ` David S. Miller
                     ` (5 preceding siblings ...)
  2001-02-21 23:49   ` Jordan Mendelson
@ 2001-02-21 23:52   ` David S. Miller
  2001-02-22  0:10     ` Jordan Mendelson
                       ` (3 more replies)
  6 siblings, 4 replies; 18+ messages in thread
From: David S. Miller @ 2001-02-21 23:52 UTC (permalink / raw)
  To: Jordan Mendelson; +Cc: ookhoi, Vibol Hou, Linux-Kernel, sim, netdev


Jordan Mendelson writes:
 > Now, if it didn't have the side effect of dropping packets left and
 > right after ~4000 open connections (simultaneously), I could finally
 > move our production system to 2.4.x.

There is no reason my patch should have this effect.

All of this is what appears to be a bug in Windows TCP header
compression, if the ID field of the IPv4 header does not change then
it drops every other packet.

The change I posted as-is, is unacceptable because it adds unnecessary
cost to a fast path.  The final change I actually use will likely
involve using the TCP sequence numbers to calculate an "always
changing" ID number in the IPv4 headers to placate these broken
windows machines.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues  (3c905B))
  2001-02-21 23:52   ` David S. Miller
@ 2001-02-22  0:10     ` Jordan Mendelson
  2001-02-22  0:50     ` Jordan Mendelson
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Jordan Mendelson @ 2001-02-22  0:10 UTC (permalink / raw)
  To: David S. Miller; +Cc: ookhoi, Vibol Hou, Linux-Kernel, sim, netdev

"David S. Miller" wrote:
> 
> Jordan Mendelson writes:
>  > Now, if it didn't have the side effect of dropping packets left and
>  > right after ~4000 open connections (simultaneously), I could finally
>  > move our production system to 2.4.x.
> 
> There is no reason my patch should have this effect.

My guess is that the fast path prevented the need for looking up the
destination in some structure which is limited to ~4K entries (route
table?).


Jordan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues  (3c905B))
  2001-02-21 23:52   ` David S. Miller
  2001-02-22  0:10     ` Jordan Mendelson
@ 2001-02-22  0:50     ` Jordan Mendelson
  2001-02-27  0:21     ` Simon Kirby
  2001-02-27  0:26     ` David S. Miller
  3 siblings, 0 replies; 18+ messages in thread
From: Jordan Mendelson @ 2001-02-22  0:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: ookhoi, Vibol Hou, Linux-Kernel, sim, netdev

"David S. Miller" wrote:
> 
> Jordan Mendelson writes:
>  > Now, if it didn't have the side effect of dropping packets left and
>  > right after ~4000 open connections (simultaneously), I could finally
>  > move our production system to 2.4.x.
> 
> The change I posted as-is, is unacceptable because it adds unnecessary
> cost to a fast path.  The final change I actually use will likely
> involve using the TCP sequence numbers to calculate an "always
> changing" ID number in the IPv4 headers to placate these broken
> windows machines.

Just for kicks I modified the fast path to use a globally incremented
count to see if it would fix both Win9x problem and my 4K connection
problem and it appears to be working just fine.

What probably happened was the sheer number of packets at 4K connections
without the fast path just slowed everything down to a crawl.


Thanks Dave,

Jordan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
  2001-02-21 22:30   ` Jordan Mendelson
@ 2001-02-22  8:28     ` Ookhoi
  0 siblings, 0 replies; 18+ messages in thread
From: Ookhoi @ 2001-02-22  8:28 UTC (permalink / raw)
  To: Jordan Mendelson; +Cc: David S. Miller, Vibol Hou, Linux-Kernel, sim

Hi Jordan,

> >  > We have exactly the same problem but in our case it depends on the
> >  > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip
> >  > header compression turned on, 3, a free internet access provider in
> >  > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster
> >  > connection').
> >  > If we remove one of the three conditions, the connection is oke. It is
> >  > only tcp which is affected.
> >  > A packet on its way from linux server to windows client seems to get
> >  > dropped once and retransmitted. This makes the connection _very_ slow.
> > 
> > :-( I hate these buggy systems.
> > 
> > Does this patch below fix the performance problem and are the windows
> > clients win2000 or win95?
> 
> I wanted to see if this would fix the problem I was seeing with Win9x
> users on PPP w/ compression dialing up to Earthlink in the bay area
> (there are others, but it's the only one I can reproduce).
> 
> I compiled 2.4.1 with this change and for some odd reason, the kernel
> started dropping packets and became unusable (couldn't ssh in) after
> around 4050 connections were opened. I tested it also with 2.4.1-ac20
> and had the same problem right around 4050 connections.
> 
> This is on a VA Linux box with dual eepro100's (one used) connected to a
> Cisco 6509.

I patched two computers, 2.4.1-ac20. One of them is a fairly loaded
webserver. Both have an uptime of 15.15 and 16.30 hours, and are fine.
Didn't test with that much connections though.

	Ookhoi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
  2001-02-21 23:52   ` David S. Miller
  2001-02-22  0:10     ` Jordan Mendelson
  2001-02-22  0:50     ` Jordan Mendelson
@ 2001-02-27  0:21     ` Simon Kirby
  2001-02-27  0:26     ` David S. Miller
  3 siblings, 0 replies; 18+ messages in thread
From: Simon Kirby @ 2001-02-27  0:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jordan Mendelson, ookhoi, Vibol Hou, Linux-Kernel, netdev

On Wed, Feb 21, 2001 at 03:52:37PM -0800, David S. Miller wrote:

> There is no reason my patch should have this effect.
> 
> All of this is what appears to be a bug in Windows TCP header
> compression, if the ID field of the IPv4 header does not change then
> it drops every other packet.
> 
> The change I posted as-is, is unacceptable because it adds unnecessary
> cost to a fast path.  The final change I actually use will likely
> involve using the TCP sequence numbers to calculate an "always
> changing" ID number in the IPv4 headers to placate these broken
> windows machines.

Has such a patch gone in to the kernel yet?

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
  2001-02-21 23:52   ` David S. Miller
                       ` (2 preceding siblings ...)
  2001-02-27  0:21     ` Simon Kirby
@ 2001-02-27  0:26     ` David S. Miller
  3 siblings, 0 replies; 18+ messages in thread
From: David S. Miller @ 2001-02-27  0:26 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Jordan Mendelson, ookhoi, Vibol Hou, Linux-Kernel, netdev


Simon Kirby writes:
 > Has such a patch gone in to the kernel yet?

Yep, it is in both the zerocopy and AC patches. (Linus is
away at the moment)

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2001-02-27  0:30 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-02-21  0:06 netdev issues (3c905B) Vibol Hou
2001-02-21  0:21 ` Martin Moerman
2001-02-21  0:34   ` Vibol Hou
2001-02-21  9:47 ` 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B)) Ookhoi
2001-02-21 13:12   ` Gregory Maxwell
2001-02-21 10:57 ` David S. Miller
2001-02-21 11:33   ` Ookhoi
2001-02-21 17:17   ` Ookhoi
2001-02-21 19:06   ` Vibol Hou
2001-02-21 19:22   ` Vibol Hou
2001-02-21 22:30   ` Jordan Mendelson
2001-02-22  8:28     ` Ookhoi
2001-02-21 23:49   ` Jordan Mendelson
2001-02-21 23:52   ` David S. Miller
2001-02-22  0:10     ` Jordan Mendelson
2001-02-22  0:50     ` Jordan Mendelson
2001-02-27  0:21     ` Simon Kirby
2001-02-27  0:26     ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).