All of lore.kernel.org
 help / color / mirror / Atom feed
* regression(?): starting with 2.6.21 sending packets became broken.
@ 2007-10-13 18:16 Peter Volkov
  2007-10-13 18:59 ` David
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Volkov @ 2007-10-13 18:16 UTC (permalink / raw)
  To: linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2399 bytes --]

Hello, all on the list.

Please CC me in answers, I'm not subscribed. Please, if this is wrong
list tell me what is correct.

Starting with 2.6.21 (or may be 2.6.20 as I have not tried it) kernel I
have problem that most tcp based services freeze at some point of
operation. I've noticed this first on ssh but then found out that at
lease one other service became similarly. The problem sites somewhere in
the kernel as I've compiled 2.6.19, 2.6.21, and 2.6.22 with the
similar .config options (of course not exact, as some options does not
exist in some kernels, but seems that enabled options are all the same)
but I have this problem only with the 21 and 22. I've tried to debug the
problem a bit, but not a lot as that is production box working as linux
based firewall/router.

First I took tcpdump. Although ssh connection to the router is not
always possible as it often hangs before I get into router, after some
attempts ssh connection was established. On client computer I've started
tcpdump and worked a bit until hang. tcpdump output showed me that when
I press any keys the packets are sent to the server and proper ack are
received. Later I found that all commands I enter blindly are executed
on router but I receive no reply packets with some data in them (pure
ack). That's why nothing happens on the screen and it looks like
hanging.

Now I've got to the router started ssh connection from router to some
other server. It hanged too. I attached strace and found that ssh
receive keyboard pressings (read() calls in the output) and writes them
further to the kernel (write() calls) but tcpdump on the router shows no
packets. So packets enter kernel and lost somewhere inside.

Now a information about my system. That's a pentium4 system with
hyper-threading enabled. cpuinfo and lspci output attached. kernel built
with "gcc version 4.1.2 (Gentoo 4.1.2 p1.0.2)" and binutils version
2.17. My .config file for all kernels I've mentioned is available here:

http://theor.ran.gpi.ru/linux-2.6.19-gentoo-r5-config (works)
http://theor.ran.gpi.ru/linux-2.6.21-gentoo-r4-config (not works)
http://theor.ran.gpi.ru/linux-2.6.22-gentoo-r8-config (not works)

Besides standard gentoo patchsets all kernels have IMQ and IPSET's
patches.

Does anybody have any idea what's going on with the latest kernels? How
to debug it further?

-- 
Peter.

[-- Attachment #1.2: router-lspci.txt --]
[-- Type: text/plain, Size: 1174 bytes --]

00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub Interface (rev 02)
00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV15 [GeForce2 GTS/Pro] (rev a4)
02:0a.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03)
02:0b.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03)
03:04.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 05)
03:05.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 05)
04:04.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 05)
04:05.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 05)


[-- Attachment #1.3: routers-cpuinfo.txt --]
[-- Type: text/plain, Size: 1396 bytes --]

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping        : 9
cpu MHz         : 3198.784
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc
pni monitor ds_cpl cid xtpr
bogomips        : 6401.59

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping        : 9
cpu MHz         : 3198.784
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc
pni monitor ds_cpl cid xtpr
bogomips        : 6397.43


[-- Attachment #2: Эта часть сообщения подписана цифровой подписью --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regression(?): starting with 2.6.21 sending packets became broken.
  2007-10-13 18:16 regression(?): starting with 2.6.21 sending packets became broken Peter Volkov
@ 2007-10-13 18:59 ` David
  2007-10-13 20:35   ` Jan Engelhardt
  2007-10-28  8:33   ` Peter Volkov
  0 siblings, 2 replies; 5+ messages in thread
From: David @ 2007-10-13 18:59 UTC (permalink / raw)
  To: Peter Volkov; +Cc: linux-kernel

Peter Volkov wrote:
> Hello, all on the list.
>
> Please CC me in answers, I'm not subscribed. Please, if this is wrong
> list tell me what is correct.
>
> Starting with 2.6.21 (or may be 2.6.20 as I have not tried it) kernel I
> have problem that most tcp based services freeze at some point of
> operation. I've noticed this first on ssh but then found out that at
> lease one other service became similarly. The problem sites somewhere in
> the kernel as I've compiled 2.6.19, 2.6.21, and 2.6.22 with the
> similar .config options (of course not exact, as some options does not
> exist in some kernels, but seems that enabled options are all the same)
> but I have this problem only with the 21 and 22. I've tried to debug the
> problem a bit, but not a lot as that is production box working as linux
> based firewall/router.
>   
Try

echo 0 > /proc/sys/net/ipv4/tcp_window_scaling

I bet you have broken router(s) between your machine and the problem
site(s).

Cheers
David

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regression(?): starting with 2.6.21 sending packets became broken.
  2007-10-13 18:59 ` David
@ 2007-10-13 20:35   ` Jan Engelhardt
  2007-10-13 23:23     ` Stephen Hemminger
  2007-10-28  8:33   ` Peter Volkov
  1 sibling, 1 reply; 5+ messages in thread
From: Jan Engelhardt @ 2007-10-13 20:35 UTC (permalink / raw)
  To: David; +Cc: Peter Volkov, linux-kernel


On Oct 13 2007 19:59, David wrote:
>Try
>
>echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
>
>I bet you have broken router(s) between your machine and the problem
>site(s).

There is an xt_TCPOPTSTRIP module in the works that allows you to strip
Window Scaling only on the connections you want (rather than globally);
seems to be in for 2.6.24 at earliest, though it's there is also the 
standalone patch.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regression(?): starting with 2.6.21 sending packets became broken.
  2007-10-13 20:35   ` Jan Engelhardt
@ 2007-10-13 23:23     ` Stephen Hemminger
  0 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2007-10-13 23:23 UTC (permalink / raw)
  To: linux-kernel

On Sat, 13 Oct 2007 22:35:25 +0200 (CEST)
Jan Engelhardt <jengelh@computergmbh.de> wrote:

> 
> On Oct 13 2007 19:59, David wrote:
> >Try
> >
> >echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
> >
> >I bet you have broken router(s) between your machine and the problem
> >site(s).
> 
> There is an xt_TCPOPTSTRIP module in the works that allows you to strip
> Window Scaling only on the connections you want (rather than globally);
> seems to be in for 2.6.24 at earliest, though it's there is also the 
> standalone patch.

You can also do it on a per route basis which is easier than bothering
with filtering rules by just enforcing a window size limit.

	ip route add {broken_dst}/32 via {gateway} window 65535


Long description at:
	http://lwn.net/Articles/92727/

-- 
Stephen Hemminger <shemminger@linux-foundation.org>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regression(?): starting with 2.6.21 sending packets became broken.
  2007-10-13 18:59 ` David
  2007-10-13 20:35   ` Jan Engelhardt
@ 2007-10-28  8:33   ` Peter Volkov
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Volkov @ 2007-10-28  8:33 UTC (permalink / raw)
  To: David, Jan Engelhardt; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3783 bytes --]

Hello, David, Jan.

В Сбт, 13/10/2007 в 19:59 +0100, David пишет:
> Try
> 
> echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
> 
> I bet you have broken router(s) between your machine and the problem
> site(s).

Thank you for your help, but it occurs that the problem comes from IMQ
patch. I've reported that on imqlinux mailing list, but seems that it's
closed for those who do not have yahoo ID so I copy my report there and
answer here.

===========my investigation of the problem=================

This bug was reported in our bugzilla ( bugs.gentoo.org/195731 ) but as
I found that IMQ is the root of problems I wanted to share my experience
here.

Starting with 2.6.21 (or may be 2.6.20 as I have not tried it) kernel I
have problem that most tcp based services freeze at some point of
operation. I've noticed this first on ssh but then found out that at
lease one other service
became similarly. The problem sites somewhere in the kernel as I've
compiled 2.6.19, 2.6.21, and 2.6.22 with the similar .config options (of
course not exact, as some options does not exist in some kernels, but
seems that
enabled options are all the same) but I have this problem only with the
21 and 22. I've tried to debug the problem a bit, but not a lot as that
is production box working as linux based firewall/router.

First I took tcpdump. Although ssh connection to the router is not
always possible as it often hangs before I get into router, after some
attempts ssh connection was established. On client computer I've started
tcpdump and worked a bit until hang. tcpdump output showed me that when
I press any keys the packets are sent to the server and proper ack are
received. Later I found that all commands I enter blindly are executed
on router but I receive no reply packets with some data in them (pure
ack). That's why nothing happens on the screen and it looks like
hanging.

Now I've got to the router started ssh connection from router to some
other server. It hanged too. I attached strace and found that ssh
receive keyboard pressings (read() calls in the output) and writes them
further
to the kernel (write() calls) but tcpdump on the router shows no
packets. So packets enter kernel and lost somewhere inside.

This problem was reproduced both on single core amd64 system and on x86
system with hyper threading. So I suspect everybody could reproduce this
problem. Just start `yes` which produce a lot of output and then press
Ctrl+C to interupt. It hanged here somewhere at this moment.

Suggestion "echo 0 > /proc/sys/net/ipv4/tcp_window_scaling" does not
helped here.

The end of the story is that if I localized that the problem sites in
this http://www.linuximq.net/patchs/linux-2.6.21-img2.diff IMQ patch.
If I install clean gentoo sources then connection does not freeze and
with this patch I have problems. BTW. This patch
http://www.actusa.net/~linuximq/linux-2.6.23-imq.diff does not have this
problem here too.

Thank you for your attention. I think may be it's good idea to mark that
patch as questionable on site?

===========================================================


======== vlad031 answer on linuximq mailing list===========

Re: regression(?): starting with 2.6.21 sending packets became broken 

We allready know that ... almost everybody does ... that's why we have
uploaded the good patches here:
http://www.actusa.net/~linuximq/

Andree isn't seem to take care of this project anymore...

Please note that 2.6.23 kernel has a lot of bugs and we don't recommend
using it yet as we had some imq problems related too (however, it doesnt
seem to be from imq patch)

Cheers!

===========================================================

-- 
Peter.

[-- Attachment #2: Эта часть сообщения подписана цифровой подписью --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-10-28  8:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-13 18:16 regression(?): starting with 2.6.21 sending packets became broken Peter Volkov
2007-10-13 18:59 ` David
2007-10-13 20:35   ` Jan Engelhardt
2007-10-13 23:23     ` Stephen Hemminger
2007-10-28  8:33   ` Peter Volkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.