linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* gateway icmp redirect handling problem (3.0.36-3.0.23)
@ 2012-07-20 22:44 Simon Roscic
  2012-08-28 17:35 ` Rune Darrud
  0 siblings, 1 reply; 2+ messages in thread
From: Simon Roscic @ 2012-07-20 22:44 UTC (permalink / raw)
  To: linux-kernel

Hello,

I'm experiencing the following problem with kernel versions 3.0.36 
(down to 3.0.23):

on our network we all have one default gateway, it's 10.1.1.254, but 
there are some networks for which we have another gateway and for this 
networks the default gateway sends an icmp redirect.

lets assume my test machine has ip 10.1.20.79 netmask is 255.255.0.0 
and my default gateway is 10.1.1.254, i now ping the following ip: 
10.109.98.11, my default gateway (10.1.1.254) now sends me an icmp 
redirect to another gateway (10.1.1.1) ... and now everything works as 
expected, i get the replies from 10.109.98.11 but not for long, after 
approx. 60 (or so) seconds i only get "ping: sendmsg: Network is down".
(exact same problem with all other tcp/udp protocols, but i used ping 
for the tests because it also prints the redirect messages to the 
console)

so let's have a closer look:

not ok - kernel versions 3.0.36 down to 3.0.23:
-----------------------------------------------

test-simon:~ # ping 10.109.98.10
...
64 bytes from 10.109.98.11: icmp_seq=62 ttl=60 time=12.1 ms
64 bytes from 10.109.98.11: icmp_seq=63 ttl=60 time=11.6 ms
ping: sendmsg: Network is down
ping: sendmsg: Network is down

when looking at "ip neigh" the "ping: sendmsg: Network is down" message 
appears in the exact moment when the arp entry for the default gateway 
(10.1.1.254) gets removed from the arp cache:

ping "OK"
test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE
10.1.1.254 dev eth0 lladdr 00:1a:64:8f:23:64 STALE

ping "dead"
test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE

so it seems that when the default gateway is removed from the arp cache 
something goes wrong in the kernel route handling. i don't know the 
internals of the linux route handling, now i need your help, any ideas 
what's going wrong?

i did a lot of tests, the problem i described first happens with kernel 
version 3.0.23, i found in the changelog of 3.0.23 the following two 
commits:
(http://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.0.23)

commit 42ab5316ddcaa0de23e88e8a3d363c767b9ab0b3
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Fri Nov 18 15:24:32 2011 -0500
ipv4: fix redirect handling

commit bebee22bcbf0026f92141990972bd5863ef9b69c
Author: Flavio Leitner <fbl@redhat.com>
Date:   Mon Oct 24 02:56:38 2011 -0400
route: fix ICMP redirect validation

i then took the net/ipv4/route.c file from kernel 3.0.22 and replaced 
the version in 3.0.23 with it, this reverts the two mentioned patches 
above (if i havent overlooked something) after that the problem 
disappears.
so those two patches surely fixed some problem but for kernel versions 
3.0.23-3.0.36 they broke the gateway icmp redirect handling as described 
by me here.

i did some further tests with different kernel versions:
3.5-rc6: OK
3.4.4: OK
3.2.22: OK
3.0.1 - 3.0.22: OK
3.0.23 - 3.0.36: not OK
2.6.35.13: OK

now lets have a closer look at a kernel version which works:
------------------------------------------------------------

this is from 3.5-rc6, but 3.4.4, 3.2.2 and 2.6.35.13 also behave 
exactly this way, 3.0.1-3.0.22 behave slightly different, see note 
below.

test-simon:~ # ping 10.109.98.11
PING 10.109.98.10 (10.109.98.11) 56(84) bytes of data.
64 bytes from 10.109.98.11: icmp_seq=1 ttl=60 time=15.2 ms
 From 10.1.1.254: icmp_seq=2 Redirect Host(New nexthop: 10.1.1.1)
...

test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE
10.1.1.254 dev eth0 lladdr 00:1a:64:8f:23:64 STALE

and after approx 60 or so seconds:

test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE

and ping (and everything else) is as expected still working.

note:
-----

on 3.0.1-3.0.22:

i see lots of icmp redirects sent from the default gateway (10.1.1.254) 
to my test machine, while running tcpdump on the default gateway 
(10.1.1.254) i see every ping packet also arriving there and also some 
icmp redirect messages going out to my test machine.
but everything works so i think my test machine is correctly talking to 
the destination using the other gateway (10.1.1.1).
i also sniffed a windows 7 client pc, it looks the same there, so 
possibly no problem, but i mention this because kernel versions 3.5-rc6, 
3.4.4, 3.2.22 and 2.6.35.13 act differently (see below).

on 3.0.23-3.0.36:

i see lots of icmp redirects sent from the default gateway (10.1.1.254) 
to my test machine, while running tcpdump on the default gateway 
(10.1.1.254) i see up to 20 ping packets arriving there and also up to 
17 icmp redirect messages going out to my test machine, after the 20th 
ping packet i dont see further ping packets arriving at the default 
gateway. so my test machine is then only talking to the other gateway 
(10.1.1.1) i think.
...
17:48:41.643952 IP 10.1.1.254 > 10.1.20.79: ICMP redirect 10.109.98.11 
to host 10.1.1.1, length 92
...
17:48:44.649008 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
30733, seq 20, length 64
17:48:44.649018 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
30733, seq 20, length 64

on 3.5-rc6, 3.4.4, 3.2.22 and 2.6.35.13:

here it looks different, and for me this is the expected behavior, or 
at least the behavior i have seen from lots of linux machines on my 
network. i see 1-2 icmp redirects sent from the default gateway 
(10.1.1.254) to my test machine, while running tcpdump on the default 
gateway (10.1.1.254) i only see up to 2 ping packets arriving then 
nothing, so then my test machine seems to only talk to the other gateway 
(10.1.1.1).

17:50:58.995894 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
10766, seq 1, length 64
17:50:58.995914 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
10766, seq 1, length 64
17:50:59.997260 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
10766, seq 2, length 64
17:50:59.997277 IP 10.1.1.254 > 10.1.20.79: ICMP redirect 10.109.98.11 
to host 10.1.1.1, length 92
17:50:59.997287 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
10766, seq 2, length 64

...

(before someone asks why i "must" use kernel 3.0.x ... because this are 
SLES 11 SP2 VMs and they currently ship kernel 3.0.34)

i hope i described the problem in a way so that the kernel network 
stack maintainers can understand the problem, please conact me if you 
have further questions, and please CC me as i am not subscribed to 
linux-kernel. this message is already on linux-netdev, if you wish you 
can CC your answer also there.

kind regards,
Simon Roscic.


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: gateway icmp redirect handling problem (3.0.36-3.0.23)
  2012-07-20 22:44 gateway icmp redirect handling problem (3.0.36-3.0.23) Simon Roscic
@ 2012-08-28 17:35 ` Rune Darrud
  0 siblings, 0 replies; 2+ messages in thread
From: Rune Darrud @ 2012-08-28 17:35 UTC (permalink / raw)
  To: linux-kernel

See inline answer below.

Simon Roscic <simon <at> segfault.info> writes:

> 
> i did some further tests with different kernel versions:
> 3.5-rc6: OK
> 3.4.4: OK
> 3.2.22: OK
> 3.0.1 - 3.0.22: OK
> 3.0.23 - 3.0.36: not OK
> 2.6.35.13: OK

Let me add that kernel 3.0.38 also experiences the same for SLES 11 SP2. A
restart of the network resolves it temporarily for a few hours. After running
fine for a few hours after upgrade from 2.6.3x to 3.0.38 via zypper, this is not
a good situation.

> 
> now lets have a closer look at a kernel version which works:
> .....
> 
> (before someone asks why i "must" use kernel 3.0.x ... because this are 
> SLES 11 SP2 VMs and they currently ship kernel 3.0.34)

Going to raise an SR with Novell about this.

> 
> i hope i described the problem in a way so that the kernel network 
> stack maintainers can understand the problem, please conact me if you 
> have further questions, and please CC me as i am not subscribed to 
> linux-kernel. this message is already on linux-netdev, if you wish you 
> can CC your answer also there.
> 
> kind regards,
> Simon Roscic.
> 
> 

Best regards,
Rune "TheFlyingCorpse" Darrud




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-08-28 17:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-20 22:44 gateway icmp redirect handling problem (3.0.36-3.0.23) Simon Roscic
2012-08-28 17:35 ` Rune Darrud

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).