All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.0: unexpected route cache entry for wrong segment?
@ 2012-02-09 17:02 Michael Tokarev
  2012-02-09 17:45 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Tokarev @ 2012-02-09 17:02 UTC (permalink / raw)
  To: netdev

Hello.

I'm observing a situation when just one single IP
address from entirely different segment gets routed
locally as if it were in a directly-connected network.

Here's how.  The short version, to show the idea, first:

A host with single eth0 interface and single IP address
(not counting loopback interface):

$ ip addr
8: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:c0:a8:b1:02 brd ff:ff:ff:ff:ff:ff
    inet 192.168.177.2/26 scope global eth0

$ ip route
default via 192.168.177.5 dev eth0
192.168.177.0/26 dev eth0  proto kernel  scope link  src 192.168.177.2

$ ip neigh
...
192.168.177.5 dev eth0 lladdr 00:90:27:30:6d:1c REACHABLE
192.168.177.33 dev eth0 lladdr 38:60:77:25:3f:95 REACHABLE
192.168.19.166 dev eth0  FAILED
192.168.177.21 dev eth0 lladdr 52:54:c0:a8:b1:15 REACHABLE

The address in question is this 192.168.19.166 -- it should
not be tried on locally connected ethernet segment, but instead
should go to the (default) gateway at 192.168.177.5.

This machine is running 3.0.18 kernel.  The gateway (also
running this kernel) can access the IP in question just fine
(it is 2 hops away from the gateway, not reachable directly
neither from the gw nor from the machine in question).

After some searching we found a very very similarly looking
issue:

 http://lists.openwall.net/netdev/2011/11/15/126
  "Unable to flush ICMP redirect routes in kernel 3.0+"

with a good reproducer:

 http://lists.openwall.net/netdev/2011/11/16/138

The issue however is that, in our case, I can't reproduce
this problem at all using the way described by Ivan Zahariev
in the last message: sending redirects from the geateay for
"random" addresses does not make corresponding "persistent"
cache entries, once the route on the gw gets removed, that
IP address starts working again from the machine in question.

So now we have only one IP address that behaves like this,
and I can't get other addresses to repeat its behavour.

The problem appeared suddenly, while the network was in
use.

What is also interesting here is that the gateway should
never send a redirect like that because it has explicit
route for that network pointing to entirely different
machine.

I can work around the _current_ problem we're facing by
moving the host in question (192.168.19.166) to another
IP address.  But I'd love to understand what's going on
here.

Also, it appears that the patch that emerged from the
mentioned discussion hasn't been released in any
stable kernels so far - is there some issue with it?

And since I can't reproduce the issue here as described
above, I've one more question: should it be reproducible?

And finally, here's some more details about our setup.
It is actually a "bit" more complex, involving bridges,
vlans, veth and tap devices.

The "host" in question is a lxc guest on veth interface.
Its veth iface is connected to a bridge "tls-br" on the
host.  I'm omiting some details still (like other lxc
guests which have very similar config, and also kvm
guests with tap interfaces).

 host$ ip addr
 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
     link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff
 3: tls-vlan@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master tls-br state UP
     link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff
 4: tls-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
     link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff
     inet 192.168.177.15/26 brd 192.168.177.63 scope global tls-br
 9: veth-tsrv: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master tls-br state UP qlen 1000
     link/ether 5e:e8:4f:67:80:17 brd ff:ff:ff:ff:ff:ff

tls-br connects tls-vlan@eth0 and veth-tsrv.  It has an
address from the same 192.168.177/26 segment as the guest
in question.

 host$ ip route
 default via 192.168.177.5 dev tls-br
 192.168.177.0/26 dev tls-br  proto kernel  scope link  src 192.168.177.15
 (this is a complete routing table, there's no more routes)

What is also very interesting is that this problem with
this single IP address affects ALL lxc machines on this
host at once, and the host itself:

 host$ ip neigh
 192.168.177.35 dev tls-br lladdr 6c:f0:49:9d:f2:0c STALE
 192.168.19.166 dev tls-br  FAILED
 192.168.177.38 dev tls-br lladdr 38:60:77:25:3f:9c STALE
 192.168.177.5 dev tls-br lladdr 00:90:27:30:6d:1c DELAY
 ...

(after trying to ping it).

Each "subdivision" on this host has its own arp table, but
every subdivision (host itself or any of it lxc guests which
all have similar config) always tries to reach thiis very
IP address directly.

 otherLXCguest$ ip n
 192.168.19.166 dev eth0  INCOMPLETE
 192.168.177.15 dev eth0 lladdr 00:1f:c6:ef:e5:1b STALE
 192.168.177.5 dev eth0 lladdr 00:90:27:30:6d:1c DELAY

So.. it looks like something does not work right across
namespaces.

Any clue what's going on?

Thank you!

/mjt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.0: unexpected route cache entry for wrong segment?
  2012-02-09 17:02 3.0: unexpected route cache entry for wrong segment? Michael Tokarev
@ 2012-02-09 17:45 ` Eric Dumazet
  2012-02-09 18:05   ` Eric Dumazet
  2012-02-09 18:37   ` Michael Tokarev
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2012-02-09 17:45 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev

Le jeudi 09 février 2012 à 21:02 +0400, Michael Tokarev a écrit :
> Hello.
> 
> I'm observing a situation when just one single IP
> address from entirely different segment gets routed
> locally as if it were in a directly-connected network.
> 
> Here's how.  The short version, to show the idea, first:
> 
> A host with single eth0 interface and single IP address
> (not counting loopback interface):
> 
> $ ip addr
> 8: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
>     link/ether 52:54:c0:a8:b1:02 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.177.2/26 scope global eth0
> 
> $ ip route
> default via 192.168.177.5 dev eth0
> 192.168.177.0/26 dev eth0  proto kernel  scope link  src 192.168.177.2
> 
> $ ip neigh
> ...
> 192.168.177.5 dev eth0 lladdr 00:90:27:30:6d:1c REACHABLE
> 192.168.177.33 dev eth0 lladdr 38:60:77:25:3f:95 REACHABLE
> 192.168.19.166 dev eth0  FAILED
> 192.168.177.21 dev eth0 lladdr 52:54:c0:a8:b1:15 REACHABLE
> 
> The address in question is this 192.168.19.166 -- it should
> not be tried on locally connected ethernet segment, but instead
> should go to the (default) gateway at 192.168.177.5.
> 
> This machine is running 3.0.18 kernel.  The gateway (also
> running this kernel) can access the IP in question just fine
> (it is 2 hops away from the gateway, not reachable directly
> neither from the gw nor from the machine in question).
> 
> After some searching we found a very very similarly looking
> issue:
> 
>  http://lists.openwall.net/netdev/2011/11/15/126
>   "Unable to flush ICMP redirect routes in kernel 3.0+"
> 
> with a good reproducer:
> 
>  http://lists.openwall.net/netdev/2011/11/16/138
> 
> The issue however is that, in our case, I can't reproduce
> this problem at all using the way described by Ivan Zahariev
> in the last message: sending redirects from the geateay for
> "random" addresses does not make corresponding "persistent"
> cache entries, once the route on the gw gets removed, that
> IP address starts working again from the machine in question.
> 
> So now we have only one IP address that behaves like this,
> and I can't get other addresses to repeat its behavour.
> 
> The problem appeared suddenly, while the network was in
> use.
> 
> What is also interesting here is that the gateway should
> never send a redirect like that because it has explicit
> route for that network pointing to entirely different
> machine.
> 
> I can work around the _current_ problem we're facing by
> moving the host in question (192.168.19.166) to another
> IP address.  But I'd love to understand what's going on
> here.
> 
> Also, it appears that the patch that emerged from the
> mentioned discussion hasn't been released in any
> stable kernels so far - is there some issue with it?
> 
> And since I can't reproduce the issue here as described
> above, I've one more question: should it be reproducible?
> 
> And finally, here's some more details about our setup.
> It is actually a "bit" more complex, involving bridges,
> vlans, veth and tap devices.
> 
> The "host" in question is a lxc guest on veth interface.
> Its veth iface is connected to a bridge "tls-br" on the
> host.  I'm omiting some details still (like other lxc
> guests which have very similar config, and also kvm
> guests with tap interfaces).
> 
>  host$ ip addr
>  2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
>      link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff
>  3: tls-vlan@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master tls-br state UP
>      link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff
>  4: tls-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
>      link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff
>      inet 192.168.177.15/26 brd 192.168.177.63 scope global tls-br
>  9: veth-tsrv: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master tls-br state UP qlen 1000
>      link/ether 5e:e8:4f:67:80:17 brd ff:ff:ff:ff:ff:ff
> 
> tls-br connects tls-vlan@eth0 and veth-tsrv.  It has an
> address from the same 192.168.177/26 segment as the guest
> in question.
> 
>  host$ ip route
>  default via 192.168.177.5 dev tls-br
>  192.168.177.0/26 dev tls-br  proto kernel  scope link  src 192.168.177.15
>  (this is a complete routing table, there's no more routes)
> 
> What is also very interesting is that this problem with
> this single IP address affects ALL lxc machines on this
> host at once, and the host itself:
> 
>  host$ ip neigh
>  192.168.177.35 dev tls-br lladdr 6c:f0:49:9d:f2:0c STALE
>  192.168.19.166 dev tls-br  FAILED
>  192.168.177.38 dev tls-br lladdr 38:60:77:25:3f:9c STALE
>  192.168.177.5 dev tls-br lladdr 00:90:27:30:6d:1c DELAY
>  ...
> 
> (after trying to ping it).
> 
> Each "subdivision" on this host has its own arp table, but
> every subdivision (host itself or any of it lxc guests which
> all have similar config) always tries to reach thiis very
> IP address directly.
> 
>  otherLXCguest$ ip n
>  192.168.19.166 dev eth0  INCOMPLETE
>  192.168.177.15 dev eth0 lladdr 00:1f:c6:ef:e5:1b STALE
>  192.168.177.5 dev eth0 lladdr 00:90:27:30:6d:1c DELAY
> 
> So.. it looks like something does not work right across
> namespaces.
> 
> Any clue what's going on?
> 
> Thank you!

Did you try to apply by hand commits :

7cc9150ebe8ec06cafea9f1c10d92ddacf88d8ae   // added in 3.2
(route: fix ICMP redirect validation)

and
9cc20b268a5a14f5e57b8ad405a83513ab0d78dc
(ipv4: fix redirect handling)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.0: unexpected route cache entry for wrong segment?
  2012-02-09 17:45 ` Eric Dumazet
@ 2012-02-09 18:05   ` Eric Dumazet
  2012-02-09 18:37   ` Michael Tokarev
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2012-02-09 18:05 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev

Le jeudi 09 février 2012 à 18:45 +0100, Eric Dumazet a écrit :

> Did you try to apply by hand commits :
> 
> 7cc9150ebe8ec06cafea9f1c10d92ddacf88d8ae   // added in 3.2
> (route: fix ICMP redirect validation)
> 
> and
> 9cc20b268a5a14f5e57b8ad405a83513ab0d78dc
> (ipv4: fix redirect handling)

Oh well, please forgive my stupid questions.

David is currently working on backporting to 3.0 all necessary fixes for
this exact problem.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.0: unexpected route cache entry for wrong segment?
  2012-02-09 17:45 ` Eric Dumazet
  2012-02-09 18:05   ` Eric Dumazet
@ 2012-02-09 18:37   ` Michael Tokarev
  2012-02-15 12:10     ` Michael Tokarev
  1 sibling, 1 reply; 12+ messages in thread
From: Michael Tokarev @ 2012-02-09 18:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 09.02.2012 21:45, Eric Dumazet wrote:
> Le jeudi 09 février 2012 à 21:02 +0400, Michael Tokarev a écrit :
[]
>> The issue however is that, in our case, I can't reproduce
>> this problem at all using the way described by Ivan Zahariev
>> in the last message: sending redirects from the geateay for
>> "random" addresses does not make corresponding "persistent"
>> cache entries, once the route on the gw gets removed, that
>> IP address starts working again from the machine in question.
>>
>> So now we have only one IP address that behaves like this,
>> and I can't get other addresses to repeat its behavour.
>>
>> The problem appeared suddenly, while the network was in
>> use.
>>
>> What is also interesting here is that the gateway should
>> never send a redirect like that because it has explicit
>> route for that network pointing to entirely different
>> machine.
[]
>> What is also very interesting is that this problem with
>> this single IP address affects ALL lxc machines on this
>> host at once, and the host itself:
>>
>>  host$ ip neigh
>>  192.168.177.35 dev tls-br lladdr 6c:f0:49:9d:f2:0c STALE
>>  192.168.19.166 dev tls-br  FAILED
>>  192.168.177.38 dev tls-br lladdr 38:60:77:25:3f:9c STALE
>>  192.168.177.5 dev tls-br lladdr 00:90:27:30:6d:1c DELAY
>>  ...
>>
>> (after trying to ping it).
>>
>> Each "subdivision" on this host has its own arp table, but
>> every subdivision (host itself or any of it lxc guests which
>> all have similar config) always tries to reach thiis very
>> IP address directly.
>>
>>  otherLXCguest$ ip n
>>  192.168.19.166 dev eth0  INCOMPLETE
>>  192.168.177.15 dev eth0 lladdr 00:1f:c6:ef:e5:1b STALE
>>  192.168.177.5 dev eth0 lladdr 00:90:27:30:6d:1c DELAY
>>
>> So.. it looks like something does not work right across
>> namespaces.
>>
>> Any clue what's going on?
>>
>> Thank you!
> 
> Did you try to apply by hand commits :
> 
> 7cc9150ebe8ec06cafea9f1c10d92ddacf88d8ae   // added in 3.2
> (route: fix ICMP redirect validation)
> 
> and
> 9cc20b268a5a14f5e57b8ad405a83513ab0d78dc
> (ipv4: fix redirect handling)

I haven't tried anything yet, as mentioned above: this prob
just appeared today, out of the sudden, and what's the most
important (imho) is that I can not reproduce it.  The host
hasn't been rebooted, I were thinking about maybe some
experiments with it before doing anything else.

But I blocked this specific IP address on the gateway and the
cached entry expired after 10 minutes (that host tried to
check mail every minute so no doubt the inactivity timer
never triggered).  So at least one difference in behavour is
now gone.

What bothers me more are 3 other issues I see around this:

1. Why this specific IP were cached to start with?  I don't
  expect any ICMP redirects for that network at all, and no
  spoofing or malicious traffic either.

2. I can't reproduce the issue while forcing ICMP redirects.
 Maybe my original prob was not due to a redirect but due to
 something else?  I dunno.

3. Why it affects whole host and all numerous different/separate
 network namespaces on it?  _All_ lxc containers started thinking
 this IP is reachable on the local subnet, at once, even those
 who never ever tried to send any packets to that IP before!

And in another email you wrote:

> Oh well, please forgive my stupid questions.
>
> David is currently working on backporting to 3.0 all necessary fixes for
> this exact problem.

I haven't tried to even reboot the host.  Because, well, even if
I'll do, I've no way to verify if the problem is fixed or not,
or even if it is the same problem or something else.  The namespace
thing here is most interesting imho.

But at least now I know why it hasn't been appeared in 3.0 stable :)

Thank you!

/mjt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.0: unexpected route cache entry for wrong segment?
  2012-02-09 18:37   ` Michael Tokarev
@ 2012-02-15 12:10     ` Michael Tokarev
  2012-02-15 12:44       ` Michael Tokarev
  2012-02-15 12:46       ` Eric Dumazet
  0 siblings, 2 replies; 12+ messages in thread
From: Michael Tokarev @ 2012-02-15 12:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, David Miller

On 09.02.2012 22:37, Michael Tokarev wrote:
> On 09.02.2012 21:45, Eric Dumazet wrote:
[]
>> Did you try to apply by hand commits :
>>
>> 7cc9150ebe8ec06cafea9f1c10d92ddacf88d8ae   // added in 3.2
>> (route: fix ICMP redirect validation)
>>
>> and
>> 9cc20b268a5a14f5e57b8ad405a83513ab0d78dc
>> (ipv4: fix redirect handling)
>
> I haven't tried anything yet, as mentioned above: this prob
> just appeared today, out of the sudden, and what's the most
> important (imho) is that I can not reproduce it.  The host
> hasn't been rebooted, I were thinking about maybe some
> experiments with it before doing anything else.
>
> But I blocked this specific IP address on the gateway and the
> cached entry expired after 10 minutes (that host tried to
> check mail every minute so no doubt the inactivity timer
> never triggered).  So at least one difference in behavour is
> now gone.
>
> What bothers me more are 3 other issues I see around this:
>
> 1. Why this specific IP were cached to start with?  I don't
>    expect any ICMP redirects for that network at all, and no
>    spoofing or malicious traffic either.
>
> 2. I can't reproduce the issue while forcing ICMP redirects.
>   Maybe my original prob was not due to a redirect but due to
>   something else?  I dunno.
>
> 3. Why it affects whole host and all numerous different/separate
>   network namespaces on it?  _All_ lxc containers started thinking
>   this IP is reachable on the local subnet, at once, even those
>   who never ever tried to send any packets to that IP before!

Do you have any insight about all this?  That smells.. fishy
somehow.  Or maybe not, since all lxc guests here are connected
to the same bridge on the host, so it is the host who does the
bad thing, apparently, hence affecting all the guests (guest
routes packets over veth to host which does further bridging/routing).

> And in another email you wrote:
>
>> David is currently working on backporting to 3.0 all necessary fixes for
>> this exact problem.

David, any progress with these?

7cc9150ebe8ec06cafea9f1c10d92ddacf88d8ae "route: fix ICMP redirect validation"
applies correctly to 3.0, but 9cc20b268a5a14f5e57b8ad405a83513ab0d78dc
"ipv4: fix redirect handling" does not, due to some changes in-between,
but these should be easy to sort out.  Should I perhaps refresh this
patch myself?  It should be doable, I think.

Thank you!

/mjt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.0: unexpected route cache entry for wrong segment?
  2012-02-15 12:10     ` Michael Tokarev
@ 2012-02-15 12:44       ` Michael Tokarev
  2012-02-15 12:46       ` Eric Dumazet
  1 sibling, 0 replies; 12+ messages in thread
From: Michael Tokarev @ 2012-02-15 12:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, David Miller

[-- Attachment #1: Type: text/plain, Size: 1920 bytes --]

On 15.02.2012 16:10, Michael Tokarev wrote:
> On 09.02.2012 22:37, Michael Tokarev wrote:
>> On 09.02.2012 21:45, Eric Dumazet wrote:
> []
>>> Did you try to apply by hand commits :
>>>
>>> 7cc9150ebe8ec06cafea9f1c10d92ddacf88d8ae // added in 3.2
>>> (route: fix ICMP redirect validation)
>>>
>>> and
>>> 9cc20b268a5a14f5e57b8ad405a83513ab0d78dc
>>> (ipv4: fix redirect handling)
[]
>>> David is currently working on backporting to 3.0 all necessary fixes for
>>> this exact problem.
>
> David, any progress with these?
>
> 7cc9150ebe8ec06cafea9f1c10d92ddacf88d8ae "route: fix ICMP redirect validation"
> applies correctly to 3.0, but 9cc20b268a5a14f5e57b8ad405a83513ab0d78dc
> "ipv4: fix redirect handling" does not, due to some changes in-between,
> but these should be easy to sort out. Should I perhaps refresh this
> patch myself? It should be doable, I think.

A quick followup.

9cc20b268a5a14f5e57b8ad405a83513ab0d78dc does not apply to current 3.0-stable
(3.0.21) because in last release, a backport of d3aaeb38c40e5a6c08dd31a1b64da65c4352be36
"net: fix NULL dereferences in check_peer_redir()" has been applied, which
changed check_peer_redir() routine a bit and it become different than in
subsequent 3.2+ releases.  And 9cc20b268a5a... moves this routine up in
net/ipv4/route.c file.

Here's the difference between check_peer_redir() in 3.0.21 and 3.2+:

          dst_confirm(&rt->dst);

          rt->rt_gateway = peer->redirect_learned.a4;
-        n = __arp_bind_neighbour(&rt->dst, rt->rt_gateway);
+
+        n = ipv4_neigh_lookup(&rt->dst, &rt->rt_gateway);
          if (IS_ERR(n))
                  return PTR_ERR(n);
          old_n = xchg(&rt->dst._neighbour, n);


With this change in mind, attached is a "backport" of 9cc20b268a5a...
to 3.0.21, which applies on top of 7cc9150ebe8ec0... "route: fix
ICMP redirect validation".

I'm building new kernel with the two patches applied

Thanks!

/mjt

[-- Attachment #2: ipv4-fix-redirect-handling.diff --]
[-- Type: text/x-diff, Size: 4756 bytes --]

Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Wed, 15 Feb 2012 16:39:00 +0400
Subject: ipv4: fix redirect handling

[ Upstream commit 9cc20b268a5a14f5e57b8ad405a83513ab0d78dc ]
    
    commit f39925dbde77 (ipv4: Cache learned redirect information in
    inetpeer.) introduced a regression in ICMP redirect handling.
    
    It assumed ipv4_dst_check() would be called because all possible routes
    were attached to the inetpeer we modify in ip_rt_redirect(), but thats
    not true.
    
    commit 7cc9150ebe (route: fix ICMP redirect validation) tried to fix
    this but solution was not complete. (It fixed only one route)
    
    So we must lookup existing routes (including different TOS values) and
    call check_peer_redir() on them.
    
    Reported-by: Ivan Zahariev <famzah@icdsoft.com>
    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    CC: Flavio Leitner <fbl@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 511f4a7..0c74da8 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1304,16 +1304,41 @@ static void rt_del(unsigned hash, struct rtable *rt)
 	spin_unlock_bh(rt_hash_lock_addr(hash));
 }
 
+static int check_peer_redir(struct dst_entry *dst, struct inet_peer *peer)
+{
+	struct rtable *rt = (struct rtable *) dst;
+	__be32 orig_gw = rt->rt_gateway;
+	struct neighbour *n, *old_n;
+
+	dst_confirm(&rt->dst);
+
+	rt->rt_gateway = peer->redirect_learned.a4;
+	n = __arp_bind_neighbour(&rt->dst, rt->rt_gateway);
+	if (IS_ERR(n))
+		return PTR_ERR(n);
+	old_n = xchg(&rt->dst._neighbour, n);
+	if (old_n)
+		neigh_release(old_n);
+	if (!n || !(n->nud_state & NUD_VALID)) {
+		if (n)
+			neigh_event_send(n, NULL);
+		rt->rt_gateway = orig_gw;
+		return -EAGAIN;
+	} else {
+		rt->rt_flags |= RTCF_REDIRECTED;
+		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+	}
+	return 0;
+}
+
 /* called in rcu_read_lock() section */
 void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 		    __be32 saddr, struct net_device *dev)
 {
 	int s, i;
 	struct in_device *in_dev = __in_dev_get_rcu(dev);
-	struct rtable *rt;
 	__be32 skeys[2] = { saddr, 0 };
 	int    ikeys[2] = { dev->ifindex, 0 };
-	struct flowi4 fl4;
 	struct inet_peer *peer;
 	struct net *net;
 
@@ -1336,33 +1362,42 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 			goto reject_redirect;
 	}
 
-	memset(&fl4, 0, sizeof(fl4));
-	fl4.daddr = daddr;
 	for (s = 0; s < 2; s++) {
 		for (i = 0; i < 2; i++) {
-			fl4.flowi4_oif = ikeys[i];
-			fl4.saddr = skeys[s];
-			rt = __ip_route_output_key(net, &fl4);
-			if (IS_ERR(rt))
-				continue;
-
-			if (rt->dst.error || rt->dst.dev != dev ||
-			    rt->rt_gateway != old_gw) {
-				ip_rt_put(rt);
-				continue;
-			}
+			unsigned int hash;
+			struct rtable __rcu **rthp;
+			struct rtable *rt;
+
+			hash = rt_hash(daddr, skeys[s], ikeys[i], rt_genid(net));
+
+			rthp = &rt_hash_table[hash].chain;
+
+			while ((rt = rcu_dereference(*rthp)) != NULL) {
+				rthp = &rt->dst.rt_next;
+
+				if (rt->rt_key_dst != daddr ||
+				    rt->rt_key_src != skeys[s] ||
+				    rt->rt_oif != ikeys[i] ||
+				    rt_is_input_route(rt) ||
+				    rt_is_expired(rt) ||
+				    !net_eq(dev_net(rt->dst.dev), net) ||
+				    rt->dst.error ||
+				    rt->dst.dev != dev ||
+				    rt->rt_gateway != old_gw)
+					continue;
 
-			if (!rt->peer)
-				rt_bind_peer(rt, rt->rt_dst, 1);
+				if (!rt->peer)
+					rt_bind_peer(rt, rt->rt_dst, 1);
 
-			peer = rt->peer;
-			if (peer) {
-				peer->redirect_learned.a4 = new_gw;
-				atomic_inc(&__rt_peer_genid);
+				peer = rt->peer;
+				if (peer) {
+					if (peer->redirect_learned.a4 != new_gw) {
+						peer->redirect_learned.a4 = new_gw;
+						atomic_inc(&__rt_peer_genid);
+					}
+					check_peer_redir(&rt->dst, peer);
+				}
 			}
-
-			ip_rt_put(rt);
-			return;
 		}
 	}
 	return;
@@ -1649,32 +1684,6 @@ static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu)
 	}
 }
 
-static int check_peer_redir(struct dst_entry *dst, struct inet_peer *peer)
-{
-	struct rtable *rt = (struct rtable *) dst;
-	__be32 orig_gw = rt->rt_gateway;
-	struct neighbour *n, *old_n;
-
-	dst_confirm(&rt->dst);
-
-	rt->rt_gateway = peer->redirect_learned.a4;
-	n = __arp_bind_neighbour(&rt->dst, rt->rt_gateway);
-	if (IS_ERR(n))
-		return PTR_ERR(n);
-	old_n = xchg(&rt->dst._neighbour, n);
-	if (old_n)
-		neigh_release(old_n);
-	if (!n || !(n->nud_state & NUD_VALID)) {
-		if (n)
-			neigh_event_send(n, NULL);
-		rt->rt_gateway = orig_gw;
-		return -EAGAIN;
-	} else {
-		rt->rt_flags |= RTCF_REDIRECTED;
-		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
-	}
-	return 0;
-}
 
 static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
 {

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: 3.0: unexpected route cache entry for wrong segment?
  2012-02-15 12:10     ` Michael Tokarev
  2012-02-15 12:44       ` Michael Tokarev
@ 2012-02-15 12:46       ` Eric Dumazet
  2012-02-15 12:57         ` Michael Tokarev
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2012-02-15 12:46 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev, David Miller

Le mercredi 15 février 2012 à 16:10 +0400, Michael Tokarev a écrit :

> David, any progress with these?
> 
> 7cc9150ebe8ec06cafea9f1c10d92ddacf88d8ae "route: fix ICMP redirect validation"
> applies correctly to 3.0, but 9cc20b268a5a14f5e57b8ad405a83513ab0d78dc
> "ipv4: fix redirect handling" does not, due to some changes in-between,
> but these should be easy to sort out.  Should I perhaps refresh this
> patch myself?  It should be doable, I think.

I totally screwed up when I said that, I was mixing things.

Can you reproduce this problem with latest 3.0.21 ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.0: unexpected route cache entry for wrong segment?
  2012-02-15 12:46       ` Eric Dumazet
@ 2012-02-15 12:57         ` Michael Tokarev
  2012-02-15 13:03           ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Tokarev @ 2012-02-15 12:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, David Miller

On 15.02.2012 16:46, Eric Dumazet wrote:
> Le mercredi 15 février 2012 à 16:10 +0400, Michael Tokarev a écrit :
>
>> David, any progress with these?
>>
>> 7cc9150ebe8ec06cafea9f1c10d92ddacf88d8ae "route: fix ICMP redirect validation"
>> applies correctly to 3.0, but 9cc20b268a5a14f5e57b8ad405a83513ab0d78dc
>> "ipv4: fix redirect handling" does not, due to some changes in-between,
>> but these should be easy to sort out.  Should I perhaps refresh this
>> patch myself?  It should be doable, I think.
>
> I totally screwed up when I said that, I was mixing things.

Oh ok.  However at least the first patch (fix ICMP redirect validation)
appears to be quite relevant here.  And the second patch is also about
the very same area.

> Can you reproduce this problem with latest 3.0.21 ?

As I described before, it was the only occurence of this issue here.
I wasn't able to reproduce it by sending "bad" ICMP redirects to the
host in question (running 3.0.18).  The host still hasn't been rebooted,
it is still running the same 3.0.18, but after the neigh entry expired
there hasn't been any other similar issues.  Or at least not the ones
I actually seen -- it's been one case which looked the same but I haven't
seen it really, when I looked it's been gone already since the remote
machine were turned off and the entry has expired (it was from the same
remote segment btw).

Since I still don't understand where that entry (caused by what? stray
ICMP redirect? Something else?) come from in the first place, nor do I
know how to reproduce this, I'm just waiting and watching.

3.0.21 included "net: fix NULL dereferences in check_peer_redir()" patch
(which is somewhat large(ish) - I wonder why it has been rolled into
single patch while in reality it consists of 7 commits; and I wonder
why the final result is different from current version in check_peer_redir()
routine, which I mentioned in my other email in this thread), but that
one does not seem to address this very issue - from a quick view anyway.

Thank you!

/mjt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.0: unexpected route cache entry for wrong segment?
  2012-02-15 12:57         ` Michael Tokarev
@ 2012-02-15 13:03           ` Eric Dumazet
  2012-02-28 11:38             ` Michael Tokarev
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2012-02-15 13:03 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev, David Miller

Le mercredi 15 février 2012 à 16:57 +0400, Michael Tokarev a écrit :

> 3.0.21 included "net: fix NULL dereferences in check_peer_redir()" patch
> (which is somewhat large(ish) - I wonder why it has been rolled into
> single patch while in reality it consists of 7 commits; and I wonder
> why the final result is different from current version in check_peer_redir()
> routine, which I mentioned in my other email in this thread), but that
> one does not seem to address this very issue - from a quick view anyway.

That was the tricky part handled by David.

We couldnt apply all needed commits without bringing too many things
from recent kernels to 3.0  (out of stable scope)

If you believe a fix is needed, just shout :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.0: unexpected route cache entry for wrong segment?
  2012-02-15 13:03           ` Eric Dumazet
@ 2012-02-28 11:38             ` Michael Tokarev
  2012-02-28 19:07               ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Tokarev @ 2012-02-28 11:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, David Miller

On 15.02.2012 17:03, Eric Dumazet wrote:
> Le mercredi 15 février 2012 à 16:57 +0400, Michael Tokarev a écrit :
> 
>> 3.0.21 included "net: fix NULL dereferences in check_peer_redir()" patch
>> (which is somewhat large(ish) - I wonder why it has been rolled into
>> single patch while in reality it consists of 7 commits; and I wonder
>> why the final result is different from current version in check_peer_redir()
>> routine, which I mentioned in my other email in this thread), but that
>> one does not seem to address this very issue - from a quick view anyway.
> 
> That was the tricky part handled by David.
> 
> We couldnt apply all needed commits without bringing too many things
> from recent kernels to 3.0  (out of stable scope)
> 
> If you believe a fix is needed, just shout :)

I think the a fix is needed.  I still don't understand where our
unexpected redirects are coming from, but we had two more occurences
of this very issue.  After applying the two patches:

7cc9150ebe8ec06cafea9f1c10d92ddacf88d8ae route: fix ICMP redirect validation
9cc20b268a5a14f5e57b8ad405a83513ab0d78dc ipv4: fix redirect handling

the issue does not occur anymore.  The system has been running this
kernel for almost 2 weeks now without any issue of this sort.

The first patch applies to 3.0 as it is, the second needs minor
backporting to 3.0.  I already sent the backported version, see
http://patchwork.ozlabs.org/patch/141316/ .

I'm not sure which of the two patches actually helps, but it appears
that both are needed.

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.0: unexpected route cache entry for wrong segment?
  2012-02-28 11:38             ` Michael Tokarev
@ 2012-02-28 19:07               ` David Miller
  2012-02-29  1:00                 ` Michael Tokarev
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2012-02-28 19:07 UTC (permalink / raw)
  To: mjt; +Cc: eric.dumazet, netdev

From: Michael Tokarev <mjt@tls.msk.ru>
Date: Tue, 28 Feb 2012 15:38:36 +0400

> On 15.02.2012 17:03, Eric Dumazet wrote:
>> Le mercredi 15 février 2012 à 16:57 +0400, Michael Tokarev a écrit :
>> 
>>> 3.0.21 included "net: fix NULL dereferences in check_peer_redir()" patch
>>> (which is somewhat large(ish) - I wonder why it has been rolled into
>>> single patch while in reality it consists of 7 commits; and I wonder
>>> why the final result is different from current version in check_peer_redir()
>>> routine, which I mentioned in my other email in this thread), but that
>>> one does not seem to address this very issue - from a quick view anyway.
>> 
>> That was the tricky part handled by David.
>> 
>> We couldnt apply all needed commits without bringing too many things
>> from recent kernels to 3.0  (out of stable scope)
>> 
>> If you believe a fix is needed, just shout :)
> 
> I think the a fix is needed.  I still don't understand where our
> unexpected redirects are coming from, but we had two more occurences
> of this very issue.  After applying the two patches:
> 
> 7cc9150ebe8ec06cafea9f1c10d92ddacf88d8ae route: fix ICMP redirect validation
> 9cc20b268a5a14f5e57b8ad405a83513ab0d78dc ipv4: fix redirect handling

If you were paying attention, you'd see that both of these patches are
in Greg's stable queue already.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.0: unexpected route cache entry for wrong segment?
  2012-02-28 19:07               ` David Miller
@ 2012-02-29  1:00                 ` Michael Tokarev
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Tokarev @ 2012-02-29  1:00 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev

On 28.02.2012 23:07, David Miller wrote:
[]
>>> If you believe a fix is needed, just shout :)
>>
>> I think the a fix is needed.  I still don't understand where our
>> unexpected redirects are coming from, but we had two more occurences
>> of this very issue.  After applying the two patches:
>>
>> 7cc9150ebe8ec06cafea9f1c10d92ddacf88d8ae route: fix ICMP redirect validation
>> 9cc20b268a5a14f5e57b8ad405a83513ab0d78dc ipv4: fix redirect handling
> 
> If you were paying attention, you'd see that both of these patches are
> in Greg's stable queue already.

Indeed, they're in current 3.0.23 review cycle -- somehow
I haven't noticed.  This should be ok, thank you!

/mjt

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-02-29  8:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-09 17:02 3.0: unexpected route cache entry for wrong segment? Michael Tokarev
2012-02-09 17:45 ` Eric Dumazet
2012-02-09 18:05   ` Eric Dumazet
2012-02-09 18:37   ` Michael Tokarev
2012-02-15 12:10     ` Michael Tokarev
2012-02-15 12:44       ` Michael Tokarev
2012-02-15 12:46       ` Eric Dumazet
2012-02-15 12:57         ` Michael Tokarev
2012-02-15 13:03           ` Eric Dumazet
2012-02-28 11:38             ` Michael Tokarev
2012-02-28 19:07               ` David Miller
2012-02-29  1:00                 ` Michael Tokarev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.