All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
@ 2016-11-07 15:04 Stephen Suryaputra Lin
  2016-11-07 16:08 ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Suryaputra Lin @ 2016-11-07 15:04 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Suryaputra Lin

ICMP redirects behavior is different after the commit above. An email
requesting the explanation on why the behavior needs to be different
was sent earlier to netdev (https://patchwork.ozlabs.org/patch/687728/).
Since there isn't a reply yet, I decided to prepare this formal patch.

In v2.6 kernel, it used to be that ip_rt_redirect() calls
arp_bind_neighbour() which returns 0 and then the state of the neigh for
the new_gw is checked. If the state isn't valid then the redirected
route is deleted. This behavior is maintained up to v3.5.7 by
check_peer_redirect() because rt->rt_gateway is assigned to
peer->redirect_learned.a4 before calling ipv4_neigh_lookup().

After the commit, ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely valid
since the old_gw is the one that sends the ICMP redirect message. Then the
new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may
never gets resolved and the traffic is blackholed.

Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org>
---
 net/ipv4/route.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 62d4d90c1389..510045cefcab 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
 			goto reject_redirect;
 	}
 
+	rt->rt_gateway = 0;
 	n = ipv4_neigh_lookup(&rt->dst, NULL, &new_gw);
+	rt->rt_gateway = old_gw;
 	if (!IS_ERR(n)) {
 		if (!(n->nud_state & NUD_VALID)) {
 			neigh_event_send(n, NULL);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
  2016-11-07 15:04 [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.") Stephen Suryaputra Lin
@ 2016-11-07 16:08 ` Eric Dumazet
  2016-11-07 16:20   ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2016-11-07 16:08 UTC (permalink / raw)
  To: Stephen Suryaputra Lin; +Cc: netdev, Stephen Suryaputra Lin

On Mon, 2016-11-07 at 10:04 -0500, Stephen Suryaputra Lin wrote:
> ICMP redirects behavior is different after the commit above. An email
> requesting the explanation on why the behavior needs to be different
> was sent earlier to netdev (https://patchwork.ozlabs.org/patch/687728/).
> Since there isn't a reply yet, I decided to prepare this formal patch.
> 
> In v2.6 kernel, it used to be that ip_rt_redirect() calls
> arp_bind_neighbour() which returns 0 and then the state of the neigh for
> the new_gw is checked. If the state isn't valid then the redirected
> route is deleted. This behavior is maintained up to v3.5.7 by
> check_peer_redirect() because rt->rt_gateway is assigned to
> peer->redirect_learned.a4 before calling ipv4_neigh_lookup().
> 
> After the commit, ipv4_neigh_lookup() is performed without the
> rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
> isn't zero, the function uses it as the key. The neigh is most likely valid
> since the old_gw is the one that sends the ICMP redirect message. Then the
> new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may
> never gets resolved and the traffic is blackholed.
> 
> Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org>
> ---
>  net/ipv4/route.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 62d4d90c1389..510045cefcab 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
>  			goto reject_redirect;
>  	}
>  
> +	rt->rt_gateway = 0;
>  	n = ipv4_neigh_lookup(&rt->dst, NULL, &new_gw);
> +	rt->rt_gateway = old_gw;
>  	if (!IS_ERR(n)) {
>  		if (!(n->nud_state & NUD_VALID)) {
>  			neigh_event_send(n, NULL);

In any case, rt is a shared object at that time, so even temporarily
clearing/restoring rt_gateway seems wrong to me.

I would rather call __ipv4_neigh_lookup(dst->dev, new_gw) directly at
this point.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
  2016-11-07 16:08 ` Eric Dumazet
@ 2016-11-07 16:20   ` David Miller
  2016-11-07 23:05     ` Stephen Suryaputra Lin
  0 siblings, 1 reply; 4+ messages in thread
From: David Miller @ 2016-11-07 16:20 UTC (permalink / raw)
  To: eric.dumazet; +Cc: stephen.suryaputra.lin, netdev, ssurya

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 07 Nov 2016 08:08:52 -0800

> In any case, rt is a shared object at that time, so even temporarily
> clearing/restoring rt_gateway seems wrong to me.
> 
> I would rather call __ipv4_neigh_lookup(dst->dev, new_gw) directly at
> this point.

Agreed.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
  2016-11-07 16:20   ` David Miller
@ 2016-11-07 23:05     ` Stephen Suryaputra Lin
  0 siblings, 0 replies; 4+ messages in thread
From: Stephen Suryaputra Lin @ 2016-11-07 23:05 UTC (permalink / raw)
  To: netdev

I did the temporary clearing/restoring rt_gateway following the deleted
function check_peer_redir(). But, looking again at the function the
assigning of peer->redirect_learned.a4 to rt_gateway can be permanent
because restoring to the old_gw only happens on errors.

I have updated the patch to use __ipv4_neigh_lookup().

Thank you.

On Mon, Nov 07, 2016 at 11:20:16AM -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 07 Nov 2016 08:08:52 -0800
> 
> > In any case, rt is a shared object at that time, so even temporarily
> > clearing/restoring rt_gateway seems wrong to me.
> > 
> > I would rather call __ipv4_neigh_lookup(dst->dev, new_gw) directly at
> > this point.
> 
> Agreed.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-11-07 23:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-07 15:04 [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.") Stephen Suryaputra Lin
2016-11-07 16:08 ` Eric Dumazet
2016-11-07 16:20   ` David Miller
2016-11-07 23:05     ` Stephen Suryaputra Lin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.