All of lore.kernel.org
 help / color / mirror / Atom feed
* Question: should local address be expired when updating PMTU?
@ 2015-02-02  8:20 shengyong
  2015-02-02 21:31 ` David Miller
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: shengyong @ 2015-02-02  8:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, yangyingling, steffen.klassert, hannes

Hi, David Miller
Since commit 81aded246 (ipv6: Handle PMTU in ICMP error handlers), the entries
in neigh table may get expired. But in the situation:

          Host only
    PC <------------> Virtual Machine

a packet is sent from PC to VM, and the packet looks like:
-----------------------------------
| IPv6 (src=PC-addr, dst=VM-addr) |
|---------------------------------|
|     ICMPv6 (Packet Too Big)     |
|---------------------------------|
| IPv6 (src=VM-addr, dst=VM-addr) |
|---------------------------------|
| ICMPv6 (Neighbor Advertisement) |
-----------------------------------

Then the local addr on VM will be updated with an expire value. After the
lifetime of the local addr is expired, the VM is unreachable from PC.

	# ip -6 route list table local
	local fe80::1 dev lo  metric 0 *expire 596*

I find that the current code seems not check whether the entry is a local one
when doing PMTU update. And if the following code is added, the situation could
be avoided.

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b2614b2..b80317a 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1136,6 +1136,9 @@ static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
 {
        struct rt6_info *rt6 = (struct rt6_info*)dst;

+       if (rt6->rt6i_flags & RTF_LOCAL)
+               return;
+
        dst_confirm(dst);
        if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
                struct net *net = dev_net(dst->dev);

So is this modification correct? Or how can we avoid such expiring?

thx & BR,
Sheng

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-02  8:20 Question: should local address be expired when updating PMTU? shengyong
@ 2015-02-02 21:31 ` David Miller
  2015-02-03  0:52   ` Alex Gartrell
  2015-02-03  9:28 ` Steffen Klassert
  2 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2015-02-02 21:31 UTC (permalink / raw)
  To: shengyong1; +Cc: netdev, yangyingling, steffen.klassert, hannes

From: shengyong <shengyong1@huawei.com>
Date: Mon, 2 Feb 2015 16:20:24 +0800

> Hi, David Miller

There are other people on this list more skilled than I am at answering
this question, just FYI...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-02  8:20 Question: should local address be expired when updating PMTU? shengyong
@ 2015-02-03  0:52   ` Alex Gartrell
  2015-02-03  0:52   ` Alex Gartrell
  2015-02-03  9:28 ` Steffen Klassert
  2 siblings, 0 replies; 25+ messages in thread
From: Alex Gartrell @ 2015-02-03  0:52 UTC (permalink / raw)
  To: shengyong, davem
  Cc: netdev, yangyingling, steffen.klassert, hannes, lvs-devel,
	Calvin Owens, kernel-team

Hello Shengyong,

 > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
 > index b2614b2..b80317a 100644
 > --- a/net/ipv6/route.c
 > +++ b/net/ipv6/route.c
 > @@ -1136,6 +1136,9 @@ static void ip6_rt_update_pmtu(struct dst_entry 
*dst, struct sock *sk,
 >   {
 >          struct rt6_info *rt6 = (struct rt6_info*)dst;
 >
 > +       if (rt6->rt6i_flags & RTF_LOCAL)
 > +               return;
 > +
 >          dst_confirm(dst);
 >          if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
 >                  struct net *net = dev_net(dst->dev);
 >
 > So is this modification correct? Or how can we avoid such expiring?


FWIW, we encountered this problem with IPVS tunneling.  Here's a patch 
done by Calvin (cc'ed) that fixes my attempted fix for this.  We're not 
particularly proud of this...

At a high level, I don't think the RTF_LOCAL check was sufficient, but I 
didn't investigate deeply enough and hopefully Calvin can say why.

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f14d49b..c607a42 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1159,18 +1159,18 @@ static void ip6_rt_update_pmtu(struct dst_entry 
*dst, struct sock *sk,
                 }
                 dst_metric_set(dst, RTAX_MTU, mtu);

-               /* FACEBOOK HACK: We need to not expire local non-expiring
-                * routes so that we don't accidentally start blackholing
-                * ipvs traffic when we happen to use it locally for
-                * healthchecking (see ip_vs_xmit.c --
-                * __ip_vs_get_out_rt_v6 invokes update_pmtu if the rt is
-                * associated with a socket)
-                * Alex Gartrell <agartrell@fb.com>
+               /*
+                * FACEBOOK HACK: Only expire routes that aren't 
destined for
+                * the loopback interface.
+                *
+                * This prevents the strange route coalescing that 
happens when
+                * you add an address to the loopback that had a route 
that had
+                * been used when the address didn't exist from getting 
expired
+                * and causing packet loss in shiv.
                  */
-               if (!(rt6->rt6i_flags & RTF_LOCAL) ||
-                   (rt6->rt6i_flags & (RTF_EXPIRES | RTF_CACHE)))
-                       rt6_update_expires(
-                               rt6, net->ipv6.sysctl.ip6_rt_mtu_expires);
+               if (!(dst->dev->flags & IFF_LOOPBACK))
+                       rt6_update_expires(rt6,
+ 
net->ipv6.sysctl.ip6_rt_mtu_expires);
         }
  }


Cheers,
-- 
Alex Gartrell <agartrell@fb.com>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
@ 2015-02-03  0:52   ` Alex Gartrell
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Gartrell @ 2015-02-03  0:52 UTC (permalink / raw)
  To: shengyong, davem
  Cc: netdev, yangyingling, steffen.klassert, hannes, lvs-devel,
	Calvin Owens, kernel-team

Hello Shengyong,

 > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
 > index b2614b2..b80317a 100644
 > --- a/net/ipv6/route.c
 > +++ b/net/ipv6/route.c
 > @@ -1136,6 +1136,9 @@ static void ip6_rt_update_pmtu(struct dst_entry 
*dst, struct sock *sk,
 >   {
 >          struct rt6_info *rt6 = (struct rt6_info*)dst;
 >
 > +       if (rt6->rt6i_flags & RTF_LOCAL)
 > +               return;
 > +
 >          dst_confirm(dst);
 >          if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
 >                  struct net *net = dev_net(dst->dev);
 >
 > So is this modification correct? Or how can we avoid such expiring?


FWIW, we encountered this problem with IPVS tunneling.  Here's a patch 
done by Calvin (cc'ed) that fixes my attempted fix for this.  We're not 
particularly proud of this...

At a high level, I don't think the RTF_LOCAL check was sufficient, but I 
didn't investigate deeply enough and hopefully Calvin can say why.

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f14d49b..c607a42 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1159,18 +1159,18 @@ static void ip6_rt_update_pmtu(struct dst_entry 
*dst, struct sock *sk,
                 }
                 dst_metric_set(dst, RTAX_MTU, mtu);

-               /* FACEBOOK HACK: We need to not expire local non-expiring
-                * routes so that we don't accidentally start blackholing
-                * ipvs traffic when we happen to use it locally for
-                * healthchecking (see ip_vs_xmit.c --
-                * __ip_vs_get_out_rt_v6 invokes update_pmtu if the rt is
-                * associated with a socket)
-                * Alex Gartrell <agartrell@fb.com>
+               /*
+                * FACEBOOK HACK: Only expire routes that aren't 
destined for
+                * the loopback interface.
+                *
+                * This prevents the strange route coalescing that 
happens when
+                * you add an address to the loopback that had a route 
that had
+                * been used when the address didn't exist from getting 
expired
+                * and causing packet loss in shiv.
                  */
-               if (!(rt6->rt6i_flags & RTF_LOCAL) ||
-                   (rt6->rt6i_flags & (RTF_EXPIRES | RTF_CACHE)))
-                       rt6_update_expires(
-                               rt6, net->ipv6.sysctl.ip6_rt_mtu_expires);
+               if (!(dst->dev->flags & IFF_LOOPBACK))
+                       rt6_update_expires(rt6,
+ 
net->ipv6.sysctl.ip6_rt_mtu_expires);
         }
  }


Cheers,
-- 
Alex Gartrell <agartrell@fb.com>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-03  0:52   ` Alex Gartrell
@ 2015-02-03  1:28     ` shengyong
  -1 siblings, 0 replies; 25+ messages in thread
From: shengyong @ 2015-02-03  1:28 UTC (permalink / raw)
  To: Alex Gartrell
  Cc: davem, netdev, yangyingliang, steffen.klassert, hannes,
	lvs-devel, Calvin Owens, kernel-team



在 2015/2/3 8:52, Alex Gartrell 写道:
> Hello Shengyong,
> 
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index b2614b2..b80317a 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -1136,6 +1136,9 @@ static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
>>   {
>>          struct rt6_info *rt6 = (struct rt6_info*)dst;
>>
>> +       if (rt6->rt6i_flags & RTF_LOCAL)
>> +               return;
>> +
>>          dst_confirm(dst);
>>          if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
>>                  struct net *net = dev_net(dst->dev);
>>
>> So is this modification correct? Or how can we avoid such expiring?
> 
> 
> FWIW, we encountered this problem with IPVS tunneling.  Here's a patch done by Calvin (cc'ed) that fixes my attempted fix for this.  We're not particularly proud of this...
> 
> At a high level, I don't think the RTF_LOCAL check was sufficient, but I didn't investigate deeply enough and hopefully Calvin can say why.
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index f14d49b..c607a42 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1159,18 +1159,18 @@ static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
>                 }
>                 dst_metric_set(dst, RTAX_MTU, mtu);
> 
> -               /* FACEBOOK HACK: We need to not expire local non-expiring
> -                * routes so that we don't accidentally start blackholing
> -                * ipvs traffic when we happen to use it locally for
> -                * healthchecking (see ip_vs_xmit.c --
> -                * __ip_vs_get_out_rt_v6 invokes update_pmtu if the rt is
> -                * associated with a socket)
> -                * Alex Gartrell <agartrell@fb.com>
> +               /*
> +                * FACEBOOK HACK: Only expire routes that aren't destined for
> +                * the loopback interface.
> +                *
> +                * This prevents the strange route coalescing that happens when
> +                * you add an address to the loopback that had a route that had
> +                * been used when the address didn't exist from getting expired
> +                * and causing packet loss in shiv.
>                  */
> -               if (!(rt6->rt6i_flags & RTF_LOCAL) ||
> -                   (rt6->rt6i_flags & (RTF_EXPIRES | RTF_CACHE)))
> -                       rt6_update_expires(
> -                               rt6, net->ipv6.sysctl.ip6_rt_mtu_expires);
> +               if (!(dst->dev->flags & IFF_LOOPBACK))
> +                       rt6_update_expires(rt6,
> + net->ipv6.sysctl.ip6_rt_mtu_expires);
>         }
>  }
Thanks, your approach can also solve the problem I met. I just a bit confuse that
is this kind of packets (like I sent in the first mail) normal?  and if they are
abnormal, I think we'd better drop them before update rt6i_flags.

thx,
Sheng
> 
> 
> Cheers,


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
@ 2015-02-03  1:28     ` shengyong
  0 siblings, 0 replies; 25+ messages in thread
From: shengyong @ 2015-02-03  1:28 UTC (permalink / raw)
  To: Alex Gartrell
  Cc: davem, netdev, yangyingliang, steffen.klassert, hannes,
	lvs-devel, Calvin Owens, kernel-team



在 2015/2/3 8:52, Alex Gartrell 写道:
> Hello Shengyong,
> 
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index b2614b2..b80317a 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -1136,6 +1136,9 @@ static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
>>   {
>>          struct rt6_info *rt6 = (struct rt6_info*)dst;
>>
>> +       if (rt6->rt6i_flags & RTF_LOCAL)
>> +               return;
>> +
>>          dst_confirm(dst);
>>          if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
>>                  struct net *net = dev_net(dst->dev);
>>
>> So is this modification correct? Or how can we avoid such expiring?
> 
> 
> FWIW, we encountered this problem with IPVS tunneling.  Here's a patch done by Calvin (cc'ed) that fixes my attempted fix for this.  We're not particularly proud of this...
> 
> At a high level, I don't think the RTF_LOCAL check was sufficient, but I didn't investigate deeply enough and hopefully Calvin can say why.
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index f14d49b..c607a42 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1159,18 +1159,18 @@ static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
>                 }
>                 dst_metric_set(dst, RTAX_MTU, mtu);
> 
> -               /* FACEBOOK HACK: We need to not expire local non-expiring
> -                * routes so that we don't accidentally start blackholing
> -                * ipvs traffic when we happen to use it locally for
> -                * healthchecking (see ip_vs_xmit.c --
> -                * __ip_vs_get_out_rt_v6 invokes update_pmtu if the rt is
> -                * associated with a socket)
> -                * Alex Gartrell <agartrell@fb.com>
> +               /*
> +                * FACEBOOK HACK: Only expire routes that aren't destined for
> +                * the loopback interface.
> +                *
> +                * This prevents the strange route coalescing that happens when
> +                * you add an address to the loopback that had a route that had
> +                * been used when the address didn't exist from getting expired
> +                * and causing packet loss in shiv.
>                  */
> -               if (!(rt6->rt6i_flags & RTF_LOCAL) ||
> -                   (rt6->rt6i_flags & (RTF_EXPIRES | RTF_CACHE)))
> -                       rt6_update_expires(
> -                               rt6, net->ipv6.sysctl.ip6_rt_mtu_expires);
> +               if (!(dst->dev->flags & IFF_LOOPBACK))
> +                       rt6_update_expires(rt6,
> + net->ipv6.sysctl.ip6_rt_mtu_expires);
>         }
>  }
Thanks, your approach can also solve the problem I met. I just a bit confuse that
is this kind of packets (like I sent in the first mail) normal?  and if they are
abnormal, I think we'd better drop them before update rt6i_flags.

thx,
Sheng
> 
> 
> Cheers,


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-03  0:52   ` Alex Gartrell
@ 2015-02-03  2:10     ` Calvin Owens
  -1 siblings, 0 replies; 25+ messages in thread
From: Calvin Owens @ 2015-02-03  2:10 UTC (permalink / raw)
  To: Alex Gartrell
  Cc: shengyong, davem, netdev, yangyingling, steffen.klassert, hannes,
	lvs-devel, kernel-team

On Monday 02/02 at 16:52 -0800, Alex Gartrell wrote:
> Hello Shengyong,
> 
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index b2614b2..b80317a 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -1136,6 +1136,9 @@ static void ip6_rt_update_pmtu(struct
> dst_entry *dst, struct sock *sk,
> >   {
> >          struct rt6_info *rt6 = (struct rt6_info*)dst;
> >
> > +       if (rt6->rt6i_flags & RTF_LOCAL)
> > +               return;
> > +
> >          dst_confirm(dst);
> >          if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
> >                  struct net *net = dev_net(dst->dev);
> >
> > So is this modification correct? Or how can we avoid such expiring? 
> 
> FWIW, we encountered this problem with IPVS tunneling.  Here's a
> patch done by Calvin (cc'ed) that fixes my attempted fix for this.
> We're not particularly proud of this...
> 
> At a high level, I don't think the RTF_LOCAL check was sufficient,
> but I didn't investigate deeply enough and hopefully Calvin can say
> why.

I honestly didn't spend much time at all finding the underlying cause
because it appeared to be fixed upstream: on 3.19-rc5 you get all 3
expected routes after the last step of my repro below. I just really
needed to get this working at the time, and the gross disgusting
horrible ugly awful [more negative adjectives] patch included below made
it work.

FWIW, the explanation I wrote down in my notes was:

"The absence of RTF_NONEXTHOP is causing COWs to happen, which are
always marked as RTF_CACHE. Somehow that's screwing things up in
rt6_do_redirect()"

That could be BS though, I don't at all remember how I came to that
conclusion. 

(/me resolves to write better notes in the future...)

Here's how to get the weird behavior on 3.10 (+stable):

$ sudo ip addr add local 4444::1 dev lo
### Now I have 2 routes in /proc/net/ipv6_route, a local and a non-local
### Both have the RTF_NONEXTHOP flag set (0x00200000)
$ sudo ip route add local 4444::1 dev lo
### Now I have 3 routes in /proc/net/ipv6_route to 4444::1
### Notice the new route does NOT have the RTF_NONEXTHOP flag set
$ sudo ip addr del local 4444::1 dev lo
### Now I just have the one route I created before
$ sudo ip addr add local 4444::1 dev lo
### And now I have 3 routes again
$ sudo ping6 4444::1
[blah blah blah successful ping]
$ sudo ip addr del local 4444::1 dev lo
$ sudo ip addr add local 4444::1 dev lo
### Still have 3 routes
$ sudo ip addr del local 4444::1 dev lo
### Now I just have my one route yet again
### Now, *without the address on lo*, talk to it (it works), then re-add it
$ ping6 4444::1
[blah blah blah successful ping]
$ sudo ip addr add local 4444::1 dev lo
### Now I only have 2 routes... WAT!?
### Notice the LOCAL (0x80000000) route doesn't have the RTF_NONEXTHOP flag set

Thanks,
Calvin

> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index f14d49b..c607a42 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1159,18 +1159,18 @@ static void ip6_rt_update_pmtu(struct
> dst_entry *dst, struct sock *sk,
>                 }
>                 dst_metric_set(dst, RTAX_MTU, mtu);
> 
> -               /* FACEBOOK HACK: We need to not expire local non-expiring
> -                * routes so that we don't accidentally start blackholing
> -                * ipvs traffic when we happen to use it locally for
> -                * healthchecking (see ip_vs_xmit.c --
> -                * __ip_vs_get_out_rt_v6 invokes update_pmtu if the rt is
> -                * associated with a socket)
> -                * Alex Gartrell <agartrell@fb.com>
> +               /*
> +                * FACEBOOK HACK: Only expire routes that aren't destined for
> +                * the loopback interface.
> +                *
> +                * This prevents the strange route coalescing that happens when
> +                * you add an address to the loopback that had a route that had
> +                * been used when the address didn't exist from getting expired
> +                * and causing packet loss in shiv.
>                  */
> -               if (!(rt6->rt6i_flags & RTF_LOCAL) ||
> -                   (rt6->rt6i_flags & (RTF_EXPIRES | RTF_CACHE)))
> -                       rt6_update_expires(
> -                               rt6, net->ipv6.sysctl.ip6_rt_mtu_expires);
> +               if (!(dst->dev->flags & IFF_LOOPBACK))
> +                       rt6_update_expires(rt6,
> + net->ipv6.sysctl.ip6_rt_mtu_expires);
>         }
>  }
> 
> 
> Cheers,
> -- 
> Alex Gartrell <agartrell@fb.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
@ 2015-02-03  2:10     ` Calvin Owens
  0 siblings, 0 replies; 25+ messages in thread
From: Calvin Owens @ 2015-02-03  2:10 UTC (permalink / raw)
  To: Alex Gartrell
  Cc: shengyong, davem, netdev, yangyingling, steffen.klassert, hannes,
	lvs-devel, kernel-team

On Monday 02/02 at 16:52 -0800, Alex Gartrell wrote:
> Hello Shengyong,
> 
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index b2614b2..b80317a 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -1136,6 +1136,9 @@ static void ip6_rt_update_pmtu(struct
> dst_entry *dst, struct sock *sk,
> >   {
> >          struct rt6_info *rt6 = (struct rt6_info*)dst;
> >
> > +       if (rt6->rt6i_flags & RTF_LOCAL)
> > +               return;
> > +
> >          dst_confirm(dst);
> >          if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
> >                  struct net *net = dev_net(dst->dev);
> >
> > So is this modification correct? Or how can we avoid such expiring? 
> 
> FWIW, we encountered this problem with IPVS tunneling.  Here's a
> patch done by Calvin (cc'ed) that fixes my attempted fix for this.
> We're not particularly proud of this...
> 
> At a high level, I don't think the RTF_LOCAL check was sufficient,
> but I didn't investigate deeply enough and hopefully Calvin can say
> why.

I honestly didn't spend much time at all finding the underlying cause
because it appeared to be fixed upstream: on 3.19-rc5 you get all 3
expected routes after the last step of my repro below. I just really
needed to get this working at the time, and the gross disgusting
horrible ugly awful [more negative adjectives] patch included below made
it work.

FWIW, the explanation I wrote down in my notes was:

"The absence of RTF_NONEXTHOP is causing COWs to happen, which are
always marked as RTF_CACHE. Somehow that's screwing things up in
rt6_do_redirect()"

That could be BS though, I don't at all remember how I came to that
conclusion. 

(/me resolves to write better notes in the future...)

Here's how to get the weird behavior on 3.10 (+stable):

$ sudo ip addr add local 4444::1 dev lo
### Now I have 2 routes in /proc/net/ipv6_route, a local and a non-local
### Both have the RTF_NONEXTHOP flag set (0x00200000)
$ sudo ip route add local 4444::1 dev lo
### Now I have 3 routes in /proc/net/ipv6_route to 4444::1
### Notice the new route does NOT have the RTF_NONEXTHOP flag set
$ sudo ip addr del local 4444::1 dev lo
### Now I just have the one route I created before
$ sudo ip addr add local 4444::1 dev lo
### And now I have 3 routes again
$ sudo ping6 4444::1
[blah blah blah successful ping]
$ sudo ip addr del local 4444::1 dev lo
$ sudo ip addr add local 4444::1 dev lo
### Still have 3 routes
$ sudo ip addr del local 4444::1 dev lo
### Now I just have my one route yet again
### Now, *without the address on lo*, talk to it (it works), then re-add it
$ ping6 4444::1
[blah blah blah successful ping]
$ sudo ip addr add local 4444::1 dev lo
### Now I only have 2 routes... WAT!?
### Notice the LOCAL (0x80000000) route doesn't have the RTF_NONEXTHOP flag set

Thanks,
Calvin

> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index f14d49b..c607a42 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1159,18 +1159,18 @@ static void ip6_rt_update_pmtu(struct
> dst_entry *dst, struct sock *sk,
>                 }
>                 dst_metric_set(dst, RTAX_MTU, mtu);
> 
> -               /* FACEBOOK HACK: We need to not expire local non-expiring
> -                * routes so that we don't accidentally start blackholing
> -                * ipvs traffic when we happen to use it locally for
> -                * healthchecking (see ip_vs_xmit.c --
> -                * __ip_vs_get_out_rt_v6 invokes update_pmtu if the rt is
> -                * associated with a socket)
> -                * Alex Gartrell <agartrell@fb.com>
> +               /*
> +                * FACEBOOK HACK: Only expire routes that aren't destined for
> +                * the loopback interface.
> +                *
> +                * This prevents the strange route coalescing that happens when
> +                * you add an address to the loopback that had a route that had
> +                * been used when the address didn't exist from getting expired
> +                * and causing packet loss in shiv.
>                  */
> -               if (!(rt6->rt6i_flags & RTF_LOCAL) ||
> -                   (rt6->rt6i_flags & (RTF_EXPIRES | RTF_CACHE)))
> -                       rt6_update_expires(
> -                               rt6, net->ipv6.sysctl.ip6_rt_mtu_expires);
> +               if (!(dst->dev->flags & IFF_LOOPBACK))
> +                       rt6_update_expires(rt6,
> + net->ipv6.sysctl.ip6_rt_mtu_expires);
>         }
>  }
> 
> 
> Cheers,
> -- 
> Alex Gartrell <agartrell@fb.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-03  2:10     ` Calvin Owens
@ 2015-02-03  3:21       ` shengyong
  -1 siblings, 0 replies; 25+ messages in thread
From: shengyong @ 2015-02-03  3:21 UTC (permalink / raw)
  To: Calvin Owens, Alex Gartrell
  Cc: davem, netdev, yangyingling, steffen.klassert, hannes, lvs-devel,
	kernel-team



在 2015/2/3 10:10, Calvin Owens 写道:
> On Monday 02/02 at 16:52 -0800, Alex Gartrell wrote:
>> Hello Shengyong,
>>
>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>>> index b2614b2..b80317a 100644
>>> --- a/net/ipv6/route.c
>>> +++ b/net/ipv6/route.c
>>> @@ -1136,6 +1136,9 @@ static void ip6_rt_update_pmtu(struct
>> dst_entry *dst, struct sock *sk,
>>>   {
>>>          struct rt6_info *rt6 = (struct rt6_info*)dst;
>>>
>>> +       if (rt6->rt6i_flags & RTF_LOCAL)
>>> +               return;
>>> +
>>>          dst_confirm(dst);
>>>          if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
>>>                  struct net *net = dev_net(dst->dev);
>>>
>>> So is this modification correct? Or how can we avoid such expiring? 
>>
>> FWIW, we encountered this problem with IPVS tunneling.  Here's a
>> patch done by Calvin (cc'ed) that fixes my attempted fix for this.
>> We're not particularly proud of this...
>>
>> At a high level, I don't think the RTF_LOCAL check was sufficient,
>> but I didn't investigate deeply enough and hopefully Calvin can say
>> why.
> 
> I honestly didn't spend much time at all finding the underlying cause
> because it appeared to be fixed upstream: on 3.19-rc5 you get all 3
> expected routes after the last step of my repro below.
Hi,
I do my test on 3.19.0-rc7 just now, it seems it still doesn't solve the
local-addr-expired problem.
 I just really
> needed to get this working at the time, and the gross disgusting
> horrible ugly awful [more negative adjectives] patch included below made
> it work.
> 
> FWIW, the explanation I wrote down in my notes was:
> 
> "The absence of RTF_NONEXTHOP is causing COWs to happen, which are
> always marked as RTF_CACHE. Somehow that's screwing things up in
> rt6_do_redirect()"
> 
> That could be BS though, I don't at all remember how I came to that
> conclusion. 
> 
> (/me resolves to write better notes in the future...)
> 
> Here's how to get the weird behavior on 3.10 (+stable):
> 
> $ sudo ip addr add local 4444::1 dev lo
> ### Now I have 2 routes in /proc/net/ipv6_route, a local and a non-local
> ### Both have the RTF_NONEXTHOP flag set (0x00200000)
> $ sudo ip route add local 4444::1 dev lo
> ### Now I have 3 routes in /proc/net/ipv6_route to 4444::1
> ### Notice the new route does NOT have the RTF_NONEXTHOP flag set
> $ sudo ip addr del local 4444::1 dev lo
> ### Now I just have the one route I created before
> $ sudo ip addr add local 4444::1 dev lo
> ### And now I have 3 routes again
> $ sudo ping6 4444::1
> [blah blah blah successful ping]
> $ sudo ip addr del local 4444::1 dev lo
> $ sudo ip addr add local 4444::1 dev lo
> ### Still have 3 routes
> $ sudo ip addr del local 4444::1 dev lo
> ### Now I just have my one route yet again
> ### Now, *without the address on lo*, talk to it (it works), then re-add it
> $ ping6 4444::1
> [blah blah blah successful ping]
> $ sudo ip addr add local 4444::1 dev lo
> ### Now I only have 2 routes... WAT!?
> ### Notice the LOCAL (0x80000000) route doesn't have the RTF_NONEXTHOP flag set
Looks like we meet different problems. Here is how I do my test (as well as on 3.10
+stable):
      Host only
PC <------------> Virtual Machine
create and send a packet using scapy:
-----------------------------------
| IPv6 (src=PC-addr, dst=VM-addr) |
|---------------------------------|
|     ICMPv6 (Packet Too Big)     |
|---------------------------------|
| IPv6 (src=VM-addr, dst=VM-addr) |
|---------------------------------|
| ICMPv6 (Neighbor Advertisement) |
-----------------------------------
Then the local-addr is set to expire. After expired, the VM is unreachable from
PC side.

thanks,
Sheng
> 
> Thanks,
> Calvin
> 
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index f14d49b..c607a42 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -1159,18 +1159,18 @@ static void ip6_rt_update_pmtu(struct
>> dst_entry *dst, struct sock *sk,
>>                 }
>>                 dst_metric_set(dst, RTAX_MTU, mtu);
>>
>> -               /* FACEBOOK HACK: We need to not expire local non-expiring
>> -                * routes so that we don't accidentally start blackholing
>> -                * ipvs traffic when we happen to use it locally for
>> -                * healthchecking (see ip_vs_xmit.c --
>> -                * __ip_vs_get_out_rt_v6 invokes update_pmtu if the rt is
>> -                * associated with a socket)
>> -                * Alex Gartrell <agartrell@fb.com>
>> +               /*
>> +                * FACEBOOK HACK: Only expire routes that aren't destined for
>> +                * the loopback interface.
>> +                *
>> +                * This prevents the strange route coalescing that happens when
>> +                * you add an address to the loopback that had a route that had
>> +                * been used when the address didn't exist from getting expired
>> +                * and causing packet loss in shiv.
>>                  */
>> -               if (!(rt6->rt6i_flags & RTF_LOCAL) ||
>> -                   (rt6->rt6i_flags & (RTF_EXPIRES | RTF_CACHE)))
>> -                       rt6_update_expires(
>> -                               rt6, net->ipv6.sysctl.ip6_rt_mtu_expires);
>> +               if (!(dst->dev->flags & IFF_LOOPBACK))
>> +                       rt6_update_expires(rt6,
>> + net->ipv6.sysctl.ip6_rt_mtu_expires);
>>         }
>>  }
>>
>>
>> Cheers,
>> -- 
>> Alex Gartrell <agartrell@fb.com>
> 
> .
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
@ 2015-02-03  3:21       ` shengyong
  0 siblings, 0 replies; 25+ messages in thread
From: shengyong @ 2015-02-03  3:21 UTC (permalink / raw)
  To: Calvin Owens, Alex Gartrell
  Cc: davem, netdev, yangyingling, steffen.klassert, hannes, lvs-devel,
	kernel-team



在 2015/2/3 10:10, Calvin Owens 写道:
> On Monday 02/02 at 16:52 -0800, Alex Gartrell wrote:
>> Hello Shengyong,
>>
>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>>> index b2614b2..b80317a 100644
>>> --- a/net/ipv6/route.c
>>> +++ b/net/ipv6/route.c
>>> @@ -1136,6 +1136,9 @@ static void ip6_rt_update_pmtu(struct
>> dst_entry *dst, struct sock *sk,
>>>   {
>>>          struct rt6_info *rt6 = (struct rt6_info*)dst;
>>>
>>> +       if (rt6->rt6i_flags & RTF_LOCAL)
>>> +               return;
>>> +
>>>          dst_confirm(dst);
>>>          if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
>>>                  struct net *net = dev_net(dst->dev);
>>>
>>> So is this modification correct? Or how can we avoid such expiring? 
>>
>> FWIW, we encountered this problem with IPVS tunneling.  Here's a
>> patch done by Calvin (cc'ed) that fixes my attempted fix for this.
>> We're not particularly proud of this...
>>
>> At a high level, I don't think the RTF_LOCAL check was sufficient,
>> but I didn't investigate deeply enough and hopefully Calvin can say
>> why.
> 
> I honestly didn't spend much time at all finding the underlying cause
> because it appeared to be fixed upstream: on 3.19-rc5 you get all 3
> expected routes after the last step of my repro below.
Hi,
I do my test on 3.19.0-rc7 just now, it seems it still doesn't solve the
local-addr-expired problem.
 I just really
> needed to get this working at the time, and the gross disgusting
> horrible ugly awful [more negative adjectives] patch included below made
> it work.
> 
> FWIW, the explanation I wrote down in my notes was:
> 
> "The absence of RTF_NONEXTHOP is causing COWs to happen, which are
> always marked as RTF_CACHE. Somehow that's screwing things up in
> rt6_do_redirect()"
> 
> That could be BS though, I don't at all remember how I came to that
> conclusion. 
> 
> (/me resolves to write better notes in the future...)
> 
> Here's how to get the weird behavior on 3.10 (+stable):
> 
> $ sudo ip addr add local 4444::1 dev lo
> ### Now I have 2 routes in /proc/net/ipv6_route, a local and a non-local
> ### Both have the RTF_NONEXTHOP flag set (0x00200000)
> $ sudo ip route add local 4444::1 dev lo
> ### Now I have 3 routes in /proc/net/ipv6_route to 4444::1
> ### Notice the new route does NOT have the RTF_NONEXTHOP flag set
> $ sudo ip addr del local 4444::1 dev lo
> ### Now I just have the one route I created before
> $ sudo ip addr add local 4444::1 dev lo
> ### And now I have 3 routes again
> $ sudo ping6 4444::1
> [blah blah blah successful ping]
> $ sudo ip addr del local 4444::1 dev lo
> $ sudo ip addr add local 4444::1 dev lo
> ### Still have 3 routes
> $ sudo ip addr del local 4444::1 dev lo
> ### Now I just have my one route yet again
> ### Now, *without the address on lo*, talk to it (it works), then re-add it
> $ ping6 4444::1
> [blah blah blah successful ping]
> $ sudo ip addr add local 4444::1 dev lo
> ### Now I only have 2 routes... WAT!?
> ### Notice the LOCAL (0x80000000) route doesn't have the RTF_NONEXTHOP flag set
Looks like we meet different problems. Here is how I do my test (as well as on 3.10
+stable):
      Host only
PC <------------> Virtual Machine
create and send a packet using scapy:
-----------------------------------
| IPv6 (src=PC-addr, dst=VM-addr) |
|---------------------------------|
|     ICMPv6 (Packet Too Big)     |
|---------------------------------|
| IPv6 (src=VM-addr, dst=VM-addr) |
|---------------------------------|
| ICMPv6 (Neighbor Advertisement) |
-----------------------------------
Then the local-addr is set to expire. After expired, the VM is unreachable from
PC side.

thanks,
Sheng
> 
> Thanks,
> Calvin
> 
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index f14d49b..c607a42 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -1159,18 +1159,18 @@ static void ip6_rt_update_pmtu(struct
>> dst_entry *dst, struct sock *sk,
>>                 }
>>                 dst_metric_set(dst, RTAX_MTU, mtu);
>>
>> -               /* FACEBOOK HACK: We need to not expire local non-expiring
>> -                * routes so that we don't accidentally start blackholing
>> -                * ipvs traffic when we happen to use it locally for
>> -                * healthchecking (see ip_vs_xmit.c --
>> -                * __ip_vs_get_out_rt_v6 invokes update_pmtu if the rt is
>> -                * associated with a socket)
>> -                * Alex Gartrell <agartrell@fb.com>
>> +               /*
>> +                * FACEBOOK HACK: Only expire routes that aren't destined for
>> +                * the loopback interface.
>> +                *
>> +                * This prevents the strange route coalescing that happens when
>> +                * you add an address to the loopback that had a route that had
>> +                * been used when the address didn't exist from getting expired
>> +                * and causing packet loss in shiv.
>>                  */
>> -               if (!(rt6->rt6i_flags & RTF_LOCAL) ||
>> -                   (rt6->rt6i_flags & (RTF_EXPIRES | RTF_CACHE)))
>> -                       rt6_update_expires(
>> -                               rt6, net->ipv6.sysctl.ip6_rt_mtu_expires);
>> +               if (!(dst->dev->flags & IFF_LOOPBACK))
>> +                       rt6_update_expires(rt6,
>> + net->ipv6.sysctl.ip6_rt_mtu_expires);
>>         }
>>  }
>>
>>
>> Cheers,
>> -- 
>> Alex Gartrell <agartrell@fb.com>
> 
> .
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-02  8:20 Question: should local address be expired when updating PMTU? shengyong
  2015-02-02 21:31 ` David Miller
  2015-02-03  0:52   ` Alex Gartrell
@ 2015-02-03  9:28 ` Steffen Klassert
  2015-02-03 10:54   ` shengyong
  2 siblings, 1 reply; 25+ messages in thread
From: Steffen Klassert @ 2015-02-03  9:28 UTC (permalink / raw)
  To: shengyong; +Cc: davem, netdev, yangyingling, hannes

On Mon, Feb 02, 2015 at 04:20:24PM +0800, shengyong wrote:
> Hi, David Miller
> Since commit 81aded246 (ipv6: Handle PMTU in ICMP error handlers), the entries
> in neigh table may get expired. But in the situation:
> 
>           Host only
>     PC <------------> Virtual Machine
> 
> a packet is sent from PC to VM, and the packet looks like:
> -----------------------------------
> | IPv6 (src=PC-addr, dst=VM-addr) |
> |---------------------------------|
> |     ICMPv6 (Packet Too Big)     |
> |---------------------------------|
> | IPv6 (src=VM-addr, dst=VM-addr) |
> |---------------------------------|
> | ICMPv6 (Neighbor Advertisement) |
> -----------------------------------
> 
> Then the local addr on VM will be updated with an expire value. After the
> lifetime of the local addr is expired, the VM is unreachable from PC.
> 
> 	# ip -6 route list table local
> 	local fe80::1 dev lo  metric 0 *expire 596*

We first need to find out why you receive this Packet Too Big message,
can you capture this packet somehow? Then we have to see why this loopback
route gets a pmtu update from that packet. Is the destination address
of the Packet Too Big message really fe80::1?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-03  9:28 ` Steffen Klassert
@ 2015-02-03 10:54   ` shengyong
  2015-02-03 12:01     ` Steffen Klassert
  0 siblings, 1 reply; 25+ messages in thread
From: shengyong @ 2015-02-03 10:54 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: davem, netdev, yangyingliang, hannes



在 2015/2/3 17:28, Steffen Klassert 写道:
> On Mon, Feb 02, 2015 at 04:20:24PM +0800, shengyong wrote:
>> Hi, David Miller
>> Since commit 81aded246 (ipv6: Handle PMTU in ICMP error handlers), the entries
>> in neigh table may get expired. But in the situation:
>>
>>           Host only
>>     PC <------------> Virtual Machine
>>
>> a packet is sent from PC to VM, and the packet looks like:
>> -----------------------------------
>> | IPv6 (src=PC-addr, dst=VM-addr) |
>> |---------------------------------|
>> |     ICMPv6 (Packet Too Big)     |
>> |---------------------------------|
>> | IPv6 (src=VM-addr, dst=VM-addr) |
>> |---------------------------------|
>> | ICMPv6 (Neighbor Advertisement) |
>> -----------------------------------
>>
>> Then the local addr on VM will be updated with an expire value. After the
>> lifetime of the local addr is expired, the VM is unreachable from PC.
>>
>> 	# ip -6 route list table local
>> 	local fe80::1 dev lo  metric 0 *expire 596*
> 
> We first need to find out why you receive this Packet Too Big message,
The packet is sent by a commercial-off-the-shelf testcase, and I can reproduce the
situation by using scapy and creating a packet as the following:

	$ cat packet-too-big.py
	#!/usr/bin/python
	
	from scapy.all import *

	# fe80::800:27ff:fe00:0 is linklocal addr of PC
	# fe80::a00:27ff:fe1a:e2a0 is linklocal addr of VM
	base=IPv6(src='fe80::800:27ff:fe00:0',dst='fe80::a00:27ff:fe1a:e2a0')
	pkt_too_big=ICMPv6PacketTooBig(mtu=1024)
	ext_base=IPv6(src='fe80::a00:27ff:fe1a:e2a0',dst='fe80::a00:27ff:fe1a:e2a0',plen=24)
	ext_nd_na=ICMPv6ND_NA()
	
	packet=base/pkt_too_big/ext_base/ext_nd_na
	send(packet)

> can you capture this packet somehow? 
I captured the packet in wireshark, it is exact the packet created by the script.
> Then we have to see why this loopback
> route gets a pmtu update from that packet.
I tried to print info when the VM receives pkt-too-big packet. The calling stack is
icmpv6_rcv->icmpv6_notify->icmpv6_err->ip6_update_pmtu->ip6_rt_update_pmtu->rt6_update_expires.
In ip6_update_pmtu, ip6_route_output looks up the route table, and returns the dst_entry of the
linklocal addr. Then it is set to expire.
> Is the destination address
> of the Packet Too Big message really fe80::1?
In fact, if the dst of the above `ext_base' is the local addr of the VM, the local addr on VM
will be expired.

thx,
Sheng

> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-03 10:54   ` shengyong
@ 2015-02-03 12:01     ` Steffen Klassert
  2015-02-04  1:59       ` shengyong
  0 siblings, 1 reply; 25+ messages in thread
From: Steffen Klassert @ 2015-02-03 12:01 UTC (permalink / raw)
  To: shengyong; +Cc: davem, netdev, yangyingliang, hannes

On Tue, Feb 03, 2015 at 06:54:19PM +0800, shengyong wrote:
> 
> 
> 在 2015/2/3 17:28, Steffen Klassert 写道:
> > On Mon, Feb 02, 2015 at 04:20:24PM +0800, shengyong wrote:
> > 
> > We first need to find out why you receive this Packet Too Big message,
> The packet is sent by a commercial-off-the-shelf testcase, and I can reproduce the
> situation by using scapy and creating a packet as the following:
> 
> 	$ cat packet-too-big.py
> 	#!/usr/bin/python
> 	
> 	from scapy.all import *
> 
> 	# fe80::800:27ff:fe00:0 is linklocal addr of PC
> 	# fe80::a00:27ff:fe1a:e2a0 is linklocal addr of VM
> 	base=IPv6(src='fe80::800:27ff:fe00:0',dst='fe80::a00:27ff:fe1a:e2a0')
> 	pkt_too_big=ICMPv6PacketTooBig(mtu=1024)
> 	ext_base=IPv6(src='fe80::a00:27ff:fe1a:e2a0',dst='fe80::a00:27ff:fe1a:e2a0',plen=24)
> 	ext_nd_na=ICMPv6ND_NA()
> 	
> 	packet=base/pkt_too_big/ext_base/ext_nd_na
> 	send(packet)

So it is not a valid pmtu update, this make life easier.

Can you please test the patch below (compile tested only)?

This should fix your problem, and in combination with the two patches I sent
out last week, it should cure the whole 'expiring of uncached routes' problem.

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 49596535..4ccfb9c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1156,7 +1156,8 @@ static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
 	struct rt6_info *rt6 = (struct rt6_info *)dst;
 
 	dst_confirm(dst);
-	if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
+	if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128 &&
+	    (rt6->rt6i_flags & RTF_CACHE)) {
 		struct net *net = dev_net(dst->dev);
 
 		rt6->rt6i_flags |= RTF_MODIFIED;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-03 12:01     ` Steffen Klassert
@ 2015-02-04  1:59       ` shengyong
  2015-02-05  7:21         ` Steffen Klassert
  0 siblings, 1 reply; 25+ messages in thread
From: shengyong @ 2015-02-04  1:59 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: davem, netdev, yangyingliang, hannes



在 2015/2/3 20:01, Steffen Klassert 写道:
> On Tue, Feb 03, 2015 at 06:54:19PM +0800, shengyong wrote:
>>
>>
>> 在 2015/2/3 17:28, Steffen Klassert 写道:
>>> On Mon, Feb 02, 2015 at 04:20:24PM +0800, shengyong wrote:
>>>
>>> We first need to find out why you receive this Packet Too Big message,
>> The packet is sent by a commercial-off-the-shelf testcase, and I can reproduce the
>> situation by using scapy and creating a packet as the following:
>>
>> 	$ cat packet-too-big.py
>> 	#!/usr/bin/python
>> 	
>> 	from scapy.all import *
>>
>> 	# fe80::800:27ff:fe00:0 is linklocal addr of PC
>> 	# fe80::a00:27ff:fe1a:e2a0 is linklocal addr of VM
>> 	base=IPv6(src='fe80::800:27ff:fe00:0',dst='fe80::a00:27ff:fe1a:e2a0')
>> 	pkt_too_big=ICMPv6PacketTooBig(mtu=1024)
>> 	ext_base=IPv6(src='fe80::a00:27ff:fe1a:e2a0',dst='fe80::a00:27ff:fe1a:e2a0',plen=24)
>> 	ext_nd_na=ICMPv6ND_NA()
>> 	
>> 	packet=base/pkt_too_big/ext_base/ext_nd_na
>> 	send(packet)
> 
> So it is not a valid pmtu update, this make life easier.
> 
> Can you please test the patch below (compile tested only)?
Sorry, the later. I test it on 3.10-stable. It can fix this problem. So maybe this is a bug?
And the 3 approaches (different flags are used: RTF_LOCAL, IFF_LOOPBACK and RTF_CACHE) in
these mails can fix the expire of local address. I'm confused about these flags:
* RTF_LOCAL: the entries of local address, like address binded to the native NIC
* RTF_CACHE: all cached entries
* IFF_LOOPBACK: this is a device-related flag, which has the same meaning as RTF_LOCAL

Am I right? If so, I think RTF_LOCAL is appropriate, because we just want entries of local
addresses to be not expired and we don't care other entries (I think if they get expired,
a neigh discovery could find them back).

thx,
Sheng
> 
> This should fix your problem, and in combination with the two patches I sent
> out last week, it should cure the whole 'expiring of uncached routes' problem.
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 49596535..4ccfb9c 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1156,7 +1156,8 @@ static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
>  	struct rt6_info *rt6 = (struct rt6_info *)dst;
>  
>  	dst_confirm(dst);
> -	if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
> +	if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128 &&
> +	    (rt6->rt6i_flags & RTF_CACHE)) {
>  		struct net *net = dev_net(dst->dev);
>  
>  		rt6->rt6i_flags |= RTF_MODIFIED;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-04  1:59       ` shengyong
@ 2015-02-05  7:21         ` Steffen Klassert
  2015-02-27  2:37           ` shengyong
  0 siblings, 1 reply; 25+ messages in thread
From: Steffen Klassert @ 2015-02-05  7:21 UTC (permalink / raw)
  To: shengyong; +Cc: davem, netdev, yangyingliang, hannes

On Wed, Feb 04, 2015 at 09:59:54AM +0800, shengyong wrote:
> 
> Sorry, the later. I test it on 3.10-stable. It can fix this problem. So maybe this is a bug?

Yes, it's a bug.

> And the 3 approaches (different flags are used: RTF_LOCAL, IFF_LOOPBACK and RTF_CACHE) in
> these mails can fix the expire of local address. I'm confused about these flags:
> * RTF_LOCAL: the entries of local address, like address binded to the native NIC
> * RTF_CACHE: all cached entries
> * IFF_LOOPBACK: this is a device-related flag, which has the same meaning as RTF_LOCAL
> 
> Am I right? If so, I think RTF_LOCAL is appropriate, because we just want entries of local
> addresses to be not expired and we don't care other entries (I think if they get expired,
> a neigh discovery could find them back).

It is not the address that expires, it is the learned PMTU value that
expires. If we delete an uncached route, nothing will bring it back
unless you readd it manually.

We need to ensure that all routes that can learn something what
expires are cached. This means that we clone the inserted route
when it is used for the first time. The learned values are stored
at this cloned route. If the learned value expires, the clone is
deleted. The original route remains in the fib tree and can be
still looked up.

The problem is, that we currently don't cache/clone host routes.
So if a host route learns something that expires, the original
route is removed from the fib tree and we loose connectivity
to that host. We don't cache host routes because some of them
(the local ones) are automatically added with metric 0.
If we try to cache such a route, the clone will be identical
to the original route and we fail to insert it to the fib tree.

So we need to adjust the caching to all routes that actually can
learn something and leave out only those that can not.

I'll send a patchset that should fix this at the beginning of the
next week.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-05  7:21         ` Steffen Klassert
@ 2015-02-27  2:37           ` shengyong
  2015-02-27 10:32             ` Steffen Klassert
  2015-03-30 10:32             ` Steffen Klassert
  0 siblings, 2 replies; 25+ messages in thread
From: shengyong @ 2015-02-27  2:37 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: davem, netdev, yangyingliang, hannes



在 2015/2/5 15:21, Steffen Klassert 写道:
> On Wed, Feb 04, 2015 at 09:59:54AM +0800, shengyong wrote:
>>
>> Sorry, the later. I test it on 3.10-stable. It can fix this problem. So maybe this is a bug?
> 
> Yes, it's a bug.
> 
>> And the 3 approaches (different flags are used: RTF_LOCAL, IFF_LOOPBACK and RTF_CACHE) in
>> these mails can fix the expire of local address. I'm confused about these flags:
>> * RTF_LOCAL: the entries of local address, like address binded to the native NIC
>> * RTF_CACHE: all cached entries
>> * IFF_LOOPBACK: this is a device-related flag, which has the same meaning as RTF_LOCAL
>>
>> Am I right? If so, I think RTF_LOCAL is appropriate, because we just want entries of local
>> addresses to be not expired and we don't care other entries (I think if they get expired,
>> a neigh discovery could find them back).
> 
> It is not the address that expires, it is the learned PMTU value that
> expires. If we delete an uncached route, nothing will bring it back
> unless you readd it manually.
> 
> We need to ensure that all routes that can learn something what
> expires are cached. This means that we clone the inserted route
> when it is used for the first time. The learned values are stored
> at this cloned route. If the learned value expires, the clone is
> deleted. The original route remains in the fib tree and can be
> still looked up.
> 
> The problem is, that we currently don't cache/clone host routes.
> So if a host route learns something that expires, the original
> route is removed from the fib tree and we loose connectivity
> to that host. We don't cache host routes because some of them
> (the local ones) are automatically added with metric 0.
> If we try to cache such a route, the clone will be identical
> to the original route and we fail to insert it to the fib tree.
> 
> So we need to adjust the caching to all routes that actually can
> learn something and leave out only those that can not.
> 
> I'll send a patchset that should fix this at the beginning of the
> next week.
Hi, Steffen
Is this patchset ready? It seems that I didn't find it in the mainling
list. If it is ready, I can test it to see if it solves the problem I
met :)

many thanks,
Sheng
> 
> .
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-27  2:37           ` shengyong
@ 2015-02-27 10:32             ` Steffen Klassert
  2015-03-30 10:32             ` Steffen Klassert
  1 sibling, 0 replies; 25+ messages in thread
From: Steffen Klassert @ 2015-02-27 10:32 UTC (permalink / raw)
  To: shengyong; +Cc: davem, netdev, yangyingliang, hannes

On Fri, Feb 27, 2015 at 10:37:46AM +0800, shengyong wrote:
> Hi, Steffen
> Is this patchset ready? It seems that I didn't find it in the mainling
> list. If it is ready, I can test it to see if it solves the problem I
> met :)

Martin Lau pointed me to a problem when cached host routes are deleted.
We may delete the clone but not the original route. This has to be
resolved first. Unfortunately I had no time to look into this so far.
I hope to get to it next week.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-02-27  2:37           ` shengyong
  2015-02-27 10:32             ` Steffen Klassert
@ 2015-03-30 10:32             ` Steffen Klassert
  2015-03-30 10:33               ` [PATCH RFC 1/3] ipv6: Fix after pmtu events dissapearing host routes Steffen Klassert
                                 ` (3 more replies)
  1 sibling, 4 replies; 25+ messages in thread
From: Steffen Klassert @ 2015-03-30 10:32 UTC (permalink / raw)
  To: shengyong; +Cc: davem, netdev, yangyingliang, hannes

On Fri, Feb 27, 2015 at 10:37:46AM +0800, shengyong wrote:
> 
> 
> 在 2015/2/5 15:21, Steffen Klassert 写道:
> > On Wed, Feb 04, 2015 at 09:59:54AM +0800, shengyong wrote:
> >>
> >> Sorry, the later. I test it on 3.10-stable. It can fix this problem. So maybe this is a bug?
> > 
> > Yes, it's a bug.
> > 
> >> And the 3 approaches (different flags are used: RTF_LOCAL, IFF_LOOPBACK and RTF_CACHE) in
> >> these mails can fix the expire of local address. I'm confused about these flags:
> >> * RTF_LOCAL: the entries of local address, like address binded to the native NIC
> >> * RTF_CACHE: all cached entries
> >> * IFF_LOOPBACK: this is a device-related flag, which has the same meaning as RTF_LOCAL
> >>
> >> Am I right? If so, I think RTF_LOCAL is appropriate, because we just want entries of local
> >> addresses to be not expired and we don't care other entries (I think if they get expired,
> >> a neigh discovery could find them back).
> > 
> > It is not the address that expires, it is the learned PMTU value that
> > expires. If we delete an uncached route, nothing will bring it back
> > unless you readd it manually.
> > 
> > We need to ensure that all routes that can learn something what
> > expires are cached. This means that we clone the inserted route
> > when it is used for the first time. The learned values are stored
> > at this cloned route. If the learned value expires, the clone is
> > deleted. The original route remains in the fib tree and can be
> > still looked up.
> > 
> > The problem is, that we currently don't cache/clone host routes.
> > So if a host route learns something that expires, the original
> > route is removed from the fib tree and we loose connectivity
> > to that host. We don't cache host routes because some of them
> > (the local ones) are automatically added with metric 0.
> > If we try to cache such a route, the clone will be identical
> > to the original route and we fail to insert it to the fib tree.
> > 
> > So we need to adjust the caching to all routes that actually can
> > learn something and leave out only those that can not.
> > 
> > I'll send a patchset that should fix this at the beginning of the
> > next week.
> Hi, Steffen
> Is this patchset ready? It seems that I didn't find it in the mainling
> list. If it is ready, I can test it to see if it solves the problem I
> met :)

I finally end up with a patchset that passed all your testcases.
I'll send it in reply to this mail, please test it.

Thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH RFC 1/3] ipv6: Fix after pmtu events dissapearing host routes
  2015-03-30 10:32             ` Steffen Klassert
@ 2015-03-30 10:33               ` Steffen Klassert
  2015-03-30 11:15                 ` Sheng Yong
  2015-03-30 18:24                 ` Martin Lau
  2015-03-30 10:33               ` [PATCH RFC 2/3] ipv6: Extend the route lookups to low priority metrics Steffen Klassert
                                 ` (2 subsequent siblings)
  3 siblings, 2 replies; 25+ messages in thread
From: Steffen Klassert @ 2015-03-30 10:33 UTC (permalink / raw)
  To: shengyong; +Cc: davem, netdev, yangyingliang, hannes

We currently don't clone host routes before we use them.
If a pmtu event is received on such a route, it gets
an expires value. As soon as the expiration time is
elapsed, the route is deleted. As a result, the host
es not reachable any more.

We fix this by cloning host routes if they are not local,
i.e. if pmtu events can happen. Also adjust the route
deletion because the cached and the original host route
are on the same fib node.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/route.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4688bd4..4318626 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -961,7 +961,7 @@ redo_rt6_select:
 
 	if (!(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY)))
 		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
-	else if (!(rt->dst.flags & DST_HOST))
+	else if (!(rt->dst.flags & DST_HOST) || !(rt->rt6i_flags & RTF_LOCAL))
 		nrt = rt6_alloc_clone(rt, &fl6->daddr);
 	else
 		goto out2;
@@ -1796,6 +1796,8 @@ static int ip6_route_del(struct fib6_config *cfg)
 				continue;
 			if (cfg->fc_metric && cfg->fc_metric != rt->rt6i_metric)
 				continue;
+			if (!(cfg->fc_flags & RTF_CACHE) && (rt->rt6i_flags & RTF_CACHE))
+				continue;
 			dst_hold(&rt->dst);
 			read_unlock_bh(&table->tb6_lock);
 
@@ -2433,6 +2435,9 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (rtm->rtm_type == RTN_LOCAL)
 		cfg->fc_flags |= RTF_LOCAL;
 
+	if (rtm->rtm_flags & RTM_F_CLONED)
+		cfg->fc_flags |= RTF_CACHE;
+
 	cfg->fc_nlinfo.portid = NETLINK_CB(skb).portid;
 	cfg->fc_nlinfo.nlh = nlh;
 	cfg->fc_nlinfo.nl_net = sock_net(skb->sk);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 2/3] ipv6: Extend the route lookups to low priority metrics.
  2015-03-30 10:32             ` Steffen Klassert
  2015-03-30 10:33               ` [PATCH RFC 1/3] ipv6: Fix after pmtu events dissapearing host routes Steffen Klassert
@ 2015-03-30 10:33               ` Steffen Klassert
  2015-03-30 10:34               ` [PATCH RFC 3/3] ipv6: Don't update pmtu on uncached routes Steffen Klassert
  2015-03-30 11:13               ` Question: should local address be expired when updating PMTU? Sheng Yong
  3 siblings, 0 replies; 25+ messages in thread
From: Steffen Klassert @ 2015-03-30 10:33 UTC (permalink / raw)
  To: shengyong; +Cc: davem, netdev, yangyingliang, hannes

We search only for routes with highest prioriy metric in
find_rr_leaf(). However if one of these routes is marked
as invalid, we may fail to find a route even if there is
a appropriate route with lower priority. Then we loose
connectivity until the garbage collector deletes the
invalid route. This typically happens if a host route
expires afer a pmtu event. Fix this by searching also
for routes with a lower priority metric.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/route.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4318626..e4b24c8 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -654,15 +654,33 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn,
 				     u32 metric, int oif, int strict,
 				     bool *do_rr)
 {
-	struct rt6_info *rt, *match;
+	struct rt6_info *rt, *match, *cont;
 	int mpri = -1;
 
 	match = NULL;
-	for (rt = rr_head; rt && rt->rt6i_metric == metric;
-	     rt = rt->dst.rt6_next)
+	cont = NULL;
+	for (rt = rr_head; rt; rt = rt->dst.rt6_next) {
+		if (rt->rt6i_metric != metric) {
+			cont = rt;
+			break;
+		}
+
+		match = find_match(rt, oif, strict, &mpri, match, do_rr);
+	}
+
+	for (rt = fn->leaf; rt && rt != rr_head; rt = rt->dst.rt6_next) {
+		if (rt->rt6i_metric != metric) {
+			cont = rt;
+			break;
+		}
+
 		match = find_match(rt, oif, strict, &mpri, match, do_rr);
-	for (rt = fn->leaf; rt && rt != rr_head && rt->rt6i_metric == metric;
-	     rt = rt->dst.rt6_next)
+	}
+
+	if (match || !cont)
+		return match;
+
+	for (rt = cont; rt; rt = rt->dst.rt6_next)
 		match = find_match(rt, oif, strict, &mpri, match, do_rr);
 
 	return match;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 3/3] ipv6: Don't update pmtu on uncached routes.
  2015-03-30 10:32             ` Steffen Klassert
  2015-03-30 10:33               ` [PATCH RFC 1/3] ipv6: Fix after pmtu events dissapearing host routes Steffen Klassert
  2015-03-30 10:33               ` [PATCH RFC 2/3] ipv6: Extend the route lookups to low priority metrics Steffen Klassert
@ 2015-03-30 10:34               ` Steffen Klassert
  2015-03-30 11:13               ` Question: should local address be expired when updating PMTU? Sheng Yong
  3 siblings, 0 replies; 25+ messages in thread
From: Steffen Klassert @ 2015-03-30 10:34 UTC (permalink / raw)
  To: shengyong; +Cc: davem, netdev, yangyingliang, hannes

A pmtu update on an uncached route will delete the
original route after expiration time is elapsed.
As a result, the host is not reachable any more.

All routes where pmtu discovery can happen are
cached now, so it is ok to refuse pmtu updates
on uncached routes.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/route.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index e4b24c8..e0c6ac9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1174,7 +1174,8 @@ static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
 	struct rt6_info *rt6 = (struct rt6_info *)dst;
 
 	dst_confirm(dst);
-	if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
+	if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128 &&
+	    (rt6->rt6i_flags & RTF_CACHE)) {
 		struct net *net = dev_net(dst->dev);
 
 		rt6->rt6i_flags |= RTF_MODIFIED;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Question: should local address be expired when updating PMTU?
  2015-03-30 10:32             ` Steffen Klassert
                                 ` (2 preceding siblings ...)
  2015-03-30 10:34               ` [PATCH RFC 3/3] ipv6: Don't update pmtu on uncached routes Steffen Klassert
@ 2015-03-30 11:13               ` Sheng Yong
  3 siblings, 0 replies; 25+ messages in thread
From: Sheng Yong @ 2015-03-30 11:13 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: davem, netdev, yangyingliang, hannes



On 3/30/2015 6:32 PM, Steffen Klassert wrote:
> On Fri, Feb 27, 2015 at 10:37:46AM +0800, shengyong wrote:
>>
>>
>> 在 2015/2/5 15:21, Steffen Klassert 写道:
>>> On Wed, Feb 04, 2015 at 09:59:54AM +0800, shengyong wrote:
>>>>
>>>> Sorry, the later. I test it on 3.10-stable. It can fix this problem. So maybe this is a bug?
>>>
>>> Yes, it's a bug.
>>>
>>>> And the 3 approaches (different flags are used: RTF_LOCAL, IFF_LOOPBACK and RTF_CACHE) in
>>>> these mails can fix the expire of local address. I'm confused about these flags:
>>>> * RTF_LOCAL: the entries of local address, like address binded to the native NIC
>>>> * RTF_CACHE: all cached entries
>>>> * IFF_LOOPBACK: this is a device-related flag, which has the same meaning as RTF_LOCAL
>>>>
>>>> Am I right? If so, I think RTF_LOCAL is appropriate, because we just want entries of local
>>>> addresses to be not expired and we don't care other entries (I think if they get expired,
>>>> a neigh discovery could find them back).
>>>
>>> It is not the address that expires, it is the learned PMTU value that
>>> expires. If we delete an uncached route, nothing will bring it back
>>> unless you readd it manually.
>>>
>>> We need to ensure that all routes that can learn something what
>>> expires are cached. This means that we clone the inserted route
>>> when it is used for the first time. The learned values are stored
>>> at this cloned route. If the learned value expires, the clone is
>>> deleted. The original route remains in the fib tree and can be
>>> still looked up.
>>>
>>> The problem is, that we currently don't cache/clone host routes.
>>> So if a host route learns something that expires, the original
>>> route is removed from the fib tree and we loose connectivity
>>> to that host. We don't cache host routes because some of them
>>> (the local ones) are automatically added with metric 0.
>>> If we try to cache such a route, the clone will be identical
>>> to the original route and we fail to insert it to the fib tree.
>>>
>>> So we need to adjust the caching to all routes that actually can
>>> learn something and leave out only those that can not.
>>>
>>> I'll send a patchset that should fix this at the beginning of the
>>> next week.
>> Hi, Steffen
>> Is this patchset ready? It seems that I didn't find it in the mainling
>> list. If it is ready, I can test it to see if it solves the problem I
>> met :)
> 
> I finally end up with a patchset that passed all your testcases.
> I'll send it in reply to this mail, please test it.
> 
I just test the patchset, and it can solve the problem I met.
Thanks very much,
Sheng

> Thanks!
> 
> .
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/3] ipv6: Fix after pmtu events dissapearing host routes
  2015-03-30 10:33               ` [PATCH RFC 1/3] ipv6: Fix after pmtu events dissapearing host routes Steffen Klassert
@ 2015-03-30 11:15                 ` Sheng Yong
  2015-03-30 18:24                 ` Martin Lau
  1 sibling, 0 replies; 25+ messages in thread
From: Sheng Yong @ 2015-03-30 11:15 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: davem, netdev, yangyingliang, hannes



On 3/30/2015 6:33 PM, Steffen Klassert wrote:
> We currently don't clone host routes before we use them.
> If a pmtu event is received on such a route, it gets
> an expires value. As soon as the expiration time is
> elapsed, the route is deleted. As a result, the host
> es not reachable any more.
> 
> We fix this by cloning host routes if they are not local,
> i.e. if pmtu events can happen. Also adjust the route
> deletion because the cached and the original host route
> are on the same fib node.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
>  net/ipv6/route.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 4688bd4..4318626 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -961,7 +961,7 @@ redo_rt6_select:
>  
>  	if (!(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY)))
>  		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
> -	else if (!(rt->dst.flags & DST_HOST))
> +	else if (!(rt->dst.flags & DST_HOST) || !(rt->rt6i_flags & RTF_LOCAL))
>  		nrt = rt6_alloc_clone(rt, &fl6->daddr);
>  	else
>  		goto out2;
> @@ -1796,6 +1796,8 @@ static int ip6_route_del(struct fib6_config *cfg)
>  				continue;
>  			if (cfg->fc_metric && cfg->fc_metric != rt->rt6i_metric)
>  				continue;
> +			if (!(cfg->fc_flags & RTF_CACHE) && (rt->rt6i_flags & RTF_CACHE))
There is a trivial coding style problem, which makes checkpatch.pl unhappy. The line is
over 80 characters.
> +				continue;
>  			dst_hold(&rt->dst);
>  			read_unlock_bh(&table->tb6_lock);
>  
> @@ -2433,6 +2435,9 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
>  	if (rtm->rtm_type == RTN_LOCAL)
>  		cfg->fc_flags |= RTF_LOCAL;
>  
> +	if (rtm->rtm_flags & RTM_F_CLONED)
> +		cfg->fc_flags |= RTF_CACHE;
> +
>  	cfg->fc_nlinfo.portid = NETLINK_CB(skb).portid;
>  	cfg->fc_nlinfo.nlh = nlh;
>  	cfg->fc_nlinfo.nl_net = sock_net(skb->sk);
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/3] ipv6: Fix after pmtu events dissapearing host routes
  2015-03-30 10:33               ` [PATCH RFC 1/3] ipv6: Fix after pmtu events dissapearing host routes Steffen Klassert
  2015-03-30 11:15                 ` Sheng Yong
@ 2015-03-30 18:24                 ` Martin Lau
  2015-04-01  8:09                   ` Steffen Klassert
  1 sibling, 1 reply; 25+ messages in thread
From: Martin Lau @ 2015-03-30 18:24 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: shengyong, davem, netdev, yangyingliang, hannes, Kernel Team

On Mon, Mar 30, 2015 at 12:33:13PM +0200, Steffen Klassert wrote:
> We currently don't clone host routes before we use them.
> If a pmtu event is received on such a route, it gets
> an expires value. As soon as the expiration time is
> elapsed, the route is deleted. As a result, the host
> es not reachable any more.
s/es/is

> We fix this by cloning host routes if they are not local,
> i.e. if pmtu events can happen. Also adjust the route
> deletion because the cached and the original host route
> are on the same fib node.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
>  net/ipv6/route.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 4688bd4..4318626 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -961,7 +961,7 @@ redo_rt6_select:
>  
>  	if (!(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY)))
>  		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
> -	else if (!(rt->dst.flags & DST_HOST))
> +	else if (!(rt->dst.flags & DST_HOST) || !(rt->rt6i_flags & RTF_LOCAL))
>  		nrt = rt6_alloc_clone(rt, &fl6->daddr);
I suspect the nrt and rt will share the same inetpeer.  Hence, after the nrt is
removed, the MTU (obtained from PMTU update) will stay.

> @@ -1796,6 +1796,8 @@ static int ip6_route_del(struct fib6_config *cfg)
>  				continue;
>  			if (cfg->fc_metric && cfg->fc_metric != rt->rt6i_metric)
>  				continue;
> +			if (!(cfg->fc_flags & RTF_CACHE) && (rt->rt6i_flags & RTF_CACHE))
> +				continue;
>  			dst_hold(&rt->dst);
>  			read_unlock_bh(&table->tb6_lock);
>  
> @@ -2433,6 +2435,9 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
>  	if (rtm->rtm_type == RTN_LOCAL)
>  		cfg->fc_flags |= RTF_LOCAL;
>  
> +	if (rtm->rtm_flags & RTM_F_CLONED)
> +		cfg->fc_flags |= RTF_CACHE;
> +
>  	cfg->fc_nlinfo.portid = NETLINK_CB(skb).portid;
>  	cfg->fc_nlinfo.nlh = nlh;
>  	cfg->fc_nlinfo.nl_net = sock_net(skb->sk);
I had pulled your last patch to my local branch that I am cooking.  I also
added similar changes to ip6_route_del() and rtm_to_fib_config(), and confirmed
it can keep 'ip -6 r del' and 'ip -6 r flush table cache' to keep working
properly.

Thanks,
--Martin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 1/3] ipv6: Fix after pmtu events dissapearing host routes
  2015-03-30 18:24                 ` Martin Lau
@ 2015-04-01  8:09                   ` Steffen Klassert
  0 siblings, 0 replies; 25+ messages in thread
From: Steffen Klassert @ 2015-04-01  8:09 UTC (permalink / raw)
  To: Martin Lau; +Cc: shengyong, davem, netdev, yangyingliang, hannes, Kernel Team

On Mon, Mar 30, 2015 at 11:24:33AM -0700, Martin Lau wrote:
> On Mon, Mar 30, 2015 at 12:33:13PM +0200, Steffen Klassert wrote:
> > ---
> >  net/ipv6/route.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 4688bd4..4318626 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -961,7 +961,7 @@ redo_rt6_select:
> >  
> >  	if (!(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY)))
> >  		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
> > -	else if (!(rt->dst.flags & DST_HOST))
> > +	else if (!(rt->dst.flags & DST_HOST) || !(rt->rt6i_flags & RTF_LOCAL))
> >  		nrt = rt6_alloc_clone(rt, &fl6->daddr);
> I suspect the nrt and rt will share the same inetpeer.  Hence, after the nrt is
> removed, the MTU (obtained from PMTU update) will stay.

I used a route with default metrics in my tests, therefore the
merics were copied for nrt. But if the route has a mtu set,
the metrics are not copied. So we overwrite the route mtu
with the pmtu value. Maybe we should make sure to always
copy the metrics if we do a pmtu update.

Something like this:

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index e0c6ac9..675bc42 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1182,6 +1182,7 @@ static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
 		if (mtu < IPV6_MIN_MTU)
 			mtu = IPV6_MIN_MTU;
 
+		dst_cow_metrics_generic(dst, dst->_metrics);
 		dst_metric_set(dst, RTAX_MTU, mtu);
 		rt6_update_expires(rt6, net->ipv6.sysctl.ip6_rt_mtu_expires);
 	}

This could fix the pmtu case, but what are we going to do if
we have multiple routes with different mtu values to the same
host? Adding a second hostroute will overwrite the mtu value
of the first hostroute.

This reminds me at the times when we removed the ipv4 routing cache.
The first approach to handle pmtu updates was to cache the pmtu
values at the inetpeer, but we never got it to work. We swiched
then to use exceptional routes in this case.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-04-01  8:09 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-02  8:20 Question: should local address be expired when updating PMTU? shengyong
2015-02-02 21:31 ` David Miller
2015-02-03  0:52 ` Alex Gartrell
2015-02-03  0:52   ` Alex Gartrell
2015-02-03  1:28   ` shengyong
2015-02-03  1:28     ` shengyong
2015-02-03  2:10   ` Calvin Owens
2015-02-03  2:10     ` Calvin Owens
2015-02-03  3:21     ` shengyong
2015-02-03  3:21       ` shengyong
2015-02-03  9:28 ` Steffen Klassert
2015-02-03 10:54   ` shengyong
2015-02-03 12:01     ` Steffen Klassert
2015-02-04  1:59       ` shengyong
2015-02-05  7:21         ` Steffen Klassert
2015-02-27  2:37           ` shengyong
2015-02-27 10:32             ` Steffen Klassert
2015-03-30 10:32             ` Steffen Klassert
2015-03-30 10:33               ` [PATCH RFC 1/3] ipv6: Fix after pmtu events dissapearing host routes Steffen Klassert
2015-03-30 11:15                 ` Sheng Yong
2015-03-30 18:24                 ` Martin Lau
2015-04-01  8:09                   ` Steffen Klassert
2015-03-30 10:33               ` [PATCH RFC 2/3] ipv6: Extend the route lookups to low priority metrics Steffen Klassert
2015-03-30 10:34               ` [PATCH RFC 3/3] ipv6: Don't update pmtu on uncached routes Steffen Klassert
2015-03-30 11:13               ` Question: should local address be expired when updating PMTU? Sheng Yong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.