From: Calvin Owens <calvinowens@fb.com> To: Alex Gartrell <agartrell@fb.com> Cc: shengyong <shengyong1@huawei.com>, <davem@davemloft.net>, <netdev@vger.kernel.org>, <yangyingling@huawei.com>, <steffen.klassert@secunet.com>, <hannes@redhat.com>, <lvs-devel@vger.kernel.org>, <kernel-team@fb.com> Subject: Re: Question: should local address be expired when updating PMTU? Date: Mon, 2 Feb 2015 18:10:07 -0800 [thread overview] Message-ID: <20150203021007.GA1866582@mail.thefacebook.com> (raw) In-Reply-To: <54D01BEA.2070501@fb.com> On Monday 02/02 at 16:52 -0800, Alex Gartrell wrote: > Hello Shengyong, > > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > index b2614b2..b80317a 100644 > > --- a/net/ipv6/route.c > > +++ b/net/ipv6/route.c > > @@ -1136,6 +1136,9 @@ static void ip6_rt_update_pmtu(struct > dst_entry *dst, struct sock *sk, > > { > > struct rt6_info *rt6 = (struct rt6_info*)dst; > > > > + if (rt6->rt6i_flags & RTF_LOCAL) > > + return; > > + > > dst_confirm(dst); > > if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) { > > struct net *net = dev_net(dst->dev); > > > > So is this modification correct? Or how can we avoid such expiring? > > FWIW, we encountered this problem with IPVS tunneling. Here's a > patch done by Calvin (cc'ed) that fixes my attempted fix for this. > We're not particularly proud of this... > > At a high level, I don't think the RTF_LOCAL check was sufficient, > but I didn't investigate deeply enough and hopefully Calvin can say > why. I honestly didn't spend much time at all finding the underlying cause because it appeared to be fixed upstream: on 3.19-rc5 you get all 3 expected routes after the last step of my repro below. I just really needed to get this working at the time, and the gross disgusting horrible ugly awful [more negative adjectives] patch included below made it work. FWIW, the explanation I wrote down in my notes was: "The absence of RTF_NONEXTHOP is causing COWs to happen, which are always marked as RTF_CACHE. Somehow that's screwing things up in rt6_do_redirect()" That could be BS though, I don't at all remember how I came to that conclusion. (/me resolves to write better notes in the future...) Here's how to get the weird behavior on 3.10 (+stable): $ sudo ip addr add local 4444::1 dev lo ### Now I have 2 routes in /proc/net/ipv6_route, a local and a non-local ### Both have the RTF_NONEXTHOP flag set (0x00200000) $ sudo ip route add local 4444::1 dev lo ### Now I have 3 routes in /proc/net/ipv6_route to 4444::1 ### Notice the new route does NOT have the RTF_NONEXTHOP flag set $ sudo ip addr del local 4444::1 dev lo ### Now I just have the one route I created before $ sudo ip addr add local 4444::1 dev lo ### And now I have 3 routes again $ sudo ping6 4444::1 [blah blah blah successful ping] $ sudo ip addr del local 4444::1 dev lo $ sudo ip addr add local 4444::1 dev lo ### Still have 3 routes $ sudo ip addr del local 4444::1 dev lo ### Now I just have my one route yet again ### Now, *without the address on lo*, talk to it (it works), then re-add it $ ping6 4444::1 [blah blah blah successful ping] $ sudo ip addr add local 4444::1 dev lo ### Now I only have 2 routes... WAT!? ### Notice the LOCAL (0x80000000) route doesn't have the RTF_NONEXTHOP flag set Thanks, Calvin > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index f14d49b..c607a42 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -1159,18 +1159,18 @@ static void ip6_rt_update_pmtu(struct > dst_entry *dst, struct sock *sk, > } > dst_metric_set(dst, RTAX_MTU, mtu); > > - /* FACEBOOK HACK: We need to not expire local non-expiring > - * routes so that we don't accidentally start blackholing > - * ipvs traffic when we happen to use it locally for > - * healthchecking (see ip_vs_xmit.c -- > - * __ip_vs_get_out_rt_v6 invokes update_pmtu if the rt is > - * associated with a socket) > - * Alex Gartrell <agartrell@fb.com> > + /* > + * FACEBOOK HACK: Only expire routes that aren't destined for > + * the loopback interface. > + * > + * This prevents the strange route coalescing that happens when > + * you add an address to the loopback that had a route that had > + * been used when the address didn't exist from getting expired > + * and causing packet loss in shiv. > */ > - if (!(rt6->rt6i_flags & RTF_LOCAL) || > - (rt6->rt6i_flags & (RTF_EXPIRES | RTF_CACHE))) > - rt6_update_expires( > - rt6, net->ipv6.sysctl.ip6_rt_mtu_expires); > + if (!(dst->dev->flags & IFF_LOOPBACK)) > + rt6_update_expires(rt6, > + net->ipv6.sysctl.ip6_rt_mtu_expires); > } > } > > > Cheers, > -- > Alex Gartrell <agartrell@fb.com>
WARNING: multiple messages have this Message-ID (diff)
From: Calvin Owens <calvinowens@fb.com> To: Alex Gartrell <agartrell@fb.com> Cc: shengyong <shengyong1@huawei.com>, davem@davemloft.net, netdev@vger.kernel.org, yangyingling@huawei.com, steffen.klassert@secunet.com, hannes@redhat.com, lvs-devel@vger.kernel.org, kernel-team@fb.com Subject: Re: Question: should local address be expired when updating PMTU? Date: Mon, 2 Feb 2015 18:10:07 -0800 [thread overview] Message-ID: <20150203021007.GA1866582@mail.thefacebook.com> (raw) In-Reply-To: <54D01BEA.2070501@fb.com> On Monday 02/02 at 16:52 -0800, Alex Gartrell wrote: > Hello Shengyong, > > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > index b2614b2..b80317a 100644 > > --- a/net/ipv6/route.c > > +++ b/net/ipv6/route.c > > @@ -1136,6 +1136,9 @@ static void ip6_rt_update_pmtu(struct > dst_entry *dst, struct sock *sk, > > { > > struct rt6_info *rt6 = (struct rt6_info*)dst; > > > > + if (rt6->rt6i_flags & RTF_LOCAL) > > + return; > > + > > dst_confirm(dst); > > if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) { > > struct net *net = dev_net(dst->dev); > > > > So is this modification correct? Or how can we avoid such expiring? > > FWIW, we encountered this problem with IPVS tunneling. Here's a > patch done by Calvin (cc'ed) that fixes my attempted fix for this. > We're not particularly proud of this... > > At a high level, I don't think the RTF_LOCAL check was sufficient, > but I didn't investigate deeply enough and hopefully Calvin can say > why. I honestly didn't spend much time at all finding the underlying cause because it appeared to be fixed upstream: on 3.19-rc5 you get all 3 expected routes after the last step of my repro below. I just really needed to get this working at the time, and the gross disgusting horrible ugly awful [more negative adjectives] patch included below made it work. FWIW, the explanation I wrote down in my notes was: "The absence of RTF_NONEXTHOP is causing COWs to happen, which are always marked as RTF_CACHE. Somehow that's screwing things up in rt6_do_redirect()" That could be BS though, I don't at all remember how I came to that conclusion. (/me resolves to write better notes in the future...) Here's how to get the weird behavior on 3.10 (+stable): $ sudo ip addr add local 4444::1 dev lo ### Now I have 2 routes in /proc/net/ipv6_route, a local and a non-local ### Both have the RTF_NONEXTHOP flag set (0x00200000) $ sudo ip route add local 4444::1 dev lo ### Now I have 3 routes in /proc/net/ipv6_route to 4444::1 ### Notice the new route does NOT have the RTF_NONEXTHOP flag set $ sudo ip addr del local 4444::1 dev lo ### Now I just have the one route I created before $ sudo ip addr add local 4444::1 dev lo ### And now I have 3 routes again $ sudo ping6 4444::1 [blah blah blah successful ping] $ sudo ip addr del local 4444::1 dev lo $ sudo ip addr add local 4444::1 dev lo ### Still have 3 routes $ sudo ip addr del local 4444::1 dev lo ### Now I just have my one route yet again ### Now, *without the address on lo*, talk to it (it works), then re-add it $ ping6 4444::1 [blah blah blah successful ping] $ sudo ip addr add local 4444::1 dev lo ### Now I only have 2 routes... WAT!? ### Notice the LOCAL (0x80000000) route doesn't have the RTF_NONEXTHOP flag set Thanks, Calvin > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index f14d49b..c607a42 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -1159,18 +1159,18 @@ static void ip6_rt_update_pmtu(struct > dst_entry *dst, struct sock *sk, > } > dst_metric_set(dst, RTAX_MTU, mtu); > > - /* FACEBOOK HACK: We need to not expire local non-expiring > - * routes so that we don't accidentally start blackholing > - * ipvs traffic when we happen to use it locally for > - * healthchecking (see ip_vs_xmit.c -- > - * __ip_vs_get_out_rt_v6 invokes update_pmtu if the rt is > - * associated with a socket) > - * Alex Gartrell <agartrell@fb.com> > + /* > + * FACEBOOK HACK: Only expire routes that aren't destined for > + * the loopback interface. > + * > + * This prevents the strange route coalescing that happens when > + * you add an address to the loopback that had a route that had > + * been used when the address didn't exist from getting expired > + * and causing packet loss in shiv. > */ > - if (!(rt6->rt6i_flags & RTF_LOCAL) || > - (rt6->rt6i_flags & (RTF_EXPIRES | RTF_CACHE))) > - rt6_update_expires( > - rt6, net->ipv6.sysctl.ip6_rt_mtu_expires); > + if (!(dst->dev->flags & IFF_LOOPBACK)) > + rt6_update_expires(rt6, > + net->ipv6.sysctl.ip6_rt_mtu_expires); > } > } > > > Cheers, > -- > Alex Gartrell <agartrell@fb.com>
next prev parent reply other threads:[~2015-02-03 2:10 UTC|newest] Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-02-02 8:20 Question: should local address be expired when updating PMTU? shengyong 2015-02-02 21:31 ` David Miller 2015-02-03 0:52 ` Alex Gartrell 2015-02-03 0:52 ` Alex Gartrell 2015-02-03 1:28 ` shengyong 2015-02-03 1:28 ` shengyong 2015-02-03 2:10 ` Calvin Owens [this message] 2015-02-03 2:10 ` Calvin Owens 2015-02-03 3:21 ` shengyong 2015-02-03 3:21 ` shengyong 2015-02-03 9:28 ` Steffen Klassert 2015-02-03 10:54 ` shengyong 2015-02-03 12:01 ` Steffen Klassert 2015-02-04 1:59 ` shengyong 2015-02-05 7:21 ` Steffen Klassert 2015-02-27 2:37 ` shengyong 2015-02-27 10:32 ` Steffen Klassert 2015-03-30 10:32 ` Steffen Klassert 2015-03-30 10:33 ` [PATCH RFC 1/3] ipv6: Fix after pmtu events dissapearing host routes Steffen Klassert 2015-03-30 11:15 ` Sheng Yong 2015-03-30 18:24 ` Martin Lau 2015-04-01 8:09 ` Steffen Klassert 2015-03-30 10:33 ` [PATCH RFC 2/3] ipv6: Extend the route lookups to low priority metrics Steffen Klassert 2015-03-30 10:34 ` [PATCH RFC 3/3] ipv6: Don't update pmtu on uncached routes Steffen Klassert 2015-03-30 11:13 ` Question: should local address be expired when updating PMTU? Sheng Yong
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20150203021007.GA1866582@mail.thefacebook.com \ --to=calvinowens@fb.com \ --cc=agartrell@fb.com \ --cc=davem@davemloft.net \ --cc=hannes@redhat.com \ --cc=kernel-team@fb.com \ --cc=lvs-devel@vger.kernel.org \ --cc=netdev@vger.kernel.org \ --cc=shengyong1@huawei.com \ --cc=steffen.klassert@secunet.com \ --cc=yangyingling@huawei.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.