* IPv6 PMTU discovery fails with source-specific routing @ 2019-05-13 19:22 Mikael Magnusson 2019-05-14 3:35 ` David Ahern 0 siblings, 1 reply; 9+ messages in thread From: Mikael Magnusson @ 2019-05-13 19:22 UTC (permalink / raw) To: netdev [-- Attachment #1: Type: text/plain, Size: 929 bytes --] Hello list, I think I have found a regression in 4.15+ kernels. IPv6 PMTU discovery doesn't seem to work with source-specific routing (AKA source-address dependent routing, SADR). I made a test script (see attachment). It sets up a test environment with three network namespaces (a, b and c) using SADR. The link between b and c is configured with MTU 1280. It then runs a ping test with large packets. I have tested a couple of kernels on Ubuntu 19.04 with the following results. mainline 4.14.117-0414117-generic SUCCESS ubuntu 4.15.0-1036-oem FAIL mainline 5.1.0-050100-generic FAIL The mainline kernels are from https://kernel.ubuntu.com/~kernel-ppa/mainline/, and the other from the Ubuntu 19.04 repository. I have attached a patch against 5.1 which seems to fix the problem in the test case. It's bug #1788623 in Ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1788623 /Mikael [-- Attachment #2: 0001-net-ipv6-route-Fix-PMTU-for-source-specific-routes.patch --] [-- Type: text/x-patch, Size: 949 bytes --] From 4dd6d3e00663ec1749d8c6cf8139a2e255c7a797 Mon Sep 17 00:00:00 2001 From: Mikael Magnusson <mikma@users.sourceforge.net> Date: Wed, 8 May 2019 22:06:44 +0000 Subject: [PATCH] net/ipv6/route: Fix PMTU for source-specific routes --- net/ipv6/route.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 0520aca3354b..63d678588ec9 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1197,13 +1197,13 @@ static struct rt6_info *ip6_rt_cache_alloc(struct fib6_info *ort, if (ort->fib6_dst.plen != 128 && ipv6_addr_equal(&ort->fib6_dst.addr, daddr)) rt->rt6i_flags |= RTF_ANYCAST; + } #ifdef CONFIG_IPV6_SUBTREES - if (rt->rt6i_src.plen && saddr) { - rt->rt6i_src.addr = *saddr; - rt->rt6i_src.plen = 128; - } -#endif + if (rt->rt6i_src.plen && saddr) { + rt->rt6i_src.addr = *saddr; + rt->rt6i_src.plen = 128; } +#endif return rt; } -- 2.17.1 [-- Attachment #3: pmtu-ns.sh --] [-- Type: application/x-shellscript, Size: 1433 bytes --] ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: IPv6 PMTU discovery fails with source-specific routing 2019-05-13 19:22 IPv6 PMTU discovery fails with source-specific routing Mikael Magnusson @ 2019-05-14 3:35 ` David Ahern 2019-05-14 6:12 ` Wei Wang 0 siblings, 1 reply; 9+ messages in thread From: David Ahern @ 2019-05-14 3:35 UTC (permalink / raw) To: Mikael Magnusson, netdev, Martin KaFai Lau, Wei Wang On 5/13/19 1:22 PM, Mikael Magnusson wrote: > Hello list, > > I think I have found a regression in 4.15+ kernels. IPv6 PMTU discovery > doesn't seem to work with source-specific routing (AKA source-address > dependent routing, SADR). > > I made a test script (see attachment). It sets up a test environment > with three network namespaces (a, b and c) using SADR. The link between > b and c is configured with MTU 1280. It then runs a ping test with large > packets. > > I have tested a couple of kernels on Ubuntu 19.04 with the following > results. > > mainline 4.14.117-0414117-generic SUCCESS > ubuntu 4.15.0-1036-oem FAIL > mainline 5.1.0-050100-generic FAIL > git bisect shows good: 38fbeeeeccdb38d0635398e8e344d245f6d8dc52 bad: 2b760fcf5cfb34e8610df56d83745b2b74ae1379 Those are back to back commits so 2b760fcf5cfb ipv6: hook up exception table to store dst cache has to be the bad commit. Your patch may work, but does not seem logical relative to code at the time of 4.15 and the commit that caused the failure. cc'ing authors of the changes referenced above. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: IPv6 PMTU discovery fails with source-specific routing 2019-05-14 3:35 ` David Ahern @ 2019-05-14 6:12 ` Wei Wang 2019-05-14 14:33 ` Stefano Brivio 0 siblings, 1 reply; 9+ messages in thread From: Wei Wang @ 2019-05-14 6:12 UTC (permalink / raw) To: David Ahern, Mikael Magnusson Cc: Linux Kernel Network Developers, Martin KaFai Lau Thanks Mikael for reporting this issue. And thanks David for the bisection. Let me spend some time to reproduce it and see what is going on. From: David Ahern <dsahern@gmail.com> Date: Mon, May 13, 2019 at 8:35 PM To: Mikael Magnusson, <netdev@vger.kernel.org>, Martin KaFai Lau, Wei Wang > On 5/13/19 1:22 PM, Mikael Magnusson wrote: > > Hello list, > > > > I think I have found a regression in 4.15+ kernels. IPv6 PMTU discovery > > doesn't seem to work with source-specific routing (AKA source-address > > dependent routing, SADR). > > > > I made a test script (see attachment). It sets up a test environment > > with three network namespaces (a, b and c) using SADR. The link between > > b and c is configured with MTU 1280. It then runs a ping test with large > > packets. > > > > I have tested a couple of kernels on Ubuntu 19.04 with the following > > results. > > > > mainline 4.14.117-0414117-generic SUCCESS > > ubuntu 4.15.0-1036-oem FAIL > > mainline 5.1.0-050100-generic FAIL > > > > git bisect shows > > good: 38fbeeeeccdb38d0635398e8e344d245f6d8dc52 > bad: 2b760fcf5cfb34e8610df56d83745b2b74ae1379 > > Those are back to back commits so > 2b760fcf5cfb ipv6: hook up exception table to store dst cache > > has to be the bad commit. > > Your patch may work, but does not seem logical relative to code at the > time of 4.15 and the commit that caused the failure. cc'ing authors of > the changes referenced above. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: IPv6 PMTU discovery fails with source-specific routing 2019-05-14 6:12 ` Wei Wang @ 2019-05-14 14:33 ` Stefano Brivio 2019-05-14 19:33 ` Wei Wang 0 siblings, 1 reply; 9+ messages in thread From: Stefano Brivio @ 2019-05-14 14:33 UTC (permalink / raw) To: Mikael Magnusson Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau On Mon, 13 May 2019 23:12:31 -0700 Wei Wang <weiwan@google.com> wrote: > Thanks Mikael for reporting this issue. And thanks David for the bisection. > Let me spend some time to reproduce it and see what is going on. Mikael, by the way, once this is sorted out, it would be nice if you could add your test as a case in tools/testing/selftests/net/pmtu.sh -- you could probably reuse all the setup parts that are already implemented there. -- Stefano ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: IPv6 PMTU discovery fails with source-specific routing 2019-05-14 14:33 ` Stefano Brivio @ 2019-05-14 19:33 ` Wei Wang 2019-05-15 15:43 ` David Ahern 2019-05-15 18:06 ` Martin Lau 0 siblings, 2 replies; 9+ messages in thread From: Wei Wang @ 2019-05-14 19:33 UTC (permalink / raw) To: Stefano Brivio, David Ahern, Martin KaFai Lau Cc: Mikael Magnusson, Linux Kernel Network Developers I think the bug is because when creating exceptions, src_addr is not always set even though fib6_info is in the subtree. (because of rt6_is_gw_or_nonexthop() check) However, when looking up for exceptions, we always set src_addr to the passed in flow->src_addr if fib6_info is in the subtree. That causes the exception lookup to fail. I will make it consistent. However, I don't quite understand the following logic in ip6_rt_cache_alloc(): if (!rt6_is_gw_or_nonexthop(ort)) { if (ort->fib6_dst.plen != 128 && ipv6_addr_equal(&ort->fib6_dst.addr, daddr)) rt->rt6i_flags |= RTF_ANYCAST; #ifdef CONFIG_IPV6_SUBTREES if (rt->rt6i_src.plen && saddr) { rt->rt6i_src.addr = *saddr; rt->rt6i_src.plen = 128; } #endif } Why do we need to check that the route is not gateway and has next hop for updating rt6i_src? I checked the git history and it seems this part was there from very early on (with some refactor in between)... From: Stefano Brivio <sbrivio@redhat.com> Date: Tue, May 14, 2019 at 7:33 AM To: Mikael Magnusson Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau > On Mon, 13 May 2019 23:12:31 -0700 > Wei Wang <weiwan@google.com> wrote: > > > Thanks Mikael for reporting this issue. And thanks David for the bisection. > > Let me spend some time to reproduce it and see what is going on. > > Mikael, by the way, once this is sorted out, it would be nice if you > could add your test as a case in tools/testing/selftests/net/pmtu.sh -- > you could probably reuse all the setup parts that are already > implemented there. > > -- > Stefano ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: IPv6 PMTU discovery fails with source-specific routing 2019-05-14 19:33 ` Wei Wang @ 2019-05-15 15:43 ` David Ahern 2019-05-15 18:06 ` Martin Lau 1 sibling, 0 replies; 9+ messages in thread From: David Ahern @ 2019-05-15 15:43 UTC (permalink / raw) To: Wei Wang, Stefano Brivio, Martin KaFai Lau Cc: Mikael Magnusson, Linux Kernel Network Developers On 5/14/19 1:33 PM, Wei Wang wrote: > I think the bug is because when creating exceptions, src_addr is not > always set even though fib6_info is in the subtree. (because of > rt6_is_gw_or_nonexthop() check) > However, when looking up for exceptions, we always set src_addr to the > passed in flow->src_addr if fib6_info is in the subtree. That causes > the exception lookup to fail. > I will make it consistent. > However, I don't quite understand the following logic in ip6_rt_cache_alloc(): > if (!rt6_is_gw_or_nonexthop(ort)) { > if (ort->fib6_dst.plen != 128 && > ipv6_addr_equal(&ort->fib6_dst.addr, daddr)) > rt->rt6i_flags |= RTF_ANYCAST; > #ifdef CONFIG_IPV6_SUBTREES > if (rt->rt6i_src.plen && saddr) { > rt->rt6i_src.addr = *saddr; > rt->rt6i_src.plen = 128; > } > #endif > } > Why do we need to check that the route is not gateway and has next hop > for updating rt6i_src? I checked the git history and it seems this > part was there from very early on (with some refactor in between)... I can not make sense of that check either. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: IPv6 PMTU discovery fails with source-specific routing 2019-05-14 19:33 ` Wei Wang 2019-05-15 15:43 ` David Ahern @ 2019-05-15 18:06 ` Martin Lau 2019-05-15 18:31 ` Wei Wang 1 sibling, 1 reply; 9+ messages in thread From: Martin Lau @ 2019-05-15 18:06 UTC (permalink / raw) To: Wei Wang Cc: Stefano Brivio, David Ahern, Mikael Magnusson, Linux Kernel Network Developers On Tue, May 14, 2019 at 12:33:25PM -0700, Wei Wang wrote: > I think the bug is because when creating exceptions, src_addr is not > always set even though fib6_info is in the subtree. (because of > rt6_is_gw_or_nonexthop() check) > However, when looking up for exceptions, we always set src_addr to the > passed in flow->src_addr if fib6_info is in the subtree. That causes > the exception lookup to fail. > I will make it consistent. > However, I don't quite understand the following logic in ip6_rt_cache_alloc(): > if (!rt6_is_gw_or_nonexthop(ort)) { > if (ort->fib6_dst.plen != 128 && > ipv6_addr_equal(&ort->fib6_dst.addr, daddr)) > rt->rt6i_flags |= RTF_ANYCAST; > #ifdef CONFIG_IPV6_SUBTREES > if (rt->rt6i_src.plen && saddr) { > rt->rt6i_src.addr = *saddr; > rt->rt6i_src.plen = 128; > } > #endif > } > Why do we need to check that the route is not gateway and has next hop > for updating rt6i_src? I checked the git history and it seems this > part was there from very early on (with some refactor in between)... I also failed to understand the RTF_GATEWAY check. The earliest related commit seems to be c440f1609b65 ("ipv6: Do not depend on rt->n in ip6_pol_route().") How was it working when the exception route was in the tree? > > > From: Stefano Brivio <sbrivio@redhat.com> > Date: Tue, May 14, 2019 at 7:33 AM > To: Mikael Magnusson > Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau > > > On Mon, 13 May 2019 23:12:31 -0700 > > Wei Wang <weiwan@google.com> wrote: > > > > > Thanks Mikael for reporting this issue. And thanks David for the bisection. > > > Let me spend some time to reproduce it and see what is going on. > > > > Mikael, by the way, once this is sorted out, it would be nice if you > > could add your test as a case in tools/testing/selftests/net/pmtu.sh -- > > you could probably reuse all the setup parts that are already > > implemented there. > > > > -- > > Stefano ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: IPv6 PMTU discovery fails with source-specific routing 2019-05-15 18:06 ` Martin Lau @ 2019-05-15 18:31 ` Wei Wang 2019-05-15 21:54 ` Martin Lau 0 siblings, 1 reply; 9+ messages in thread From: Wei Wang @ 2019-05-15 18:31 UTC (permalink / raw) To: Martin Lau Cc: Stefano Brivio, David Ahern, Mikael Magnusson, Linux Kernel Network Developers On Wed, May 15, 2019 at 11:06 AM Martin Lau <kafai@fb.com> wrote: > > On Tue, May 14, 2019 at 12:33:25PM -0700, Wei Wang wrote: > > I think the bug is because when creating exceptions, src_addr is not > > always set even though fib6_info is in the subtree. (because of > > rt6_is_gw_or_nonexthop() check) > > However, when looking up for exceptions, we always set src_addr to the > > passed in flow->src_addr if fib6_info is in the subtree. That causes > > the exception lookup to fail. > > I will make it consistent. > > However, I don't quite understand the following logic in ip6_rt_cache_alloc(): > > if (!rt6_is_gw_or_nonexthop(ort)) { > > if (ort->fib6_dst.plen != 128 && > > ipv6_addr_equal(&ort->fib6_dst.addr, daddr)) > > rt->rt6i_flags |= RTF_ANYCAST; > > #ifdef CONFIG_IPV6_SUBTREES > > if (rt->rt6i_src.plen && saddr) { > > rt->rt6i_src.addr = *saddr; > > rt->rt6i_src.plen = 128; > > } > > #endif > > } > > Why do we need to check that the route is not gateway and has next hop > > for updating rt6i_src? I checked the git history and it seems this > > part was there from very early on (with some refactor in between)... > I also failed to understand the RTF_GATEWAY check. The earliest related > commit seems to be c440f1609b65 ("ipv6: Do not depend on rt->n in ip6_pol_route().") > > How was it working when the exception route was in the tree? > When adding all exception route to the main routing tree, because route cache has dest_addr as /128, the longest prefix match will always match the /128 route entry. > > > > > > From: Stefano Brivio <sbrivio@redhat.com> > > Date: Tue, May 14, 2019 at 7:33 AM > > To: Mikael Magnusson > > Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau > > > > > On Mon, 13 May 2019 23:12:31 -0700 > > > Wei Wang <weiwan@google.com> wrote: > > > > > > > Thanks Mikael for reporting this issue. And thanks David for the bisection. > > > > Let me spend some time to reproduce it and see what is going on. > > > > > > Mikael, by the way, once this is sorted out, it would be nice if you > > > could add your test as a case in tools/testing/selftests/net/pmtu.sh -- > > > you could probably reuse all the setup parts that are already > > > implemented there. > > > > > > -- > > > Stefano ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: IPv6 PMTU discovery fails with source-specific routing 2019-05-15 18:31 ` Wei Wang @ 2019-05-15 21:54 ` Martin Lau 0 siblings, 0 replies; 9+ messages in thread From: Martin Lau @ 2019-05-15 21:54 UTC (permalink / raw) To: Wei Wang Cc: Stefano Brivio, David Ahern, Mikael Magnusson, Linux Kernel Network Developers On Wed, May 15, 2019 at 11:31:44AM -0700, Wei Wang wrote: > On Wed, May 15, 2019 at 11:06 AM Martin Lau <kafai@fb.com> wrote: > > > > On Tue, May 14, 2019 at 12:33:25PM -0700, Wei Wang wrote: > > > I think the bug is because when creating exceptions, src_addr is not > > > always set even though fib6_info is in the subtree. (because of > > > rt6_is_gw_or_nonexthop() check) > > > However, when looking up for exceptions, we always set src_addr to the > > > passed in flow->src_addr if fib6_info is in the subtree. That causes > > > the exception lookup to fail. > > > I will make it consistent. > > > However, I don't quite understand the following logic in ip6_rt_cache_alloc(): > > > if (!rt6_is_gw_or_nonexthop(ort)) { > > > if (ort->fib6_dst.plen != 128 && > > > ipv6_addr_equal(&ort->fib6_dst.addr, daddr)) > > > rt->rt6i_flags |= RTF_ANYCAST; > > > #ifdef CONFIG_IPV6_SUBTREES > > > if (rt->rt6i_src.plen && saddr) { > > > rt->rt6i_src.addr = *saddr; > > > rt->rt6i_src.plen = 128; > > > } > > > #endif > > > } > > > Why do we need to check that the route is not gateway and has next hop > > > for updating rt6i_src? I checked the git history and it seems this > > > part was there from very early on (with some refactor in between)... > > I also failed to understand the RTF_GATEWAY check. The earliest related > > commit seems to be c440f1609b65 ("ipv6: Do not depend on rt->n in ip6_pol_route().") > > > > How was it working when the exception route was in the tree? > > > When adding all exception route to the main routing tree, because > route cache has dest_addr as /128, the longest prefix match will > always match the /128 route entry. Got it. Thanks for the explanation. > > > > > > > > > > From: Stefano Brivio <sbrivio@redhat.com> > > > Date: Tue, May 14, 2019 at 7:33 AM > > > To: Mikael Magnusson > > > Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau > > > > > > > On Mon, 13 May 2019 23:12:31 -0700 > > > > Wei Wang <weiwan@google.com> wrote: > > > > > > > > > Thanks Mikael for reporting this issue. And thanks David for the bisection. > > > > > Let me spend some time to reproduce it and see what is going on. > > > > > > > > Mikael, by the way, once this is sorted out, it would be nice if you > > > > could add your test as a case in tools/testing/selftests/net/pmtu.sh -- > > > > you could probably reuse all the setup parts that are already > > > > implemented there. > > > > > > > > -- > > > > Stefano ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-05-15 21:55 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-13 19:22 IPv6 PMTU discovery fails with source-specific routing Mikael Magnusson 2019-05-14 3:35 ` David Ahern 2019-05-14 6:12 ` Wei Wang 2019-05-14 14:33 ` Stefano Brivio 2019-05-14 19:33 ` Wei Wang 2019-05-15 15:43 ` David Ahern 2019-05-15 18:06 ` Martin Lau 2019-05-15 18:31 ` Wei Wang 2019-05-15 21:54 ` Martin Lau
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.