All of lore.kernel.org
 help / color / mirror / Atom feed
* IPv6 PMTU discovery fails with source-specific routing
@ 2019-05-13 19:22 Mikael Magnusson
  2019-05-14  3:35 ` David Ahern
  0 siblings, 1 reply; 9+ messages in thread
From: Mikael Magnusson @ 2019-05-13 19:22 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 929 bytes --]

Hello list,

I think I have found a regression in 4.15+ kernels. IPv6 PMTU discovery 
doesn't seem to work with source-specific routing (AKA source-address 
dependent routing, SADR).

I made a test script (see attachment). It sets up a test environment 
with three network namespaces (a, b and c) using SADR. The link between 
b and c is configured with MTU 1280. It then runs a ping test with large 
packets.

I have tested a couple of kernels on Ubuntu 19.04 with the following 
results.

mainline 4.14.117-0414117-generic SUCCESS
ubuntu   4.15.0-1036-oem          FAIL
mainline 5.1.0-050100-generic     FAIL

The mainline kernels are from 
https://kernel.ubuntu.com/~kernel-ppa/mainline/, and the other from the 
Ubuntu 19.04 repository.

I have attached a patch against 5.1 which seems to fix the problem in 
the test case.

It's bug #1788623 in Ubuntu: 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1788623

/Mikael

[-- Attachment #2: 0001-net-ipv6-route-Fix-PMTU-for-source-specific-routes.patch --]
[-- Type: text/x-patch, Size: 949 bytes --]

From 4dd6d3e00663ec1749d8c6cf8139a2e255c7a797 Mon Sep 17 00:00:00 2001
From: Mikael Magnusson <mikma@users.sourceforge.net>
Date: Wed, 8 May 2019 22:06:44 +0000
Subject: [PATCH] net/ipv6/route: Fix PMTU for source-specific routes

---
 net/ipv6/route.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0520aca3354b..63d678588ec9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1197,13 +1197,13 @@ static struct rt6_info *ip6_rt_cache_alloc(struct fib6_info *ort,
 		if (ort->fib6_dst.plen != 128 &&
 		    ipv6_addr_equal(&ort->fib6_dst.addr, daddr))
 			rt->rt6i_flags |= RTF_ANYCAST;
+	}
 #ifdef CONFIG_IPV6_SUBTREES
-		if (rt->rt6i_src.plen && saddr) {
-			rt->rt6i_src.addr = *saddr;
-			rt->rt6i_src.plen = 128;
-		}
-#endif
+	if (rt->rt6i_src.plen && saddr) {
+		rt->rt6i_src.addr = *saddr;
+		rt->rt6i_src.plen = 128;
 	}
+#endif
 
 	return rt;
 }
-- 
2.17.1



[-- Attachment #3: pmtu-ns.sh --]
[-- Type: application/x-shellscript, Size: 1433 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: IPv6 PMTU discovery fails with source-specific routing
  2019-05-13 19:22 IPv6 PMTU discovery fails with source-specific routing Mikael Magnusson
@ 2019-05-14  3:35 ` David Ahern
  2019-05-14  6:12   ` Wei Wang
  0 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2019-05-14  3:35 UTC (permalink / raw)
  To: Mikael Magnusson, netdev, Martin KaFai Lau, Wei Wang

On 5/13/19 1:22 PM, Mikael Magnusson wrote:
> Hello list,
> 
> I think I have found a regression in 4.15+ kernels. IPv6 PMTU discovery
> doesn't seem to work with source-specific routing (AKA source-address
> dependent routing, SADR).
> 
> I made a test script (see attachment). It sets up a test environment
> with three network namespaces (a, b and c) using SADR. The link between
> b and c is configured with MTU 1280. It then runs a ping test with large
> packets.
> 
> I have tested a couple of kernels on Ubuntu 19.04 with the following
> results.
> 
> mainline 4.14.117-0414117-generic SUCCESS
> ubuntu   4.15.0-1036-oem          FAIL
> mainline 5.1.0-050100-generic     FAIL
> 

git bisect shows

good: 38fbeeeeccdb38d0635398e8e344d245f6d8dc52
bad:  2b760fcf5cfb34e8610df56d83745b2b74ae1379

Those are back to back commits so
    2b760fcf5cfb ipv6: hook up exception table to store dst cache

has to be the bad commit.

Your patch may work, but does not seem logical relative to code at the
time of 4.15 and the commit that caused the failure. cc'ing authors of
the changes referenced above.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IPv6 PMTU discovery fails with source-specific routing
  2019-05-14  3:35 ` David Ahern
@ 2019-05-14  6:12   ` Wei Wang
  2019-05-14 14:33     ` Stefano Brivio
  0 siblings, 1 reply; 9+ messages in thread
From: Wei Wang @ 2019-05-14  6:12 UTC (permalink / raw)
  To: David Ahern, Mikael Magnusson
  Cc: Linux Kernel Network Developers, Martin KaFai Lau

Thanks Mikael for reporting this issue. And thanks David for the bisection.
Let me spend some time to reproduce it and see what is going on.

From: David Ahern <dsahern@gmail.com>
Date: Mon, May 13, 2019 at 8:35 PM
To: Mikael Magnusson, <netdev@vger.kernel.org>, Martin KaFai Lau, Wei Wang

> On 5/13/19 1:22 PM, Mikael Magnusson wrote:
> > Hello list,
> >
> > I think I have found a regression in 4.15+ kernels. IPv6 PMTU discovery
> > doesn't seem to work with source-specific routing (AKA source-address
> > dependent routing, SADR).
> >
> > I made a test script (see attachment). It sets up a test environment
> > with three network namespaces (a, b and c) using SADR. The link between
> > b and c is configured with MTU 1280. It then runs a ping test with large
> > packets.
> >
> > I have tested a couple of kernels on Ubuntu 19.04 with the following
> > results.
> >
> > mainline 4.14.117-0414117-generic SUCCESS
> > ubuntu   4.15.0-1036-oem          FAIL
> > mainline 5.1.0-050100-generic     FAIL
> >
>
> git bisect shows
>
> good: 38fbeeeeccdb38d0635398e8e344d245f6d8dc52
> bad:  2b760fcf5cfb34e8610df56d83745b2b74ae1379
>
> Those are back to back commits so
>     2b760fcf5cfb ipv6: hook up exception table to store dst cache
>
> has to be the bad commit.
>
> Your patch may work, but does not seem logical relative to code at the
> time of 4.15 and the commit that caused the failure. cc'ing authors of
> the changes referenced above.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IPv6 PMTU discovery fails with source-specific routing
  2019-05-14  6:12   ` Wei Wang
@ 2019-05-14 14:33     ` Stefano Brivio
  2019-05-14 19:33       ` Wei Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Brivio @ 2019-05-14 14:33 UTC (permalink / raw)
  To: Mikael Magnusson
  Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau

On Mon, 13 May 2019 23:12:31 -0700
Wei Wang <weiwan@google.com> wrote:

> Thanks Mikael for reporting this issue. And thanks David for the bisection.
> Let me spend some time to reproduce it and see what is going on.

Mikael, by the way, once this is sorted out, it would be nice if you
could add your test as a case in tools/testing/selftests/net/pmtu.sh --
you could probably reuse all the setup parts that are already
implemented there.

-- 
Stefano

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IPv6 PMTU discovery fails with source-specific routing
  2019-05-14 14:33     ` Stefano Brivio
@ 2019-05-14 19:33       ` Wei Wang
  2019-05-15 15:43         ` David Ahern
  2019-05-15 18:06         ` Martin Lau
  0 siblings, 2 replies; 9+ messages in thread
From: Wei Wang @ 2019-05-14 19:33 UTC (permalink / raw)
  To: Stefano Brivio, David Ahern, Martin KaFai Lau
  Cc: Mikael Magnusson, Linux Kernel Network Developers

I think the bug is because when creating exceptions, src_addr is not
always set even though fib6_info is in the subtree. (because of
rt6_is_gw_or_nonexthop() check)
However, when looking up for exceptions, we always set src_addr to the
passed in flow->src_addr if fib6_info is in the subtree. That causes
the exception lookup to fail.
I will make it consistent.
However, I don't quite understand the following logic in ip6_rt_cache_alloc():
        if (!rt6_is_gw_or_nonexthop(ort)) {
                if (ort->fib6_dst.plen != 128 &&
                    ipv6_addr_equal(&ort->fib6_dst.addr, daddr))
                        rt->rt6i_flags |= RTF_ANYCAST;
#ifdef CONFIG_IPV6_SUBTREES
                if (rt->rt6i_src.plen && saddr) {
                        rt->rt6i_src.addr = *saddr;
                        rt->rt6i_src.plen = 128;
                }
#endif
        }
Why do we need to check that the route is not gateway and has next hop
for updating rt6i_src? I checked the git history and it seems this
part was there from very early on (with some refactor in between)...


From: Stefano Brivio <sbrivio@redhat.com>
Date: Tue, May 14, 2019 at 7:33 AM
To: Mikael Magnusson
Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau

> On Mon, 13 May 2019 23:12:31 -0700
> Wei Wang <weiwan@google.com> wrote:
>
> > Thanks Mikael for reporting this issue. And thanks David for the bisection.
> > Let me spend some time to reproduce it and see what is going on.
>
> Mikael, by the way, once this is sorted out, it would be nice if you
> could add your test as a case in tools/testing/selftests/net/pmtu.sh --
> you could probably reuse all the setup parts that are already
> implemented there.
>
> --
> Stefano

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IPv6 PMTU discovery fails with source-specific routing
  2019-05-14 19:33       ` Wei Wang
@ 2019-05-15 15:43         ` David Ahern
  2019-05-15 18:06         ` Martin Lau
  1 sibling, 0 replies; 9+ messages in thread
From: David Ahern @ 2019-05-15 15:43 UTC (permalink / raw)
  To: Wei Wang, Stefano Brivio, Martin KaFai Lau
  Cc: Mikael Magnusson, Linux Kernel Network Developers

On 5/14/19 1:33 PM, Wei Wang wrote:
> I think the bug is because when creating exceptions, src_addr is not
> always set even though fib6_info is in the subtree. (because of
> rt6_is_gw_or_nonexthop() check)
> However, when looking up for exceptions, we always set src_addr to the
> passed in flow->src_addr if fib6_info is in the subtree. That causes
> the exception lookup to fail.
> I will make it consistent.
> However, I don't quite understand the following logic in ip6_rt_cache_alloc():
>         if (!rt6_is_gw_or_nonexthop(ort)) {
>                 if (ort->fib6_dst.plen != 128 &&
>                     ipv6_addr_equal(&ort->fib6_dst.addr, daddr))
>                         rt->rt6i_flags |= RTF_ANYCAST;
> #ifdef CONFIG_IPV6_SUBTREES
>                 if (rt->rt6i_src.plen && saddr) {
>                         rt->rt6i_src.addr = *saddr;
>                         rt->rt6i_src.plen = 128;
>                 }
> #endif
>         }
> Why do we need to check that the route is not gateway and has next hop
> for updating rt6i_src? I checked the git history and it seems this
> part was there from very early on (with some refactor in between)...

I can not make sense of that check either.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IPv6 PMTU discovery fails with source-specific routing
  2019-05-14 19:33       ` Wei Wang
  2019-05-15 15:43         ` David Ahern
@ 2019-05-15 18:06         ` Martin Lau
  2019-05-15 18:31           ` Wei Wang
  1 sibling, 1 reply; 9+ messages in thread
From: Martin Lau @ 2019-05-15 18:06 UTC (permalink / raw)
  To: Wei Wang
  Cc: Stefano Brivio, David Ahern, Mikael Magnusson,
	Linux Kernel Network Developers

On Tue, May 14, 2019 at 12:33:25PM -0700, Wei Wang wrote:
> I think the bug is because when creating exceptions, src_addr is not
> always set even though fib6_info is in the subtree. (because of
> rt6_is_gw_or_nonexthop() check)
> However, when looking up for exceptions, we always set src_addr to the
> passed in flow->src_addr if fib6_info is in the subtree. That causes
> the exception lookup to fail.
> I will make it consistent.
> However, I don't quite understand the following logic in ip6_rt_cache_alloc():
>         if (!rt6_is_gw_or_nonexthop(ort)) {
>                 if (ort->fib6_dst.plen != 128 &&
>                     ipv6_addr_equal(&ort->fib6_dst.addr, daddr))
>                         rt->rt6i_flags |= RTF_ANYCAST;
> #ifdef CONFIG_IPV6_SUBTREES
>                 if (rt->rt6i_src.plen && saddr) {
>                         rt->rt6i_src.addr = *saddr;
>                         rt->rt6i_src.plen = 128;
>                 }
> #endif
>         }
> Why do we need to check that the route is not gateway and has next hop
> for updating rt6i_src? I checked the git history and it seems this
> part was there from very early on (with some refactor in between)...
I also failed to understand the RTF_GATEWAY check.  The earliest related
commit seems to be c440f1609b65 ("ipv6: Do not depend on rt->n in ip6_pol_route().")

How was it working when the exception route was in the tree?

> 
> 
> From: Stefano Brivio <sbrivio@redhat.com>
> Date: Tue, May 14, 2019 at 7:33 AM
> To: Mikael Magnusson
> Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau
> 
> > On Mon, 13 May 2019 23:12:31 -0700
> > Wei Wang <weiwan@google.com> wrote:
> >
> > > Thanks Mikael for reporting this issue. And thanks David for the bisection.
> > > Let me spend some time to reproduce it and see what is going on.
> >
> > Mikael, by the way, once this is sorted out, it would be nice if you
> > could add your test as a case in tools/testing/selftests/net/pmtu.sh --
> > you could probably reuse all the setup parts that are already
> > implemented there.
> >
> > --
> > Stefano

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IPv6 PMTU discovery fails with source-specific routing
  2019-05-15 18:06         ` Martin Lau
@ 2019-05-15 18:31           ` Wei Wang
  2019-05-15 21:54             ` Martin Lau
  0 siblings, 1 reply; 9+ messages in thread
From: Wei Wang @ 2019-05-15 18:31 UTC (permalink / raw)
  To: Martin Lau
  Cc: Stefano Brivio, David Ahern, Mikael Magnusson,
	Linux Kernel Network Developers

On Wed, May 15, 2019 at 11:06 AM Martin Lau <kafai@fb.com> wrote:
>
> On Tue, May 14, 2019 at 12:33:25PM -0700, Wei Wang wrote:
> > I think the bug is because when creating exceptions, src_addr is not
> > always set even though fib6_info is in the subtree. (because of
> > rt6_is_gw_or_nonexthop() check)
> > However, when looking up for exceptions, we always set src_addr to the
> > passed in flow->src_addr if fib6_info is in the subtree. That causes
> > the exception lookup to fail.
> > I will make it consistent.
> > However, I don't quite understand the following logic in ip6_rt_cache_alloc():
> >         if (!rt6_is_gw_or_nonexthop(ort)) {
> >                 if (ort->fib6_dst.plen != 128 &&
> >                     ipv6_addr_equal(&ort->fib6_dst.addr, daddr))
> >                         rt->rt6i_flags |= RTF_ANYCAST;
> > #ifdef CONFIG_IPV6_SUBTREES
> >                 if (rt->rt6i_src.plen && saddr) {
> >                         rt->rt6i_src.addr = *saddr;
> >                         rt->rt6i_src.plen = 128;
> >                 }
> > #endif
> >         }
> > Why do we need to check that the route is not gateway and has next hop
> > for updating rt6i_src? I checked the git history and it seems this
> > part was there from very early on (with some refactor in between)...
> I also failed to understand the RTF_GATEWAY check.  The earliest related
> commit seems to be c440f1609b65 ("ipv6: Do not depend on rt->n in ip6_pol_route().")
>
> How was it working when the exception route was in the tree?
>
When adding all exception route to the main routing tree, because
route cache has dest_addr as /128, the longest prefix match will
always match the /128 route entry.

> >
> >
> > From: Stefano Brivio <sbrivio@redhat.com>
> > Date: Tue, May 14, 2019 at 7:33 AM
> > To: Mikael Magnusson
> > Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau
> >
> > > On Mon, 13 May 2019 23:12:31 -0700
> > > Wei Wang <weiwan@google.com> wrote:
> > >
> > > > Thanks Mikael for reporting this issue. And thanks David for the bisection.
> > > > Let me spend some time to reproduce it and see what is going on.
> > >
> > > Mikael, by the way, once this is sorted out, it would be nice if you
> > > could add your test as a case in tools/testing/selftests/net/pmtu.sh --
> > > you could probably reuse all the setup parts that are already
> > > implemented there.
> > >
> > > --
> > > Stefano

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IPv6 PMTU discovery fails with source-specific routing
  2019-05-15 18:31           ` Wei Wang
@ 2019-05-15 21:54             ` Martin Lau
  0 siblings, 0 replies; 9+ messages in thread
From: Martin Lau @ 2019-05-15 21:54 UTC (permalink / raw)
  To: Wei Wang
  Cc: Stefano Brivio, David Ahern, Mikael Magnusson,
	Linux Kernel Network Developers

On Wed, May 15, 2019 at 11:31:44AM -0700, Wei Wang wrote:
> On Wed, May 15, 2019 at 11:06 AM Martin Lau <kafai@fb.com> wrote:
> >
> > On Tue, May 14, 2019 at 12:33:25PM -0700, Wei Wang wrote:
> > > I think the bug is because when creating exceptions, src_addr is not
> > > always set even though fib6_info is in the subtree. (because of
> > > rt6_is_gw_or_nonexthop() check)
> > > However, when looking up for exceptions, we always set src_addr to the
> > > passed in flow->src_addr if fib6_info is in the subtree. That causes
> > > the exception lookup to fail.
> > > I will make it consistent.
> > > However, I don't quite understand the following logic in ip6_rt_cache_alloc():
> > >         if (!rt6_is_gw_or_nonexthop(ort)) {
> > >                 if (ort->fib6_dst.plen != 128 &&
> > >                     ipv6_addr_equal(&ort->fib6_dst.addr, daddr))
> > >                         rt->rt6i_flags |= RTF_ANYCAST;
> > > #ifdef CONFIG_IPV6_SUBTREES
> > >                 if (rt->rt6i_src.plen && saddr) {
> > >                         rt->rt6i_src.addr = *saddr;
> > >                         rt->rt6i_src.plen = 128;
> > >                 }
> > > #endif
> > >         }
> > > Why do we need to check that the route is not gateway and has next hop
> > > for updating rt6i_src? I checked the git history and it seems this
> > > part was there from very early on (with some refactor in between)...
> > I also failed to understand the RTF_GATEWAY check.  The earliest related
> > commit seems to be c440f1609b65 ("ipv6: Do not depend on rt->n in ip6_pol_route().")
> >
> > How was it working when the exception route was in the tree?
> >
> When adding all exception route to the main routing tree, because
> route cache has dest_addr as /128, the longest prefix match will
> always match the /128 route entry.
Got it.  Thanks for the explanation.

> 
> > >
> > >
> > > From: Stefano Brivio <sbrivio@redhat.com>
> > > Date: Tue, May 14, 2019 at 7:33 AM
> > > To: Mikael Magnusson
> > > Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau
> > >
> > > > On Mon, 13 May 2019 23:12:31 -0700
> > > > Wei Wang <weiwan@google.com> wrote:
> > > >
> > > > > Thanks Mikael for reporting this issue. And thanks David for the bisection.
> > > > > Let me spend some time to reproduce it and see what is going on.
> > > >
> > > > Mikael, by the way, once this is sorted out, it would be nice if you
> > > > could add your test as a case in tools/testing/selftests/net/pmtu.sh --
> > > > you could probably reuse all the setup parts that are already
> > > > implemented there.
> > > >
> > > > --
> > > > Stefano

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-05-15 21:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-13 19:22 IPv6 PMTU discovery fails with source-specific routing Mikael Magnusson
2019-05-14  3:35 ` David Ahern
2019-05-14  6:12   ` Wei Wang
2019-05-14 14:33     ` Stefano Brivio
2019-05-14 19:33       ` Wei Wang
2019-05-15 15:43         ` David Ahern
2019-05-15 18:06         ` Martin Lau
2019-05-15 18:31           ` Wei Wang
2019-05-15 21:54             ` Martin Lau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.