All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] ipv6: fix src addr routing with the exception table
@ 2019-05-15  0:46 Wei Wang
  2019-05-15 15:56 ` David Ahern
  2019-05-15 21:50 ` Martin Lau
  0 siblings, 2 replies; 13+ messages in thread
From: Wei Wang @ 2019-05-15  0:46 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Martin KaFai Lau, Wei Wang, Mikael Magnusson, David Ahern, Eric Dumazet

From: Wei Wang <weiwan@google.com>

When inserting route cache into the exception table, the key is
generated with both src_addr and dest_addr with src addr routing.
However, current logic always assumes the src_addr used to generate the
key is a /128 host address. This is not true in the following scenarios:
1. When the route is a gateway route or does not have next hop.
   (rt6_is_gw_or_nonexthop() == false)
2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL.
This means, when looking for a route cache in the exception table, we
have to do the lookup twice: first time with the passed in /128 host
address, second time with the src_addr stored in fib6_info.

This solves the pmtu discovery issue reported by Mikael Magnusson where
a route cache with a lower mtu info is created for a gateway route with
src addr. However, the lookup code is not able to find this route cache.

Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
Reported-by: Mikael Magnusson <mikael.kernel@lists.m7n.se>
Bisected-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/route.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 23a20d62daac..c36900a07a78 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1574,23 +1574,36 @@ static struct rt6_info *rt6_find_cached_rt(const struct fib6_result *res,
 	struct rt6_exception *rt6_ex;
 	struct rt6_info *ret = NULL;
 
-	bucket = rcu_dereference(res->f6i->rt6i_exception_bucket);
-
 #ifdef CONFIG_IPV6_SUBTREES
 	/* fib6i_src.plen != 0 indicates f6i is in subtree
 	 * and exception table is indexed by a hash of
 	 * both fib6_dst and fib6_src.
-	 * Otherwise, the exception table is indexed by
-	 * a hash of only fib6_dst.
+	 * However, the src addr used to create the hash
+	 * might not be exactly the passed in saddr which
+	 * is a /128 addr from the flow.
+	 * So we need to use f6i->fib6_src to redo lookup
+	 * if the passed in saddr does not find anything.
+	 * (See the logic in ip6_rt_cache_alloc() on how
+	 * rt->rt6i_src is updated.)
 	 */
 	if (res->f6i->fib6_src.plen)
 		src_key = saddr;
+find_ex:
 #endif
+	bucket = rcu_dereference(res->f6i->rt6i_exception_bucket);
 	rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key);
 
 	if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i))
 		ret = rt6_ex->rt6i;
 
+#ifdef CONFIG_IPV6_SUBTREES
+	/* Use fib6_src as src_key and redo lookup */
+	if (!ret && src_key == saddr) {
+		src_key = &res->f6i->fib6_src.addr;
+		goto find_ex;
+	}
+#endif
+
 	return ret;
 }
 
@@ -2683,12 +2696,22 @@ u32 ip6_mtu_from_fib6(const struct fib6_result *res,
 #ifdef CONFIG_IPV6_SUBTREES
 	if (f6i->fib6_src.plen)
 		src_key = saddr;
+find_ex:
 #endif
-
 	bucket = rcu_dereference(f6i->rt6i_exception_bucket);
 	rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key);
 	if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i))
 		mtu = dst_metric_raw(&rt6_ex->rt6i->dst, RTAX_MTU);
+#ifdef CONFIG_IPV6_SUBTREES
+	/* Similar logic as in rt6_find_cached_rt().
+	 * We need to use f6i->fib6_src to redo lookup in exception
+	 * table if saddr did not yield any result.
+	 */
+	else if (src_key == saddr) {
+		src_key = &f6i->fib6_src.addr;
+		goto find_ex;
+	}
+#endif
 
 	if (likely(!mtu)) {
 		struct net_device *dev = nh->fib_nh_dev;
-- 
2.21.0.1020.gf2820cf01a-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net] ipv6: fix src addr routing with the exception table
  2019-05-15  0:46 [PATCH net] ipv6: fix src addr routing with the exception table Wei Wang
@ 2019-05-15 15:56 ` David Ahern
  2019-05-15 16:38   ` David Ahern
  2019-05-15 21:50 ` Martin Lau
  1 sibling, 1 reply; 13+ messages in thread
From: David Ahern @ 2019-05-15 15:56 UTC (permalink / raw)
  To: Wei Wang, David Miller, netdev
  Cc: Martin KaFai Lau, Wei Wang, Mikael Magnusson, Eric Dumazet

On 5/14/19 6:46 PM, Wei Wang wrote:
> From: Wei Wang <weiwan@google.com>
> 
> When inserting route cache into the exception table, the key is
> generated with both src_addr and dest_addr with src addr routing.
> However, current logic always assumes the src_addr used to generate the
> key is a /128 host address. This is not true in the following scenarios:
> 1. When the route is a gateway route or does not have next hop.
>    (rt6_is_gw_or_nonexthop() == false)
> 2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL.
> This means, when looking for a route cache in the exception table, we
> have to do the lookup twice: first time with the passed in /128 host
> address, second time with the src_addr stored in fib6_info.
> 
> This solves the pmtu discovery issue reported by Mikael Magnusson where
> a route cache with a lower mtu info is created for a gateway route with
> src addr. However, the lookup code is not able to find this route cache.
> 
> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> Reported-by: Mikael Magnusson <mikael.kernel@lists.m7n.se>
> Bisected-by: David Ahern <dsahern@gmail.com>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv6/route.c | 33 ++++++++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 23a20d62daac..c36900a07a78 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1574,23 +1574,36 @@ static struct rt6_info *rt6_find_cached_rt(const struct fib6_result *res,
>  	struct rt6_exception *rt6_ex;
>  	struct rt6_info *ret = NULL;
>  
> -	bucket = rcu_dereference(res->f6i->rt6i_exception_bucket);
> -
>  #ifdef CONFIG_IPV6_SUBTREES
>  	/* fib6i_src.plen != 0 indicates f6i is in subtree
>  	 * and exception table is indexed by a hash of
>  	 * both fib6_dst and fib6_src.
> -	 * Otherwise, the exception table is indexed by
> -	 * a hash of only fib6_dst.
> +	 * However, the src addr used to create the hash
> +	 * might not be exactly the passed in saddr which
> +	 * is a /128 addr from the flow.
> +	 * So we need to use f6i->fib6_src to redo lookup
> +	 * if the passed in saddr does not find anything.
> +	 * (See the logic in ip6_rt_cache_alloc() on how
> +	 * rt->rt6i_src is updated.)
>  	 */
>  	if (res->f6i->fib6_src.plen)
>  		src_key = saddr;
> +find_ex:
>  #endif
> +	bucket = rcu_dereference(res->f6i->rt6i_exception_bucket);
>  	rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key);
>  
>  	if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i))
>  		ret = rt6_ex->rt6i;
>  
> +#ifdef CONFIG_IPV6_SUBTREES
> +	/* Use fib6_src as src_key and redo lookup */
> +	if (!ret && src_key == saddr) {
> +		src_key = &res->f6i->fib6_src.addr;
> +		goto find_ex;
> +	}
> +#endif
> +
>  	return ret;
>  }
>  
> @@ -2683,12 +2696,22 @@ u32 ip6_mtu_from_fib6(const struct fib6_result *res,
>  #ifdef CONFIG_IPV6_SUBTREES
>  	if (f6i->fib6_src.plen)
>  		src_key = saddr;
> +find_ex:
>  #endif
> -
>  	bucket = rcu_dereference(f6i->rt6i_exception_bucket);
>  	rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key);
>  	if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i))
>  		mtu = dst_metric_raw(&rt6_ex->rt6i->dst, RTAX_MTU);
> +#ifdef CONFIG_IPV6_SUBTREES
> +	/* Similar logic as in rt6_find_cached_rt().
> +	 * We need to use f6i->fib6_src to redo lookup in exception
> +	 * table if saddr did not yield any result.
> +	 */
> +	else if (src_key == saddr) {
> +		src_key = &f6i->fib6_src.addr;
> +		goto find_ex;
> +	}
> +#endif
>  
>  	if (likely(!mtu)) {
>  		struct net_device *dev = nh->fib_nh_dev;
> 

What about rt6_remove_exception_rt?

You can add a 'cache' hook to ip/iproute.c to delete the cached routes
and verify that it works. I seem to have misplaced my patch to do it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] ipv6: fix src addr routing with the exception table
  2019-05-15 15:56 ` David Ahern
@ 2019-05-15 16:38   ` David Ahern
  2019-05-15 17:25     ` Wei Wang
  0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2019-05-15 16:38 UTC (permalink / raw)
  To: Wei Wang, David Miller, netdev
  Cc: Martin KaFai Lau, Wei Wang, Mikael Magnusson, Eric Dumazet

[-- Attachment #1: Type: text/plain, Size: 195 bytes --]

On 5/15/19 9:56 AM, David Ahern wrote:
> You can add a 'cache' hook to ip/iproute.c to delete the cached routes
> and verify that it works. I seem to have misplaced my patch to do it.

found it.

[-- Attachment #2: 0001-route-Add-cache-keyword-to-iproute_modify.patch --]
[-- Type: text/plain, Size: 1553 bytes --]

From 7a328753a93321a07a5228fb32ed881d82d7a537 Mon Sep 17 00:00:00 2001
From: David Ahern <dsahern@gmail.com>
Date: Mon, 6 May 2019 08:09:01 -0700
Subject: [PATCH iproute2-next] route: Add cache keyword to iproute_modify

Kernel supports deleting cached routes (e.g., exceptions). Add cache
keyword to iproute_modify to set RTM_F_CLONED in the request.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 ip/iproute.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 2b3dcc5dbd53..d7a812a39047 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -74,7 +74,7 @@ static void usage(void)
 		"       ip route { add | del | change | append | replace } ROUTE\n"
 		"SELECTOR := [ root PREFIX ] [ match PREFIX ] [ exact PREFIX ]\n"
 		"            [ table TABLE_ID ] [ vrf NAME ] [ proto RTPROTO ]\n"
-		"            [ type TYPE ] [ scope SCOPE ]\n"
+		"            [ type TYPE ] [ scope SCOPE ] [ cache ]\n"
 		"ROUTE := NODE_SPEC [ INFO_SPEC ]\n"
 		"NODE_SPEC := [ TYPE ] PREFIX [ tos TOS ]\n"
 		"             [ table TABLE_ID ] [ proto RTPROTO ]\n"
@@ -1444,6 +1444,8 @@ static int iproute_modify(int cmd, unsigned int flags, int argc, char **argv)
 			if (fastopen_no_cookie != 1 && fastopen_no_cookie != 0)
 				invarg("\"fastopen_no_cookie\" value should be 0 or 1\n", *argv);
 			rta_addattr32(mxrta, sizeof(mxbuf), RTAX_FASTOPEN_NO_COOKIE, fastopen_no_cookie);
+		} else if (!strcmp(*argv, "cache")) {
+			req.r.rtm_flags |= RTM_F_CLONED;
 		} else {
 			int type;
 			inet_prefix dst;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net] ipv6: fix src addr routing with the exception table
  2019-05-15 16:38   ` David Ahern
@ 2019-05-15 17:25     ` Wei Wang
  2019-05-15 17:28       ` Wei Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Wei Wang @ 2019-05-15 17:25 UTC (permalink / raw)
  To: David Ahern
  Cc: Wei Wang, David Miller, Linux Kernel Network Developers,
	Martin KaFai Lau, Mikael Magnusson, Eric Dumazet

>
> What about rt6_remove_exception_rt?
>
> You can add a 'cache' hook to ip/iproute.c to delete the cached routes
> and verify that it works. I seem to have misplaced my patch to do it.
I don't think rt6_remove_exception_rt() needs any change.
It is because it gets the route cache rt6_info as the input parameter,
not specific saddr or daddr from a flow or a packet.
It is guaranteed that the hash used in the exception table is
generated from rt6_info->rt6i_dst and rt6_info->rt6i_src.

For the case where user tries to delete a cache route, ip6_route_del()
calls rt6_find_cached_rt() to find the cached route first. And
rt6_find_cached_rt() is taken care of to find the cached route
according to both passed in src addr and f6i->fib6_src.
So I think we are good here.

From: David Ahern <dsahern@gmail.com>
Date: Wed, May 15, 2019 at 9:38 AM
To: Wei Wang, David Miller, <netdev@vger.kernel.org>
Cc: Martin KaFai Lau, Wei Wang, Mikael Magnusson, Eric Dumazet

> On 5/15/19 9:56 AM, David Ahern wrote:
> > You can add a 'cache' hook to ip/iproute.c to delete the cached routes
> > and verify that it works. I seem to have misplaced my patch to do it.
>
> found it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] ipv6: fix src addr routing with the exception table
  2019-05-15 17:25     ` Wei Wang
@ 2019-05-15 17:28       ` Wei Wang
  2019-05-15 17:32         ` David Ahern
  0 siblings, 1 reply; 13+ messages in thread
From: Wei Wang @ 2019-05-15 17:28 UTC (permalink / raw)
  To: David Ahern
  Cc: Wei Wang, David Miller, Linux Kernel Network Developers,
	Martin KaFai Lau, Mikael Magnusson, Eric Dumazet

From: Wei Wang <weiwan@google.com>
Date: Wed, May 15, 2019 at 10:25 AM
To: David Ahern
Cc: Wei Wang, David Miller, Linux Kernel Network Developers, Martin
KaFai Lau, Mikael Magnusson, Eric Dumazet

> >
> > What about rt6_remove_exception_rt?
> >
> > You can add a 'cache' hook to ip/iproute.c to delete the cached routes
> > and verify that it works. I seem to have misplaced my patch to do it.
> I don't think rt6_remove_exception_rt() needs any change.
> It is because it gets the route cache rt6_info as the input parameter,
> not specific saddr or daddr from a flow or a packet.
> It is guaranteed that the hash used in the exception table is
> generated from rt6_info->rt6i_dst and rt6_info->rt6i_src.
>
> For the case where user tries to delete a cache route, ip6_route_del()
> calls rt6_find_cached_rt() to find the cached route first. And
> rt6_find_cached_rt() is taken care of to find the cached route
> according to both passed in src addr and f6i->fib6_src.
> So I think we are good here.
>
> From: David Ahern <dsahern@gmail.com>
> Date: Wed, May 15, 2019 at 9:38 AM
> To: Wei Wang, David Miller, <netdev@vger.kernel.org>
> Cc: Martin KaFai Lau, Wei Wang, Mikael Magnusson, Eric Dumazet
>
> > On 5/15/19 9:56 AM, David Ahern wrote:
> > > You can add a 'cache' hook to ip/iproute.c to delete the cached routes
> > > and verify that it works. I seem to have misplaced my patch to do it.
> >
> > found it.

Thanks. I patched it to iproute2 and tried it.
The route cache is removed by doing:
ip netns exec a ./ip -6 route del fd01::c from fd00::a cache

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] ipv6: fix src addr routing with the exception table
  2019-05-15 17:28       ` Wei Wang
@ 2019-05-15 17:32         ` David Ahern
  2019-05-15 17:45           ` Wei Wang
  0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2019-05-15 17:32 UTC (permalink / raw)
  To: Wei Wang
  Cc: Wei Wang, David Miller, Linux Kernel Network Developers,
	Martin KaFai Lau, Mikael Magnusson, Eric Dumazet

On 5/15/19 11:28 AM, Wei Wang wrote:
> From: Wei Wang <weiwan@google.com>
> Date: Wed, May 15, 2019 at 10:25 AM
> To: David Ahern
> Cc: Wei Wang, David Miller, Linux Kernel Network Developers, Martin
> KaFai Lau, Mikael Magnusson, Eric Dumazet
> 
>>>
>>> What about rt6_remove_exception_rt?
>>>
>>> You can add a 'cache' hook to ip/iproute.c to delete the cached routes
>>> and verify that it works. I seem to have misplaced my patch to do it.
>> I don't think rt6_remove_exception_rt() needs any change.
>> It is because it gets the route cache rt6_info as the input parameter,
>> not specific saddr or daddr from a flow or a packet.
>> It is guaranteed that the hash used in the exception table is
>> generated from rt6_info->rt6i_dst and rt6_info->rt6i_src.
>>
>> For the case where user tries to delete a cache route, ip6_route_del()
>> calls rt6_find_cached_rt() to find the cached route first. And
>> rt6_find_cached_rt() is taken care of to find the cached route
>> according to both passed in src addr and f6i->fib6_src.
>> So I think we are good here.
>>
>> From: David Ahern <dsahern@gmail.com>
>> Date: Wed, May 15, 2019 at 9:38 AM
>> To: Wei Wang, David Miller, <netdev@vger.kernel.org>
>> Cc: Martin KaFai Lau, Wei Wang, Mikael Magnusson, Eric Dumazet
>>
>>> On 5/15/19 9:56 AM, David Ahern wrote:
>>>> You can add a 'cache' hook to ip/iproute.c to delete the cached routes
>>>> and verify that it works. I seem to have misplaced my patch to do it.
>>>
>>> found it.
> 
> Thanks. I patched it to iproute2 and tried it.
> The route cache is removed by doing:
> ip netns exec a ./ip -6 route del fd01::c from fd00::a cache 
> 

you have to pass in a device. The first line in ip6_del_cached_rt:

if (cfg->fc_ifindex && rt->dst.dev->ifindex != cfg->fc_ifindex)
                goto out;

'ip route get' is one way to check if it has been deleted. We really
need to add support for dumping exception routes.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] ipv6: fix src addr routing with the exception table
  2019-05-15 17:32         ` David Ahern
@ 2019-05-15 17:45           ` Wei Wang
  2019-05-15 17:49             ` David Ahern
  0 siblings, 1 reply; 13+ messages in thread
From: Wei Wang @ 2019-05-15 17:45 UTC (permalink / raw)
  To: David Ahern
  Cc: Wei Wang, David Miller, Linux Kernel Network Developers,
	Martin KaFai Lau, Mikael Magnusson, Eric Dumazet

From: David Ahern <dsahern@gmail.com>
Date: Wed, May 15, 2019 at 10:33 AM
To: Wei Wang
Cc: Wei Wang, David Miller, Linux Kernel Network Developers, Martin
KaFai Lau, Mikael Magnusson, Eric Dumazet

> On 5/15/19 11:28 AM, Wei Wang wrote:
> > From: Wei Wang <weiwan@google.com>
> > Date: Wed, May 15, 2019 at 10:25 AM
> > To: David Ahern
> > Cc: Wei Wang, David Miller, Linux Kernel Network Developers, Martin
> > KaFai Lau, Mikael Magnusson, Eric Dumazet
> >
> >>>
> >>> What about rt6_remove_exception_rt?
> >>>
> >>> You can add a 'cache' hook to ip/iproute.c to delete the cached routes
> >>> and verify that it works. I seem to have misplaced my patch to do it.
> >> I don't think rt6_remove_exception_rt() needs any change.
> >> It is because it gets the route cache rt6_info as the input parameter,
> >> not specific saddr or daddr from a flow or a packet.
> >> It is guaranteed that the hash used in the exception table is
> >> generated from rt6_info->rt6i_dst and rt6_info->rt6i_src.
> >>
> >> For the case where user tries to delete a cache route, ip6_route_del()
> >> calls rt6_find_cached_rt() to find the cached route first. And
> >> rt6_find_cached_rt() is taken care of to find the cached route
> >> according to both passed in src addr and f6i->fib6_src.
> >> So I think we are good here.
> >>
> >> From: David Ahern <dsahern@gmail.com>
> >> Date: Wed, May 15, 2019 at 9:38 AM
> >> To: Wei Wang, David Miller, <netdev@vger.kernel.org>
> >> Cc: Martin KaFai Lau, Wei Wang, Mikael Magnusson, Eric Dumazet
> >>
> >>> On 5/15/19 9:56 AM, David Ahern wrote:
> >>>> You can add a 'cache' hook to ip/iproute.c to delete the cached routes
> >>>> and verify that it works. I seem to have misplaced my patch to do it.
> >>>
> >>> found it.
> >
> > Thanks. I patched it to iproute2 and tried it.
> > The route cache is removed by doing:
> > ip netns exec a ./ip -6 route del fd01::c from fd00::a cache
> >
>
> you have to pass in a device. The first line in ip6_del_cached_rt:
>
> if (cfg->fc_ifindex && rt->dst.dev->ifindex != cfg->fc_ifindex)
>                 goto out;
>
> 'ip route get' is one way to check if it has been deleted. We really
> need to add support for dumping exception routes.

Without passing in dev, fc_ifindex = 0. So it won't goto out. Isn't it?
The way I checked if the route cache is being removed is by doing:
ip netns exec a cat /proc/net/rt6_stats
The 5th counter is the number of cached routes right now in the system.

The output I get after I run the reproducer:
# ip netns exec a cat /proc/net/rt6_stats
000b 0006 000e 0006 0001 0005 0000
# ip netns exec a ./ip -6 route del fd01::c from fd00::/64 cache
# ip netns exec a cat /proc/net/rt6_stats
000b 0006 0012 0006 0000 0004 0000

The same behavior if I pass in dev:
# ip netns exec a cat /proc/net/rt6_stats
000b 0006 000c 0006 0001 0004 0000
# ip netns exec a ./ip -6 route del fd01::c from fd00::/64 dev vethab cache
# ip netns exec a cat /proc/net/rt6_stats
000b 0006 0013 0006 0000 0003 0000

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] ipv6: fix src addr routing with the exception table
  2019-05-15 17:45           ` Wei Wang
@ 2019-05-15 17:49             ` David Ahern
  0 siblings, 0 replies; 13+ messages in thread
From: David Ahern @ 2019-05-15 17:49 UTC (permalink / raw)
  To: Wei Wang
  Cc: Wei Wang, David Miller, Linux Kernel Network Developers,
	Martin KaFai Lau, Mikael Magnusson, Eric Dumazet

On 5/15/19 11:45 AM, Wei Wang wrote:
>>
>> you have to pass in a device. The first line in ip6_del_cached_rt:
>>
>> if (cfg->fc_ifindex && rt->dst.dev->ifindex != cfg->fc_ifindex)
>>                 goto out;
>>
>> 'ip route get' is one way to check if it has been deleted. We really
>> need to add support for dumping exception routes.
> 
> Without passing in dev, fc_ifindex = 0. So it won't goto out. Isn't it?

ugh, yes, blew right past that.

> The way I checked if the route cache is being removed is by doing:
> ip netns exec a cat /proc/net/rt6_stats
> The 5th counter is the number of cached routes right now in the system.
> 
> The output I get after I run the reproducer:
> # ip netns exec a cat /proc/net/rt6_stats
> 000b 0006 000e 0006 0001 0005 0000
> # ip netns exec a ./ip -6 route del fd01::c from fd00::/64 cache
> # ip netns exec a cat /proc/net/rt6_stats
> 000b 0006 0012 0006 0000 0004 0000
> 
> The same behavior if I pass in dev:
> # ip netns exec a cat /proc/net/rt6_stats
> 000b 0006 000c 0006 0001 0004 0000
> # ip netns exec a ./ip -6 route del fd01::c from fd00::/64 dev vethab cache
> # ip netns exec a cat /proc/net/rt6_stats
> 000b 0006 0013 0006 0000 0003 0000
> 

ok.

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] ipv6: fix src addr routing with the exception table
  2019-05-15  0:46 [PATCH net] ipv6: fix src addr routing with the exception table Wei Wang
  2019-05-15 15:56 ` David Ahern
@ 2019-05-15 21:50 ` Martin Lau
  2019-05-16  0:03   ` Wei Wang
  1 sibling, 1 reply; 13+ messages in thread
From: Martin Lau @ 2019-05-15 21:50 UTC (permalink / raw)
  To: Wei Wang
  Cc: David Miller, netdev, Wei Wang, Mikael Magnusson, David Ahern,
	Eric Dumazet

On Tue, May 14, 2019 at 05:46:10PM -0700, Wei Wang wrote:
> From: Wei Wang <weiwan@google.com>
> 
> When inserting route cache into the exception table, the key is
> generated with both src_addr and dest_addr with src addr routing.
> However, current logic always assumes the src_addr used to generate the
> key is a /128 host address. This is not true in the following scenarios:
> 1. When the route is a gateway route or does not have next hop.
>    (rt6_is_gw_or_nonexthop() == false)
> 2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL.
> This means, when looking for a route cache in the exception table, we
> have to do the lookup twice: first time with the passed in /128 host
> address, second time with the src_addr stored in fib6_info.
> 
> This solves the pmtu discovery issue reported by Mikael Magnusson where
> a route cache with a lower mtu info is created for a gateway route with
> src addr. However, the lookup code is not able to find this route cache.
> 
> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> Reported-by: Mikael Magnusson <mikael.kernel@lists.m7n.se>
> Bisected-by: David Ahern <dsahern@gmail.com>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv6/route.c | 33 ++++++++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 23a20d62daac..c36900a07a78 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1574,23 +1574,36 @@ static struct rt6_info *rt6_find_cached_rt(const struct fib6_result *res,
>  	struct rt6_exception *rt6_ex;
>  	struct rt6_info *ret = NULL;
>  
> -	bucket = rcu_dereference(res->f6i->rt6i_exception_bucket);
> -
>  #ifdef CONFIG_IPV6_SUBTREES
>  	/* fib6i_src.plen != 0 indicates f6i is in subtree
>  	 * and exception table is indexed by a hash of
>  	 * both fib6_dst and fib6_src.
> -	 * Otherwise, the exception table is indexed by
> -	 * a hash of only fib6_dst.
> +	 * However, the src addr used to create the hash
> +	 * might not be exactly the passed in saddr which
> +	 * is a /128 addr from the flow.
> +	 * So we need to use f6i->fib6_src to redo lookup
> +	 * if the passed in saddr does not find anything.
> +	 * (See the logic in ip6_rt_cache_alloc() on how
> +	 * rt->rt6i_src is updated.)
>  	 */
>  	if (res->f6i->fib6_src.plen)
>  		src_key = saddr;
> +find_ex:
>  #endif
> +	bucket = rcu_dereference(res->f6i->rt6i_exception_bucket);
>  	rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key);
>  
>  	if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i))
>  		ret = rt6_ex->rt6i;
>  
> +#ifdef CONFIG_IPV6_SUBTREES
> +	/* Use fib6_src as src_key and redo lookup */
> +	if (!ret && src_key == saddr) {
> +		src_key = &res->f6i->fib6_src.addr;
> +		goto find_ex;
> +	}
> +#endif
> +
>  	return ret;
>  }
>  
> @@ -2683,12 +2696,22 @@ u32 ip6_mtu_from_fib6(const struct fib6_result *res,
>  #ifdef CONFIG_IPV6_SUBTREES
>  	if (f6i->fib6_src.plen)
>  		src_key = saddr;
> +find_ex:
>  #endif
> -
>  	bucket = rcu_dereference(f6i->rt6i_exception_bucket);
>  	rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key);
>  	if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i))
>  		mtu = dst_metric_raw(&rt6_ex->rt6i->dst, RTAX_MTU);
> +#ifdef CONFIG_IPV6_SUBTREES
> +	/* Similar logic as in rt6_find_cached_rt().
> +	 * We need to use f6i->fib6_src to redo lookup in exception
> +	 * table if saddr did not yield any result.
> +	 */
> +	else if (src_key == saddr) {
> +		src_key = &f6i->fib6_src.addr;
> +		goto find_ex;
> +	}
> +#endif
Nit.
Instead of repeating this retry logic,
can it be consolidated into __rt6_find_exception_xxx()
by passing fib6_src.addr as a secondary matching
saddr?

>  
>  	if (likely(!mtu)) {
>  		struct net_device *dev = nh->fib_nh_dev;
> -- 
> 2.21.0.1020.gf2820cf01a-goog
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] ipv6: fix src addr routing with the exception table
  2019-05-15 21:50 ` Martin Lau
@ 2019-05-16  0:03   ` Wei Wang
  2019-05-16  0:06     ` David Ahern
  0 siblings, 1 reply; 13+ messages in thread
From: Wei Wang @ 2019-05-16  0:03 UTC (permalink / raw)
  To: Martin Lau
  Cc: Wei Wang, David Miller, netdev, Mikael Magnusson, David Ahern,
	Eric Dumazet

On Wed, May 15, 2019 at 2:51 PM Martin Lau <kafai@fb.com> wrote:
>
> On Tue, May 14, 2019 at 05:46:10PM -0700, Wei Wang wrote:
> > From: Wei Wang <weiwan@google.com>
> >
> > When inserting route cache into the exception table, the key is
> > generated with both src_addr and dest_addr with src addr routing.
> > However, current logic always assumes the src_addr used to generate the
> > key is a /128 host address. This is not true in the following scenarios:
> > 1. When the route is a gateway route or does not have next hop.
> >    (rt6_is_gw_or_nonexthop() == false)
> > 2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL.
> > This means, when looking for a route cache in the exception table, we
> > have to do the lookup twice: first time with the passed in /128 host
> > address, second time with the src_addr stored in fib6_info.
> >
> > This solves the pmtu discovery issue reported by Mikael Magnusson where
> > a route cache with a lower mtu info is created for a gateway route with
> > src addr. However, the lookup code is not able to find this route cache.
> >
> > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> > Reported-by: Mikael Magnusson <mikael.kernel@lists.m7n.se>
> > Bisected-by: David Ahern <dsahern@gmail.com>
> > Signed-off-by: Wei Wang <weiwan@google.com>
> > Acked-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  net/ipv6/route.c | 33 ++++++++++++++++++++++++++++-----
> >  1 file changed, 28 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 23a20d62daac..c36900a07a78 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -1574,23 +1574,36 @@ static struct rt6_info *rt6_find_cached_rt(const struct fib6_result *res,
> >       struct rt6_exception *rt6_ex;
> >       struct rt6_info *ret = NULL;
> >
> > -     bucket = rcu_dereference(res->f6i->rt6i_exception_bucket);
> > -
> >  #ifdef CONFIG_IPV6_SUBTREES
> >       /* fib6i_src.plen != 0 indicates f6i is in subtree
> >        * and exception table is indexed by a hash of
> >        * both fib6_dst and fib6_src.
> > -      * Otherwise, the exception table is indexed by
> > -      * a hash of only fib6_dst.
> > +      * However, the src addr used to create the hash
> > +      * might not be exactly the passed in saddr which
> > +      * is a /128 addr from the flow.
> > +      * So we need to use f6i->fib6_src to redo lookup
> > +      * if the passed in saddr does not find anything.
> > +      * (See the logic in ip6_rt_cache_alloc() on how
> > +      * rt->rt6i_src is updated.)
> >        */
> >       if (res->f6i->fib6_src.plen)
> >               src_key = saddr;
> > +find_ex:
> >  #endif
> > +     bucket = rcu_dereference(res->f6i->rt6i_exception_bucket);
> >       rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key);
> >
> >       if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i))
> >               ret = rt6_ex->rt6i;
> >
> > +#ifdef CONFIG_IPV6_SUBTREES
> > +     /* Use fib6_src as src_key and redo lookup */
> > +     if (!ret && src_key == saddr) {
> > +             src_key = &res->f6i->fib6_src.addr;
> > +             goto find_ex;
> > +     }
> > +#endif
> > +
> >       return ret;
> >  }
> >
> > @@ -2683,12 +2696,22 @@ u32 ip6_mtu_from_fib6(const struct fib6_result *res,
> >  #ifdef CONFIG_IPV6_SUBTREES
> >       if (f6i->fib6_src.plen)
> >               src_key = saddr;
> > +find_ex:
> >  #endif
> > -
> >       bucket = rcu_dereference(f6i->rt6i_exception_bucket);
> >       rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key);
> >       if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i))
> >               mtu = dst_metric_raw(&rt6_ex->rt6i->dst, RTAX_MTU);
> > +#ifdef CONFIG_IPV6_SUBTREES
> > +     /* Similar logic as in rt6_find_cached_rt().
> > +      * We need to use f6i->fib6_src to redo lookup in exception
> > +      * table if saddr did not yield any result.
> > +      */
> > +     else if (src_key == saddr) {
> > +             src_key = &f6i->fib6_src.addr;
> > +             goto find_ex;
> > +     }
> > +#endif
> Nit.
> Instead of repeating this retry logic,
> can it be consolidated into __rt6_find_exception_xxx()
> by passing fib6_src.addr as a secondary matching
> saddr?
>
Thanks Martin.
Changing __rt6_find_exception_xxx() might not be easy cause other
callers of this function does not really need to back off and use
another saddr.
And the validation of the result is a bit different for different callers.
What about add a new helper for the above 2 cases and just call that
from both places?

> >
> >       if (likely(!mtu)) {
> >               struct net_device *dev = nh->fib_nh_dev;
> > --
> > 2.21.0.1020.gf2820cf01a-goog
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] ipv6: fix src addr routing with the exception table
  2019-05-16  0:03   ` Wei Wang
@ 2019-05-16  0:06     ` David Ahern
  2019-05-16  4:18       ` Wei Wang
  2019-05-16  4:37       ` Martin Lau
  0 siblings, 2 replies; 13+ messages in thread
From: David Ahern @ 2019-05-16  0:06 UTC (permalink / raw)
  To: Wei Wang, Martin Lau
  Cc: Wei Wang, David Miller, netdev, Mikael Magnusson, Eric Dumazet

On 5/15/19 6:03 PM, Wei Wang wrote:
> Thanks Martin.
> Changing __rt6_find_exception_xxx() might not be easy cause other
> callers of this function does not really need to back off and use
> another saddr.
> And the validation of the result is a bit different for different callers.
> What about add a new helper for the above 2 cases and just call that
> from both places?

Since this needs to be backported to stable releases, I would say
simplest patch for that is best.

I have changes queued for this area once net-next opens; I can look at
consolidating as part of that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] ipv6: fix src addr routing with the exception table
  2019-05-16  0:06     ` David Ahern
@ 2019-05-16  4:18       ` Wei Wang
  2019-05-16  4:37       ` Martin Lau
  1 sibling, 0 replies; 13+ messages in thread
From: Wei Wang @ 2019-05-16  4:18 UTC (permalink / raw)
  To: David Ahern
  Cc: Martin Lau, Wei Wang, David Miller, netdev, Mikael Magnusson,
	Eric Dumazet

On Wed, May 15, 2019 at 5:07 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 5/15/19 6:03 PM, Wei Wang wrote:
> > Thanks Martin.
> > Changing __rt6_find_exception_xxx() might not be easy cause other
> > callers of this function does not really need to back off and use
> > another saddr.
> > And the validation of the result is a bit different for different callers.
> > What about add a new helper for the above 2 cases and just call that
> > from both places?
>
> Since this needs to be backported to stable releases, I would say
> simplest patch for that is best.
>
> I have changes queued for this area once net-next opens; I can look at
> consolidating as part of that.

Thanks David... In that case, I would prefer to stick with the current version.
Martin, what do you think?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] ipv6: fix src addr routing with the exception table
  2019-05-16  0:06     ` David Ahern
  2019-05-16  4:18       ` Wei Wang
@ 2019-05-16  4:37       ` Martin Lau
  1 sibling, 0 replies; 13+ messages in thread
From: Martin Lau @ 2019-05-16  4:37 UTC (permalink / raw)
  To: David Ahern, Wei Wang, Wei Wang
  Cc: David Miller, netdev, Mikael Magnusson, Eric Dumazet

On Wed, May 15, 2019 at 06:06:58PM -0600, David Ahern wrote:
> On 5/15/19 6:03 PM, Wei Wang wrote:
> > Thanks Martin.
> > Changing __rt6_find_exception_xxx() might not be easy cause other
> > callers of this function does not really need to back off and use
> > another saddr.
> > And the validation of the result is a bit different for different callers.
I was thinking other callers can pass NULL for the new arg "saddr2".

> > What about add a new helper for the above 2 cases and just call that
> > from both places?
That will also do.  I think it may even have less code churn
on the existing functions.  I guess the new helper may just call
__rt6_find_exception_rcu() inside with "saddr" first and
then again with "saddr2" (if needed)?

> 
> Since this needs to be backported to stable releases, I would say
> simplest patch for that is best.
> 
> I have changes queued for this area once net-next opens; I can look at
> consolidating as part of that.
Some of the functions have mutliple changes since then, I suspect
less code churn on these existing functions will make the backport
to stable easier also.
This bug has been there since 4.15.  I think it can take a mement
to do it now rather than later.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-05-16  4:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-15  0:46 [PATCH net] ipv6: fix src addr routing with the exception table Wei Wang
2019-05-15 15:56 ` David Ahern
2019-05-15 16:38   ` David Ahern
2019-05-15 17:25     ` Wei Wang
2019-05-15 17:28       ` Wei Wang
2019-05-15 17:32         ` David Ahern
2019-05-15 17:45           ` Wei Wang
2019-05-15 17:49             ` David Ahern
2019-05-15 21:50 ` Martin Lau
2019-05-16  0:03   ` Wei Wang
2019-05-16  0:06     ` David Ahern
2019-05-16  4:18       ` Wei Wang
2019-05-16  4:37       ` Martin Lau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.