netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] ipv4: fix race condition between route lookup and invalidation
@ 2019-10-16 19:03 Wei Wang
  2019-10-17  6:24 ` Ido Schimmel
  2019-10-18  0:01 ` David Miller
  0 siblings, 2 replies; 3+ messages in thread
From: Wei Wang @ 2019-10-16 19:03 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Wei Wang, Ido Schimmel, Jesse Hathaway, Martin KaFai Lau, David Ahern

Jesse and Ido reported the following race condition:
<CPU A, t0> - Received packet A is forwarded and cached dst entry is
taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set()

<t1> - Given Jesse has busy routers ("ingesting full BGP routing tables
from multiple ISPs"), route is added / deleted and rt_cache_flush() is
called

<CPU B, t2> - Received packet B tries to use the same cached dst entry
from t0, but rt_cache_valid() is no longer true and it is replaced in
rt_cache_route() by the newer one. This calls dst_dev_put() on the
original dst entry which assigns the blackhole netdev to 'dst->dev'

<CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due
to 'dst->dev' being the blackhole netdev

There are 2 issues in the v4 routing code:
1. A per-netns counter is used to do the validation of the route. That
means whenever a route is changed in the netns, users of all routes in
the netns needs to redo lookup. v6 has an implementation of only
updating fn_sernum for routes that are affected.
2. When rt_cache_valid() returns false, rt_cache_route() is called to
throw away the current cache, and create a new one. This seems
unnecessary because as long as this route does not change, the route
cache does not need to be recreated.

To fully solve the above 2 issues, it probably needs quite some code
changes and requires careful testing, and does not suite for net branch.

So this patch only tries to add the deleted cached rt into the uncached
list, so user could still be able to use it to receive packets until
it's done.

Fixes: 95c47f9cf5e0 ("ipv4: call dst_dev_put() properly")
Signed-off-by: Wei Wang <weiwan@google.com>
Reported-by: Ido Schimmel <idosch@idosch.org>
Reported-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
Tested-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@gmail.com>

---
 net/ipv4/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 14654876127e..9e0c8dff2cd6 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1482,7 +1482,7 @@ static bool rt_cache_route(struct fib_nh_common *nhc, struct rtable *rt)
 	prev = cmpxchg(p, orig, rt);
 	if (prev == orig) {
 		if (orig) {
-			dst_dev_put(&orig->dst);
+			rt_add_uncached_list(orig);
 			dst_release(&orig->dst);
 		}
 	} else {
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] ipv4: fix race condition between route lookup and invalidation
  2019-10-16 19:03 [PATCH net] ipv4: fix race condition between route lookup and invalidation Wei Wang
@ 2019-10-17  6:24 ` Ido Schimmel
  2019-10-18  0:01 ` David Miller
  1 sibling, 0 replies; 3+ messages in thread
From: Ido Schimmel @ 2019-10-17  6:24 UTC (permalink / raw)
  To: Wei Wang
  Cc: David Miller, netdev, Jesse Hathaway, Martin KaFai Lau, David Ahern

On Wed, Oct 16, 2019 at 12:03:15PM -0700, Wei Wang wrote:
> Jesse and Ido reported the following race condition:
> <CPU A, t0> - Received packet A is forwarded and cached dst entry is
> taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set()
> 
> <t1> - Given Jesse has busy routers ("ingesting full BGP routing tables
> from multiple ISPs"), route is added / deleted and rt_cache_flush() is
> called
> 
> <CPU B, t2> - Received packet B tries to use the same cached dst entry
> from t0, but rt_cache_valid() is no longer true and it is replaced in
> rt_cache_route() by the newer one. This calls dst_dev_put() on the
> original dst entry which assigns the blackhole netdev to 'dst->dev'
> 
> <CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due
> to 'dst->dev' being the blackhole netdev
> 
> There are 2 issues in the v4 routing code:
> 1. A per-netns counter is used to do the validation of the route. That
> means whenever a route is changed in the netns, users of all routes in
> the netns needs to redo lookup. v6 has an implementation of only
> updating fn_sernum for routes that are affected.
> 2. When rt_cache_valid() returns false, rt_cache_route() is called to
> throw away the current cache, and create a new one. This seems
> unnecessary because as long as this route does not change, the route
> cache does not need to be recreated.
> 
> To fully solve the above 2 issues, it probably needs quite some code
> changes and requires careful testing, and does not suite for net branch.
> 
> So this patch only tries to add the deleted cached rt into the uncached
> list, so user could still be able to use it to receive packets until
> it's done.
> 
> Fixes: 95c47f9cf5e0 ("ipv4: call dst_dev_put() properly")
> Signed-off-by: Wei Wang <weiwan@google.com>
> Reported-by: Ido Schimmel <idosch@idosch.org>
> Reported-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
> Tested-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> Cc: David Ahern <dsahern@gmail.com>

Reviewed-by: Ido Schimmel <idosch@mellanox.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] ipv4: fix race condition between route lookup and invalidation
  2019-10-16 19:03 [PATCH net] ipv4: fix race condition between route lookup and invalidation Wei Wang
  2019-10-17  6:24 ` Ido Schimmel
@ 2019-10-18  0:01 ` David Miller
  1 sibling, 0 replies; 3+ messages in thread
From: David Miller @ 2019-10-18  0:01 UTC (permalink / raw)
  To: weiwan; +Cc: netdev, idosch, jesse, kafai, dsahern

From: Wei Wang <weiwan@google.com>
Date: Wed, 16 Oct 2019 12:03:15 -0700

> Jesse and Ido reported the following race condition:
> <CPU A, t0> - Received packet A is forwarded and cached dst entry is
> taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set()
> 
> <t1> - Given Jesse has busy routers ("ingesting full BGP routing tables
> from multiple ISPs"), route is added / deleted and rt_cache_flush() is
> called
> 
> <CPU B, t2> - Received packet B tries to use the same cached dst entry
> from t0, but rt_cache_valid() is no longer true and it is replaced in
> rt_cache_route() by the newer one. This calls dst_dev_put() on the
> original dst entry which assigns the blackhole netdev to 'dst->dev'
> 
> <CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due
> to 'dst->dev' being the blackhole netdev
> 
> There are 2 issues in the v4 routing code:
> 1. A per-netns counter is used to do the validation of the route. That
> means whenever a route is changed in the netns, users of all routes in
> the netns needs to redo lookup. v6 has an implementation of only
> updating fn_sernum for routes that are affected.
> 2. When rt_cache_valid() returns false, rt_cache_route() is called to
> throw away the current cache, and create a new one. This seems
> unnecessary because as long as this route does not change, the route
> cache does not need to be recreated.
> 
> To fully solve the above 2 issues, it probably needs quite some code
> changes and requires careful testing, and does not suite for net branch.
> 
> So this patch only tries to add the deleted cached rt into the uncached
> list, so user could still be able to use it to receive packets until
> it's done.
> 
> Fixes: 95c47f9cf5e0 ("ipv4: call dst_dev_put() properly")
> Signed-off-by: Wei Wang <weiwan@google.com>
> Reported-by: Ido Schimmel <idosch@idosch.org>
> Reported-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
> Tested-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
> Acked-by: Martin KaFai Lau <kafai@fb.com>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-10-18  0:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-16 19:03 [PATCH net] ipv4: fix race condition between route lookup and invalidation Wei Wang
2019-10-17  6:24 ` Ido Schimmel
2019-10-18  0:01 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).