[net-next] ipv6: fix routing cache overflow for raw sockets

* [net-next] ipv6: fix routing cache overflow for raw sockets
@ 2022-12-18 23:48 Jon Maxwell
  2022-12-20 12:35 ` Paolo Abeni
  0 siblings, 1 reply; 18+ messages in thread
From: Jon Maxwell @ 2022-12-18 23:48 UTC (permalink / raw)
  To: davem
  Cc: edumazet, kuba, pabeni, yoshfuji, dsahern, netdev, linux-kernel,
	Jon Maxwell

Sending Ipv6 packets in a loop via a raw socket triggers an issue where a 
route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly 
consumes the Ipv6 max_size threshold which defaults to 4096 resulting in 
these warnings:

[1]   99.187805] dst_alloc: 7728 callbacks suppressed
[2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.
.
.
[300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.

When this happens the packet is dropped and sendto() gets a network is 
unreachable error:

# ./a.out -s 

remaining pkt 200557 errno 101
remaining pkt 196462 errno 101
.
.
remaining pkt 126821 errno 101

Fix this by adding a flag to prevent the cloning of routes for raw sockets. 
Which makes the Ipv6 routing code use per-cpu routes instead which prevents 
packet drop due to max_size overflow. 

Ipv4 is not affected because it has a very large default max_size.

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
---
 include/net/flow.h | 1 +
 net/ipv6/raw.c     | 2 +-
 net/ipv6/route.c   | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index 2f0da4f0318b..30b8973ffb4b 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -37,6 +37,7 @@ struct flowi_common {
 	__u8	flowic_flags;
 #define FLOWI_FLAG_ANYSRC		0x01
 #define FLOWI_FLAG_KNOWN_NH		0x02
+#define FLOWI_FLAG_SKIP_RAW		0x04
 	__u32	flowic_secid;
 	kuid_t  flowic_uid;
 	struct flowi_tunnel flowic_tun_key;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index a06a9f847db5..0b89a7e66d09 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -884,7 +884,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	security_sk_classify_flow(sk, flowi6_to_flowi_common(&fl6));
 
 	if (hdrincl)
-		fl6.flowi6_flags |= FLOWI_FLAG_KNOWN_NH;
+		fl6.flowi6_flags |= FLOWI_FLAG_KNOWN_NH | FLOWI_FLAG_SKIP_RAW;
 
 	if (ipc6.tclass < 0)
 		ipc6.tclass = np->tclass;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index e74e0361fd92..beae0bd61738 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2226,6 +2226,7 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 	if (rt) {
 		goto out;
 	} else if (unlikely((fl6->flowi6_flags & FLOWI_FLAG_KNOWN_NH) &&
+			    !(fl6->flowi6_flags & FLOWI_FLAG_SKIP_RAW) &&
 			    !res.nh->fib_nh_gw_family)) {
 		/* Create a RTF_CACHE clone which will not be
 		 * owned by the fib6 tree.  It is for the special case where
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread