All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/2] net: introduce and use route hint
@ 2019-11-18 11:01 Paolo Abeni
  2019-11-18 11:01 ` [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input Paolo Abeni
  2019-11-18 11:01 ` [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive Paolo Abeni
  0 siblings, 2 replies; 11+ messages in thread
From: Paolo Abeni @ 2019-11-18 11:01 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Willem de Bruijn, Edward Cree

This series leverages the listification infrastructure to avoid
unnecessary route lookup on ingress packets. In absence of policy routing,
packets with equal daddr will usually land on the same dst.

When processing packet bursts (lists) we can easily reference the previous
dst entry. When we hit the 'same destination' condition we can avoid the
route lookup, coping the already available dst.

Detailed performance numbers are available in the individual commit
messages.

v1 -> v2
 - fix build issue with !CONFIG_IP*_MULTIPLE_TABLES
 - fix potential race in ip6_list_rcv_finish()

Paolo Abeni (2):
  ipv6: introduce and uses route look hints for list input
  ipv4: use dst hint for ipv4 list receive

 include/net/route.h  | 11 +++++++++++
 net/ipv4/ip_input.c  | 38 +++++++++++++++++++++++++++++++++-----
 net/ipv4/route.c     | 38 ++++++++++++++++++++++++++++++++++++++
 net/ipv6/ip6_input.c | 40 ++++++++++++++++++++++++++++++++++++----
 4 files changed, 118 insertions(+), 9 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input
  2019-11-18 11:01 [PATCH net-next v2 0/2] net: introduce and use route hint Paolo Abeni
@ 2019-11-18 11:01 ` Paolo Abeni
  2019-11-18 20:29   ` Willem de Bruijn
  2019-11-18 11:01 ` [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive Paolo Abeni
  1 sibling, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2019-11-18 11:01 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Willem de Bruijn, Edward Cree

When doing RX batch packet processing, we currently always repeat
the route lookup for each ingress packet. If policy routing is
configured, and IPV6_SUBTREES is disabled at build time, we
know that packets with the same destination address will use
the same dst.

This change tries to avoid per packet route lookup caching
the destination address of the latest successful lookup, and
reusing it for the next packet when the above conditions are
in place. Ingress traffic for most servers should fit.

The measured performance delta under UDP flood vs a recvmmsg
receiver is as follow:

vanilla		patched		delta
Kpps		Kpps		%
1431		1664		+14

In the worst-case scenario - each packet has a different
destination address - the performance delta is within noise
range.

v1 -> v2:
 - fix build issue with !CONFIG_IPV6_MULTIPLE_TABLES
 - fix potential race when fib6_has_custom_rules is set
   while processing a packet batch

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv6/ip6_input.c | 40 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index ef7f707d9ae3..f559ad6b09ef 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -44,10 +44,16 @@
 #include <net/inet_ecn.h>
 #include <net/dst_metadata.h>
 
+struct ip6_route_input_hint {
+	unsigned long	refdst;
+	struct in6_addr daddr;
+};
+
 INDIRECT_CALLABLE_DECLARE(void udp_v6_early_demux(struct sk_buff *));
 INDIRECT_CALLABLE_DECLARE(void tcp_v6_early_demux(struct sk_buff *));
 static void ip6_rcv_finish_core(struct net *net, struct sock *sk,
-				struct sk_buff *skb)
+				struct sk_buff *skb,
+				struct ip6_route_input_hint *hint)
 {
 	void (*edemux)(struct sk_buff *skb);
 
@@ -59,7 +65,13 @@ static void ip6_rcv_finish_core(struct net *net, struct sock *sk,
 			INDIRECT_CALL_2(edemux, tcp_v6_early_demux,
 					udp_v6_early_demux, skb);
 	}
-	if (!skb_valid_dst(skb))
+
+	if (skb_valid_dst(skb))
+		return;
+
+	if (hint && ipv6_addr_equal(&hint->daddr, &ipv6_hdr(skb)->daddr))
+		__skb_dst_copy(skb, hint->refdst);
+	else
 		ip6_route_input(skb);
 }
 
@@ -71,7 +83,7 @@ int ip6_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 	skb = l3mdev_ip6_rcv(skb);
 	if (!skb)
 		return NET_RX_SUCCESS;
-	ip6_rcv_finish_core(net, sk, skb);
+	ip6_rcv_finish_core(net, sk, skb, NULL);
 
 	return dst_input(skb);
 }
@@ -86,9 +98,20 @@ static void ip6_sublist_rcv_finish(struct list_head *head)
 	}
 }
 
+static bool ip6_can_cache_route_hint(struct net *net)
+{
+	return !IS_ENABLED(IPV6_SUBTREES) &&
+#ifdef CONFIG_IPV6_MULTIPLE_TABLES
+	       !net->ipv6.fib6_has_custom_rules;
+#else
+	       1;
+#endif
+}
+
 static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
 				struct list_head *head)
 {
+	struct ip6_route_input_hint _hint, *hint = NULL;
 	struct dst_entry *curr_dst = NULL;
 	struct sk_buff *skb, *next;
 	struct list_head sublist;
@@ -104,9 +127,18 @@ static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
 		skb = l3mdev_ip6_rcv(skb);
 		if (!skb)
 			continue;
-		ip6_rcv_finish_core(net, sk, skb);
+		ip6_rcv_finish_core(net, sk, skb, hint);
 		dst = skb_dst(skb);
 		if (curr_dst != dst) {
+			if (ip6_can_cache_route_hint(net)) {
+				_hint.refdst = skb->_skb_refdst;
+				memcpy(&_hint.daddr, &ipv6_hdr(skb)->daddr,
+				       sizeof(_hint.daddr));
+				hint = &_hint;
+			} else {
+				hint = NULL;
+			}
+
 			/* dispatch old sublist */
 			if (!list_empty(&sublist))
 				ip6_sublist_rcv_finish(&sublist);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive
  2019-11-18 11:01 [PATCH net-next v2 0/2] net: introduce and use route hint Paolo Abeni
  2019-11-18 11:01 ` [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input Paolo Abeni
@ 2019-11-18 11:01 ` Paolo Abeni
  2019-11-18 14:11     ` kbuild test robot
  2019-11-18 16:07   ` David Ahern
  1 sibling, 2 replies; 11+ messages in thread
From: Paolo Abeni @ 2019-11-18 11:01 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Willem de Bruijn, Edward Cree

This is alike the previous change, with some additional ipv4 specific
quirk. Even when using the route hint we still have to do perform
additional per packet checks about source address validity: a new
helper is added to wrap them.

Moreover, the ipv4 route lookup, even in the absence of policy routing,
may depend on pkts ToS, so we cache that values, too.

Explicitly avoid hints for local broadcast: this simplify the code
and broadcasts are slower path anyway.

UDP flood performances vs recvmmsg() receiver:

vanilla		patched		delta
Kpps		Kpps		%
1683		1833		+8

In the worst case scenario - each packet has a different
destination address - the performance delta is within noise
range.

v1 -> v2:
 - fix build issue with !CONFIG_IP_MULTIPLE_TABLES

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/route.h | 11 +++++++++++
 net/ipv4/ip_input.c | 38 +++++++++++++++++++++++++++++++++-----
 net/ipv4/route.c    | 38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 6c516840380d..f7a8a52318cd 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -185,6 +185,17 @@ int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src,
 		       u8 tos, struct net_device *devin,
 		       struct fib_result *res);
 
+struct ip_route_input_hint {
+	unsigned long	refdst;
+	__be32		daddr;
+	char		tos;
+	bool		local;
+};
+
+int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src,
+		      u8 tos, struct net_device *devin,
+		      struct ip_route_input_hint *hint);
+
 static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
 				 u8 tos, struct net_device *devin)
 {
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 24a95126e698..25f6fcc65380 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -305,7 +305,8 @@ static inline bool ip_rcv_options(struct sk_buff *skb, struct net_device *dev)
 INDIRECT_CALLABLE_DECLARE(int udp_v4_early_demux(struct sk_buff *));
 INDIRECT_CALLABLE_DECLARE(int tcp_v4_early_demux(struct sk_buff *));
 static int ip_rcv_finish_core(struct net *net, struct sock *sk,
-			      struct sk_buff *skb, struct net_device *dev)
+			      struct sk_buff *skb, struct net_device *dev,
+			      struct ip_route_input_hint *hint)
 {
 	const struct iphdr *iph = ip_hdr(skb);
 	int (*edemux)(struct sk_buff *skb);
@@ -335,8 +336,12 @@ static int ip_rcv_finish_core(struct net *net, struct sock *sk,
 	 *	how the packet travels inside Linux networking.
 	 */
 	if (!skb_valid_dst(skb)) {
-		err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
-					   iph->tos, dev);
+		if (hint && hint->daddr == iph->daddr && hint->tos == iph->tos)
+			err = ip_route_use_hint(skb, iph->daddr, iph->saddr,
+						iph->tos, dev, hint);
+		else
+			err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
+						   iph->tos, dev);
 		if (unlikely(err))
 			goto drop_error;
 	}
@@ -408,7 +413,7 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 	if (!skb)
 		return NET_RX_SUCCESS;
 
-	ret = ip_rcv_finish_core(net, sk, skb, dev);
+	ret = ip_rcv_finish_core(net, sk, skb, dev, NULL);
 	if (ret != NET_RX_DROP)
 		ret = dst_input(skb);
 	return ret;
@@ -535,9 +540,20 @@ static void ip_sublist_rcv_finish(struct list_head *head)
 	}
 }
 
+static bool ip_can_cache_route_hint(struct net *net, struct rtable *rt)
+{
+	return rt->rt_type != RTN_BROADCAST &&
+#ifdef CONFIG_IP_MULTIPLE_TABLES
+	       !net->ipv6.fib6_has_custom_rules;
+#else
+	       1;
+#endif
+}
+
 static void ip_list_rcv_finish(struct net *net, struct sock *sk,
 			       struct list_head *head)
 {
+	struct ip_route_input_hint _hint, *hint = NULL;
 	struct dst_entry *curr_dst = NULL;
 	struct sk_buff *skb, *next;
 	struct list_head sublist;
@@ -554,11 +570,23 @@ static void ip_list_rcv_finish(struct net *net, struct sock *sk,
 		skb = l3mdev_ip_rcv(skb);
 		if (!skb)
 			continue;
-		if (ip_rcv_finish_core(net, sk, skb, dev) == NET_RX_DROP)
+		if (ip_rcv_finish_core(net, sk, skb, dev, hint) == NET_RX_DROP)
 			continue;
 
 		dst = skb_dst(skb);
 		if (curr_dst != dst) {
+			struct rtable *rt = (struct rtable *)dst;
+
+			if (ip_can_cache_route_hint(net, rt)) {
+				_hint.refdst = skb->_skb_refdst;
+				_hint.daddr = ip_hdr(skb)->daddr;
+				_hint.tos = ip_hdr(skb)->tos;
+				_hint.local = rt->rt_type == RTN_LOCAL;
+				hint = &_hint;
+			} else {
+				hint = NULL;
+			}
+
 			/* dispatch old sublist */
 			if (!list_empty(&sublist))
 				ip_sublist_rcv_finish(&sublist);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index dcc4fa10138d..b0ddff17db80 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2019,6 +2019,44 @@ static int ip_mkroute_input(struct sk_buff *skb,
 	return __mkroute_input(skb, res, in_dev, daddr, saddr, tos);
 }
 
+/* Implements all the saddr-related checks as ip_route_input_slow(),
+ * assuming daddr is valid and this is not a local broadcast.
+ * Uses the provided hint instead of performing a route lookup.
+ */
+int ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+		      u8 tos, struct net_device *dev,
+		      struct ip_route_input_hint *hint)
+{
+	struct in_device *in_dev = __in_dev_get_rcu(dev);
+	struct net *net = dev_net(dev);
+	int err = -EINVAL;
+	u32 itag = 0;
+
+	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
+		goto martian_source;
+
+	if (ipv4_is_zeronet(saddr))
+		goto martian_source;
+
+	if (ipv4_is_loopback(saddr) && !IN_DEV_NET_ROUTE_LOCALNET(in_dev, net))
+		goto martian_source;
+
+	if (hint->local) {
+		err = fib_validate_source(skb, saddr, daddr, tos, 0, dev,
+					  in_dev, &itag);
+		if (err < 0)
+			goto martian_source;
+	}
+
+	err = 0;
+	__skb_dst_copy(skb, hint->refdst);
+	return err;
+
+martian_source:
+	ip_handle_martian_source(dev, in_dev, skb, daddr, saddr);
+	return err;
+}
+
 /*
  *	NOTE. We drop all the packets that has local source
  *	addresses, because every properly looped back packet
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive
  2019-11-18 11:01 ` [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive Paolo Abeni
@ 2019-11-18 14:11     ` kbuild test robot
  2019-11-18 16:07   ` David Ahern
  1 sibling, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2019-11-18 14:11 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: kbuild-all, netdev, David S. Miller, Willem de Bruijn, Edward Cree

[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]

Hi Paolo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]
[also build test ERROR on v5.4-rc8 next-20191115]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Paolo-Abeni/net-introduce-and-use-route-hint/20191118-195936
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 19b7e21c55c81713c4011278143006af9f232504
config: mips-malta_kvm_defconfig (attached as .config)
compiler: mipsel-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net//ipv4/ip_input.c: In function 'ip_can_cache_route_hint':
>> net//ipv4/ip_input.c:547:19: error: 'struct netns_ipv6' has no member named 'fib6_has_custom_rules'
            !net->ipv6.fib6_has_custom_rules;
                      ^
   net//ipv4/ip_input.c:551:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^

vim +547 net//ipv4/ip_input.c

   542	
   543	static bool ip_can_cache_route_hint(struct net *net, struct rtable *rt)
   544	{
   545		return rt->rt_type != RTN_BROADCAST &&
   546	#ifdef CONFIG_IP_MULTIPLE_TABLES
 > 547		       !net->ipv6.fib6_has_custom_rules;
   548	#else
   549		       1;
   550	#endif
   551	}
   552	

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 20609 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive
@ 2019-11-18 14:11     ` kbuild test robot
  0 siblings, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2019-11-18 14:11 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1992 bytes --]

Hi Paolo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]
[also build test ERROR on v5.4-rc8 next-20191115]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Paolo-Abeni/net-introduce-and-use-route-hint/20191118-195936
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 19b7e21c55c81713c4011278143006af9f232504
config: mips-malta_kvm_defconfig (attached as .config)
compiler: mipsel-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net//ipv4/ip_input.c: In function 'ip_can_cache_route_hint':
>> net//ipv4/ip_input.c:547:19: error: 'struct netns_ipv6' has no member named 'fib6_has_custom_rules'
            !net->ipv6.fib6_has_custom_rules;
                      ^
   net//ipv4/ip_input.c:551:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^

vim +547 net//ipv4/ip_input.c

   542	
   543	static bool ip_can_cache_route_hint(struct net *net, struct rtable *rt)
   544	{
   545		return rt->rt_type != RTN_BROADCAST &&
   546	#ifdef CONFIG_IP_MULTIPLE_TABLES
 > 547		       !net->ipv6.fib6_has_custom_rules;
   548	#else
   549		       1;
   550	#endif
   551	}
   552	

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org Intel Corporation

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 20609 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive
  2019-11-18 11:01 ` [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive Paolo Abeni
  2019-11-18 14:11     ` kbuild test robot
@ 2019-11-18 16:07   ` David Ahern
  2019-11-18 16:31     ` Paolo Abeni
  1 sibling, 1 reply; 11+ messages in thread
From: David Ahern @ 2019-11-18 16:07 UTC (permalink / raw)
  To: Paolo Abeni, netdev; +Cc: David S. Miller, Willem de Bruijn, Edward Cree

On 11/18/19 4:01 AM, Paolo Abeni wrote:
> @@ -535,9 +540,20 @@ static void ip_sublist_rcv_finish(struct list_head *head)
>  	}
>  }
>  
> +static bool ip_can_cache_route_hint(struct net *net, struct rtable *rt)
> +{
> +	return rt->rt_type != RTN_BROADCAST &&
> +#ifdef CONFIG_IP_MULTIPLE_TABLES
> +	       !net->ipv6.fib6_has_custom_rules;

that should be ipv4, not ipv6, right?

Also, for readability it would be better to have 2 helpers in
include//net/fib_rules.h that return true false and manage the net
namespace issue.

> +#else
> +	       1;
> +#endif
> +}
> +



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive
  2019-11-18 16:07   ` David Ahern
@ 2019-11-18 16:31     ` Paolo Abeni
  2019-11-18 16:40       ` David Ahern
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2019-11-18 16:31 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: David S. Miller, Willem de Bruijn, Edward Cree

Hi,

Thank you for the feedback.

On Mon, 2019-11-18 at 09:07 -0700, David Ahern wrote:
> On 11/18/19 4:01 AM, Paolo Abeni wrote:
> > @@ -535,9 +540,20 @@ static void ip_sublist_rcv_finish(struct list_head *head)
> >  	}
> >  }
> >  
> > +static bool ip_can_cache_route_hint(struct net *net, struct rtable *rt)
> > +{
> > +	return rt->rt_type != RTN_BROADCAST &&
> > +#ifdef CONFIG_IP_MULTIPLE_TABLES
> > +	       !net->ipv6.fib6_has_custom_rules;
> 
> that should be ipv4, not ipv6, right?

Indeed. More coffee needed here, sorry.

> Also, for readability it would be better to have 2 helpers in
> include//net/fib_rules.h that return true false and manage the net
> namespace issue.

Double checking I parsed the above correctly. Do you mean something
like the following - I think net/ip_fib.h fits more, as it already
deals with CONFIG_IP_MULTIPLE_TABLES?

---
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 52b2406a5dfc..b6c5cd544402 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -272,6 +272,11 @@ void fib_free_table(struct fib_table *tb);
 #define TABLE_LOCAL_INDEX      (RT_TABLE_LOCAL & (FIB_TABLE_HASHSZ - 1))
 #define TABLE_MAIN_INDEX       (RT_TABLE_MAIN  & (FIB_TABLE_HASHSZ - 1))
 
+static bool fib4_has_custom_rules(struct net *net)
+{
+       return 0;
+}
+
 static inline struct fib_table *fib_get_table(struct net *net, u32 id)
 {
        struct hlist_node *tb_hlist;
@@ -341,6 +346,11 @@ void __net_exit fib4_rules_exit(struct net *net);
 struct fib_table *fib_new_table(struct net *net, u32 id);
 struct fib_table *fib_get_table(struct net *net, u32 id);
 
+static bool fib4_has_custom_rules(struct net *net)
+{
+       return net->ipv4.fib_has_custom_rules;
+}
+
 int __fib_lookup(struct net *net, struct flowi4 *flp,
                 struct fib_result *res, unsigned int flags);
---
plus something similar for the previous patch, in include/net/ip6_fib.h

Thank you,

Paolo


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive
  2019-11-18 16:31     ` Paolo Abeni
@ 2019-11-18 16:40       ` David Ahern
  0 siblings, 0 replies; 11+ messages in thread
From: David Ahern @ 2019-11-18 16:40 UTC (permalink / raw)
  To: Paolo Abeni, netdev; +Cc: David S. Miller, Willem de Bruijn, Edward Cree

On 11/18/19 9:31 AM, Paolo Abeni wrote:
>> Also, for readability it would be better to have 2 helpers in
>> include//net/fib_rules.h that return true false and manage the net
>> namespace issue.
> 
> Double checking I parsed the above correctly. Do you mean something
> like the following - I think net/ip_fib.h fits more, as it already
> deals with CONFIG_IP_MULTIPLE_TABLES?

sure.

And it looks like they already exist in net//ipv4/fib_frontend.c, so
those can be moved to ip_fib.h



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input
  2019-11-18 11:01 ` [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input Paolo Abeni
@ 2019-11-18 20:29   ` Willem de Bruijn
  2019-11-18 21:58     ` Paolo Abeni
  0 siblings, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2019-11-18 20:29 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Willem de Bruijn, Edward Cree

On Mon, Nov 18, 2019 at 6:03 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> When doing RX batch packet processing, we currently always repeat
> the route lookup for each ingress packet. If policy routing is
> configured, and IPV6_SUBTREES is disabled at build time, we
> know that packets with the same destination address will use
> the same dst.
>
> This change tries to avoid per packet route lookup caching
> the destination address of the latest successful lookup, and
> reusing it for the next packet when the above conditions are
> in place. Ingress traffic for most servers should fit.
>
> The measured performance delta under UDP flood vs a recvmmsg
> receiver is as follow:
>
> vanilla         patched         delta
> Kpps            Kpps            %
> 1431            1664            +14

Since IPv4 speed-up is almost half and code considerably more complex,
maybe only do IPv6?

>
> In the worst-case scenario - each packet has a different
> destination address - the performance delta is within noise
> range.
>
> v1 -> v2:
>  - fix build issue with !CONFIG_IPV6_MULTIPLE_TABLES
>  - fix potential race when fib6_has_custom_rules is set
>    while processing a packet batch
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/ipv6/ip6_input.c | 40 ++++++++++++++++++++++++++++++++++++----
>  1 file changed, 36 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index ef7f707d9ae3..f559ad6b09ef 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -44,10 +44,16 @@
>  #include <net/inet_ecn.h>
>  #include <net/dst_metadata.h>
>
> +struct ip6_route_input_hint {
> +       unsigned long   refdst;
> +       struct in6_addr daddr;
> +};
> +
>  INDIRECT_CALLABLE_DECLARE(void udp_v6_early_demux(struct sk_buff *));
>  INDIRECT_CALLABLE_DECLARE(void tcp_v6_early_demux(struct sk_buff *));
>  static void ip6_rcv_finish_core(struct net *net, struct sock *sk,
> -                               struct sk_buff *skb)
> +                               struct sk_buff *skb,
> +                               struct ip6_route_input_hint *hint)
>  {
>         void (*edemux)(struct sk_buff *skb);
>
> @@ -59,7 +65,13 @@ static void ip6_rcv_finish_core(struct net *net, struct sock *sk,
>                         INDIRECT_CALL_2(edemux, tcp_v6_early_demux,
>                                         udp_v6_early_demux, skb);
>         }
> -       if (!skb_valid_dst(skb))
> +
> +       if (skb_valid_dst(skb))
> +               return;
> +
> +       if (hint && ipv6_addr_equal(&hint->daddr, &ipv6_hdr(skb)->daddr))
> +               __skb_dst_copy(skb, hint->refdst);
> +       else
>                 ip6_route_input(skb);

Is it possible to do the address comparison in ip6_list_rcv_finish
itself and pass a pointer to refdst if safe? To avoid new struct
definition, memcpy and to have all logic in one place. Need to
keep a pointer to the prev skb, then, instead.

>  static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
>                                 struct list_head *head)
>  {
> +       struct ip6_route_input_hint _hint, *hint = NULL;
>         struct dst_entry *curr_dst = NULL;
>         struct sk_buff *skb, *next;
>         struct list_head sublist;
> @@ -104,9 +127,18 @@ static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
>                 skb = l3mdev_ip6_rcv(skb);
>                 if (!skb)
>                         continue;
> -               ip6_rcv_finish_core(net, sk, skb);
> +               ip6_rcv_finish_core(net, sk, skb, hint);
>                 dst = skb_dst(skb);
>                 if (curr_dst != dst) {
> +                       if (ip6_can_cache_route_hint(net)) {
> +                               _hint.refdst = skb->_skb_refdst;
> +                               memcpy(&_hint.daddr, &ipv6_hdr(skb)->daddr,
> +                                      sizeof(_hint.daddr));
> +                               hint = &_hint;
> +                       } else {
> +                               hint = NULL;
> +                       }

not needed. ip6_can_cache_route_hit is the same for all iterations of
the loop (indeed, compile time static), so if false, hint is never
set.




> +
>                         /* dispatch old sublist */
>                         if (!list_empty(&sublist))
>                                 ip6_sublist_rcv_finish(&sublist);
> --
> 2.21.0
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input
  2019-11-18 20:29   ` Willem de Bruijn
@ 2019-11-18 21:58     ` Paolo Abeni
  2019-11-19 14:10       ` Willem de Bruijn
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2019-11-18 21:58 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David S. Miller, Edward Cree

On Mon, 2019-11-18 at 15:29 -0500, Willem de Bruijn wrote:
> On Mon, Nov 18, 2019 at 6:03 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > When doing RX batch packet processing, we currently always repeat
> > the route lookup for each ingress packet. If policy routing is
> > configured, and IPV6_SUBTREES is disabled at build time, we
> > know that packets with the same destination address will use
> > the same dst.
> > 
> > This change tries to avoid per packet route lookup caching
> > the destination address of the latest successful lookup, and
> > reusing it for the next packet when the above conditions are
> > in place. Ingress traffic for most servers should fit.
> > 
> > The measured performance delta under UDP flood vs a recvmmsg
> > receiver is as follow:
> > 
> > vanilla         patched         delta
> > Kpps            Kpps            %
> > 1431            1664            +14
> 
> Since IPv4 speed-up is almost half and code considerably more complex,
> maybe only do IPv6?

uhmm... I would avoid that kind of assimmetry, and I would not look
down on a 8% speedup, if possible.

> > In the worst-case scenario - each packet has a different
> > destination address - the performance delta is within noise
> > range.
> > 
> > v1 -> v2:
> >  - fix build issue with !CONFIG_IPV6_MULTIPLE_TABLES
> >  - fix potential race when fib6_has_custom_rules is set
> >    while processing a packet batch
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  net/ipv6/ip6_input.c | 40 ++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 36 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> > index ef7f707d9ae3..f559ad6b09ef 100644
> > --- a/net/ipv6/ip6_input.c
> > +++ b/net/ipv6/ip6_input.c
> > @@ -44,10 +44,16 @@
> >  #include <net/inet_ecn.h>
> >  #include <net/dst_metadata.h>
> > 
> > +struct ip6_route_input_hint {
> > +       unsigned long   refdst;
> > +       struct in6_addr daddr;
> > +};
> > +
> >  INDIRECT_CALLABLE_DECLARE(void udp_v6_early_demux(struct sk_buff *));
> >  INDIRECT_CALLABLE_DECLARE(void tcp_v6_early_demux(struct sk_buff *));
> >  static void ip6_rcv_finish_core(struct net *net, struct sock *sk,
> > -                               struct sk_buff *skb)
> > +                               struct sk_buff *skb,
> > +                               struct ip6_route_input_hint *hint)
> >  {
> >         void (*edemux)(struct sk_buff *skb);
> > 
> > @@ -59,7 +65,13 @@ static void ip6_rcv_finish_core(struct net *net, struct sock *sk,
> >                         INDIRECT_CALL_2(edemux, tcp_v6_early_demux,
> >                                         udp_v6_early_demux, skb);
> >         }
> > -       if (!skb_valid_dst(skb))
> > +
> > +       if (skb_valid_dst(skb))
> > +               return;
> > +
> > +       if (hint && ipv6_addr_equal(&hint->daddr, &ipv6_hdr(skb)->daddr))
> > +               __skb_dst_copy(skb, hint->refdst);
> > +       else
> >                 ip6_route_input(skb);
> 
> Is it possible to do the address comparison in ip6_list_rcv_finish
> itself and pass a pointer to refdst if safe? To avoid new struct
> definition, memcpy and to have all logic in one place. Need to
> keep a pointer to the prev skb, then, instead.

I haven't tought about that. Sounds promising. I'll try, thanks.

> >  static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
> >                                 struct list_head *head)
> >  {
> > +       struct ip6_route_input_hint _hint, *hint = NULL;
> >         struct dst_entry *curr_dst = NULL;
> >         struct sk_buff *skb, *next;
> >         struct list_head sublist;
> > @@ -104,9 +127,18 @@ static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
> >                 skb = l3mdev_ip6_rcv(skb);
> >                 if (!skb)
> >                         continue;
> > -               ip6_rcv_finish_core(net, sk, skb);
> > +               ip6_rcv_finish_core(net, sk, skb, hint);
> >                 dst = skb_dst(skb);
> >                 if (curr_dst != dst) {
> > +                       if (ip6_can_cache_route_hint(net)) {
> > +                               _hint.refdst = skb->_skb_refdst;
> > +                               memcpy(&_hint.daddr, &ipv6_hdr(skb)->daddr,
> > +                                      sizeof(_hint.daddr));
> > +                               hint = &_hint;
> > +                       } else {
> > +                               hint = NULL;
> > +                       }
> 
> not needed. ip6_can_cache_route_hit is the same for all iterations of
> the loop (indeed, compile time static), so if false, hint is never
> set.

I think this is needed, instead: if CONFIG_MULTIPLE_TABLES=y,
fib6_has_custom_rules can change at runtime - from 'false' to 'true'.
If we don't reset 'hint', we could end-up with use-after-free.

Cheers,

Paolo




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input
  2019-11-18 21:58     ` Paolo Abeni
@ 2019-11-19 14:10       ` Willem de Bruijn
  0 siblings, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2019-11-19 14:10 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Willem de Bruijn, Network Development, David S. Miller, Edward Cree

On Mon, Nov 18, 2019 at 4:59 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Mon, 2019-11-18 at 15:29 -0500, Willem de Bruijn wrote:
> > On Mon, Nov 18, 2019 at 6:03 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > When doing RX batch packet processing, we currently always repeat
> > > the route lookup for each ingress packet. If policy routing is
> > > configured, and IPV6_SUBTREES is disabled at build time, we
> > > know that packets with the same destination address will use
> > > the same dst.
> > >
> > > This change tries to avoid per packet route lookup caching
> > > the destination address of the latest successful lookup, and
> > > reusing it for the next packet when the above conditions are
> > > in place. Ingress traffic for most servers should fit.
> > >
> > > The measured performance delta under UDP flood vs a recvmmsg
> > > receiver is as follow:
> > >
> > > vanilla         patched         delta
> > > Kpps            Kpps            %
> > > 1431            1664            +14
> >
> > Since IPv4 speed-up is almost half and code considerably more complex,
> > maybe only do IPv6?
>
> uhmm... I would avoid that kind of assimmetry, and I would not look
> down on a 8% speedup, if possible.

Okay, that's fair.

> > > @@ -104,9 +127,18 @@ static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
> > >                 skb = l3mdev_ip6_rcv(skb);
> > >                 if (!skb)
> > >                         continue;
> > > -               ip6_rcv_finish_core(net, sk, skb);
> > > +               ip6_rcv_finish_core(net, sk, skb, hint);
> > >                 dst = skb_dst(skb);
> > >                 if (curr_dst != dst) {
> > > +                       if (ip6_can_cache_route_hint(net)) {
> > > +                               _hint.refdst = skb->_skb_refdst;
> > > +                               memcpy(&_hint.daddr, &ipv6_hdr(skb)->daddr,
> > > +                                      sizeof(_hint.daddr));
> > > +                               hint = &_hint;
> > > +                       } else {
> > > +                               hint = NULL;
> > > +                       }
> >
> > not needed. ip6_can_cache_route_hit is the same for all iterations of
> > the loop (indeed, compile time static), so if false, hint is never
> > set.
>
> I think this is needed, instead: if CONFIG_MULTIPLE_TABLES=y,
> fib6_has_custom_rules can change at runtime - from 'false' to 'true'.
> If we don't reset 'hint', we could end-up with use-after-free.

Uhm, of course, this is not compile time static at all. I clearly
missed a part.

But such a config change does not expect instantaneous effect on
packets in flight, like those in the recv rcu critical section? In
which case it should be safe to treat all skbs in the list the same.

I would need to read that code more closely to be certain, and the
current solution errs on the side of caution, so is definitely fine as
is, of course.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-11-19 14:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-18 11:01 [PATCH net-next v2 0/2] net: introduce and use route hint Paolo Abeni
2019-11-18 11:01 ` [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input Paolo Abeni
2019-11-18 20:29   ` Willem de Bruijn
2019-11-18 21:58     ` Paolo Abeni
2019-11-19 14:10       ` Willem de Bruijn
2019-11-18 11:01 ` [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive Paolo Abeni
2019-11-18 14:11   ` kbuild test robot
2019-11-18 14:11     ` kbuild test robot
2019-11-18 16:07   ` David Ahern
2019-11-18 16:31     ` Paolo Abeni
2019-11-18 16:40       ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.