netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] net: mitigate retpoline overhead
@ 2018-11-29 23:00 Paolo Abeni
  2018-11-29 23:00 ` [RFC PATCH 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin Paolo Abeni
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Paolo Abeni @ 2018-11-29 23:00 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet

The spectre v2 counter-measures, aka retpolines, are a source of measurable
overhead[1]. We can partially address that when the function pointer refers to
a builtin symbol resorting to a list of tests vs well-known builtin function and
direct calls.

This may lead to some uglification around the indirect calls. In netconf 2018
Eric Dumazet described a technique to hide the most relevant part of the needed
boilerplate with some macro help.

This series is a [re-]implementation of such idea, exposing the introduced
helpers in a new header file. They are later leveraged to avoid the indirect
call overhead in the GRO path, when possible.

Overall this gives > 10% performance improvement for UDP GRO benchmark, and
smaller but measurable under for TCP syn flood.

The added infra can be used in follow-up patches to cope with retpoline overhead
in other points of the networking stack (e.g. at the qdisc layer) and possibly
even in other subsystems.

Paolo Abeni (4):
  indirect call wrappers: helpers to speed-up indirect calls of builtin
  net: use indirect call wrappers at GRO network layer
  net: use indirect call wrapper at GRO transport layer
  udp: use indirect call wrapper for GRO socket lookup

 include/linux/indirect_call_wrapper.h | 77 +++++++++++++++++++++++++++
 include/net/inet_common.h             |  9 ++++
 net/core/dev.c                        | 10 +++-
 net/ipv4/af_inet.c                    | 15 +++++-
 net/ipv4/tcp_offload.c                |  5 ++
 net/ipv4/udp.c                        |  2 +
 net/ipv4/udp_offload.c                | 11 +++-
 net/ipv6/ip6_offload.c                | 14 ++++-
 net/ipv6/tcpv6_offload.c              |  5 ++
 net/ipv6/udp.c                        |  2 +
 net/ipv6/udp_offload.c                |  5 ++
 11 files changed, 147 insertions(+), 8 deletions(-)
 create mode 100644 include/linux/indirect_call_wrapper.h

-- 
2.19.2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin
  2018-11-29 23:00 [RFC PATCH 0/4] net: mitigate retpoline overhead Paolo Abeni
@ 2018-11-29 23:00 ` Paolo Abeni
  2018-11-29 23:25   ` Eric Dumazet
  2018-11-29 23:00 ` [RFC PATCH 2/4] net: use indirect call wrappers at GRO network layer Paolo Abeni
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2018-11-29 23:00 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet

This header define a bunch of helpers that allow avoiding the
retpoline overhead when calling builtin functions via function pointers.
It boils down to explicitly comparing the function pointers to
known builtin functions and eventually invoke directly the latter.

The macro defined here implement the boilerplate for the above schema
and will be used by the next patches.

Suggested-by: Eric Dumazet <Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/indirect_call_wrapper.h | 77 +++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)
 create mode 100644 include/linux/indirect_call_wrapper.h

diff --git a/include/linux/indirect_call_wrapper.h b/include/linux/indirect_call_wrapper.h
new file mode 100644
index 000000000000..57e82b4a166d
--- /dev/null
+++ b/include/linux/indirect_call_wrapper.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_INDIRECT_CALL_WRAPPER_H
+#define _LINUX_INDIRECT_CALL_WRAPPER_H
+
+#ifdef CONFIG_RETPOLINE
+
+/*
+ * INDIRECT_CALL_$NR - wrapper for indirect calls with $NR known builtin
+ *  @f: function pointer
+ *  @name: base name for builtin functions, see INDIRECT_CALLABLE_DECLARE_$NR
+ *  @__VA_ARGS__: arguments for @f
+ *
+ * Avoid retpoline overhead for known builtin, checking @f vs each of them and
+ * eventually invoking directly the builtin function. Fallback to the indirect
+ * call
+ */
+#define INDIRECT_CALL_1(f, name, ...)					\
+	({								\
+		f == name ## 1 ? name ## 1(__VA_ARGS__) :		\
+				 f(__VA_ARGS__);			\
+	})
+#define INDIRECT_CALL_2(f, name, ...)					\
+	({								\
+		f == name ## 2 ? name ## 2(__VA_ARGS__) :		\
+				 INDIRECT_CALL_1(f, name, __VA_ARGS__);	\
+	})
+
+/*
+ * INDIRECT_CALLABLE_DECLARE_$NR - declare $NR known builtin for
+ * INDIRECT_CALL_$NR usage
+ *  @type: return type for the builtin function
+ *  @name: base name for builtin functions, the full list is generated appending
+ *	   the numbers in the 1..@NR range
+ *  @__VA_ARGS__: arguments type list for the builtin function
+ *
+ * Builtin with higher $NR will be checked first by INDIRECT_CALL_$NR
+ */
+#define INDIRECT_CALLABLE_DECLARE_1(type, name, ...)			\
+	extern type name ## 1(__VA_ARGS__)
+#define INDIRECT_CALLABLE_DECLARE_2(type, name, ...)			\
+	extern type name ## 2(__VA_ARGS__);				\
+	INDIRECT_CALLABLE_DECLARE_1(type, name, __VA_ARGS__)
+
+/*
+ * INDIRECT_CALLABLE - allow usage of a builtin function from INDIRECT_CALL_$NR
+ *  @f: builtin function name
+ *  @nr: id associated with this builtin, higher values will be checked first by
+ *	 INDIRECT_CALL_$NR
+ *  @type: function return type
+ *  @name: base name used by INDIRECT_CALL_ to access the builtin list
+ *  @__VA_ARGS__: arguments type list for @f
+ */
+#define INDIRECT_CALLABLE(f, nr, type, name, ...)		\
+	__alias(f) type name ## nr(__VA_ARGS__)
+
+#else
+#define INDIRECT_CALL_1(f, name, ...) f(__VA_ARGS__)
+#define INDIRECT_CALL_2(f, name, ...) f(__VA_ARGS__)
+#define INDIRECT_CALLABLE_DECLARE_1(type, name, ...)
+#define INDIRECT_CALLABLE_DECLARE_2(type, name, ...)
+#define INDIRECT_CALLABLE(f, nr, type, name, ...)
+#endif
+
+/*
+ * We can use INDIRECT_CALL_$NR for ipv6 related functions only if ipv6 is
+ * builtin, this macro simplify dealing with indirect calls with only ipv4/ipv6
+ * alternatives
+ */
+#if IS_BUILTIN(CONFIG_IPV6)
+#define INDIRECT_CALL_INET INDIRECT_CALL_2
+#elif IS_ENABLED(CONFIG_INET)
+#define INDIRECT_CALL_INET INDIRECT_CALL_1
+#else
+#define INDIRECT_CALL_INET(...)
+#endif
+
+#endif
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/4] net: use indirect call wrappers at GRO network layer
  2018-11-29 23:00 [RFC PATCH 0/4] net: mitigate retpoline overhead Paolo Abeni
  2018-11-29 23:00 ` [RFC PATCH 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin Paolo Abeni
@ 2018-11-29 23:00 ` Paolo Abeni
  2018-11-29 23:00 ` [RFC PATCH 3/4] net: use indirect call wrapper at GRO transport layer Paolo Abeni
  2018-11-29 23:00 ` [RFC PATCH 4/4] udp: use indirect call wrapper for GRO socket lookup Paolo Abeni
  3 siblings, 0 replies; 7+ messages in thread
From: Paolo Abeni @ 2018-11-29 23:00 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet

This avoids an indirect calls for L3 GRO receive path, both
for ipv4 and ipv6, if the latter is not compiled as a module.

Note that when IPv6 is compiled as buildin, it will be checked first,
so we have a single additional compare for the more common path.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/inet_common.h |  2 ++
 net/core/dev.c            | 10 ++++++++--
 net/ipv4/af_inet.c        |  4 ++++
 net/ipv6/ip6_offload.c    |  4 ++++
 4 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index 3ca969cbd161..56e7592811ea 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -2,6 +2,8 @@
 #ifndef _INET_COMMON_H
 #define _INET_COMMON_H
 
+#include <linux/indirect_call_wrapper.h>
+
 extern const struct proto_ops inet_stream_ops;
 extern const struct proto_ops inet_dgram_ops;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index f69b2fcdee40..619f5902600f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -145,6 +145,7 @@
 #include <linux/sctp.h>
 #include <net/udp_tunnel.h>
 #include <linux/net_namespace.h>
+#include <linux/indirect_call_wrapper.h>
 
 #include "net-sysfs.h"
 
@@ -5306,6 +5307,7 @@ static void flush_all_backlogs(void)
 	put_online_cpus();
 }
 
+INDIRECT_CALLABLE_DECLARE_2(int, network_gro_complete, struct sk_buff *, int);
 static int napi_gro_complete(struct sk_buff *skb)
 {
 	struct packet_offload *ptype;
@@ -5325,7 +5327,8 @@ static int napi_gro_complete(struct sk_buff *skb)
 		if (ptype->type != type || !ptype->callbacks.gro_complete)
 			continue;
 
-		err = ptype->callbacks.gro_complete(skb, 0);
+		err = INDIRECT_CALL_INET(ptype->callbacks.gro_complete,
+					 network_gro_complete, skb, 0);
 		break;
 	}
 	rcu_read_unlock();
@@ -5472,6 +5475,8 @@ static void gro_flush_oldest(struct list_head *head)
 	napi_gro_complete(oldest);
 }
 
+INDIRECT_CALLABLE_DECLARE_2(struct sk_buff *, network_gro_receive,
+			    struct list_head *, struct sk_buff *);
 static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 {
 	u32 hash = skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1);
@@ -5521,7 +5526,8 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 			NAPI_GRO_CB(skb)->csum_valid = 0;
 		}
 
-		pp = ptype->callbacks.gro_receive(gro_head, skb);
+		pp = INDIRECT_CALL_INET(ptype->callbacks.gro_receive,
+					network_gro_receive, gro_head, skb);
 		break;
 	}
 	rcu_read_unlock();
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 326c422c22f8..04ab7ebd6e9b 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1505,6 +1505,8 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 	return pp;
 }
 EXPORT_SYMBOL(inet_gro_receive);
+INDIRECT_CALLABLE(inet_gro_receive, 1, struct sk_buff *, network_gro_receive,
+		  struct list_head *, struct sk_buff *);
 
 static struct sk_buff *ipip_gro_receive(struct list_head *head,
 					struct sk_buff *skb)
@@ -1589,6 +1591,8 @@ int inet_gro_complete(struct sk_buff *skb, int nhoff)
 	return err;
 }
 EXPORT_SYMBOL(inet_gro_complete);
+INDIRECT_CALLABLE(inet_gro_complete, 1, int, network_gro_complete,
+		  struct sk_buff *, int);
 
 static int ipip_gro_complete(struct sk_buff *skb, int nhoff)
 {
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 70f525c33cb6..a1c2bfb2ce0d 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -270,6 +270,8 @@ static struct sk_buff *ipv6_gro_receive(struct list_head *head,
 
 	return pp;
 }
+INDIRECT_CALLABLE(ipv6_gro_receive, 2, struct sk_buff *, network_gro_receive,
+		  struct list_head *, struct sk_buff *);
 
 static struct sk_buff *sit_ip6ip6_gro_receive(struct list_head *head,
 					      struct sk_buff *skb)
@@ -327,6 +329,8 @@ static int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 
 	return err;
 }
+INDIRECT_CALLABLE(ipv6_gro_complete, 2, int, network_gro_complete,
+		  struct sk_buff *, int);
 
 static int sit_gro_complete(struct sk_buff *skb, int nhoff)
 {
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 3/4] net: use indirect call wrapper at GRO transport layer
  2018-11-29 23:00 [RFC PATCH 0/4] net: mitigate retpoline overhead Paolo Abeni
  2018-11-29 23:00 ` [RFC PATCH 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin Paolo Abeni
  2018-11-29 23:00 ` [RFC PATCH 2/4] net: use indirect call wrappers at GRO network layer Paolo Abeni
@ 2018-11-29 23:00 ` Paolo Abeni
  2018-11-29 23:00 ` [RFC PATCH 4/4] udp: use indirect call wrapper for GRO socket lookup Paolo Abeni
  3 siblings, 0 replies; 7+ messages in thread
From: Paolo Abeni @ 2018-11-29 23:00 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet

This avoids an indirect call in the receive path for TCP and UDP
packets. TCP takes precedence on UDP, so that we have a single
additional conditional in the common case.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/inet_common.h |  7 +++++++
 net/ipv4/af_inet.c        | 11 +++++++++--
 net/ipv4/tcp_offload.c    |  5 +++++
 net/ipv4/udp_offload.c    |  5 +++++
 net/ipv6/ip6_offload.c    | 10 ++++++++--
 net/ipv6/tcpv6_offload.c  |  5 +++++
 net/ipv6/udp_offload.c    |  5 +++++
 7 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index 56e7592811ea..667bb8247f9a 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -56,4 +56,11 @@ static inline void inet_ctl_sock_destroy(struct sock *sk)
 		sock_release(sk->sk_socket);
 }
 
+#define indirect_call_gro_receive(name, cb, head, skb)	\
+({							\
+	unlikely(gro_recursion_inc_test(skb)) ?		\
+		NAPI_GRO_CB(skb)->flush |= 1, NULL :	\
+		INDIRECT_CALL_2(cb, name, head, skb);	\
+})
+
 #endif
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 04ab7ebd6e9b..774f183f56e3 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1385,6 +1385,8 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 }
 EXPORT_SYMBOL(inet_gso_segment);
 
+INDIRECT_CALLABLE_DECLARE_2(struct sk_buff *, transport4_gro_receive,
+			    struct list_head *head, struct sk_buff *skb);
 struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 {
 	const struct net_offload *ops;
@@ -1494,7 +1496,8 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 	skb_gro_pull(skb, sizeof(*iph));
 	skb_set_transport_header(skb, skb_gro_offset(skb));
 
-	pp = call_gro_receive(ops->callbacks.gro_receive, head, skb);
+	pp = indirect_call_gro_receive(transport4_gro_receive,
+				       ops->callbacks.gro_receive, head, skb);
 
 out_unlock:
 	rcu_read_unlock();
@@ -1558,6 +1561,8 @@ int inet_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 	return -EINVAL;
 }
 
+INDIRECT_CALLABLE_DECLARE_2(int, transport4_gro_complete, struct sk_buff *skb,
+			    int);
 int inet_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	__be16 newlen = htons(skb->len - nhoff);
@@ -1583,7 +1588,9 @@ int inet_gro_complete(struct sk_buff *skb, int nhoff)
 	 * because any hdr with option will have been flushed in
 	 * inet_gro_receive().
 	 */
-	err = ops->callbacks.gro_complete(skb, nhoff + sizeof(*iph));
+	err = INDIRECT_CALL_2(ops->callbacks.gro_complete,
+			      transport4_gro_complete, skb,
+			      nhoff + sizeof(*iph));
 
 out_unlock:
 	rcu_read_unlock();
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 870b0a335061..3d5dfac4cd1b 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -10,6 +10,7 @@
  *	TCPv4 GSO/GRO support
  */
 
+#include <linux/indirect_call_wrapper.h>
 #include <linux/skbuff.h>
 #include <net/tcp.h>
 #include <net/protocol.h>
@@ -317,6 +318,8 @@ static struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *
 
 	return tcp_gro_receive(head, skb);
 }
+INDIRECT_CALLABLE(tcp4_gro_receive, 2, struct sk_buff *, transport4_gro_receive,
+		  struct list_head *, struct sk_buff *);
 
 static int tcp4_gro_complete(struct sk_buff *skb, int thoff)
 {
@@ -332,6 +335,8 @@ static int tcp4_gro_complete(struct sk_buff *skb, int thoff)
 
 	return tcp_gro_complete(skb);
 }
+INDIRECT_CALLABLE(tcp4_gro_complete, 2, int, transport4_gro_complete,
+		  struct sk_buff *, int);
 
 static const struct net_offload tcpv4_offload = {
 	.callbacks = {
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 0646d61f4fa8..c3c5b237c8e0 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -13,6 +13,7 @@
 #include <linux/skbuff.h>
 #include <net/udp.h>
 #include <net/protocol.h>
+#include <net/inet_common.h>
 
 static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 	netdev_features_t features,
@@ -477,6 +478,8 @@ static struct sk_buff *udp4_gro_receive(struct list_head *head,
 	NAPI_GRO_CB(skb)->flush = 1;
 	return NULL;
 }
+INDIRECT_CALLABLE(udp4_gro_receive, 1, struct sk_buff *, transport4_gro_receive,
+		  struct list_head *, struct sk_buff *);
 
 static int udp_gro_complete_segment(struct sk_buff *skb)
 {
@@ -536,6 +539,8 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 
 	return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
 }
+INDIRECT_CALLABLE(udp4_gro_complete, 1, int, transport4_gro_complete,
+		  struct sk_buff *, int);
 
 static const struct net_offload udpv4_offload = {
 	.callbacks = {
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index a1c2bfb2ce0d..eeca4164a155 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -164,6 +164,8 @@ static int ipv6_exthdrs_len(struct ipv6hdr *iph,
 	return len;
 }
 
+INDIRECT_CALLABLE_DECLARE_2(struct sk_buff *, transport6_gro_receive,
+			    struct list_head *head, struct sk_buff *skb);
 static struct sk_buff *ipv6_gro_receive(struct list_head *head,
 					struct sk_buff *skb)
 {
@@ -260,7 +262,8 @@ static struct sk_buff *ipv6_gro_receive(struct list_head *head,
 
 	skb_gro_postpull_rcsum(skb, iph, nlen);
 
-	pp = call_gro_receive(ops->callbacks.gro_receive, head, skb);
+	pp = indirect_call_gro_receive(transport6_gro_receive,
+				       ops->callbacks.gro_receive, head, skb);
 
 out_unlock:
 	rcu_read_unlock();
@@ -303,6 +306,8 @@ static struct sk_buff *ip4ip6_gro_receive(struct list_head *head,
 	return inet_gro_receive(head, skb);
 }
 
+INDIRECT_CALLABLE_DECLARE_2(int, transport6_gro_complete, struct sk_buff *skb,
+			    int);
 static int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	const struct net_offload *ops;
@@ -322,7 +327,8 @@ static int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
 		goto out_unlock;
 
-	err = ops->callbacks.gro_complete(skb, nhoff);
+	err = INDIRECT_CALL_2(ops->callbacks.gro_complete,
+			      transport6_gro_complete, skb, nhoff);
 
 out_unlock:
 	rcu_read_unlock();
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index e72947c99454..3c85afc4cf43 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -9,6 +9,7 @@
  *
  *      TCPv6 GSO/GRO support
  */
+#include <linux/indirect_call_wrapper.h>
 #include <linux/skbuff.h>
 #include <net/protocol.h>
 #include <net/tcp.h>
@@ -28,6 +29,8 @@ static struct sk_buff *tcp6_gro_receive(struct list_head *head,
 
 	return tcp_gro_receive(head, skb);
 }
+INDIRECT_CALLABLE(tcp6_gro_receive, 2, struct sk_buff *, transport6_gro_receive,
+		  struct list_head *, struct sk_buff *);
 
 static int tcp6_gro_complete(struct sk_buff *skb, int thoff)
 {
@@ -40,6 +43,8 @@ static int tcp6_gro_complete(struct sk_buff *skb, int thoff)
 
 	return tcp_gro_complete(skb);
 }
+INDIRECT_CALLABLE(tcp6_gro_complete, 2, int, transport6_gro_complete,
+		  struct sk_buff *, int);
 
 static struct sk_buff *tcp6_gso_segment(struct sk_buff *skb,
 					netdev_features_t features)
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 828b2457f97b..ce4d491c583c 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -11,6 +11,7 @@
  */
 #include <linux/skbuff.h>
 #include <linux/netdevice.h>
+#include <linux/indirect_call_wrapper.h>
 #include <net/protocol.h>
 #include <net/ipv6.h>
 #include <net/udp.h>
@@ -141,6 +142,8 @@ static struct sk_buff *udp6_gro_receive(struct list_head *head,
 	NAPI_GRO_CB(skb)->flush = 1;
 	return NULL;
 }
+INDIRECT_CALLABLE(udp6_gro_receive, 1, struct sk_buff *, transport6_gro_receive,
+		  struct list_head *, struct sk_buff *);
 
 static int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 {
@@ -153,6 +156,8 @@ static int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 
 	return udp_gro_complete(skb, nhoff, udp6_lib_lookup_skb);
 }
+INDIRECT_CALLABLE(udp6_gro_complete, 1, int, transport6_gro_complete,
+		  struct sk_buff *, int);
 
 static const struct net_offload udpv6_offload = {
 	.callbacks = {
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 4/4] udp: use indirect call wrapper for GRO socket lookup
  2018-11-29 23:00 [RFC PATCH 0/4] net: mitigate retpoline overhead Paolo Abeni
                   ` (2 preceding siblings ...)
  2018-11-29 23:00 ` [RFC PATCH 3/4] net: use indirect call wrapper at GRO transport layer Paolo Abeni
@ 2018-11-29 23:00 ` Paolo Abeni
  3 siblings, 0 replies; 7+ messages in thread
From: Paolo Abeni @ 2018-11-29 23:00 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet

This avoids another indirect call for UDP GRO. Again, the test
for the IPv6 variant is performed first.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/udp.c         | 2 ++
 net/ipv4/udp_offload.c | 6 ++++--
 net/ipv6/udp.c         | 2 ++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index aff2a8e99e01..9ea851f47598 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -544,6 +544,8 @@ struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
 	return __udp4_lib_lookup_skb(skb, sport, dport, &udp_table);
 }
 EXPORT_SYMBOL_GPL(udp4_lib_lookup_skb);
+INDIRECT_CALLABLE(udp4_lib_lookup_skb, 1, struct sock *, udp_lookup,
+		  struct sk_buff *skb, __be16 sport, __be16 dport);
 
 /* Must be called under rcu_read_lock().
  * Does increment socket refcount.
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index c3c5b237c8e0..0ccd2aa1ab98 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -392,6 +392,8 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 	return NULL;
 }
 
+INDIRECT_CALLABLE_DECLARE_2(struct sock *, udp_lookup, struct sk_buff *skb,
+			    __be16 sport, __be16 dport);
 struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 				struct udphdr *uh, udp_lookup_t lookup)
 {
@@ -403,7 +405,7 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 	struct sock *sk;
 
 	rcu_read_lock();
-	sk = (*lookup)(skb, uh->source, uh->dest);
+	sk = INDIRECT_CALL_INET(lookup, udp_lookup, skb, uh->source, uh->dest);
 	if (!sk)
 		goto out_unlock;
 
@@ -505,7 +507,7 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
 	uh->len = newlen;
 
 	rcu_read_lock();
-	sk = (*lookup)(skb, uh->source, uh->dest);
+	sk = INDIRECT_CALL_INET(lookup, udp_lookup, skb, uh->source, uh->dest);
 	if (sk && udp_sk(sk)->gro_enabled) {
 		err = udp_gro_complete_segment(skb);
 	} else if (sk && udp_sk(sk)->gro_complete) {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 09cba4cfe31f..616f374760d1 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -282,6 +282,8 @@ struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
 				 inet6_sdif(skb), &udp_table, skb);
 }
 EXPORT_SYMBOL_GPL(udp6_lib_lookup_skb);
+INDIRECT_CALLABLE(udp6_lib_lookup_skb, 2, struct sock *, udp_lookup,
+		  struct sk_buff *skb, __be16 sport, __be16 dport);
 
 /* Must be called under rcu_read_lock().
  * Does increment socket refcount.
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin
  2018-11-29 23:00 ` [RFC PATCH 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin Paolo Abeni
@ 2018-11-29 23:25   ` Eric Dumazet
  2018-11-30  8:29     ` Paolo Abeni
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2018-11-29 23:25 UTC (permalink / raw)
  To: Paolo Abeni, netdev; +Cc: David S. Miller



On 11/29/2018 03:00 PM, Paolo Abeni wrote:
> This header define a bunch of helpers that allow avoiding the
> retpoline overhead when calling builtin functions via function pointers.
> It boils down to explicitly comparing the function pointers to
> known builtin functions and eventually invoke directly the latter.
> 
> The macro defined here implement the boilerplate for the above schema
> and will be used by the next patches.
> 
> Suggested-by: Eric Dumazet <Eric Dumazet <edumazet@google.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/linux/indirect_call_wrapper.h | 77 +++++++++++++++++++++++++++
>  1 file changed, 77 insertions(+)
>  create mode 100644 include/linux/indirect_call_wrapper.h
> 
> diff --git a/include/linux/indirect_call_wrapper.h b/include/linux/indirect_call_wrapper.h
> new file mode 100644
> index 000000000000..57e82b4a166d
> --- /dev/null
> +++ b/include/linux/indirect_call_wrapper.h
> @@ -0,0 +1,77 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_INDIRECT_CALL_WRAPPER_H
> +#define _LINUX_INDIRECT_CALL_WRAPPER_H
> +
> +#ifdef CONFIG_RETPOLINE
> +
> +/*
> + * INDIRECT_CALL_$NR - wrapper for indirect calls with $NR known builtin
> + *  @f: function pointer
> + *  @name: base name for builtin functions, see INDIRECT_CALLABLE_DECLARE_$NR
> + *  @__VA_ARGS__: arguments for @f
> + *
> + * Avoid retpoline overhead for known builtin, checking @f vs each of them and
> + * eventually invoking directly the builtin function. Fallback to the indirect
> + * call
> + */
> +#define INDIRECT_CALL_1(f, name, ...)					\
> +	({								\
> +		f == name ## 1 ? name ## 1(__VA_ARGS__) :		\

              likely(f == name ## 1) ? ...

> +				 f(__VA_ARGS__);			\
> +	})
> +#define INDIRECT_CALL_2(f, name, ...)					\
> +	({								\
> +		f == name ## 2 ? name ## 2(__VA_ARGS__) :		\

             likely(f == name ## 2) ? ...


> +				 INDIRECT_CALL_1(f, name, __VA_ARGS__);	\
> +	})
> +
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin
  2018-11-29 23:25   ` Eric Dumazet
@ 2018-11-30  8:29     ` Paolo Abeni
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Abeni @ 2018-11-30  8:29 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: David S. Miller

Hi,

On Thu, 2018-11-29 at 15:25 -0800, Eric Dumazet wrote:
> 
> On 11/29/2018 03:00 PM, Paolo Abeni wrote:
> > This header define a bunch of helpers that allow avoiding the
> > retpoline overhead when calling builtin functions via function pointers.
> > It boils down to explicitly comparing the function pointers to
> > known builtin functions and eventually invoke directly the latter.
> > 
> > The macro defined here implement the boilerplate for the above schema
> > and will be used by the next patches.
> > 
> > Suggested-by: Eric Dumazet <Eric Dumazet <edumazet@google.com>

Oops... typo here. For some reasons checkpatch did not catch it.

> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  include/linux/indirect_call_wrapper.h | 77 +++++++++++++++++++++++++++
> >  1 file changed, 77 insertions(+)
> >  create mode 100644 include/linux/indirect_call_wrapper.h
> > 
> > diff --git a/include/linux/indirect_call_wrapper.h b/include/linux/indirect_call_wrapper.h
> > new file mode 100644
> > index 000000000000..57e82b4a166d
> > --- /dev/null
> > +++ b/include/linux/indirect_call_wrapper.h
> > @@ -0,0 +1,77 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _LINUX_INDIRECT_CALL_WRAPPER_H
> > +#define _LINUX_INDIRECT_CALL_WRAPPER_H
> > +
> > +#ifdef CONFIG_RETPOLINE
> > +
> > +/*
> > + * INDIRECT_CALL_$NR - wrapper for indirect calls with $NR known builtin
> > + *  @f: function pointer
> > + *  @name: base name for builtin functions, see INDIRECT_CALLABLE_DECLARE_$NR
> > + *  @__VA_ARGS__: arguments for @f
> > + *
> > + * Avoid retpoline overhead for known builtin, checking @f vs each of them and
> > + * eventually invoking directly the builtin function. Fallback to the indirect
> > + * call
> > + */
> > +#define INDIRECT_CALL_1(f, name, ...)					\
> > +	({								\
> > +		f == name ## 1 ? name ## 1(__VA_ARGS__) :		\
> 
>               likely(f == name ## 1) ? ...

Thank you for the feedback!

I thought about the above, and than I avoided it, because I was not
100% it would fit cases (if any) where we have 2 or more built-in
equally likely.

I guess we can address such cases if and when they will pop-up. I'll do
some more benchmarks with the branch prediction hints, and then if
there are no surprises, I'll add them in v1.

BTW I would like to give the correct attribution here. Does 'Suggested-
by' fit? should I list some other guy @google?

Thanks,

Paols

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-11-30 19:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-29 23:00 [RFC PATCH 0/4] net: mitigate retpoline overhead Paolo Abeni
2018-11-29 23:00 ` [RFC PATCH 1/4] indirect call wrappers: helpers to speed-up indirect calls of builtin Paolo Abeni
2018-11-29 23:25   ` Eric Dumazet
2018-11-30  8:29     ` Paolo Abeni
2018-11-29 23:00 ` [RFC PATCH 2/4] net: use indirect call wrappers at GRO network layer Paolo Abeni
2018-11-29 23:00 ` [RFC PATCH 3/4] net: use indirect call wrapper at GRO transport layer Paolo Abeni
2018-11-29 23:00 ` [RFC PATCH 4/4] udp: use indirect call wrapper for GRO socket lookup Paolo Abeni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).