All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/10] udp: implement GRO support
@ 2018-10-19 14:25 Paolo Abeni
  2018-10-19 14:25 ` [RFC PATCH v2 01/10] udp: implement complete book-keeping for encap_needed Paolo Abeni
                   ` (11 more replies)
  0 siblings, 12 replies; 41+ messages in thread
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert

This series implements GRO support for UDP sockets, as the RX counterpart
of commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
The core functionality is implemented by the second patch, introducing a new
sockopt to enable UDP_GRO, while patch 3 implements support for passing the
segment size to the user space via a new cmsg.
UDP GRO performs a socket lookup for each ingress packets and aggregate datagram
directed to UDP GRO enabled sockets with constant l4 tuple.

UDP GRO packets can land on non GRO-enabled sockets, e.g. due to iptables NAT
rules, and that could potentially confuse existing applications.

The solution adopted here is to de-segment the GRO packet before enqueuing
as needed. Since we must cope with packet reinsertion after de-segmentation,
the relevant code is factored-out in ipv4 and ipv6 specific helpers and exposed
to UDP usage.

While the current code can probably be improved, this safeguard ,implemented in
the patches 4-7, allows future enachements to enable UDP GSO offload on more
virtual devices eventually even on forwarded packets.

The last 4 for patches implement some performance and functional self-tests,
re-using the existing udpgso infrastructure. The problematic scenario described
above is explicitly tested.

v1 - v2:
 - use a new option to enable UDP GRO
 - use static keys to protect the UDP GRO socket lookup
 - cope with UDP GRO misdirection
 - add self-tests

Paolo Abeni (10):
  udp: implement complete book-keeping for encap_needed
  udp: implement GRO for plain UDP sockets.
  udp: add support for UDP_GRO cmsg
  ip: factor out protocol delivery helper
  ipv6: factor out protocol delivery helper
  udp: cope with UDP GRO packet misdirection
  selftests: add GRO support to udp bench rx program
  selftests: conditionally enable XDP support in udpgso_bench_rx
  selftests: add some benchmark for UDP GRO
  selftests: add functionals test for UDP GRO

 include/linux/udp.h                           |  42 +++-
 include/net/udp_tunnel.h                      |   6 +
 include/uapi/linux/udp.h                      |   1 +
 net/ipv4/ip_input.c                           |  73 ++++---
 net/ipv4/udp.c                                |  54 ++++-
 net/ipv4/udp_offload.c                        | 109 ++++++++--
 net/ipv6/ip6_input.c                          |  28 +--
 net/ipv6/udp.c                                |  44 +++-
 net/ipv6/udp_offload.c                        |   6 +-
 tools/testing/selftests/net/Makefile          |  70 +++++++
 tools/testing/selftests/net/udpgro.sh         | 144 +++++++++++++
 tools/testing/selftests/net/udpgro_bench.sh   |  94 +++++++++
 tools/testing/selftests/net/udpgso_bench.sh   |   2 +-
 tools/testing/selftests/net/udpgso_bench_rx.c | 195 ++++++++++++++++--
 tools/testing/selftests/net/udpgso_bench_tx.c |  22 +-
 tools/testing/selftests/net/xdp_dummy.c       |  13 ++
 16 files changed, 790 insertions(+), 113 deletions(-)
 create mode 100755 tools/testing/selftests/net/udpgro.sh
 create mode 100755 tools/testing/selftests/net/udpgro_bench.sh
 create mode 100644 tools/testing/selftests/net/xdp_dummy.c

-- 
2.17.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [RFC PATCH v2 01/10] udp: implement complete book-keeping for encap_needed
  2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
@ 2018-10-19 14:25 ` Paolo Abeni
  2018-10-22 16:06   ` Willem de Bruijn
  2018-10-19 14:25 ` [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets Paolo Abeni
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert

The *encap_needed static keys are enabled by UDP tunnels
and several UDP encapsulations type, but they are never
turned off. This can cause unneeded overall performance
degradation for systems where such features are used
transiently.

This patch introduces complete book-keeping for such keys,
decreasing the usage at socket destruction time, if needed,
and avoiding that the same socket could increase the key
usage multiple times.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/udp.h      |  7 ++++++-
 include/net/udp_tunnel.h |  6 ++++++
 net/ipv4/udp.c           | 18 ++++++++++++------
 net/ipv6/udp.c           | 14 +++++++++-----
 4 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index 320d49d85484..a4dafff407fb 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -49,7 +49,12 @@ struct udp_sock {
 	unsigned int	 corkflag;	/* Cork is required */
 	__u8		 encap_type;	/* Is this an Encapsulation socket? */
 	unsigned char	 no_check6_tx:1,/* Send zero UDP6 checksums on TX? */
-			 no_check6_rx:1;/* Allow zero UDP6 checksums on RX? */
+			 no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
+			 encap_enabled:1; /* This socket enabled encap
+					   * processing; UDP tunnels and
+					   * different encapsulation layer set
+					   * this
+					   */
 	/*
 	 * Following member retains the information to create a UDP header
 	 * when the socket is uncorked.
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index fe680ab6b15a..3fbe56430e3b 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -165,6 +165,12 @@ static inline int udp_tunnel_handle_offloads(struct sk_buff *skb, bool udp_csum)
 
 static inline void udp_tunnel_encap_enable(struct socket *sock)
 {
+	struct udp_sock *up = udp_sk(sock->sk);
+
+	if (up->encap_enabled)
+		return;
+
+	up->encap_enabled = 1;
 #if IS_ENABLED(CONFIG_IPV6)
 	if (sock->sk->sk_family == PF_INET6)
 		ipv6_stub->udpv6_encap_enable();
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cf8252d05a01..9fcb5374e166 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2382,11 +2382,15 @@ void udp_destroy_sock(struct sock *sk)
 	bool slow = lock_sock_fast(sk);
 	udp_flush_pending_frames(sk);
 	unlock_sock_fast(sk, slow);
-	if (static_branch_unlikely(&udp_encap_needed_key) && up->encap_type) {
-		void (*encap_destroy)(struct sock *sk);
-		encap_destroy = READ_ONCE(up->encap_destroy);
-		if (encap_destroy)
-			encap_destroy(sk);
+	if (static_branch_unlikely(&udp_encap_needed_key)) {
+		if (up->encap_type) {
+			void (*encap_destroy)(struct sock *sk);
+			encap_destroy = READ_ONCE(up->encap_destroy);
+			if (encap_destroy)
+				encap_destroy(sk);
+		}
+		if (up->encap_enabled)
+			static_branch_disable(&udp_encap_needed_key);
 	}
 }
 
@@ -2431,7 +2435,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 			/* FALLTHROUGH */
 		case UDP_ENCAP_L2TPINUDP:
 			up->encap_type = val;
-			udp_encap_enable();
+			if (!up->encap_enabled)
+				udp_encap_enable();
+			up->encap_enabled = 1;
 			break;
 		default:
 			err = -ENOPROTOOPT;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 374e7d302f26..8bb50ba32a6f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1460,11 +1460,15 @@ void udpv6_destroy_sock(struct sock *sk)
 	udp_v6_flush_pending_frames(sk);
 	release_sock(sk);
 
-	if (static_branch_unlikely(&udpv6_encap_needed_key) && up->encap_type) {
-		void (*encap_destroy)(struct sock *sk);
-		encap_destroy = READ_ONCE(up->encap_destroy);
-		if (encap_destroy)
-			encap_destroy(sk);
+	if (static_branch_unlikely(&udpv6_encap_needed_key)) {
+		if (up->encap_type) {
+			void (*encap_destroy)(struct sock *sk);
+			encap_destroy = READ_ONCE(up->encap_destroy);
+			if (encap_destroy)
+				encap_destroy(sk);
+		}
+		if (up->encap_enabled)
+			static_branch_disable(&udpv6_encap_needed_key);
 	}
 
 	inet6_destroy_sock(sk);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.
  2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
  2018-10-19 14:25 ` [RFC PATCH v2 01/10] udp: implement complete book-keeping for encap_needed Paolo Abeni
@ 2018-10-19 14:25 ` Paolo Abeni
  2018-10-21 20:06   ` Willem de Bruijn
  2018-10-22 11:24   ` Steffen Klassert
  2018-10-19 14:25 ` [RFC PATCH v2 03/10] udp: add support for UDP_GRO cmsg Paolo Abeni
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 41+ messages in thread
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert

This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso
with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
eligible for GRO in the rx path: UDP segments directed to such socket
are assembled into a larger GSO_UDP_L4 packet.

The core UDP GRO support is enabled with setsockopt(UDP_GRO).

Initial benchmark numbers:

Before:
udp rx:   1079 MB/s   769065 calls/s

After:
udp rx:   1466 MB/s    24877 calls/s

This change introduces a side effect in respect to UDP tunnels:
after a UDP tunnel creation, now the kernel performs a lookup per ingress
UDP packet, while before such lookup happened only if the ingress packet
carried a valid internal header csum.

v1 -> v2:
 - use a new option to enable UDP GRO
 - use static keys to protect the UDP GRO socket lookup

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/udp.h      |   3 +-
 include/uapi/linux/udp.h |   1 +
 net/ipv4/udp.c           |   7 +++
 net/ipv4/udp_offload.c   | 109 +++++++++++++++++++++++++++++++--------
 net/ipv6/udp_offload.c   |   6 +--
 5 files changed, 98 insertions(+), 28 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index a4dafff407fb..f613b329852e 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -50,11 +50,12 @@ struct udp_sock {
 	__u8		 encap_type;	/* Is this an Encapsulation socket? */
 	unsigned char	 no_check6_tx:1,/* Send zero UDP6 checksums on TX? */
 			 no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
-			 encap_enabled:1; /* This socket enabled encap
+			 encap_enabled:1, /* This socket enabled encap
 					   * processing; UDP tunnels and
 					   * different encapsulation layer set
 					   * this
 					   */
+			 gro_enabled:1;	/* Can accept GRO packets */
 	/*
 	 * Following member retains the information to create a UDP header
 	 * when the socket is uncorked.
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 09502de447f5..30baccb6c9c4 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -33,6 +33,7 @@ struct udphdr {
 #define UDP_NO_CHECK6_TX 101	/* Disable sending checksum for UDP6X */
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
+#define UDP_GRO		104	/* This socket can receive UDP GRO packets */
 
 /* UDP encapsulation types */
 #define UDP_ENCAP_ESPINUDP_NON_IKE	1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9fcb5374e166..3c277378814f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -115,6 +115,7 @@
 #include "udp_impl.h"
 #include <net/sock_reuseport.h>
 #include <net/addrconf.h>
+#include <net/udp_tunnel.h>
 
 struct udp_table udp_table __read_mostly;
 EXPORT_SYMBOL(udp_table);
@@ -2459,6 +2460,12 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 		up->gso_size = val;
 		break;
 
+	case UDP_GRO:
+		if (valbool)
+			udp_tunnel_encap_enable(sk->sk_socket);
+		up->gro_enabled = valbool;
+		break;
+
 	/*
 	 * 	UDP-Lite's partial checksum coverage (RFC 3828).
 	 */
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 802f2bc00d69..d93c1e8097ba 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -343,6 +343,54 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 	return segs;
 }
 
+#define UDO_GRO_CNT_MAX 64
+static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
+					       struct sk_buff *skb)
+{
+	struct udphdr *uh = udp_hdr(skb);
+	struct sk_buff *pp = NULL;
+	struct udphdr *uh2;
+	struct sk_buff *p;
+
+	/* requires non zero csum, for simmetry with GSO */
+	if (!uh->check) {
+		NAPI_GRO_CB(skb)->flush = 1;
+		return NULL;
+	}
+
+	/* pull encapsulating udp header */
+	skb_gro_pull(skb, sizeof(struct udphdr));
+	skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
+
+	list_for_each_entry(p, head, list) {
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		uh2 = udp_hdr(p);
+
+		/* Match ports only, as csum is always non zero */
+		if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+
+		/* Terminate the flow on len mismatch or if it grow "too much".
+		 * Under small packet flood GRO count could elsewhere grow a lot
+		 * leading to execessive truesize values
+		 */
+		if (!skb_gro_receive(p, skb) &&
+		    NAPI_GRO_CB(p)->count > UDO_GRO_CNT_MAX)
+			pp = p;
+		else if (uh->len != uh2->len)
+			pp = p;
+
+		return pp;
+	}
+
+	/* mismatch, but we never need to flush */
+	return NULL;
+}
+
 struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 				struct udphdr *uh, udp_lookup_t lookup)
 {
@@ -353,23 +401,27 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 	int flush = 1;
 	struct sock *sk;
 
+	rcu_read_lock();
+	sk = (*lookup)(skb, uh->source, uh->dest);
+	if (!sk)
+		goto out_unlock;
+
+	if (udp_sk(sk)->gro_enabled) {
+		pp = call_gro_receive(udp_gro_receive_segment, head, skb);
+		rcu_read_unlock();
+		return pp;
+	}
+
 	if (NAPI_GRO_CB(skb)->encap_mark ||
 	    (skb->ip_summed != CHECKSUM_PARTIAL &&
 	     NAPI_GRO_CB(skb)->csum_cnt == 0 &&
-	     !NAPI_GRO_CB(skb)->csum_valid))
-		goto out;
+	     !NAPI_GRO_CB(skb)->csum_valid) ||
+	    !udp_sk(sk)->gro_receive)
+		goto out_unlock;
 
 	/* mark that this skb passed once through the tunnel gro layer */
 	NAPI_GRO_CB(skb)->encap_mark = 1;
 
-	rcu_read_lock();
-	sk = (*lookup)(skb, uh->source, uh->dest);
-
-	if (sk && udp_sk(sk)->gro_receive)
-		goto unflush;
-	goto out_unlock;
-
-unflush:
 	flush = 0;
 
 	list_for_each_entry(p, head, list) {
@@ -394,7 +446,6 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 
 out_unlock:
 	rcu_read_unlock();
-out:
 	skb_gro_flush_final(skb, pp, flush);
 	return pp;
 }
@@ -427,6 +478,19 @@ static struct sk_buff *udp4_gro_receive(struct list_head *head,
 	return NULL;
 }
 
+static int udp_gro_complete_segment(struct sk_buff *skb)
+{
+	struct udphdr *uh = udp_hdr(skb);
+
+	skb->csum_start = (unsigned char *)uh - skb->head;
+	skb->csum_offset = offsetof(struct udphdr, check);
+	skb->ip_summed = CHECKSUM_PARTIAL;
+
+	skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
+	skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_L4;
+	return 0;
+}
+
 int udp_gro_complete(struct sk_buff *skb, int nhoff,
 		     udp_lookup_t lookup)
 {
@@ -437,16 +501,21 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
 
 	uh->len = newlen;
 
-	/* Set encapsulation before calling into inner gro_complete() functions
-	 * to make them set up the inner offsets.
-	 */
-	skb->encapsulation = 1;
-
 	rcu_read_lock();
 	sk = (*lookup)(skb, uh->source, uh->dest);
-	if (sk && udp_sk(sk)->gro_complete)
+	if (sk && udp_sk(sk)->gro_enabled) {
+		err = udp_gro_complete_segment(skb);
+	} else if (sk && udp_sk(sk)->gro_complete) {
+		skb_shinfo(skb)->gso_type = uh->check ? SKB_GSO_UDP_TUNNEL_CSUM
+					: SKB_GSO_UDP_TUNNEL;
+
+		/* Set encapsulation before calling into inner gro_complete()
+		 * functions to make them set up the inner offsets.
+		 */
+		skb->encapsulation = 1;
 		err = udp_sk(sk)->gro_complete(sk, skb,
 				nhoff + sizeof(struct udphdr));
+	}
 	rcu_read_unlock();
 
 	if (skb->remcsum_offload)
@@ -461,13 +530,9 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 	const struct iphdr *iph = ip_hdr(skb);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
-	if (uh->check) {
-		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
+	if (uh->check)
 		uh->check = ~udp_v4_check(skb->len - nhoff, iph->saddr,
 					  iph->daddr, 0);
-	} else {
-		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
-	}
 
 	return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
 }
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 1b8e161ac527..828b2457f97b 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -147,13 +147,9 @@ static int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
-	if (uh->check) {
-		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
+	if (uh->check)
 		uh->check = ~udp_v6_check(skb->len - nhoff, &ipv6h->saddr,
 					  &ipv6h->daddr, 0);
-	} else {
-		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
-	}
 
 	return udp_gro_complete(skb, nhoff, udp6_lib_lookup_skb);
 }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v2 03/10] udp: add support for UDP_GRO cmsg
  2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
  2018-10-19 14:25 ` [RFC PATCH v2 01/10] udp: implement complete book-keeping for encap_needed Paolo Abeni
  2018-10-19 14:25 ` [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets Paolo Abeni
@ 2018-10-19 14:25 ` Paolo Abeni
  2018-10-21 20:07   ` Willem de Bruijn
  2018-10-19 14:25 ` [RFC PATCH v2 04/10] ip: factor out protocol delivery helper Paolo Abeni
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert

When UDP GRO is enabled, the UDP_GRO cmsg will carry the ingress
datagram size. User-space can use such info to compute the original
packets layout.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
CHECK: should we use a separate setsockopt to explicitly enable
gso_size cmsg reception? So that user space can enable UDP_GRO and
fetch cmsg without forcefully receiving GRO related info.
---
 include/linux/udp.h | 11 +++++++++++
 net/ipv4/udp.c      |  4 ++++
 net/ipv6/udp.c      |  3 +++
 3 files changed, 18 insertions(+)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index f613b329852e..e23d5024f42f 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -121,6 +121,17 @@ static inline bool udp_get_no_check6_rx(struct sock *sk)
 	return udp_sk(sk)->no_check6_rx;
 }
 
+static inline void udp_cmsg_recv(struct msghdr *msg, struct sock *sk,
+				 struct sk_buff *skb)
+{
+	int gso_size;
+
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) {
+		gso_size = skb_shinfo(skb)->gso_size;
+		put_cmsg(msg, SOL_UDP, UDP_GRO, sizeof(gso_size), &gso_size);
+	}
+}
+
 #define udp_portaddr_for_each_entry(__sk, list) \
 	hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node)
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 3c277378814f..2331ac9de954 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1714,6 +1714,10 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
 		memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
 		*addr_len = sizeof(*sin);
 	}
+
+	if (udp_sk(sk)->gro_enabled)
+		udp_cmsg_recv(msg, sk, skb);
+
 	if (inet->cmsg_flags)
 		ip_cmsg_recv_offset(msg, sk, skb, sizeof(struct udphdr), off);
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8bb50ba32a6f..05f723b98cab 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -421,6 +421,9 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		*addr_len = sizeof(*sin6);
 	}
 
+	if (udp_sk(sk)->gro_enabled)
+		udp_cmsg_recv(msg, sk, skb);
+
 	if (np->rxopt.all)
 		ip6_datagram_recv_common_ctl(sk, msg, skb);
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v2 04/10] ip: factor out protocol delivery helper
  2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
                   ` (2 preceding siblings ...)
  2018-10-19 14:25 ` [RFC PATCH v2 03/10] udp: add support for UDP_GRO cmsg Paolo Abeni
@ 2018-10-19 14:25 ` Paolo Abeni
  2018-10-19 14:25 ` [RFC PATCH v2 05/10] ipv6: " Paolo Abeni
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert

So that we can re-use it at the UDP lavel in a later patch

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/ip_input.c | 73 ++++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 37 deletions(-)

diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 35a786c0aaa0..72250b4e466d 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -188,51 +188,50 @@ bool ip_call_ra_chain(struct sk_buff *skb)
 	return false;
 }
 
-static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int protocol)
 {
-	__skb_pull(skb, skb_network_header_len(skb));
-
-	rcu_read_lock();
-	{
-		int protocol = ip_hdr(skb)->protocol;
-		const struct net_protocol *ipprot;
-		int raw;
+	const struct net_protocol *ipprot;
+	int raw, ret;
 
-	resubmit:
-		raw = raw_local_deliver(skb, protocol);
+resubmit:
+	raw = raw_local_deliver(skb, protocol);
 
-		ipprot = rcu_dereference(inet_protos[protocol]);
-		if (ipprot) {
-			int ret;
-
-			if (!ipprot->no_policy) {
-				if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-					kfree_skb(skb);
-					goto out;
-				}
-				nf_reset(skb);
+	ipprot = rcu_dereference(inet_protos[protocol]);
+	if (ipprot) {
+		if (!ipprot->no_policy) {
+			if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+				kfree_skb(skb);
+				return;
 			}
-			ret = ipprot->handler(skb);
-			if (ret < 0) {
-				protocol = -ret;
-				goto resubmit;
+			nf_reset(skb);
+		}
+		ret = ipprot->handler(skb);
+		if (ret < 0) {
+			protocol = -ret;
+			goto resubmit;
+		}
+		__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
+	} else {
+		if (!raw) {
+			if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+				__IP_INC_STATS(net, IPSTATS_MIB_INUNKNOWNPROTOS);
+				icmp_send(skb, ICMP_DEST_UNREACH,
+					  ICMP_PROT_UNREACH, 0);
 			}
-			__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
+			kfree_skb(skb);
 		} else {
-			if (!raw) {
-				if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-					__IP_INC_STATS(net, IPSTATS_MIB_INUNKNOWNPROTOS);
-					icmp_send(skb, ICMP_DEST_UNREACH,
-						  ICMP_PROT_UNREACH, 0);
-				}
-				kfree_skb(skb);
-			} else {
-				__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
-				consume_skb(skb);
-			}
+			__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
+			consume_skb(skb);
 		}
 	}
- out:
+}
+
+static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+{
+	__skb_pull(skb, skb_network_header_len(skb));
+
+	rcu_read_lock();
+	ip_protocol_deliver_rcu(net, skb, ip_hdr(skb)->protocol);
 	rcu_read_unlock();
 
 	return 0;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v2 05/10] ipv6: factor out protocol delivery helper
  2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
                   ` (3 preceding siblings ...)
  2018-10-19 14:25 ` [RFC PATCH v2 04/10] ip: factor out protocol delivery helper Paolo Abeni
@ 2018-10-19 14:25 ` Paolo Abeni
  2018-10-19 14:25 ` [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection Paolo Abeni
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert

So that we can re-use it at the UDP lavel in the next patch

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv6/ip6_input.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 96577e742afd..3065226bdc57 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -319,28 +319,26 @@ void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
 /*
  *	Deliver the packet to the host
  */
-
-
-static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
+			      bool have_final)
 {
 	const struct inet6_protocol *ipprot;
 	struct inet6_dev *idev;
 	unsigned int nhoff;
-	int nexthdr;
 	bool raw;
-	bool have_final = false;
 
 	/*
 	 *	Parse extension headers
 	 */
 
-	rcu_read_lock();
 resubmit:
 	idev = ip6_dst_idev(skb_dst(skb));
-	if (!pskb_pull(skb, skb_transport_offset(skb)))
-		goto discard;
 	nhoff = IP6CB(skb)->nhoff;
-	nexthdr = skb_network_header(skb)[nhoff];
+	if (!have_final) {
+		if (!pskb_pull(skb, skb_transport_offset(skb)))
+			goto discard;
+		nexthdr = skb_network_header(skb)[nhoff];
+	}
 
 resubmit_final:
 	raw = raw6_local_deliver(skb, nexthdr);
@@ -411,13 +409,19 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
 			consume_skb(skb);
 		}
 	}
-	rcu_read_unlock();
-	return 0;
+	return;
 
 discard:
 	__IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS);
-	rcu_read_unlock();
 	kfree_skb(skb);
+}
+
+static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+{
+	rcu_read_lock();
+	ip6_protocol_deliver_rcu(net, skb, 0, false);
+	rcu_read_unlock();
+
 	return 0;
 }
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection
  2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
                   ` (4 preceding siblings ...)
  2018-10-19 14:25 ` [RFC PATCH v2 05/10] ipv6: " Paolo Abeni
@ 2018-10-19 14:25 ` Paolo Abeni
  2018-10-21 20:08   ` Willem de Bruijn
                     ` (2 more replies)
  2018-10-19 14:25 ` [RFC PATCH v2 07/10] selftests: add GRO support to udp bench rx program Paolo Abeni
                   ` (5 subsequent siblings)
  11 siblings, 3 replies; 41+ messages in thread
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert

In some scenarios, the GRO engine can assemble an UDP GRO packet
that ultimately lands on a non GRO-enabled socket.
This patch tries to address the issue explicitly checking for the UDP
socket features before enqueuing the packet, and eventually segmenting
the unexpected GRO packet, as needed.

We must also cope with re-insertion requests: after segmentation the
UDP code calls the helper introduced by the previous patches, as needed.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/udp.h | 23 +++++++++++++++++++++++
 net/ipv4/udp.c      | 25 ++++++++++++++++++++++++-
 net/ipv6/udp.c      | 27 ++++++++++++++++++++++++++-
 3 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index e23d5024f42f..19bcb396cd1b 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -132,6 +132,29 @@ static inline void udp_cmsg_recv(struct msghdr *msg, struct sock *sk,
 	}
 }
 
+static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb)
+{
+	return !udp_sk(sk)->gro_enabled && skb_is_gso(skb) &&
+	       skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4;
+}
+
+static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
+					      struct sk_buff *skb)
+{
+	struct sk_buff *segs;
+
+	/* the GSO CB lays after the UDP one, no need to save and restore any
+	 * CB fragment, just initialize it
+	 */
+	segs = __skb_gso_segment(skb, NETIF_F_SG, false);
+	if (unlikely(IS_ERR(segs)))
+		kfree_skb(skb);
+	else if (segs)
+		consume_skb(skb);
+	return segs;
+}
+
+
 #define udp_portaddr_for_each_entry(__sk, list) \
 	hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node)
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 2331ac9de954..0d55145ce9f5 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1909,7 +1909,7 @@ EXPORT_SYMBOL(udp_encap_enable);
  * Note that in the success and error cases, the skb is assumed to
  * have either been requeued or freed.
  */
-static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct udp_sock *up = udp_sk(sk);
 	int is_udplite = IS_UDPLITE(sk);
@@ -2012,6 +2012,29 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	return -1;
 }
 
+void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int proto);
+
+static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+{
+	struct sk_buff *next, *segs;
+	int ret;
+
+	if (likely(!udp_unexpected_gso(sk, skb)))
+		return udp_queue_rcv_one_skb(sk, skb);
+
+	BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_SGO_CB_OFFSET);
+	__skb_push(skb, -skb_mac_offset(skb));
+	segs = udp_rcv_segment(sk, skb);
+	for (skb = segs; skb; skb = next) {
+		next = skb->next;
+		__skb_pull(skb, skb_transport_offset(skb));
+		ret = udp_queue_rcv_one_skb(sk, skb);
+		if (ret > 0)
+			ip_protocol_deliver_rcu(dev_net(skb->dev), skb, -ret);
+	}
+	return 0;
+}
+
 /* For TCP sockets, sk_rx_dst is protected by socket lock
  * For UDP, we use xchg() to guard against concurrent changes.
  */
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 05f723b98cab..d892c064657c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -558,7 +558,7 @@ void udpv6_encap_enable(void)
 }
 EXPORT_SYMBOL(udpv6_encap_enable);
 
-static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+static int udpv6_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct udp_sock *up = udp_sk(sk);
 	int is_udplite = IS_UDPLITE(sk);
@@ -641,6 +641,31 @@ static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	return -1;
 }
 
+void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
+			      bool have_final);
+
+static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+{
+	struct sk_buff *next, *segs;
+	int ret;
+
+	if (likely(!udp_unexpected_gso(sk, skb)))
+		return udpv6_queue_rcv_one_skb(sk, skb);
+
+	__skb_push(skb, -skb_mac_offset(skb));
+	segs = udp_rcv_segment(sk, skb);
+	for (skb = segs; skb; skb = next) {
+		next = skb->next;
+		__skb_pull(skb, skb_transport_offset(skb));
+
+		ret = udpv6_queue_rcv_one_skb(sk, skb);
+		if (ret > 0)
+			ip6_protocol_deliver_rcu(dev_net(skb->dev), skb, ret,
+						 true);
+	}
+	return 0;
+}
+
 static bool __udp_v6_is_mcast_sock(struct net *net, struct sock *sk,
 				   __be16 loc_port, const struct in6_addr *loc_addr,
 				   __be16 rmt_port, const struct in6_addr *rmt_addr,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v2 07/10] selftests: add GRO support to udp bench rx program
  2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
                   ` (5 preceding siblings ...)
  2018-10-19 14:25 ` [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection Paolo Abeni
@ 2018-10-19 14:25 ` Paolo Abeni
  2018-10-21 20:08   ` Willem de Bruijn
  2018-10-19 14:25 ` [RFC PATCH v2 08/10] selftests: conditionally enable XDP support in udpgso_bench_rx Paolo Abeni
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert

And fix a couple of buglets (port option processing,
clean termination on SIGINT). This is preparatory work
for GRO tests.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/udpgso_bench_rx.c | 37 +++++++++++++++----
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index 727cf67a3f75..c55acdd1e27b 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -31,9 +31,15 @@
 #include <sys/wait.h>
 #include <unistd.h>
 
+#ifndef UDP_GRO
+#define UDP_GRO		104
+#endif
+
 static int  cfg_port		= 8000;
 static bool cfg_tcp;
 static bool cfg_verify;
+static bool cfg_read_all;
+static bool cfg_gro_segment;
 
 static bool interrupted;
 static unsigned long packets, bytes;
@@ -63,6 +69,8 @@ static void do_poll(int fd)
 
 	do {
 		ret = poll(&pfd, 1, 10);
+		if (interrupted)
+			break;
 		if (ret == -1)
 			error(1, errno, "poll");
 		if (ret == 0)
@@ -70,7 +78,7 @@ static void do_poll(int fd)
 		if (pfd.revents != POLLIN)
 			error(1, errno, "poll: 0x%x expected 0x%x\n",
 					pfd.revents, POLLIN);
-	} while (!ret && !interrupted);
+	} while (!ret);
 }
 
 static int do_socket(bool do_tcp)
@@ -102,6 +110,8 @@ static int do_socket(bool do_tcp)
 			error(1, errno, "listen");
 
 		do_poll(accept_fd);
+		if (interrupted)
+			exit(0);
 
 		fd = accept(accept_fd, NULL, NULL);
 		if (fd == -1)
@@ -167,10 +177,10 @@ static void do_verify_udp(const char *data, int len)
 /* Flush all outstanding datagrams. Verify first few bytes of each. */
 static void do_flush_udp(int fd)
 {
-	static char rbuf[ETH_DATA_LEN];
+	static char rbuf[65535];
 	int ret, len, budget = 256;
 
-	len = cfg_verify ? sizeof(rbuf) : 0;
+	len = cfg_read_all ? sizeof(rbuf) : 0;
 	while (budget--) {
 		/* MSG_TRUNC will make return value full datagram length */
 		ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
@@ -178,7 +188,7 @@ static void do_flush_udp(int fd)
 			return;
 		if (ret == -1)
 			error(1, errno, "recv");
-		if (len) {
+		if (len && cfg_verify) {
 			if (ret == 0)
 				error(1, errno, "recv: 0 byte datagram\n");
 
@@ -192,23 +202,30 @@ static void do_flush_udp(int fd)
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-tv] [-p port]", filepath);
+	error(1, 0, "Usage: %s [-Grtv] [-p port]", filepath);
 }
 
 static void parse_opts(int argc, char **argv)
 {
 	int c;
 
-	while ((c = getopt(argc, argv, "ptv")) != -1) {
+	while ((c = getopt(argc, argv, "Gp:rtvx:")) != -1) {
 		switch (c) {
+		case 'G':
+			cfg_gro_segment = true;
+			break;
 		case 'p':
-			cfg_port = htons(strtoul(optarg, NULL, 0));
+			cfg_port = strtoul(optarg, NULL, 0);
+			break;
+		case 'r':
+			cfg_read_all = true;
 			break;
 		case 't':
 			cfg_tcp = true;
 			break;
 		case 'v':
 			cfg_verify = true;
+			cfg_read_all = true;
 			break;
 		}
 	}
@@ -227,6 +244,12 @@ static void do_recv(void)
 
 	fd = do_socket(cfg_tcp);
 
+	if (cfg_gro_segment && !cfg_tcp) {
+		int val = 1;
+		if (setsockopt(fd, IPPROTO_UDP, UDP_GRO, &val, sizeof(val)))
+			error(1, errno, "setsockopt UDP_GRO");
+	}
+
 	treport = gettimeofday_ms() + 1000;
 	do {
 		do_poll(fd);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v2 08/10] selftests: conditionally enable XDP support in udpgso_bench_rx
  2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
                   ` (6 preceding siblings ...)
  2018-10-19 14:25 ` [RFC PATCH v2 07/10] selftests: add GRO support to udp bench rx program Paolo Abeni
@ 2018-10-19 14:25 ` Paolo Abeni
  2018-10-21 20:09   ` Willem de Bruijn
  2018-10-19 14:25 ` [RFC PATCH v2 09/10] selftests: add some benchmark for UDP GRO Paolo Abeni
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert

XDP support will be used by a later patch to test the GRO path
in a net namespace, leveraging the veth XDP implementation.
To avoid breaking existing setup, XDP support is conditionally
enabled and build only if llc is locally available.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/Makefile          | 69 +++++++++++++++++++
 tools/testing/selftests/net/udpgso_bench_rx.c | 37 ++++++++++
 tools/testing/selftests/net/xdp_dummy.c       | 13 ++++
 3 files changed, 119 insertions(+)
 create mode 100644 tools/testing/selftests/net/xdp_dummy.c

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 256d82d5fa87..176459b7c4d6 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -16,8 +16,77 @@ TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
 TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
 
 KSFT_KHDR_INSTALL := 1
+
+# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
+#  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
+LLC ?= llc
+CLANG ?= clang
+LLVM_OBJCOPY ?= llvm-objcopy
+BTF_PAHOLE ?= pahole
+HAS_LLC := $(shell which $(LLC) 2>/dev/null)
+
+# conditional enable testes requiring llc
+ifneq (, $(HAS_LLC))
+TEST_GEN_FILES += xdp_dummy.o
+endif
+
 include ../lib.mk
 
+ifneq (, $(HAS_LLC))
+
+# Detect that we're cross compiling and use the cross compiler
+ifdef CROSS_COMPILE
+CLANG_ARCH_ARGS = -target $(ARCH)
+endif
+
+PROBE := $(shell $(LLC) -march=bpf -mcpu=probe -filetype=null /dev/null 2>&1)
+
+# Let newer LLVM versions transparently probe the kernel for availability
+# of full BPF instruction set.
+ifeq ($(PROBE),)
+  CPU ?= probe
+else
+  CPU ?= generic
+endif
+
+SRC_PATH := $(abspath ../../../..)
+LIB_PATH := $(SRC_PATH)/tools/lib
+XDP_CFLAGS := -D SUPPORT_XDP=1 -I$(LIB_PATH)
+LIBBPF = $(LIB_PATH)/bpf/libbpf.a
+BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
+BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
+BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 'usage.*llvm')
+CLANG_SYS_INCLUDES := $(shell $(CLANG) -v -E - </dev/null 2>&1 \
+        | sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }')
+CLANG_FLAGS = -I. -I$(SRC_PATH)/include -I../bpf/ \
+	      $(CLANG_SYS_INCLUDES) -Wno-compare-distinct-pointer-types
+
+ifneq ($(and $(BTF_LLC_PROBE),$(BTF_PAHOLE_PROBE),$(BTF_OBJCOPY_PROBE)),)
+	CLANG_CFLAGS += -g
+	LLC_FLAGS += -mattr=dwarfris
+	DWARF2BTF = y
+endif
+
+$(LIBBPF): FORCE
+# Fix up variables inherited from Kbuild that tools/ build system won't like
+	$(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(SRC_PATH) O= $(nodir $@)
+
+$(OUTPUT)/udpgso_bench_rx: $(OUTPUT)/udpgso_bench_rx.c $(LIBBPF)
+	$(CC) -o $@ $(XDP_CFLAGS) $(CFLAGS) $(LOADLIBES) $(LDLIBS) $^ -lelf
+
+FORCE:
+
+# bpf program[s] generation
+$(OUTPUT)/%.o: %.c
+	$(CLANG) $(CLANG_FLAGS) \
+		 -O2 -target bpf -emit-llvm -c $< -o - |      \
+	$(LLC) -march=bpf -mcpu=$(CPU) $(LLC_FLAGS) -filetype=obj -o $@
+ifeq ($(DWARF2BTF),y)
+	$(BTF_PAHOLE) -J $@
+endif
+
+endif
+
 $(OUTPUT)/reuseport_bpf_numa: LDFLAGS += -lnuma
 $(OUTPUT)/tcp_mmap: LDFLAGS += -lpthread
 $(OUTPUT)/tcp_inq: LDFLAGS += -lpthread
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index c55acdd1e27b..84f101852805 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -31,6 +31,10 @@
 #include <sys/wait.h>
 #include <unistd.h>
 
+#ifdef SUPPORT_XDP
+#include "bpf/libbpf.h"
+#endif
+
 #ifndef UDP_GRO
 #define UDP_GRO		104
 #endif
@@ -40,6 +44,9 @@ static bool cfg_tcp;
 static bool cfg_verify;
 static bool cfg_read_all;
 static bool cfg_gro_segment;
+#ifdef SUPPORT_XDP
+static int cfg_xdp_iface;
+#endif
 
 static bool interrupted;
 static unsigned long packets, bytes;
@@ -227,6 +234,13 @@ static void parse_opts(int argc, char **argv)
 			cfg_verify = true;
 			cfg_read_all = true;
 			break;
+#ifdef SUPPORT_XDP
+		case 'x':
+			cfg_xdp_iface = if_nametoindex(optarg);
+			if (!cfg_xdp_iface)
+				error(1, errno, "unknown interface %s", optarg);
+			break;
+#endif
 		}
 	}
 
@@ -240,6 +254,9 @@ static void parse_opts(int argc, char **argv)
 static void do_recv(void)
 {
 	unsigned long tnow, treport;
+#ifdef SUPPORT_XDP
+	int prog_fd = -1;
+#endif
 	int fd;
 
 	fd = do_socket(cfg_tcp);
@@ -250,6 +267,22 @@ static void do_recv(void)
 			error(1, errno, "setsockopt UDP_GRO");
 	}
 
+#ifdef SUPPORT_XDP
+	if (cfg_xdp_iface) {
+		struct bpf_prog_load_attr prog_load_attr = {
+			.prog_type	= BPF_PROG_TYPE_XDP,
+			.file 		= "xdp_dummy.o",
+		};
+		struct bpf_object *obj;
+
+		if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+			error(1, errno, "xdp program load failed\n");
+
+		if (bpf_set_link_xdp_fd(cfg_xdp_iface, prog_fd, 0) < 0)
+			error(1, errno, "link set xdp fd failed\n");
+	}
+#endif
+
 	treport = gettimeofday_ms() + 1000;
 	do {
 		do_poll(fd);
@@ -274,6 +307,10 @@ static void do_recv(void)
 
 	if (close(fd))
 		error(1, errno, "close");
+#ifdef SUPPORT_XDP
+	if (cfg_xdp_iface && bpf_set_link_xdp_fd(cfg_xdp_iface, -1, 0))
+		error(1, errno, "removing xdp program");
+#endif
 }
 
 int main(int argc, char **argv)
diff --git a/tools/testing/selftests/net/xdp_dummy.c b/tools/testing/selftests/net/xdp_dummy.c
new file mode 100644
index 000000000000..1a64cf5099ed
--- /dev/null
+++ b/tools/testing/selftests/net/xdp_dummy.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define KBUILD_MODNAME "xdp_dummy"
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+SEC("xdp_dummy")
+int xdp_dummy_prog(struct xdp_md *ctx)
+{
+	return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v2 09/10] selftests: add some benchmark for UDP GRO
  2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
                   ` (7 preceding siblings ...)
  2018-10-19 14:25 ` [RFC PATCH v2 08/10] selftests: conditionally enable XDP support in udpgso_bench_rx Paolo Abeni
@ 2018-10-19 14:25 ` Paolo Abeni
  2018-10-19 14:25 ` [RFC PATCH v2 10/10] selftests: add functionals test " Paolo Abeni
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert

Run on top of veth pair, using a dummy XDP program to enable the GRO.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/Makefile        |  1 +
 tools/testing/selftests/net/udpgro_bench.sh | 92 +++++++++++++++++++++
 2 files changed, 93 insertions(+)
 create mode 100755 tools/testing/selftests/net/udpgro_bench.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 176459b7c4d6..ac999354af54 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -7,6 +7,7 @@ CFLAGS += -I../../../../usr/include/
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
 TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
+TEST_PROGS += udpgro_bench.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/udpgro_bench.sh b/tools/testing/selftests/net/udpgro_bench.sh
new file mode 100755
index 000000000000..03d37e5e7424
--- /dev/null
+++ b/tools/testing/selftests/net/udpgro_bench.sh
@@ -0,0 +1,92 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run a series of udpgro benchmarks
+
+readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
+
+cleanup() {
+	local -r jobs="$(jobs -p)"
+	local -r ns="$(ip netns list|grep $PEER_NS)"
+
+	[ -n "${jobs}" ] && kill -INT ${jobs} 2>/dev/null
+	[ -n "$ns" ] && ip netns del $ns 2>/dev/null
+}
+trap cleanup EXIT
+
+run_one() {
+	# use 'rx' as separator between sender args and receiver args
+	local -r all="$@"
+	local -r tx_args=${all%rx*}
+	local -r rx_args=${all#*rx}
+
+	ip netns add "${PEER_NS}"
+	ip -netns "${PEER_NS}" link set lo up
+	ip link add type veth
+	ip link set dev veth0 up
+	ip addr add dev veth0 192.168.1.2/24
+	ip addr add dev veth0 2001:db8::2/64 nodad
+
+	ip link set dev veth1 netns "${PEER_NS}"
+	ip -netns "${PEER_NS}" addr add dev veth1 192.168.1.1/24
+	ip -netns "${PEER_NS}" addr add dev veth1 2001:db8::1/64 nodad
+	ip -netns "${PEER_NS}" link set dev veth1 up
+
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} -r -x veth1 &
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx -t ${rx_args} -r &
+
+	# Hack: let bg programs complete the startup
+	sleep 0.1
+	./udpgso_bench_tx ${tx_args}
+}
+
+run_in_netns() {
+	local -r args=$@
+
+	./in_netns.sh $0 __subprocess ${args}
+}
+
+run_udp() {
+	local -r args=$@
+
+	echo "udp gso - over veth touching data"
+	run_in_netns ${args} -S rx
+
+	echo "udp gso and gro - over veth touching data"
+	run_in_netns ${args} -S rx -G
+}
+
+run_tcp() {
+	local -r args=$@
+
+	echo "tcp - over veth touching data"
+	run_in_netns ${args} -t rx
+}
+
+run_all() {
+	local -r core_args="-l 4"
+	local -r ipv4_args="${core_args} -4 -D 192.168.1.1"
+	local -r ipv6_args="${core_args} -6 -D 2001:db8::1"
+
+	echo "ipv4"
+	run_tcp "${ipv4_args}"
+	run_udp "${ipv4_args}"
+
+	echo "ipv6"
+	run_tcp "${ipv4_args}"
+	run_udp "${ipv6_args}"
+}
+
+if [ ! -f xdp_dummy.o ]; then
+	echo "Skipping GRO benchmarks - missing LLC"
+	exit 0
+fi
+
+if [[ $# -eq 0 ]]; then
+	run_all
+elif [[ $1 == "__subprocess" ]]; then
+	shift
+	run_one $@
+else
+	run_in_netns $@
+fi
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v2 10/10] selftests: add functionals test for UDP GRO
  2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
                   ` (8 preceding siblings ...)
  2018-10-19 14:25 ` [RFC PATCH v2 09/10] selftests: add some benchmark for UDP GRO Paolo Abeni
@ 2018-10-19 14:25 ` Paolo Abeni
  2018-10-21 20:09   ` Willem de Bruijn
  2018-10-21 20:05 ` [RFC PATCH v2 00/10] udp: implement GRO support Willem de Bruijn
  2018-10-23 12:10 ` Steffen Klassert
  11 siblings, 1 reply; 41+ messages in thread
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert

Extends the existing udp programs to allow checking for proper
GRO aggregation/GSO size, and run the tests via a shell script, using
a veth pair with XDP program attached to trigger the GRO code path.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/Makefile          |   2 +-
 tools/testing/selftests/net/udpgro.sh         | 144 ++++++++++++++++++
 tools/testing/selftests/net/udpgro_bench.sh   |   8 +-
 tools/testing/selftests/net/udpgso_bench.sh   |   2 +-
 tools/testing/selftests/net/udpgso_bench_rx.c | 125 +++++++++++++--
 tools/testing/selftests/net/udpgso_bench_tx.c |  22 ++-
 6 files changed, 281 insertions(+), 22 deletions(-)
 create mode 100755 tools/testing/selftests/net/udpgro.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index ac999354af54..a8a0d256aafb 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -7,7 +7,7 @@ CFLAGS += -I../../../../usr/include/
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
 TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
-TEST_PROGS += udpgro_bench.sh
+TEST_PROGS += udpgro_bench.sh udpgro.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh
new file mode 100755
index 000000000000..eb380c7babf0
--- /dev/null
+++ b/tools/testing/selftests/net/udpgro.sh
@@ -0,0 +1,144 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run a series of udpgro functional tests.
+
+readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
+
+cleanup() {
+	local -r jobs="$(jobs -p)"
+	local -r ns="$(ip netns list|grep $PEER_NS)"
+
+	[ -n "${jobs}" ] && kill -1 ${jobs} 2>/dev/null
+	[ -n "$ns" ] && ip netns del $ns 2>/dev/null
+}
+trap cleanup EXIT
+
+cfg_veth() {
+	ip netns add "${PEER_NS}"
+	ip -netns "${PEER_NS}" link set lo up
+	ip link add type veth
+	ip link set dev veth0 up
+	ip addr add dev veth0 192.168.1.2/24
+	ip addr add dev veth0 2001:db8::2/64 nodad
+
+	ip link set dev veth1 netns "${PEER_NS}"
+	ip -netns "${PEER_NS}" addr add dev veth1 192.168.1.1/24
+	ip -netns "${PEER_NS}" addr add dev veth1 2001:db8::1/64 nodad
+	ip -netns "${PEER_NS}" link set dev veth1 up
+}
+
+run_one() {
+	# use 'rx' as separator between sender args and receiver args
+	local -r all="$@"
+	local -r tx_args=${all%rx*}
+	local -r rx_args=${all#*rx}
+
+	cfg_veth
+
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} && \
+		echo "ok" || \
+		echo "failed" &
+
+	# Hack: let bg programs complete the startup
+	sleep 0.1
+	./udpgso_bench_tx ${tx_args}
+	wait $(jobs -p)
+}
+
+run_test() {
+	local -r args=$@
+
+	printf " %-40s" "$1"
+	./in_netns.sh $0 __subprocess $2 rx -G -r -x veth1 $3
+}
+
+run_one_nat() {
+	# use 'rx' as separator between sender args and receiver args
+	local addr1 addr2 pid family="" ipt_cmd=ip6tables
+	local -r all="$@"
+	local -r tx_args=${all%rx*}
+	local -r rx_args=${all#*rx}
+
+	if [[ ${tx_args} = *-4* ]]; then
+		ipt_cmd=iptables
+		family=-4
+		addr1=192.168.1.1
+		addr2=192.168.1.3/24
+	else
+		addr1=2001:db8::1
+		addr2="2001:db8::3/64 nodad"
+	fi
+
+	cfg_veth
+	ip -netns "${PEER_NS}" addr add dev veth1 ${addr2}
+
+	# fool the GRO engine changing the destination address ...
+	ip netns exec "${PEER_NS}" $ipt_cmd -t nat -I PREROUTING -d ${addr1} -j DNAT --to-destination ${addr2%/*}
+
+	# ... so that GRO will match the UDP_GRO enabled socket, but packets
+	# will land on the 'plain' one
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx -G ${family} -x veth1 -b ${addr1} -n 0 &
+	pid=$!
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${family} -b ${addr2%/*} ${rx_args} && \
+		echo "ok" || \
+		echo "failed"&
+
+	sleep 0.1
+	./udpgso_bench_tx ${tx_args}
+	kill -INT $pid
+	wait $(jobs -p)
+}
+
+run_nat_test() {
+	local -r args=$@
+
+	printf " %-40s" "$1"
+	./in_netns.sh $0 __subprocess_nat $2 rx -r $3
+}
+
+run_all() {
+	local -r core_args="-l 4"
+	local -r ipv4_args="${core_args} -4 -D 192.168.1.1"
+	local -r ipv6_args="${core_args} -6 -D 2001:db8::1"
+
+	echo "ipv4"
+	run_test "no GRO" "${ipv4_args} -M 10 -s 1400" "-4 -n 10 -l 1400"
+	run_test "no GRO chk cmsg" "${ipv4_args} -M 10 -s 1400" "-4 -n 10 -l 1400 -S -1"
+
+	# the GSO packets are aggregated because:
+	# * veth schedule napi after each xmit
+	# * segmentation happens in BH context, veth napi poll is delayed after
+	#   the transmission of the last segment
+	run_test "GRO" "${ipv4_args} -M 1 -s 14720 -S 0 " "-4 -n 1 -l 14720"
+	run_test "GRO chk cmsg" "${ipv4_args} -M 1 -s 14720 -S 0 " "-4 -n 1 -l 14720 -S 1472"
+	run_test "GRO with custom segment size" "${ipv4_args} -M 1 -s 14720 -S 500 " "-4 -n 1 -l 14720"
+	run_test "GRO with custom segment size cmsg" "${ipv4_args} -M 1 -s 14720 -S 500 " "-4 -n 1 -l 14720 -S 500"
+
+	run_nat_test "bad GRO lookup" "${ipv4_args} -M 1 -s 14720 -S 0" "-n 10 -l 1472"
+
+	echo "ipv6"
+	run_test "no GRO" "${ipv6_args} -M 10 -s 1400" "-n 10 -l 1400"
+	run_test "no GRO chk cmsg" "${ipv6_args} -M 10 -s 1400" "-n 10 -l 1400 -S -1"
+	run_test "GRO" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 1 -l 14520"
+	run_test "GRO chk cmsg" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 1 -l 14520 -S 1452"
+	run_test "GRO with custom segment size" "${ipv6_args} -M 1 -s 14520 -S 500" "-n 1 -l 14520"
+	run_test "GRO with custom segment size cmsg" "${ipv6_args} -M 1 -s 14520 -S 500" "-n 1 -l 14520 -S 500"
+
+	run_nat_test "bad GRO lookup" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 10 -l 1452"
+}
+
+if [ ! -f xdp_dummy.o ]; then
+	echo "Skipping GRO benchmarks - missing LLC"
+	exit 0
+fi
+
+if [[ $# -eq 0 ]]; then
+	run_all
+elif [[ $1 == "__subprocess" ]]; then
+	shift
+	run_one $@
+elif [[ $1 == "__subprocess_nat" ]]; then
+	shift
+	run_one_nat $@
+fi
diff --git a/tools/testing/selftests/net/udpgro_bench.sh b/tools/testing/selftests/net/udpgro_bench.sh
index 03d37e5e7424..77a1fb0ae0bc 100755
--- a/tools/testing/selftests/net/udpgro_bench.sh
+++ b/tools/testing/selftests/net/udpgro_bench.sh
@@ -18,7 +18,9 @@ run_one() {
 	# use 'rx' as separator between sender args and receiver args
 	local -r all="$@"
 	local -r tx_args=${all%rx*}
-	local -r rx_args=${all#*rx}
+	local rx_args=${all#*rx}
+
+	[[ "${tx_args}" == *"-4"* ]] && rx_args="${rx_args} -4"
 
 	ip netns add "${PEER_NS}"
 	ip -netns "${PEER_NS}" link set lo up
@@ -50,10 +52,10 @@ run_udp() {
 	local -r args=$@
 
 	echo "udp gso - over veth touching data"
-	run_in_netns ${args} -S rx
+	run_in_netns ${args} -S 0 rx
 
 	echo "udp gso and gro - over veth touching data"
-	run_in_netns ${args} -S rx -G
+	run_in_netns ${args} -S 0 rx -G
 }
 
 run_tcp() {
diff --git a/tools/testing/selftests/net/udpgso_bench.sh b/tools/testing/selftests/net/udpgso_bench.sh
index 99e537ab5ad9..0f0628613f81 100755
--- a/tools/testing/selftests/net/udpgso_bench.sh
+++ b/tools/testing/selftests/net/udpgso_bench.sh
@@ -34,7 +34,7 @@ run_udp() {
 	run_in_netns ${args}
 
 	echo "udp gso"
-	run_in_netns ${args} -S
+	run_in_netns ${args} -S 0
 }
 
 run_tcp() {
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index 84f101852805..5fc7c4636753 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -44,6 +44,12 @@ static bool cfg_tcp;
 static bool cfg_verify;
 static bool cfg_read_all;
 static bool cfg_gro_segment;
+static int  cfg_family		= PF_INET6;
+static int  cfg_alen 		= sizeof(struct sockaddr_in6);
+static int  cfg_expected_pkt_nr;
+static int  cfg_expected_pkt_len;
+static int  cfg_expected_gso_size;
+static struct sockaddr_storage cfg_bind_addr;
 #ifdef SUPPORT_XDP
 static int cfg_xdp_iface;
 #endif
@@ -57,6 +63,29 @@ static void sigint_handler(int signum)
 		interrupted = true;
 }
 
+static void setup_sockaddr(int domain, const char *str_addr, void *sockaddr)
+{
+	struct sockaddr_in6 *addr6 = (void *) sockaddr;
+	struct sockaddr_in *addr4 = (void *) sockaddr;
+
+	switch (domain) {
+	case PF_INET:
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(cfg_port);
+		if (inet_pton(AF_INET, str_addr, &(addr4->sin_addr)) != 1)
+			error(1, 0, "ipv4 parse error: %s", str_addr);
+		break;
+	case PF_INET6:
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(cfg_port);
+		if (inet_pton(AF_INET6, str_addr, &(addr6->sin6_addr)) != 1)
+			error(1, 0, "ipv6 parse error: %s", str_addr);
+		break;
+	default:
+		error(1, 0, "illegal domain");
+	}
+}
+
 static unsigned long gettimeofday_ms(void)
 {
 	struct timeval tv;
@@ -90,10 +119,9 @@ static void do_poll(int fd)
 
 static int do_socket(bool do_tcp)
 {
-	struct sockaddr_in6 addr = {0};
 	int fd, val;
 
-	fd = socket(PF_INET6, cfg_tcp ? SOCK_STREAM : SOCK_DGRAM, 0);
+	fd = socket(cfg_family, cfg_tcp ? SOCK_STREAM : SOCK_DGRAM, 0);
 	if (fd == -1)
 		error(1, errno, "socket");
 
@@ -104,10 +132,7 @@ static int do_socket(bool do_tcp)
 	if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &val, sizeof(val)))
 		error(1, errno, "setsockopt reuseport");
 
-	addr.sin6_family =	PF_INET6;
-	addr.sin6_port =	htons(cfg_port);
-	addr.sin6_addr =	in6addr_any;
-	if (bind(fd, (void *) &addr, sizeof(addr)))
+	if (bind(fd, (void *)&cfg_bind_addr, cfg_alen))
 		error(1, errno, "bind");
 
 	if (do_tcp) {
@@ -181,52 +206,130 @@ static void do_verify_udp(const char *data, int len)
 	}
 }
 
+static int recv_msg(int fd, char *buf, int len, int *gso_size)
+{
+	char control[CMSG_SPACE(sizeof(uint16_t))] = {0};
+	struct msghdr msg = {0};
+	struct iovec iov = {0};
+	struct cmsghdr *cmsg;
+	uint16_t *gsosizeptr;
+	int ret;
+
+	iov.iov_base = buf;
+	iov.iov_len = len;
+
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+
+	msg.msg_control = control;
+	msg.msg_controllen = sizeof(control);
+
+	*gso_size = -1;
+	ret = recvmsg(fd, &msg, MSG_TRUNC | MSG_DONTWAIT);
+	if (ret != -1) {
+		for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL;
+		     cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+			if (cmsg->cmsg_level == SOL_UDP
+			    && cmsg->cmsg_type == UDP_GRO) {
+				gsosizeptr = (uint16_t *) CMSG_DATA(cmsg);
+				*gso_size = *gsosizeptr;
+				break;
+			}
+		}
+	}
+	return ret;
+}
+
 /* Flush all outstanding datagrams. Verify first few bytes of each. */
 static void do_flush_udp(int fd)
 {
 	static char rbuf[65535];
-	int ret, len, budget = 256;
+	int ret, len, gso_size, budget = 256;
 
 	len = cfg_read_all ? sizeof(rbuf) : 0;
 	while (budget--) {
 		/* MSG_TRUNC will make return value full datagram length */
-		ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
+		if (!cfg_expected_gso_size)
+			ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
+		else
+			ret = recv_msg(fd, rbuf, len, &gso_size);
 		if (ret == -1 && errno == EAGAIN)
-			return;
+			break;
 		if (ret == -1)
 			error(1, errno, "recv");
+		if (cfg_expected_pkt_len && ret != cfg_expected_pkt_len)
+			error(1, 0, "recv: bad packet len, got %d,"
+			      " expected %d\n", ret, cfg_expected_pkt_len);
 		if (len && cfg_verify) {
 			if (ret == 0)
 				error(1, errno, "recv: 0 byte datagram\n");
 
 			do_verify_udp(rbuf, ret);
 		}
+		if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
+			error(1, 0, "recv: bad gso size, got %d, expected %d "
+			      "(-1 == no gso cmsg))\n", gso_size,
+			      cfg_expected_gso_size);
 
 		packets++;
 		bytes += ret;
+		if (cfg_expected_pkt_nr && packets >= cfg_expected_pkt_nr)
+			break;
+	}
+
+	if (cfg_expected_pkt_nr) {
+		/* stop polling and check the received pkts nr */
+		interrupted = true;
+
+		if (packets != cfg_expected_pkt_nr)
+			error(1, 0, "recv: missing packets! got %ld packets, "
+			      "expected %d\n", packets, cfg_expected_pkt_nr);
+		ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
+		if (ret == 0)
+			error(1, 0, "recv: unexpected packets after %d\n",
+			      cfg_expected_pkt_nr);
 	}
 }
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-Grtv] [-p port]", filepath);
+	error(1, 0, "Usage: %s [-Grtv] [-b addr] [-p port] [-l pktlen] [-n packetnr] [-S gsosize]", filepath);
 }
 
 static void parse_opts(int argc, char **argv)
 {
 	int c;
 
-	while ((c = getopt(argc, argv, "Gp:rtvx:")) != -1) {
+	/* bind to any by default */
+	setup_sockaddr(PF_INET6, "::", &cfg_bind_addr);
+	while ((c = getopt(argc, argv, "4b:Gl:n:p:rS:tvx:")) != -1) {
 		switch (c) {
+		case '4':
+			cfg_family = PF_INET;
+			cfg_alen = sizeof(struct sockaddr_in);
+			setup_sockaddr(PF_INET, "0.0.0.0", &cfg_bind_addr);
+			break;
+		case 'b':
+			setup_sockaddr(cfg_family, optarg, &cfg_bind_addr);
+			break;
 		case 'G':
 			cfg_gro_segment = true;
 			break;
+		case 'l':
+			cfg_expected_pkt_len = strtoul(optarg, NULL, 0);
+			break;
+		case 'n':
+			cfg_expected_pkt_nr = strtoul(optarg, NULL, 0);
+			break;
 		case 'p':
 			cfg_port = strtoul(optarg, NULL, 0);
 			break;
 		case 'r':
 			cfg_read_all = true;
 			break;
+		case 'S':
+			cfg_expected_gso_size = strtol(optarg, NULL, 0);
+			break;
 		case 't':
 			cfg_tcp = true;
 			break;
diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c
index e821564053cf..4074538b5df5 100644
--- a/tools/testing/selftests/net/udpgso_bench_tx.c
+++ b/tools/testing/selftests/net/udpgso_bench_tx.c
@@ -52,6 +52,8 @@ static bool	cfg_segment;
 static bool	cfg_sendmmsg;
 static bool	cfg_tcp;
 static bool	cfg_zerocopy;
+static int	cfg_msg_nr;
+static uint16_t	cfg_gso_size;
 
 static socklen_t cfg_alen;
 static struct sockaddr_storage cfg_dst_addr;
@@ -205,14 +207,14 @@ static void send_udp_segment_cmsg(struct cmsghdr *cm)
 
 	cm->cmsg_level = SOL_UDP;
 	cm->cmsg_type = UDP_SEGMENT;
-	cm->cmsg_len = CMSG_LEN(sizeof(cfg_mss));
+	cm->cmsg_len = CMSG_LEN(sizeof(cfg_gso_size));
 	valp = (void *)CMSG_DATA(cm);
-	*valp = cfg_mss;
+	*valp = cfg_gso_size;
 }
 
 static int send_udp_segment(int fd, char *data)
 {
-	char control[CMSG_SPACE(sizeof(cfg_mss))] = {0};
+	char control[CMSG_SPACE(sizeof(cfg_gso_size))] = {0};
 	struct msghdr msg = {0};
 	struct iovec iov = {0};
 	int ret;
@@ -241,7 +243,7 @@ static int send_udp_segment(int fd, char *data)
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-46cmStuz] [-C cpu] [-D dst ip] [-l secs] [-p port] [-s sendsize]",
+	error(1, 0, "Usage: %s [-46cmtuz] [-C cpu] [-D dst ip] [-l secs] [-m messagenr] [-p port] [-s sendsize] [-S gsosize]",
 		    filepath);
 }
 
@@ -250,7 +252,7 @@ static void parse_opts(int argc, char **argv)
 	int max_len, hdrlen;
 	int c;
 
-	while ((c = getopt(argc, argv, "46cC:D:l:mp:s:Stuz")) != -1) {
+	while ((c = getopt(argc, argv, "46cC:D:l:mM:p:s:S:tuz")) != -1) {
 		switch (c) {
 		case '4':
 			if (cfg_family != PF_UNSPEC)
@@ -279,6 +281,9 @@ static void parse_opts(int argc, char **argv)
 		case 'm':
 			cfg_sendmmsg = true;
 			break;
+		case 'M':
+			cfg_msg_nr = strtoul(optarg, NULL, 10);
+			break;
 		case 'p':
 			cfg_port = strtoul(optarg, NULL, 0);
 			break;
@@ -286,6 +291,7 @@ static void parse_opts(int argc, char **argv)
 			cfg_payload_len = strtoul(optarg, NULL, 0);
 			break;
 		case 'S':
+			cfg_gso_size = strtoul(optarg, NULL, 0);
 			cfg_segment = true;
 			break;
 		case 't':
@@ -317,6 +323,8 @@ static void parse_opts(int argc, char **argv)
 
 	cfg_mss = ETH_DATA_LEN - hdrlen;
 	max_len = ETH_MAX_MTU - hdrlen;
+	if (!cfg_gso_size)
+		cfg_gso_size = cfg_mss;
 
 	if (cfg_payload_len > max_len)
 		error(1, 0, "payload length %u exceeds max %u",
@@ -392,10 +400,12 @@ int main(int argc, char **argv)
 		else
 			num_sends += send_udp(fd, buf[i]);
 		num_msgs++;
-
 		if (cfg_zerocopy && ((num_msgs & 0xF) == 0))
 			flush_zerocopy(fd);
 
+		if (cfg_msg_nr && num_msgs >= cfg_msg_nr)
+			break;
+
 		tnow = gettimeofday_ms();
 		if (tnow > treport) {
 			fprintf(stderr,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 00/10] udp: implement GRO support
  2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
                   ` (9 preceding siblings ...)
  2018-10-19 14:25 ` [RFC PATCH v2 10/10] selftests: add functionals test " Paolo Abeni
@ 2018-10-21 20:05 ` Willem de Bruijn
  2018-10-22  9:41   ` Paolo Abeni
  2018-10-23 12:10 ` Steffen Klassert
  11 siblings, 1 reply; 41+ messages in thread
From: Willem de Bruijn @ 2018-10-21 20:05 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Fri, Oct 19, 2018 at 10:30 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> This series implements GRO support for UDP sockets, as the RX counterpart
> of commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
> The core functionality is implemented by the second patch, introducing a new
> sockopt to enable UDP_GRO, while patch 3 implements support for passing the
> segment size to the user space via a new cmsg.
> UDP GRO performs a socket lookup for each ingress packets and aggregate datagram
> directed to UDP GRO enabled sockets with constant l4 tuple.
>
> UDP GRO packets can land on non GRO-enabled sockets, e.g. due to iptables NAT
> rules, and that could potentially confuse existing applications.

Good catch.

> The solution adopted here is to de-segment the GRO packet before enqueuing
> as needed. Since we must cope with packet reinsertion after de-segmentation,
> the relevant code is factored-out in ipv4 and ipv6 specific helpers and exposed
> to UDP usage.
>
> While the current code can probably be improved, this safeguard ,implemented in
> the patches 4-7, allows future enachements to enable UDP GSO offload on more
> virtual devices eventually even on forwarded packets.
>
> The last 4 for patches implement some performance and functional self-tests,
> re-using the existing udpgso infrastructure. The problematic scenario described
> above is explicitly tested.

This looks awesome! Impressive testing, too.

A few comments in the individual patches, mostly minor.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.
  2018-10-19 14:25 ` [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets Paolo Abeni
@ 2018-10-21 20:06   ` Willem de Bruijn
  2018-10-22 10:13     ` Paolo Abeni
  2018-10-22 11:24   ` Steffen Klassert
  1 sibling, 1 reply; 41+ messages in thread
From: Willem de Bruijn @ 2018-10-21 20:06 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Fri, Oct 19, 2018 at 10:30 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso
> with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
> eligible for GRO in the rx path: UDP segments directed to such socket
> are assembled into a larger GSO_UDP_L4 packet.
>
> The core UDP GRO support is enabled with setsockopt(UDP_GRO).
>
> Initial benchmark numbers:
>
> Before:
> udp rx:   1079 MB/s   769065 calls/s
>
> After:
> udp rx:   1466 MB/s    24877 calls/s
>
>
> This change introduces a side effect in respect to UDP tunnels:
> after a UDP tunnel creation, now the kernel performs a lookup per ingress
> UDP packet, while before such lookup happened only if the ingress packet
> carried a valid internal header csum.
>
> v1 -> v2:
>  - use a new option to enable UDP GRO
>  - use static keys to protect the UDP GRO socket lookup
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/linux/udp.h      |   3 +-
>  include/uapi/linux/udp.h |   1 +
>  net/ipv4/udp.c           |   7 +++
>  net/ipv4/udp_offload.c   | 109 +++++++++++++++++++++++++++++++--------
>  net/ipv6/udp_offload.c   |   6 +--
>  5 files changed, 98 insertions(+), 28 deletions(-)
>
> diff --git a/include/linux/udp.h b/include/linux/udp.h
> index a4dafff407fb..f613b329852e 100644
> --- a/include/linux/udp.h
> +++ b/include/linux/udp.h
> @@ -50,11 +50,12 @@ struct udp_sock {
>         __u8             encap_type;    /* Is this an Encapsulation socket? */
>         unsigned char    no_check6_tx:1,/* Send zero UDP6 checksums on TX? */
>                          no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
> -                        encap_enabled:1; /* This socket enabled encap
> +                        encap_enabled:1, /* This socket enabled encap
>                                            * processing; UDP tunnels and
>                                            * different encapsulation layer set
>                                            * this
>                                            */
> +                        gro_enabled:1; /* Can accept GRO packets */
>
>         /*
>          * Following member retains the information to create a UDP header
>          * when the socket is uncorked.
> diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
> index 09502de447f5..30baccb6c9c4 100644
> --- a/include/uapi/linux/udp.h
> +++ b/include/uapi/linux/udp.h
> @@ -33,6 +33,7 @@ struct udphdr {
>  #define UDP_NO_CHECK6_TX 101   /* Disable sending checksum for UDP6X */
>  #define UDP_NO_CHECK6_RX 102   /* Disable accpeting checksum for UDP6 */
>  #define UDP_SEGMENT    103     /* Set GSO segmentation size */
> +#define UDP_GRO                104     /* This socket can receive UDP GRO packets */
>
>  /* UDP encapsulation types */
>  #define UDP_ENCAP_ESPINUDP_NON_IKE     1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 9fcb5374e166..3c277378814f 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -115,6 +115,7 @@
>  #include "udp_impl.h"
>  #include <net/sock_reuseport.h>
>  #include <net/addrconf.h>
> +#include <net/udp_tunnel.h>
>
>  struct udp_table udp_table __read_mostly;
>  EXPORT_SYMBOL(udp_table);
> @@ -2459,6 +2460,12 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
>                 up->gso_size = val;
>                 break;
>
> +       case UDP_GRO:
> +               if (valbool)
> +                       udp_tunnel_encap_enable(sk->sk_socket);
> +               up->gro_enabled = valbool;

The socket lock is not held here, so multiple updates to
up->gro_enabled and the up->encap_enabled and the static branch can
race. Syzkaller is adept at generating those.

> +#define UDO_GRO_CNT_MAX 64
> +static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
> +                                              struct sk_buff *skb)
> +{
> +       struct udphdr *uh = udp_hdr(skb);
> +       struct sk_buff *pp = NULL;
> +       struct udphdr *uh2;
> +       struct sk_buff *p;
> +
> +       /* requires non zero csum, for simmetry with GSO */

symmetry

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 03/10] udp: add support for UDP_GRO cmsg
  2018-10-19 14:25 ` [RFC PATCH v2 03/10] udp: add support for UDP_GRO cmsg Paolo Abeni
@ 2018-10-21 20:07   ` Willem de Bruijn
  2018-10-22 15:44     ` Paolo Abeni
  0 siblings, 1 reply; 41+ messages in thread
From: Willem de Bruijn @ 2018-10-21 20:07 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Fri, Oct 19, 2018 at 10:30 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> When UDP GRO is enabled, the UDP_GRO cmsg will carry the ingress
> datagram size. User-space can use such info to compute the original
> packets layout.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> CHECK: should we use a separate setsockopt to explicitly enable
> gso_size cmsg reception? So that user space can enable UDP_GRO and
> fetch cmsg without forcefully receiving GRO related info.

A user can avoid the message by not passing control data. Though in
most practical cases it seems unsafe to do so, anyway, as the path MTU
can be lower than the expected device MTU.

> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 3c277378814f..2331ac9de954 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1714,6 +1714,10 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
>                 memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
>                 *addr_len = sizeof(*sin);
>         }
> +
> +       if (udp_sk(sk)->gro_enabled)
> +               udp_cmsg_recv(msg, sk, skb);
> +

Perhaps we can avoid adding a branch by setting a bit in
inet->cmsg_flags for gso_size to let the below branch handle the cmsg
processing.

I'd still set that as part of the UDP_GRO setsockopt. Though if you
insist it could be a value 2 instead of 1, effectively allowing the
above mentioned opt-out.

>         if (inet->cmsg_flags)
>                 ip_cmsg_recv_offset(msg, sk, skb, sizeof(struct udphdr), off);
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection
  2018-10-19 14:25 ` [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection Paolo Abeni
@ 2018-10-21 20:08   ` Willem de Bruijn
  2018-10-22 10:29     ` Paolo Abeni
  2018-10-22 11:43   ` Steffen Klassert
  2018-10-22 19:04   ` Subash Abhinov Kasiviswanathan
  2 siblings, 1 reply; 41+ messages in thread
From: Willem de Bruijn @ 2018-10-21 20:08 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Fri, Oct 19, 2018 at 10:31 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> In some scenarios, the GRO engine can assemble an UDP GRO packet
> that ultimately lands on a non GRO-enabled socket.
> This patch tries to address the issue explicitly checking for the UDP
> socket features before enqueuing the packet, and eventually segmenting
> the unexpected GRO packet, as needed.
>
> We must also cope with re-insertion requests: after segmentation the
> UDP code calls the helper introduced by the previous patches, as needed.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---

> +static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> +                                             struct sk_buff *skb)
> +{
> +       struct sk_buff *segs;
> +
> +       /* the GSO CB lays after the UDP one, no need to save and restore any
> +        * CB fragment, just initialize it
> +        */
> +       segs = __skb_gso_segment(skb, NETIF_F_SG, false);
> +       if (unlikely(IS_ERR(segs)))
> +               kfree_skb(skb);
> +       else if (segs)
> +               consume_skb(skb);
> +       return segs;
> +}
> +
> +
> +void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int proto);
> +
> +static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> +{
> +       struct sk_buff *next, *segs;
> +       int ret;
> +
> +       if (likely(!udp_unexpected_gso(sk, skb)))
> +               return udp_queue_rcv_one_skb(sk, skb);
> +
> +       BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_SGO_CB_OFFSET);
> +       __skb_push(skb, -skb_mac_offset(skb));
> +       segs = udp_rcv_segment(sk, skb);
> +       for (skb = segs; skb; skb = next) {

need to check IS_ERR(segs) again?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 07/10] selftests: add GRO support to udp bench rx program
  2018-10-19 14:25 ` [RFC PATCH v2 07/10] selftests: add GRO support to udp bench rx program Paolo Abeni
@ 2018-10-21 20:08   ` Willem de Bruijn
  2018-10-22 10:31     ` Paolo Abeni
  0 siblings, 1 reply; 41+ messages in thread
From: Willem de Bruijn @ 2018-10-21 20:08 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Fri, Oct 19, 2018 at 10:31 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> And fix a couple of buglets (port option processing,
> clean termination on SIGINT). This is preparatory work
> for GRO tests.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  tools/testing/selftests/net/udpgso_bench_rx.c | 37 +++++++++++++++----
>  1 file changed, 30 insertions(+), 7 deletions(-)
>
> diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c

> @@ -167,10 +177,10 @@ static void do_verify_udp(const char *data, int len)
>  /* Flush all outstanding datagrams. Verify first few bytes of each. */
>  static void do_flush_udp(int fd)
>  {
> -       static char rbuf[ETH_DATA_LEN];
> +       static char rbuf[65535];

we can use ETH_MAX_MTU.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 08/10] selftests: conditionally enable XDP support in udpgso_bench_rx
  2018-10-19 14:25 ` [RFC PATCH v2 08/10] selftests: conditionally enable XDP support in udpgso_bench_rx Paolo Abeni
@ 2018-10-21 20:09   ` Willem de Bruijn
  2018-10-22 10:37     ` Paolo Abeni
  0 siblings, 1 reply; 41+ messages in thread
From: Willem de Bruijn @ 2018-10-21 20:09 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Fri, Oct 19, 2018 at 10:31 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> XDP support will be used by a later patch to test the GRO path
> in a net namespace, leveraging the veth XDP implementation.
> To avoid breaking existing setup, XDP support is conditionally
> enabled and build only if llc is locally available.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> index 256d82d5fa87..176459b7c4d6 100644
> --- a/tools/testing/selftests/net/Makefile
> +++ b/tools/testing/selftests/net/Makefile
> @@ -16,8 +16,77 @@ TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
>  TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
>
>  KSFT_KHDR_INSTALL := 1
> +
> +# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
> +#  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
> +LLC ?= llc
> +CLANG ?= clang
> +LLVM_OBJCOPY ?= llvm-objcopy
> +BTF_PAHOLE ?= pahole
> +HAS_LLC := $(shell which $(LLC) 2>/dev/null)
> +
> +# conditional enable testes requiring llc
> +ifneq (, $(HAS_LLC))
> +TEST_GEN_FILES += xdp_dummy.o
> +endif
> +
>  include ../lib.mk
>
> +ifneq (, $(HAS_LLC))
> +
> +# Detect that we're cross compiling and use the cross compiler
> +ifdef CROSS_COMPILE
> +CLANG_ARCH_ARGS = -target $(ARCH)
> +endif
> +
> +PROBE := $(shell $(LLC) -march=bpf -mcpu=probe -filetype=null /dev/null 2>&1)
> +
> +# Let newer LLVM versions transparently probe the kernel for availability
> +# of full BPF instruction set.
> +ifeq ($(PROBE),)
> +  CPU ?= probe
> +else
> +  CPU ?= generic
> +endif
> +
> +SRC_PATH := $(abspath ../../../..)
> +LIB_PATH := $(SRC_PATH)/tools/lib
> +XDP_CFLAGS := -D SUPPORT_XDP=1 -I$(LIB_PATH)
> +LIBBPF = $(LIB_PATH)/bpf/libbpf.a
> +BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
> +BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
> +BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 'usage.*llvm')
> +CLANG_SYS_INCLUDES := $(shell $(CLANG) -v -E - </dev/null 2>&1 \
> +        | sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }')
> +CLANG_FLAGS = -I. -I$(SRC_PATH)/include -I../bpf/ \
> +             $(CLANG_SYS_INCLUDES) -Wno-compare-distinct-pointer-types
> +
> +ifneq ($(and $(BTF_LLC_PROBE),$(BTF_PAHOLE_PROBE),$(BTF_OBJCOPY_PROBE)),)
> +       CLANG_CFLAGS += -g
> +       LLC_FLAGS += -mattr=dwarfris
> +       DWARF2BTF = y
> +endif
> +
> +$(LIBBPF): FORCE
> +# Fix up variables inherited from Kbuild that tools/ build system won't like
> +       $(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(SRC_PATH) O= $(nodir $@)
> +

This is a lot of XDP specific code. Not for this patchset, per se, but
would be nice if we can reuse the logic in selftests/bpf for all this.

> --- a/tools/testing/selftests/net/udpgso_bench_rx.c
> @@ -227,6 +234,13 @@ static void parse_opts(int argc, char **argv)
>                         cfg_verify = true;
>                         cfg_read_all = true;
>                         break;
> +#ifdef SUPPORT_XDP
> +               case 'x':
> +                       cfg_xdp_iface = if_nametoindex(optarg);
> +                       if (!cfg_xdp_iface)
> +                               error(1, errno, "unknown interface %s", optarg);
> +                       break;
> +#endif

nit: needs to be added to getopt string in this patch.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 10/10] selftests: add functionals test for UDP GRO
  2018-10-19 14:25 ` [RFC PATCH v2 10/10] selftests: add functionals test " Paolo Abeni
@ 2018-10-21 20:09   ` Willem de Bruijn
  2018-10-22 10:46     ` Paolo Abeni
  0 siblings, 1 reply; 41+ messages in thread
From: Willem de Bruijn @ 2018-10-21 20:09 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Fri, Oct 19, 2018 at 10:31 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Extends the existing udp programs to allow checking for proper
> GRO aggregation/GSO size, and run the tests via a shell script, using
> a veth pair with XDP program attached to trigger the GRO code path.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  tools/testing/selftests/net/Makefile          |   2 +-
>  tools/testing/selftests/net/udpgro.sh         | 144 ++++++++++++++++++
>  tools/testing/selftests/net/udpgro_bench.sh   |   8 +-
>  tools/testing/selftests/net/udpgso_bench.sh   |   2 +-
>  tools/testing/selftests/net/udpgso_bench_rx.c | 125 +++++++++++++--
>  tools/testing/selftests/net/udpgso_bench_tx.c |  22 ++-
>  6 files changed, 281 insertions(+), 22 deletions(-)
>  create mode 100755 tools/testing/selftests/net/udpgro.sh
>

> diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh

> +       run_test "no GRO chk cmsg" "${ipv4_args} -M 10 -s 1400" "-4 -n 10 -l 1400 -S -1"
> +       run_test "no GRO chk cmsg" "${ipv6_args} -M 10 -s 1400" "-n 10 -l 1400 -S -1"

why expected segment size -1 in these two?

> diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c
>  static void usage(const char *filepath)
>  {
> -       error(1, 0, "Usage: %s [-46cmStuz] [-C cpu] [-D dst ip] [-l secs] [-p port] [-s sendsize]",
> +       error(1, 0, "Usage: %s [-46cmtuz] [-C cpu] [-D dst ip] [-l secs] [-m messagenr] [-p port] [-s sendsize] [-S gsosize]",
>                     filepath);

missing -M

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 00/10] udp: implement GRO support
  2018-10-21 20:05 ` [RFC PATCH v2 00/10] udp: implement GRO support Willem de Bruijn
@ 2018-10-22  9:41   ` Paolo Abeni
  0 siblings, 0 replies; 41+ messages in thread
From: Paolo Abeni @ 2018-10-22  9:41 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Willem de Bruijn, steffen.klassert

Hi all,

On Sun, 2018-10-21 at 16:05 -0400, Willem de Bruijn wrote:
> On Fri, Oct 19, 2018 at 10:30 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > This series implements GRO support for UDP sockets, as the RX counterpart
> > of commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
> > The core functionality is implemented by the second patch, introducing a new
> > sockopt to enable UDP_GRO, while patch 3 implements support for passing the
> > segment size to the user space via a new cmsg.
> > UDP GRO performs a socket lookup for each ingress packets and aggregate datagram
> > directed to UDP GRO enabled sockets with constant l4 tuple.
> > 
> > UDP GRO packets can land on non GRO-enabled sockets, e.g. due to iptables NAT
> > rules, and that could potentially confuse existing applications.
> 
> Good catch.
> 
> > The solution adopted here is to de-segment the GRO packet before enqueuing
> > as needed. Since we must cope with packet reinsertion after de-segmentation,
> > the relevant code is factored-out in ipv4 and ipv6 specific helpers and exposed
> > to UDP usage.
> > 
> > While the current code can probably be improved, this safeguard ,implemented in
> > the patches 4-7, allows future enachements to enable UDP GSO offload on more
> > virtual devices eventually even on forwarded packets.
> > 
> > The last 4 for patches implement some performance and functional self-tests,
> > re-using the existing udpgso infrastructure. The problematic scenario described
> > above is explicitly tested.
> 
> This looks awesome! Impressive testing, too.
> 
> A few comments in the individual patches, mostly minor.

Thank you for the in-depth review! (in the WE ;)

I'll try to address the comments on each patch individually.

In the end I rushed a bit this RFC, because the misdirection issue (and
the tentative fix) bothered me more than a bit: I wanted to check I was
not completely out-of-track.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.
  2018-10-21 20:06   ` Willem de Bruijn
@ 2018-10-22 10:13     ` Paolo Abeni
  2018-10-22 15:15       ` Willem de Bruijn
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Abeni @ 2018-10-22 10:13 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Sun, 2018-10-21 at 16:06 -0400, Willem de Bruijn wrote:
> On Fri, Oct 19, 2018 at 10:30 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso
> > with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
> > eligible for GRO in the rx path: UDP segments directed to such socket
> > are assembled into a larger GSO_UDP_L4 packet.
> > 
> > The core UDP GRO support is enabled with setsockopt(UDP_GRO).
> > 
> > Initial benchmark numbers:
> > 
> > Before:
> > udp rx:   1079 MB/s   769065 calls/s
> > 
> > After:
> > udp rx:   1466 MB/s    24877 calls/s
> > 
> > 
> > This change introduces a side effect in respect to UDP tunnels:
> > after a UDP tunnel creation, now the kernel performs a lookup per ingress
> > UDP packet, while before such lookup happened only if the ingress packet
> > carried a valid internal header csum.
> > 
> > v1 -> v2:
> >  - use a new option to enable UDP GRO
> >  - use static keys to protect the UDP GRO socket lookup
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  include/linux/udp.h      |   3 +-
> >  include/uapi/linux/udp.h |   1 +
> >  net/ipv4/udp.c           |   7 +++
> >  net/ipv4/udp_offload.c   | 109 +++++++++++++++++++++++++++++++--------
> >  net/ipv6/udp_offload.c   |   6 +--
> >  5 files changed, 98 insertions(+), 28 deletions(-)
> > 
> > diff --git a/include/linux/udp.h b/include/linux/udp.h
> > index a4dafff407fb..f613b329852e 100644
> > --- a/include/linux/udp.h
> > +++ b/include/linux/udp.h
> > @@ -50,11 +50,12 @@ struct udp_sock {
> >         __u8             encap_type;    /* Is this an Encapsulation socket? */
> >         unsigned char    no_check6_tx:1,/* Send zero UDP6 checksums on TX? */
> >                          no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
> > -                        encap_enabled:1; /* This socket enabled encap
> > +                        encap_enabled:1, /* This socket enabled encap
> >                                            * processing; UDP tunnels and
> >                                            * different encapsulation layer set
> >                                            * this
> >                                            */
> > +                        gro_enabled:1; /* Can accept GRO packets */
> > 
> >         /*
> >          * Following member retains the information to create a UDP header
> >          * when the socket is uncorked.
> > diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
> > index 09502de447f5..30baccb6c9c4 100644
> > --- a/include/uapi/linux/udp.h
> > +++ b/include/uapi/linux/udp.h
> > @@ -33,6 +33,7 @@ struct udphdr {
> >  #define UDP_NO_CHECK6_TX 101   /* Disable sending checksum for UDP6X */
> >  #define UDP_NO_CHECK6_RX 102   /* Disable accpeting checksum for UDP6 */
> >  #define UDP_SEGMENT    103     /* Set GSO segmentation size */
> > +#define UDP_GRO                104     /* This socket can receive UDP GRO packets */
> > 
> >  /* UDP encapsulation types */
> >  #define UDP_ENCAP_ESPINUDP_NON_IKE     1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 9fcb5374e166..3c277378814f 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -115,6 +115,7 @@
> >  #include "udp_impl.h"
> >  #include <net/sock_reuseport.h>
> >  #include <net/addrconf.h>
> > +#include <net/udp_tunnel.h>
> > 
> >  struct udp_table udp_table __read_mostly;
> >  EXPORT_SYMBOL(udp_table);
> > @@ -2459,6 +2460,12 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
> >                 up->gso_size = val;
> >                 break;
> > 
> > +       case UDP_GRO:
> > +               if (valbool)
> > +                       udp_tunnel_encap_enable(sk->sk_socket);
> > +               up->gro_enabled = valbool;
> 
> The socket lock is not held here, so multiple updates to
> up->gro_enabled and the up->encap_enabled and the static branch can
> race. Syzkaller is adept at generating those.

Good catch. I was fooled by the current existing code. I think there
are potentially similar issues for UDP_ENCAP, UDPLITE_SEND_CSCOV, ...

Since the rx path don't take it anymore and we don't risk starving, I
think we should could/always acquire the socket lock on setsockopt,
wdyt?

> > +#define UDO_GRO_CNT_MAX 64
> > +static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
> > +                                              struct sk_buff *skb)
> > +{
> > +       struct udphdr *uh = udp_hdr(skb);
> > +       struct sk_buff *pp = NULL;
> > +       struct udphdr *uh2;
> > +       struct sk_buff *p;
> > +
> > +       /* requires non zero csum, for simmetry with GSO */
> 
> symmetry

Thanks ;)

Paolo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection
  2018-10-21 20:08   ` Willem de Bruijn
@ 2018-10-22 10:29     ` Paolo Abeni
  2018-10-22 16:00       ` Willem de Bruijn
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Abeni @ 2018-10-22 10:29 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Sun, 2018-10-21 at 16:08 -0400, Willem de Bruijn wrote:
> On Fri, Oct 19, 2018 at 10:31 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > In some scenarios, the GRO engine can assemble an UDP GRO packet
> > that ultimately lands on a non GRO-enabled socket.
> > This patch tries to address the issue explicitly checking for the UDP
> > socket features before enqueuing the packet, and eventually segmenting
> > the unexpected GRO packet, as needed.
> > 
> > We must also cope with re-insertion requests: after segmentation the
> > UDP code calls the helper introduced by the previous patches, as needed.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > +static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> > +                                             struct sk_buff *skb)
> > +{
> > +       struct sk_buff *segs;
> > +
> > +       /* the GSO CB lays after the UDP one, no need to save and restore any
> > +        * CB fragment, just initialize it
> > +        */
> > +       segs = __skb_gso_segment(skb, NETIF_F_SG, false);
> > +       if (unlikely(IS_ERR(segs)))
> > +               kfree_skb(skb);
> > +       else if (segs)
> > +               consume_skb(skb);
> > +       return segs;
> > +}
> > +
> > +
> > +void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int proto);
> > +
> > +static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > +{
> > +       struct sk_buff *next, *segs;
> > +       int ret;
> > +
> > +       if (likely(!udp_unexpected_gso(sk, skb)))
> > +               return udp_queue_rcv_one_skb(sk, skb);
> > +
> > +       BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_SGO_CB_OFFSET);
> > +       __skb_push(skb, -skb_mac_offset(skb));
> > +       segs = udp_rcv_segment(sk, skb);
> > +       for (skb = segs; skb; skb = next) {
> 
> need to check IS_ERR(segs) again?

whooops ... yes, I think so, thanks for catching it.

Since the error code is always discarded, perhpas udp_rcv_segment() can
simply return 0 when IS_ERR(segs) is true, so we can save a conditional
here. This is currently a slower/exceptional path, but if we will
enable UDP GRO for forwaded packets, it will be hit often.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 07/10] selftests: add GRO support to udp bench rx program
  2018-10-21 20:08   ` Willem de Bruijn
@ 2018-10-22 10:31     ` Paolo Abeni
  0 siblings, 0 replies; 41+ messages in thread
From: Paolo Abeni @ 2018-10-22 10:31 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Sun, 2018-10-21 at 16:08 -0400, Willem de Bruijn wrote:
> On Fri, Oct 19, 2018 at 10:31 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > And fix a couple of buglets (port option processing,
> > clean termination on SIGINT). This is preparatory work
> > for GRO tests.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  tools/testing/selftests/net/udpgso_bench_rx.c | 37 +++++++++++++++----
> >  1 file changed, 30 insertions(+), 7 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
> > @@ -167,10 +177,10 @@ static void do_verify_udp(const char *data, int len)
> >  /* Flush all outstanding datagrams. Verify first few bytes of each. */
> >  static void do_flush_udp(int fd)
> >  {
> > -       static char rbuf[ETH_DATA_LEN];
> > +       static char rbuf[65535];
> 
> we can use ETH_MAX_MTU.

Thanks, will do in next iteration.

Paolo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 08/10] selftests: conditionally enable XDP support in udpgso_bench_rx
  2018-10-21 20:09   ` Willem de Bruijn
@ 2018-10-22 10:37     ` Paolo Abeni
  0 siblings, 0 replies; 41+ messages in thread
From: Paolo Abeni @ 2018-10-22 10:37 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Sun, 2018-10-21 at 16:09 -0400, Willem de Bruijn wrote:
> On Fri, Oct 19, 2018 at 10:31 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > XDP support will be used by a later patch to test the GRO path
> > in a net namespace, leveraging the veth XDP implementation.
> > To avoid breaking existing setup, XDP support is conditionally
> > enabled and build only if llc is locally available.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> > index 256d82d5fa87..176459b7c4d6 100644
> > --- a/tools/testing/selftests/net/Makefile
> > +++ b/tools/testing/selftests/net/Makefile
> > @@ -16,8 +16,77 @@ TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
> >  TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
> > 
> >  KSFT_KHDR_INSTALL := 1
> > +
> > +# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
> > +#  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
> > +LLC ?= llc
> > +CLANG ?= clang
> > +LLVM_OBJCOPY ?= llvm-objcopy
> > +BTF_PAHOLE ?= pahole
> > +HAS_LLC := $(shell which $(LLC) 2>/dev/null)
> > +
> > +# conditional enable testes requiring llc
> > +ifneq (, $(HAS_LLC))
> > +TEST_GEN_FILES += xdp_dummy.o
> > +endif
> > +
> >  include ../lib.mk
> > 
> > +ifneq (, $(HAS_LLC))
> > +
> > +# Detect that we're cross compiling and use the cross compiler
> > +ifdef CROSS_COMPILE
> > +CLANG_ARCH_ARGS = -target $(ARCH)
> > +endif
> > +
> > +PROBE := $(shell $(LLC) -march=bpf -mcpu=probe -filetype=null /dev/null 2>&1)
> > +
> > +# Let newer LLVM versions transparently probe the kernel for availability
> > +# of full BPF instruction set.
> > +ifeq ($(PROBE),)
> > +  CPU ?= probe
> > +else
> > +  CPU ?= generic
> > +endif
> > +
> > +SRC_PATH := $(abspath ../../../..)
> > +LIB_PATH := $(SRC_PATH)/tools/lib
> > +XDP_CFLAGS := -D SUPPORT_XDP=1 -I$(LIB_PATH)
> > +LIBBPF = $(LIB_PATH)/bpf/libbpf.a
> > +BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
> > +BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
> > +BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 'usage.*llvm')
> > +CLANG_SYS_INCLUDES := $(shell $(CLANG) -v -E - </dev/null 2>&1 \
> > +        | sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }')
> > +CLANG_FLAGS = -I. -I$(SRC_PATH)/include -I../bpf/ \
> > +             $(CLANG_SYS_INCLUDES) -Wno-compare-distinct-pointer-types
> > +
> > +ifneq ($(and $(BTF_LLC_PROBE),$(BTF_PAHOLE_PROBE),$(BTF_OBJCOPY_PROBE)),)
> > +       CLANG_CFLAGS += -g
> > +       LLC_FLAGS += -mattr=dwarfris
> > +       DWARF2BTF = y
> > +endif
> > +
> > +$(LIBBPF): FORCE
> > +# Fix up variables inherited from Kbuild that tools/ build system won't like
> > +       $(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(SRC_PATH) O= $(nodir $@)
> > +
> 
> This is a lot of XDP specific code. Not for this patchset, per se, but
> would be nice if we can reuse the logic in selftests/bpf for all this.

Agreed. A very similar code is already present almost duplicated in 3
different places (samples/bpf/Makefile, tools/testing/selftests/tc-
testing/bpf/Makefile and tools/testing/selftests/bpf/Makefile). A
bfp_lib.mk or the like would be nice ;). But I felt it a bit out of
scope for this patch, and I'm new to XDP/ebpf, so I preferred avoid
additional issues.

> > --- a/tools/testing/selftests/net/udpgso_bench_rx.c
> > @@ -227,6 +234,13 @@ static void parse_opts(int argc, char **argv)
> >                         cfg_verify = true;
> >                         cfg_read_all = true;
> >                         break;
> > +#ifdef SUPPORT_XDP
> > +               case 'x':
> > +                       cfg_xdp_iface = if_nametoindex(optarg);
> > +                       if (!cfg_xdp_iface)
> > +                               error(1, errno, "unknown interface %s", optarg);
> > +                       break;
> > +#endif
> 
> nit: needs to be added to getopt string in this patch.

Thanks, will do in next iteration.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 10/10] selftests: add functionals test for UDP GRO
  2018-10-21 20:09   ` Willem de Bruijn
@ 2018-10-22 10:46     ` Paolo Abeni
  0 siblings, 0 replies; 41+ messages in thread
From: Paolo Abeni @ 2018-10-22 10:46 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Sun, 2018-10-21 at 16:09 -0400, Willem de Bruijn wrote:
> On Fri, Oct 19, 2018 at 10:31 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > Extends the existing udp programs to allow checking for proper
> > GRO aggregation/GSO size, and run the tests via a shell script, using
> > a veth pair with XDP program attached to trigger the GRO code path.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  tools/testing/selftests/net/Makefile          |   2 +-
> >  tools/testing/selftests/net/udpgro.sh         | 144 ++++++++++++++++++
> >  tools/testing/selftests/net/udpgro_bench.sh   |   8 +-
> >  tools/testing/selftests/net/udpgso_bench.sh   |   2 +-
> >  tools/testing/selftests/net/udpgso_bench_rx.c | 125 +++++++++++++--
> >  tools/testing/selftests/net/udpgso_bench_tx.c |  22 ++-
> >  6 files changed, 281 insertions(+), 22 deletions(-)
> >  create mode 100755 tools/testing/selftests/net/udpgro.sh
> > 
> > diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh
> > +       run_test "no GRO chk cmsg" "${ipv4_args} -M 10 -s 1400" "-4 -n 10 -l 1400 -S -1"
> > +       run_test "no GRO chk cmsg" "${ipv6_args} -M 10 -s 1400" "-n 10 -l 1400 -S -1"
> 
> why expected segment size -1 in these two?

I was unable to come up with a self-explaining option name/syntax. '-1' 
really means 'no UDP_SEGMENT cmsg'. Since the receiver did not enable
UDP_GRO, should not receive such cmsg.
> 
> > diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c
> >  static void usage(const char *filepath)
> >  {
> > -       error(1, 0, "Usage: %s [-46cmStuz] [-C cpu] [-D dst ip] [-l secs] [-p port] [-s sendsize]",
> > +       error(1, 0, "Usage: %s [-46cmtuz] [-C cpu] [-D dst ip] [-l secs] [-m messagenr] [-p port] [-s sendsize] [-S gsosize]",
> >                     filepath);
> 
> missing -M

Will add in next iteration.

Additional node: in the current test implementation, 'no GRO chk cmsg'
sometimes wrongly returns a failure. I'll try to address it in the next
iteration (it's a test issue in the code I added, not a kernel one).

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.
  2018-10-19 14:25 ` [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets Paolo Abeni
  2018-10-21 20:06   ` Willem de Bruijn
@ 2018-10-22 11:24   ` Steffen Klassert
  2018-10-22 13:41     ` Paolo Abeni
  1 sibling, 1 reply; 41+ messages in thread
From: Steffen Klassert @ 2018-10-22 11:24 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, Willem de Bruijn

On Fri, Oct 19, 2018 at 04:25:12PM +0200, Paolo Abeni wrote:
>  
> +#define UDO_GRO_CNT_MAX 64

Maybe better UDP_GRO_CNT_MAX?

Btw. do we really need this explicit limit?
We should not get more than 64 packets during
one napi poll cycle.

> +static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
> +					       struct sk_buff *skb)
> +{
> +	struct udphdr *uh = udp_hdr(skb);
> +	struct sk_buff *pp = NULL;
> +	struct udphdr *uh2;
> +	struct sk_buff *p;
> +
> +	/* requires non zero csum, for simmetry with GSO */
> +	if (!uh->check) {
> +		NAPI_GRO_CB(skb)->flush = 1;
> +		return NULL;
> +	}

Why is the requirement of checksums different than in 
udp_gro_receive? It's not that I care much about UDP
packets without a checksum, but you would not need
to implement your own loop if the requirement could
be the same as in udp_gro_receive.

> +
> +	/* pull encapsulating udp header */
> +	skb_gro_pull(skb, sizeof(struct udphdr));
> +	skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
> +
> +	list_for_each_entry(p, head, list) {
> +		if (!NAPI_GRO_CB(p)->same_flow)
> +			continue;
> +
> +		uh2 = udp_hdr(p);
> +
> +		/* Match ports only, as csum is always non zero */
> +		if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
> +			NAPI_GRO_CB(p)->same_flow = 0;
> +			continue;
> +		}
> +
> +		/* Terminate the flow on len mismatch or if it grow "too much".
> +		 * Under small packet flood GRO count could elsewhere grow a lot
> +		 * leading to execessive truesize values
> +		 */
> +		if (!skb_gro_receive(p, skb) &&
> +		    NAPI_GRO_CB(p)->count > UDO_GRO_CNT_MAX)

This allows to merge UDO_GRO_CNT_MAX + 1 packets.

> +			pp = p;
> +		else if (uh->len != uh2->len)
> +			pp = p;
> +
> +		return pp;
> +	}
> +
> +	/* mismatch, but we never need to flush */
> +	return NULL;
> +}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection
  2018-10-19 14:25 ` [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection Paolo Abeni
  2018-10-21 20:08   ` Willem de Bruijn
@ 2018-10-22 11:43   ` Steffen Klassert
  2018-10-22 12:51     ` Paolo Abeni
  2018-10-22 19:04   ` Subash Abhinov Kasiviswanathan
  2 siblings, 1 reply; 41+ messages in thread
From: Steffen Klassert @ 2018-10-22 11:43 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, Willem de Bruijn

On Fri, Oct 19, 2018 at 04:25:16PM +0200, Paolo Abeni wrote:
> +
> +static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> +					      struct sk_buff *skb)
> +{
> +	struct sk_buff *segs;
> +
> +	/* the GSO CB lays after the UDP one, no need to save and restore any
> +	 * CB fragment, just initialize it
> +	 */
> +	segs = __skb_gso_segment(skb, NETIF_F_SG, false);
> +	if (unlikely(IS_ERR(segs)))
> +		kfree_skb(skb);
> +	else if (segs)
> +		consume_skb(skb);
> +	return segs;
> +}
> +
> +

One empty line too much.

>  #define udp_portaddr_for_each_entry(__sk, list) \
>  	hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node)
>  
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 2331ac9de954..0d55145ce9f5 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1909,7 +1909,7 @@ EXPORT_SYMBOL(udp_encap_enable);
>   * Note that in the success and error cases, the skb is assumed to
>   * have either been requeued or freed.
>   */
> -static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> +static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
>  {
>  	struct udp_sock *up = udp_sk(sk);
>  	int is_udplite = IS_UDPLITE(sk);
> @@ -2012,6 +2012,29 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>  	return -1;
>  }
>  
> +void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int proto);
> +
> +static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> +{
> +	struct sk_buff *next, *segs;
> +	int ret;
> +
> +	if (likely(!udp_unexpected_gso(sk, skb)))
> +		return udp_queue_rcv_one_skb(sk, skb);
> +
> +	BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_SGO_CB_OFFSET);
> +	__skb_push(skb, -skb_mac_offset(skb));
> +	segs = udp_rcv_segment(sk, skb);
> +	for (skb = segs; skb; skb = next) {
> +		next = skb->next;
> +		__skb_pull(skb, skb_transport_offset(skb));
> +		ret = udp_queue_rcv_one_skb(sk, skb);

udp_queue_rcv_one_skb() starts with doing a xfrm4_policy_check().
Maybe we can do this on the GSO packet instead of the segments.
So far this code is just for handling a corner case, but this might
change.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection
  2018-10-22 11:43   ` Steffen Klassert
@ 2018-10-22 12:51     ` Paolo Abeni
  2018-10-23 10:29       ` Steffen Klassert
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Abeni @ 2018-10-22 12:51 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev, Willem de Bruijn

Hi,

On Mon, 2018-10-22 at 13:43 +0200, Steffen Klassert wrote:
> On Fri, Oct 19, 2018 at 04:25:16PM +0200, Paolo Abeni wrote:
> > +
> > +static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> > +					      struct sk_buff *skb)
> > +{
> > +	struct sk_buff *segs;
> > +
> > +	/* the GSO CB lays after the UDP one, no need to save and restore any
> > +	 * CB fragment, just initialize it
> > +	 */
> > +	segs = __skb_gso_segment(skb, NETIF_F_SG, false);
> > +	if (unlikely(IS_ERR(segs)))
> > +		kfree_skb(skb);
> > +	else if (segs)
> > +		consume_skb(skb);
> > +	return segs;
> > +}
> > +
> > +
> 
> One empty line too much.

Thank you, will handle in the next iteration.

> >  #define udp_portaddr_for_each_entry(__sk, list) \
> >  	hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node)
> >  
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 2331ac9de954..0d55145ce9f5 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -1909,7 +1909,7 @@ EXPORT_SYMBOL(udp_encap_enable);
> >   * Note that in the success and error cases, the skb is assumed to
> >   * have either been requeued or freed.
> >   */
> > -static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > +static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
> >  {
> >  	struct udp_sock *up = udp_sk(sk);
> >  	int is_udplite = IS_UDPLITE(sk);
> > @@ -2012,6 +2012,29 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> >  	return -1;
> >  }
> >  
> > +void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int proto);
> > +
> > +static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > +{
> > +	struct sk_buff *next, *segs;
> > +	int ret;
> > +
> > +	if (likely(!udp_unexpected_gso(sk, skb)))
> > +		return udp_queue_rcv_one_skb(sk, skb);
> > +
> > +	BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_SGO_CB_OFFSET);
> > +	__skb_push(skb, -skb_mac_offset(skb));
> > +	segs = udp_rcv_segment(sk, skb);
> > +	for (skb = segs; skb; skb = next) {
> > +		next = skb->next;
> > +		__skb_pull(skb, skb_transport_offset(skb));
> > +		ret = udp_queue_rcv_one_skb(sk, skb);
> 
> udp_queue_rcv_one_skb() starts with doing a xfrm4_policy_check().
> Maybe we can do this on the GSO packet instead of the segments.
> So far this code is just for handling a corner case, but this might
> change.

I thought about keeping the policy check here, but then I preferred
what looked the safest option. Perhaps we can improve with a follow-up?

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.
  2018-10-22 11:24   ` Steffen Klassert
@ 2018-10-22 13:41     ` Paolo Abeni
  2018-10-22 15:51       ` Willem de Bruijn
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Abeni @ 2018-10-22 13:41 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev, Willem de Bruijn

On Mon, 2018-10-22 at 13:24 +0200, Steffen Klassert wrote:
> On Fri, Oct 19, 2018 at 04:25:12PM +0200, Paolo Abeni wrote:
> >  
> > +#define UDO_GRO_CNT_MAX 64
> 
> Maybe better UDP_GRO_CNT_MAX?

Oops, typo. Yes, sure, will address in the next iteration.

> Btw. do we really need this explicit limit?
> We should not get more than 64 packets during
> one napi poll cycle.

With HZ >= 1000, gro_flush happens at most once per jiffies: we can
have much more than 64 packets per segment, with appropriate pkt len.

> 
> > +static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
> > +					       struct sk_buff *skb)
> > +{
> > +	struct udphdr *uh = udp_hdr(skb);
> > +	struct sk_buff *pp = NULL;
> > +	struct udphdr *uh2;
> > +	struct sk_buff *p;
> > +
> > +	/* requires non zero csum, for simmetry with GSO */
> > +	if (!uh->check) {
> > +		NAPI_GRO_CB(skb)->flush = 1;
> > +		return NULL;
> > +	}
> 
> Why is the requirement of checksums different than in 
> udp_gro_receive? It's not that I care much about UDP
> packets without a checksum, but you would not need
> to implement your own loop if the requirement could
> be the same as in udp_gro_receive.

uhm.... 
AFAIU, we need to generated aggregated packets that UDP GSO is able to
process/segment. I was unable to get a nocsum packet segment (possibly
PEBKAC) so I enforced that condition on the rx path.

@Willem: did I see ghost here? is UDP_SEGMENT fine with no checksum
segment?

> > +
> > +	/* pull encapsulating udp header */
> > +	skb_gro_pull(skb, sizeof(struct udphdr));
> > +	skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
> > +
> > +	list_for_each_entry(p, head, list) {
> > +		if (!NAPI_GRO_CB(p)->same_flow)
> > +			continue;
> > +
> > +		uh2 = udp_hdr(p);
> > +
> > +		/* Match ports only, as csum is always non zero */
> > +		if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
> > +			NAPI_GRO_CB(p)->same_flow = 0;
> > +			continue;
> > +		}
> > +
> > +		/* Terminate the flow on len mismatch or if it grow "too much".
> > +		 * Under small packet flood GRO count could elsewhere grow a lot
> > +		 * leading to execessive truesize values
> > +		 */
> > +		if (!skb_gro_receive(p, skb) &&
> > +		    NAPI_GRO_CB(p)->count > UDO_GRO_CNT_MAX)
> 
> This allows to merge UDO_GRO_CNT_MAX + 1 packets.

Thanks, will address in the next iteration.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.
  2018-10-22 10:13     ` Paolo Abeni
@ 2018-10-22 15:15       ` Willem de Bruijn
  0 siblings, 0 replies; 41+ messages in thread
From: Willem de Bruijn @ 2018-10-22 15:15 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Mon, Oct 22, 2018 at 6:13 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Sun, 2018-10-21 at 16:06 -0400, Willem de Bruijn wrote:
> > On Fri, Oct 19, 2018 at 10:30 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso
> > > with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
> > > eligible for GRO in the rx path: UDP segments directed to such socket
> > > are assembled into a larger GSO_UDP_L4 packet.
> > >
> > > The core UDP GRO support is enabled with setsockopt(UDP_GRO).
> > >
> > > Initial benchmark numbers:
> > >
> > > Before:
> > > udp rx:   1079 MB/s   769065 calls/s
> > >
> > > After:
> > > udp rx:   1466 MB/s    24877 calls/s
> > >
> > >
> > > This change introduces a side effect in respect to UDP tunnels:
> > > after a UDP tunnel creation, now the kernel performs a lookup per ingress
> > > UDP packet, while before such lookup happened only if the ingress packet
> > > carried a valid internal header csum.
> > >
> > > v1 -> v2:
> > >  - use a new option to enable UDP GRO
> > >  - use static keys to protect the UDP GRO socket lookup
> > >
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  include/linux/udp.h      |   3 +-
> > >  include/uapi/linux/udp.h |   1 +
> > >  net/ipv4/udp.c           |   7 +++
> > >  net/ipv4/udp_offload.c   | 109 +++++++++++++++++++++++++++++++--------
> > >  net/ipv6/udp_offload.c   |   6 +--
> > >  5 files changed, 98 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/include/linux/udp.h b/include/linux/udp.h
> > > index a4dafff407fb..f613b329852e 100644
> > > --- a/include/linux/udp.h
> > > +++ b/include/linux/udp.h
> > > @@ -50,11 +50,12 @@ struct udp_sock {
> > >         __u8             encap_type;    /* Is this an Encapsulation socket? */
> > >         unsigned char    no_check6_tx:1,/* Send zero UDP6 checksums on TX? */
> > >                          no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
> > > -                        encap_enabled:1; /* This socket enabled encap
> > > +                        encap_enabled:1, /* This socket enabled encap
> > >                                            * processing; UDP tunnels and
> > >                                            * different encapsulation layer set
> > >                                            * this
> > >                                            */
> > > +                        gro_enabled:1; /* Can accept GRO packets */
> > >
> > >         /*
> > >          * Following member retains the information to create a UDP header
> > >          * when the socket is uncorked.
> > > diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
> > > index 09502de447f5..30baccb6c9c4 100644
> > > --- a/include/uapi/linux/udp.h
> > > +++ b/include/uapi/linux/udp.h
> > > @@ -33,6 +33,7 @@ struct udphdr {
> > >  #define UDP_NO_CHECK6_TX 101   /* Disable sending checksum for UDP6X */
> > >  #define UDP_NO_CHECK6_RX 102   /* Disable accpeting checksum for UDP6 */
> > >  #define UDP_SEGMENT    103     /* Set GSO segmentation size */
> > > +#define UDP_GRO                104     /* This socket can receive UDP GRO packets */
> > >
> > >  /* UDP encapsulation types */
> > >  #define UDP_ENCAP_ESPINUDP_NON_IKE     1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index 9fcb5374e166..3c277378814f 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -115,6 +115,7 @@
> > >  #include "udp_impl.h"
> > >  #include <net/sock_reuseport.h>
> > >  #include <net/addrconf.h>
> > > +#include <net/udp_tunnel.h>
> > >
> > >  struct udp_table udp_table __read_mostly;
> > >  EXPORT_SYMBOL(udp_table);
> > > @@ -2459,6 +2460,12 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
> > >                 up->gso_size = val;
> > >                 break;
> > >
> > > +       case UDP_GRO:
> > > +               if (valbool)
> > > +                       udp_tunnel_encap_enable(sk->sk_socket);
> > > +               up->gro_enabled = valbool;
> >
> > The socket lock is not held here, so multiple updates to
> > up->gro_enabled and the up->encap_enabled and the static branch can
> > race. Syzkaller is adept at generating those.
>
> Good catch. I was fooled by the current existing code. I think there
> are potentially similar issues for UDP_ENCAP, UDPLITE_SEND_CSCOV, ...
>
> Since the rx path don't take it anymore and we don't risk starving, I
> think we should could/always acquire the socket lock on setsockopt,
> wdyt?

Agreed. We had to add a lot of those in packet_setsockopt for the same reason.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 03/10] udp: add support for UDP_GRO cmsg
  2018-10-21 20:07   ` Willem de Bruijn
@ 2018-10-22 15:44     ` Paolo Abeni
  0 siblings, 0 replies; 41+ messages in thread
From: Paolo Abeni @ 2018-10-22 15:44 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Sun, 2018-10-21 at 16:07 -0400, Willem de Bruijn wrote:
> On Fri, Oct 19, 2018 at 10:30 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > When UDP GRO is enabled, the UDP_GRO cmsg will carry the ingress
> > datagram size. User-space can use such info to compute the original
> > packets layout.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > CHECK: should we use a separate setsockopt to explicitly enable
> > gso_size cmsg reception? So that user space can enable UDP_GRO and
> > fetch cmsg without forcefully receiving GRO related info.
> 
> A user can avoid the message by not passing control data. Though in
> most practical cases it seems unsafe to do so, anyway, as the path MTU
> can be lower than the expected device MTU.
> 
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 3c277378814f..2331ac9de954 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -1714,6 +1714,10 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
> >                 memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
> >                 *addr_len = sizeof(*sin);
> >         }
> > +
> > +       if (udp_sk(sk)->gro_enabled)
> > +               udp_cmsg_recv(msg, sk, skb);
> > +
> 
> Perhaps we can avoid adding a branch by setting a bit in
> inet->cmsg_flags for gso_size to let the below branch handle the cmsg
> processing.

Uhmm... I think that for ipv6 sockets we need to set a bit in rxopt
instead (and we already have some conditionals we could for ipv6 socket
recv cmsg processing).

> I'd still set that as part of the UDP_GRO setsockopt. Though if you
> insist it could be a value 2 instead of 1, effectively allowing the
> above mentioned opt-out.

I'm ok with the current impl (no additional value to opt-in the UDP GRO
cmsg).

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.
  2018-10-22 13:41     ` Paolo Abeni
@ 2018-10-22 15:51       ` Willem de Bruijn
  0 siblings, 0 replies; 41+ messages in thread
From: Willem de Bruijn @ 2018-10-22 15:51 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: steffen.klassert, Network Development, Willem de Bruijn

> >
> > > +static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
> > > +                                          struct sk_buff *skb)
> > > +{
> > > +   struct udphdr *uh = udp_hdr(skb);
> > > +   struct sk_buff *pp = NULL;
> > > +   struct udphdr *uh2;
> > > +   struct sk_buff *p;
> > > +
> > > +   /* requires non zero csum, for simmetry with GSO */
> > > +   if (!uh->check) {
> > > +           NAPI_GRO_CB(skb)->flush = 1;
> > > +           return NULL;
> > > +   }
> >
> > Why is the requirement of checksums different than in
> > udp_gro_receive? It's not that I care much about UDP
> > packets without a checksum, but you would not need
> > to implement your own loop if the requirement could
> > be the same as in udp_gro_receive.

It would be nice if we could deduplicate the loops, but even without
the checksum difference they look to me a bit too different for it to be
practical, also with the constraints on segment length and max aggregation.

> uhm....
> AFAIU, we need to generated aggregated packets that UDP GSO is able to
> process/segment. I was unable to get a nocsum packet segment (possibly
> PEBKAC) so I enforced that condition on the rx path.
>
> @Willem: did I see ghost here? is UDP_SEGMENT fine with no checksum
> segment?

udp_send_skb fails with EIO if ip_summed is anything but CHECKSUM_PARTIAL.

but that's not in the forwarding path. Still, __udp_gso_segment as is
depends on that invariant and will not handle packets with zero
checksum correctly. It unconditionally adjusts uh->check. That could
be changed, of course.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection
  2018-10-22 10:29     ` Paolo Abeni
@ 2018-10-22 16:00       ` Willem de Bruijn
  0 siblings, 0 replies; 41+ messages in thread
From: Willem de Bruijn @ 2018-10-22 16:00 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Mon, Oct 22, 2018 at 6:29 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Sun, 2018-10-21 at 16:08 -0400, Willem de Bruijn wrote:
> > On Fri, Oct 19, 2018 at 10:31 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > In some scenarios, the GRO engine can assemble an UDP GRO packet
> > > that ultimately lands on a non GRO-enabled socket.
> > > This patch tries to address the issue explicitly checking for the UDP
> > > socket features before enqueuing the packet, and eventually segmenting
> > > the unexpected GRO packet, as needed.
> > >
> > > We must also cope with re-insertion requests: after segmentation the
> > > UDP code calls the helper introduced by the previous patches, as needed.
> > >
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > > +static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> > > +                                             struct sk_buff *skb)
> > > +{
> > > +       struct sk_buff *segs;
> > > +
> > > +       /* the GSO CB lays after the UDP one, no need to save and restore any
> > > +        * CB fragment, just initialize it
> > > +        */
> > > +       segs = __skb_gso_segment(skb, NETIF_F_SG, false);
> > > +       if (unlikely(IS_ERR(segs)))
> > > +               kfree_skb(skb);
> > > +       else if (segs)
> > > +               consume_skb(skb);
> > > +       return segs;
> > > +}
> > > +
> > > +
> > > +void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int proto);
> > > +
> > > +static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > > +{
> > > +       struct sk_buff *next, *segs;
> > > +       int ret;
> > > +
> > > +       if (likely(!udp_unexpected_gso(sk, skb)))
> > > +               return udp_queue_rcv_one_skb(sk, skb);
> > > +
> > > +       BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_SGO_CB_OFFSET);
> > > +       __skb_push(skb, -skb_mac_offset(skb));
> > > +       segs = udp_rcv_segment(sk, skb);
> > > +       for (skb = segs; skb; skb = next) {
> >
> > need to check IS_ERR(segs) again?
>
> whooops ... yes, I think so, thanks for catching it.
>
> Since the error code is always discarded, perhpas udp_rcv_segment() can
> simply return 0 when IS_ERR(segs) is true, so we can save a conditional
> here. This is currently a slower/exceptional path, but if we will
> enable UDP GRO for forwaded packets, it will be hit often.

That sounds fine.

In udp_rcv_segment, we should probably account the dropped segments
to UDP_MIB_INERRORS and sk_drops.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 01/10] udp: implement complete book-keeping for encap_needed
  2018-10-19 14:25 ` [RFC PATCH v2 01/10] udp: implement complete book-keeping for encap_needed Paolo Abeni
@ 2018-10-22 16:06   ` Willem de Bruijn
  2018-10-25 13:00     ` Paolo Abeni
  0 siblings, 1 reply; 41+ messages in thread
From: Willem de Bruijn @ 2018-10-22 16:06 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Network Development, Willem de Bruijn, steffen.klassert

On Fri, Oct 19, 2018 at 10:30 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> The *encap_needed static keys are enabled by UDP tunnels
> and several UDP encapsulations type, but they are never
> turned off. This can cause unneeded overall performance
> degradation for systems where such features are used
> transiently.
>
> This patch introduces complete book-keeping for such keys,
> decreasing the usage at socket destruction time, if needed,
> and avoiding that the same socket could increase the key
> usage multiple times.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/linux/udp.h      |  7 ++++++-
>  include/net/udp_tunnel.h |  6 ++++++
>  net/ipv4/udp.c           | 18 ++++++++++++------
>  net/ipv6/udp.c           | 14 +++++++++-----
>  4 files changed, 33 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/udp.h b/include/linux/udp.h
> index 320d49d85484..a4dafff407fb 100644
> --- a/include/linux/udp.h
> +++ b/include/linux/udp.h
> @@ -49,7 +49,12 @@ struct udp_sock {
>         unsigned int     corkflag;      /* Cork is required */
>         __u8             encap_type;    /* Is this an Encapsulation socket? */
>         unsigned char    no_check6_tx:1,/* Send zero UDP6 checksums on TX? */
> -                        no_check6_rx:1;/* Allow zero UDP6 checksums on RX? */
> +                        no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
> +                        encap_enabled:1; /* This socket enabled encap
> +                                          * processing; UDP tunnels and
> +                                          * different encapsulation layer set
> +                                          * this
> +                                          */
>         /*
>          * Following member retains the information to create a UDP header
>          * when the socket is uncorked.
> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
> index fe680ab6b15a..3fbe56430e3b 100644
> --- a/include/net/udp_tunnel.h
> +++ b/include/net/udp_tunnel.h
> @@ -165,6 +165,12 @@ static inline int udp_tunnel_handle_offloads(struct sk_buff *skb, bool udp_csum)
>
>  static inline void udp_tunnel_encap_enable(struct socket *sock)
>  {
> +       struct udp_sock *up = udp_sk(sock->sk);
> +
> +       if (up->encap_enabled)
> +               return;
> +
> +       up->encap_enabled = 1;
>  #if IS_ENABLED(CONFIG_IPV6)
>         if (sock->sk->sk_family == PF_INET6)
>                 ipv6_stub->udpv6_encap_enable();
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index cf8252d05a01..9fcb5374e166 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2382,11 +2382,15 @@ void udp_destroy_sock(struct sock *sk)
>         bool slow = lock_sock_fast(sk);
>         udp_flush_pending_frames(sk);
>         unlock_sock_fast(sk, slow);
> -       if (static_branch_unlikely(&udp_encap_needed_key) && up->encap_type) {
> -               void (*encap_destroy)(struct sock *sk);
> -               encap_destroy = READ_ONCE(up->encap_destroy);
> -               if (encap_destroy)
> -                       encap_destroy(sk);
> +       if (static_branch_unlikely(&udp_encap_needed_key)) {
> +               if (up->encap_type) {
> +                       void (*encap_destroy)(struct sock *sk);
> +                       encap_destroy = READ_ONCE(up->encap_destroy);
> +                       if (encap_destroy)
> +                               encap_destroy(sk);
> +               }
> +               if (up->encap_enabled)
> +                       static_branch_disable(&udp_encap_needed_key);
>         }
>  }
>
> @@ -2431,7 +2435,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
>                         /* FALLTHROUGH */
>                 case UDP_ENCAP_L2TPINUDP:
>                         up->encap_type = val;
> -                       udp_encap_enable();
> +                       if (!up->encap_enabled)
> +                               udp_encap_enable();
> +                       up->encap_enabled = 1;

nit: no need for the branch: udp_encap_enable already has a branch and
is static inline.

Perhaps it makes sense to convert that to take the udp_sock and handle
the state change within, to avoid having to open code at multiple
locations.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection
  2018-10-19 14:25 ` [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection Paolo Abeni
  2018-10-21 20:08   ` Willem de Bruijn
  2018-10-22 11:43   ` Steffen Klassert
@ 2018-10-22 19:04   ` Subash Abhinov Kasiviswanathan
  2018-10-23  7:59     ` Paolo Abeni
  2 siblings, 1 reply; 41+ messages in thread
From: Subash Abhinov Kasiviswanathan @ 2018-10-22 19:04 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, Willem de Bruijn, Steffen Klassert

On 2018-10-19 08:25, Paolo Abeni wrote:
> In some scenarios, the GRO engine can assemble an UDP GRO packet
> that ultimately lands on a non GRO-enabled socket.
> This patch tries to address the issue explicitly checking for the UDP
> socket features before enqueuing the packet, and eventually segmenting
> the unexpected GRO packet, as needed.
> 
> We must also cope with re-insertion requests: after segmentation the
> UDP code calls the helper introduced by the previous patches, as 
> needed.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> +static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff
> *skb)
> +{
> +	return !udp_sk(sk)->gro_enabled && skb_is_gso(skb) &&
> +	       skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4;
> +}
> +
> +static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> +					      struct sk_buff *skb)
> +{
> +	struct sk_buff *segs;
> +
> +	/* the GSO CB lays after the UDP one, no need to save and restore
> any
> +	 * CB fragment, just initialize it
> +	 */
> +	segs = __skb_gso_segment(skb, NETIF_F_SG, false);
> +	if (unlikely(IS_ERR(segs)))
> +		kfree_skb(skb);
> +	else if (segs)
> +		consume_skb(skb);
> +	return segs;
> +}
> +
> +

Hi Paolo

Do we need to check for IS_ERR_OR_NULL(segs)

> 
> +void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int
> proto);
> +
> +static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> +{
> +	struct sk_buff *next, *segs;
> +	int ret;
> +
> +	if (likely(!udp_unexpected_gso(sk, skb)))
> +		return udp_queue_rcv_one_skb(sk, skb);

> +static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> +{
> +	struct sk_buff *next, *segs;
> +	int ret;
> +
> +	if (likely(!udp_unexpected_gso(sk, skb)))
> +		return udpv6_queue_rcv_one_skb(sk, skb);
> +

Is the "likely" required here?
HW can coalesce all incoming streams of UDP and may not know the socket 
state.
In that case, a socket not having UDP GRO option might see a penalty 
here.


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection
  2018-10-22 19:04   ` Subash Abhinov Kasiviswanathan
@ 2018-10-23  7:59     ` Paolo Abeni
  2018-10-24  0:55       ` Subash Abhinov Kasiviswanathan
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Abeni @ 2018-10-23  7:59 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan; +Cc: netdev, Willem de Bruijn, Steffen Klassert

Hi,

On Mon, 2018-10-22 at 13:04 -0600, Subash Abhinov Kasiviswanathan
wrote:
> On 2018-10-19 08:25, Paolo Abeni wrote:
> > In some scenarios, the GRO engine can assemble an UDP GRO packet
> > that ultimately lands on a non GRO-enabled socket.
> > This patch tries to address the issue explicitly checking for the UDP
> > socket features before enqueuing the packet, and eventually segmenting
> > the unexpected GRO packet, as needed.
> > 
> > We must also cope with re-insertion requests: after segmentation the
> > UDP code calls the helper introduced by the previous patches, as 
> > needed.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > +static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff
> > *skb)
> > +{
> > +	return !udp_sk(sk)->gro_enabled && skb_is_gso(skb) &&
> > +	       skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4;
> > +}
> > +
> > +static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> > +					      struct sk_buff *skb)
> > +{
> > +	struct sk_buff *segs;
> > +
> > +	/* the GSO CB lays after the UDP one, no need to save and restore
> > any
> > +	 * CB fragment, just initialize it
> > +	 */
> > +	segs = __skb_gso_segment(skb, NETIF_F_SG, false);
> > +	if (unlikely(IS_ERR(segs)))
> > +		kfree_skb(skb);
> > +	else if (segs)
> > +		consume_skb(skb);
> > +	return segs;
> > +}
> > +
> > +
> 
> Hi Paolo
> 
> Do we need to check for IS_ERR_OR_NULL(segs)

Yes, thanks.

(also Williem already noted the above)

> > 
> > +void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int
> > proto);
> > +
> > +static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > +{
> > +	struct sk_buff *next, *segs;
> > +	int ret;
> > +
> > +	if (likely(!udp_unexpected_gso(sk, skb)))
> > +		return udp_queue_rcv_one_skb(sk, skb);
> > +static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > +{
> > +	struct sk_buff *next, *segs;
> > +	int ret;
> > +
> > +	if (likely(!udp_unexpected_gso(sk, skb)))
> > +		return udpv6_queue_rcv_one_skb(sk, skb);
> > +
> 
> Is the "likely" required here?

Not required, but currently helpful IMHO, as we should hit the above
only on unlikey and really unwonted configuration.

Note that only SKB_GSO_UDP_L4 GSO packets will not match the above
likely condition.

> HW can coalesce all incoming streams of UDP and may not know the socket 
> state.
> In that case, a socket not having UDP GRO option might see a penalty 
> here.

Really? Is there any HW creating SKB_GSO_UDP_L4 packets on RX? if the
HW is doing that, without this patch, I think it's breaking existing
applications (which may expext that the read UDP frame length
implicitly describe the application level message length).

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection
  2018-10-22 12:51     ` Paolo Abeni
@ 2018-10-23 10:29       ` Steffen Klassert
  0 siblings, 0 replies; 41+ messages in thread
From: Steffen Klassert @ 2018-10-23 10:29 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, Willem de Bruijn

On Mon, Oct 22, 2018 at 02:51:56PM +0200, Paolo Abeni wrote:
> > > +
> > > +static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > > +{
> > > +	struct sk_buff *next, *segs;
> > > +	int ret;
> > > +
> > > +	if (likely(!udp_unexpected_gso(sk, skb)))
> > > +		return udp_queue_rcv_one_skb(sk, skb);
> > > +
> > > +	BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_SGO_CB_OFFSET);
> > > +	__skb_push(skb, -skb_mac_offset(skb));
> > > +	segs = udp_rcv_segment(sk, skb);
> > > +	for (skb = segs; skb; skb = next) {
> > > +		next = skb->next;
> > > +		__skb_pull(skb, skb_transport_offset(skb));
> > > +		ret = udp_queue_rcv_one_skb(sk, skb);
> > 
> > udp_queue_rcv_one_skb() starts with doing a xfrm4_policy_check().
> > Maybe we can do this on the GSO packet instead of the segments.
> > So far this code is just for handling a corner case, but this might
> > change.
> 
> I thought about keeping the policy check here, but then I preferred
> what looked the safest option. Perhaps we can improve with a follow-up?

Fair enough. Let's keep it in mind and do it later.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 00/10] udp: implement GRO support
  2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
                   ` (10 preceding siblings ...)
  2018-10-21 20:05 ` [RFC PATCH v2 00/10] udp: implement GRO support Willem de Bruijn
@ 2018-10-23 12:10 ` Steffen Klassert
  2018-10-23 12:22   ` Paolo Abeni
  11 siblings, 1 reply; 41+ messages in thread
From: Steffen Klassert @ 2018-10-23 12:10 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, Willem de Bruijn

On Fri, Oct 19, 2018 at 04:25:10PM +0200, Paolo Abeni wrote:
> This series implements GRO support for UDP sockets, as the RX counterpart
> of commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
> The core functionality is implemented by the second patch, introducing a new
> sockopt to enable UDP_GRO, while patch 3 implements support for passing the
> segment size to the user space via a new cmsg.
> UDP GRO performs a socket lookup for each ingress packets and aggregate datagram
> directed to UDP GRO enabled sockets with constant l4 tuple.
> 
> UDP GRO packets can land on non GRO-enabled sockets, e.g. due to iptables NAT
> rules, and that could potentially confuse existing applications.
> 
> The solution adopted here is to de-segment the GRO packet before enqueuing
> as needed. Since we must cope with packet reinsertion after de-segmentation,
> the relevant code is factored-out in ipv4 and ipv6 specific helpers and exposed
> to UDP usage.
> 
> While the current code can probably be improved, this safeguard ,implemented in
> the patches 4-7, allows future enachements to enable UDP GSO offload on more
> virtual devices eventually even on forwarded packets.

I was curious what I get with this when doing packet forwarding.
So I added forwarding support with the patch below (on top of
your patchset). While the packet processing could work the way
I do it in this patch, I'm not sure about the way I enable
UDP GRO on forwarding. Maybe there it a better way than just
do UDP GRO if forwarding is enabled on the receiving device.

Some quick benchmark numbers with UDP packet forwarding
(1460 byte packets) through two gateways:

net-next: 16.4 Gbps

net-next + UDP GRO: 20.3 Gbps

Subject: [PATCH RFC] udp: Allow gro for the forwarding path.

This patch adds a early route lookup to inet_gro_receive()
in case forwarding is enabled on the receiving device.
To be forwarded packets are allowed to enter the UDP
GRO handlers then.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/netdevice.h |  2 +-
 include/net/dst.h         |  1 +
 net/core/dev.c            |  6 ++++--
 net/ipv4/af_inet.c        | 12 ++++++++++++
 net/ipv4/route.c          |  1 +
 net/ipv4/udp_offload.c    | 11 +++++++++--
 6 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index dc1d9ed33b31..2eb3fa960ad4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2304,7 +2304,7 @@ struct napi_gro_cb {
 	/* Number of gro_receive callbacks this packet already went through */
 	u8 recursion_counter:4;
 
-	/* 1 bit hole */
+	u8	is_fwd:1;
 
 	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
 	__wsum	csum;
diff --git a/include/net/dst.h b/include/net/dst.h
index 6cf0870414c7..01bdf825a6f9 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -54,6 +54,7 @@ struct dst_entry {
 #define DST_XFRM_TUNNEL		0x0020
 #define DST_XFRM_QUEUE		0x0040
 #define DST_METADATA		0x0080
+#define DST_FWD			0x0100
 
 	/* A non-zero value of dst->obsolete forces by-hand validation
 	 * of the route entry.  Positive values are set by the generic
diff --git a/net/core/dev.c b/net/core/dev.c
index 022ad73d6253..c6aaffb99456 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5387,8 +5387,10 @@ static struct list_head *gro_list_prepare(struct napi_struct *napi,
 
 		diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev;
 		diffs |= p->vlan_tci ^ skb->vlan_tci;
-		diffs |= skb_metadata_dst_cmp(p, skb);
-		diffs |= skb_metadata_differs(p, skb);
+		if (!NAPI_GRO_CB(p)->is_fwd) {
+			diffs |= skb_metadata_dst_cmp(p, skb);
+			diffs |= skb_metadata_differs(p, skb);
+		}
 		if (maclen == ETH_HLEN)
 			diffs |= compare_ether_header(skb_mac_header(p),
 						      skb_mac_header(skb));
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 1fbe2f815474..6e4e8588c0b1 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1479,8 +1479,20 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 			NAPI_GRO_CB(p)->flush_id = flush_id;
 		else
 			NAPI_GRO_CB(p)->flush_id |= flush_id;
+
+		NAPI_GRO_CB(skb)->is_fwd = NAPI_GRO_CB(p)->is_fwd;
+
+		goto found;
 	}
 
+	if (IN_DEV_FORWARD(__in_dev_get_rcu(skb->dev))) {
+		if (!ip_route_input_noref(skb, iph->daddr, iph->saddr, iph->tos,
+					  skb->dev)) {
+			if ((skb_dst(skb)->flags & DST_FWD))
+				NAPI_GRO_CB(skb)->is_fwd = 1;
+		}
+	}
+found:
 	NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF));
 	NAPI_GRO_CB(skb)->flush |= flush;
 	skb_set_network_header(skb, off);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c0a9d26c06ce..3e5b3f0be0f0 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1757,6 +1757,7 @@ static int __mkroute_input(struct sk_buff *skb,
 	rth->rt_is_input = 1;
 	RT_CACHE_STAT_INC(in_slow_tot);
 
+	rth->dst.flags |= DST_FWD;
 	rth->dst.input = ip_forward;
 
 	rt_set_nexthop(rth, daddr, res, fnhe, res->fi, res->type, itag,
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index d93c1e8097ba..c99f117bc44e 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -402,6 +402,12 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 	struct sock *sk;
 
 	rcu_read_lock();
+	if (NAPI_GRO_CB(skb)->is_fwd) {
+		pp = call_gro_receive(udp_gro_receive_segment, head, skb);
+		rcu_read_unlock();
+		return pp;
+	}
+
 	sk = (*lookup)(skb, uh->source, uh->dest);
 	if (!sk)
 		goto out_unlock;
@@ -456,7 +462,8 @@ static struct sk_buff *udp4_gro_receive(struct list_head *head,
 {
 	struct udphdr *uh = udp_gro_udphdr(skb);
 
-	if (unlikely(!uh) || !static_branch_unlikely(&udp_encap_needed_key))
+	if (unlikely(!uh) || (!static_branch_unlikely(&udp_encap_needed_key) &&
+	    !NAPI_GRO_CB(skb)->is_fwd))
 		goto flush;
 
 	/* Don't bother verifying checksum if we're going to flush anyway. */
@@ -503,7 +510,7 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
 
 	rcu_read_lock();
 	sk = (*lookup)(skb, uh->source, uh->dest);
-	if (sk && udp_sk(sk)->gro_enabled) {
+	if ((sk && udp_sk(sk)->gro_enabled) || NAPI_GRO_CB(skb)->is_fwd) {
 		err = udp_gro_complete_segment(skb);
 	} else if (sk && udp_sk(sk)->gro_complete) {
 		skb_shinfo(skb)->gso_type = uh->check ? SKB_GSO_UDP_TUNNEL_CSUM
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 00/10] udp: implement GRO support
  2018-10-23 12:10 ` Steffen Klassert
@ 2018-10-23 12:22   ` Paolo Abeni
  2018-10-24 10:55     ` Steffen Klassert
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Abeni @ 2018-10-23 12:22 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev, Willem de Bruijn

Hi,

On Tue, 2018-10-23 at 14:10 +0200, Steffen Klassert wrote:
> On Fri, Oct 19, 2018 at 04:25:10PM +0200, Paolo Abeni wrote:
> > This series implements GRO support for UDP sockets, as the RX counterpart
> > of commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
> > The core functionality is implemented by the second patch, introducing a new
> > sockopt to enable UDP_GRO, while patch 3 implements support for passing the
> > segment size to the user space via a new cmsg.
> > UDP GRO performs a socket lookup for each ingress packets and aggregate datagram
> > directed to UDP GRO enabled sockets with constant l4 tuple.
> > 
> > UDP GRO packets can land on non GRO-enabled sockets, e.g. due to iptables NAT
> > rules, and that could potentially confuse existing applications.
> > 
> > The solution adopted here is to de-segment the GRO packet before enqueuing
> > as needed. Since we must cope with packet reinsertion after de-segmentation,
> > the relevant code is factored-out in ipv4 and ipv6 specific helpers and exposed
> > to UDP usage.
> > 
> > While the current code can probably be improved, this safeguard ,implemented in
> > the patches 4-7, allows future enachements to enable UDP GSO offload on more
> > virtual devices eventually even on forwarded packets.
> 
> I was curious what I get with this when doing packet forwarding.
> So I added forwarding support with the patch below (on top of
> your patchset). While the packet processing could work the way
> I do it in this patch, I'm not sure about the way I enable
> UDP GRO on forwarding. Maybe there it a better way than just
> do UDP GRO if forwarding is enabled on the receiving device.

My idea/hope is slightly different: ensure that UDP GRO + UDP input
path + UDP desegmentation on socket enqueue is safe and as fast as
plain UDP input path and then enable UDP GRO without socket lookup, for
each incoming frame (!!!).

If we land on the input path on a non UDP GRO enabled socket, the
packet is de-segment before enqueuing.

Socket lookup would be needed only if tunnels are enabled, to check if
the ingress frame match a tunnel. We will use UDP tunnel GRO in that
case.

> Some quick benchmark numbers with UDP packet forwarding
> (1460 byte packets) through two gateways:
> 
> net-next: 16.4 Gbps
> 
> net-next + UDP GRO: 20.3 Gbps

uhmmm... what do you think about this speed-up ?

I hoped that without the socket lookup we can get measurably better
figures.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection
  2018-10-23  7:59     ` Paolo Abeni
@ 2018-10-24  0:55       ` Subash Abhinov Kasiviswanathan
  0 siblings, 0 replies; 41+ messages in thread
From: Subash Abhinov Kasiviswanathan @ 2018-10-24  0:55 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, Willem de Bruijn, Steffen Klassert

>> Is the "likely" required here?
> 
> Not required, but currently helpful IMHO, as we should hit the above
> only on unlikey and really unwonted configuration.
> 
> Note that only SKB_GSO_UDP_L4 GSO packets will not match the above
> likely condition.
> 
>> HW can coalesce all incoming streams of UDP and may not know the 
>> socket
>> state.
>> In that case, a socket not having UDP GRO option might see a penalty
>> here.
> 
> Really? Is there any HW creating SKB_GSO_UDP_L4 packets on RX? if the
> HW is doing that, without this patch, I think it's breaking existing
> applications (which may expext that the read UDP frame length
> implicitly describe the application level message length).
> 

Hi

Yes, I agree that existing HW would not work without the patch.
My question was based on how UDP GRO packets from any future HW would 
interact
with this code path and if they might be potentially have any side 
effects
due to socket option not being set.

We do not have control over the application and the socket options being 
used
on systems like Android. The packet count reduction due to UDP GRO would 
help
when there are multiple firewall rules present even if we do not take
advantage of the reduced recvmsg() calls.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 00/10] udp: implement GRO support
  2018-10-23 12:22   ` Paolo Abeni
@ 2018-10-24 10:55     ` Steffen Klassert
  0 siblings, 0 replies; 41+ messages in thread
From: Steffen Klassert @ 2018-10-24 10:55 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, Willem de Bruijn

On Tue, Oct 23, 2018 at 02:22:12PM +0200, Paolo Abeni wrote:
> On Tue, 2018-10-23 at 14:10 +0200, Steffen Klassert wrote:
> 
> > Some quick benchmark numbers with UDP packet forwarding
> > (1460 byte packets) through two gateways:
> > 
> > net-next: 16.4 Gbps
> > 
> > net-next + UDP GRO: 20.3 Gbps
> 
> uhmmm... what do you think about this speed-up ?

skb_segment() burns a lot of cycles. If I do the same test with
TCP and disable HW TSO, throughput drops also down to similar
values.

In case of software segmentation, the skb chain appropach
is likely faster because packets are not mangled. So no need
to allocate skbs, no new checksum calculations, less memcpy etc.

If we have an early route lookup in GRO, we could have a good
guess on the offload capabilities of the outgoing device.
So in case that software segmentation is likely, use the
skb chaining method. If HW segmentation is likely, merge
IP packets.

The chaining method might be also faster on non UDP GRO enabled
sockets.

I'll try to implement the skb chaining method on top of this
to see what we get from that.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2 01/10] udp: implement complete book-keeping for encap_needed
  2018-10-22 16:06   ` Willem de Bruijn
@ 2018-10-25 13:00     ` Paolo Abeni
  0 siblings, 0 replies; 41+ messages in thread
From: Paolo Abeni @ 2018-10-25 13:00 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Willem de Bruijn, steffen.klassert

Hi,

I'm sorry for lagging behind, this one felt outside my radar.

On Mon, 2018-10-22 at 12:06 -0400, Willem de Bruijn wrote:
> @@ -2431,7 +2435,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
> >                         /* FALLTHROUGH */
> >                 case UDP_ENCAP_L2TPINUDP:
> >                         up->encap_type = val;
> > -                       udp_encap_enable();
> > +                       if (!up->encap_enabled)
> > +                               udp_encap_enable();
> > +                       up->encap_enabled = 1;
> 
> nit: no need for the branch: udp_encap_enable already has a branch and
> is static inline.

Uhm... I think it's needed, so that we call udp_encap_enable() at most
once per socket. If up->encap_enabled we also call static_key_disable
at socket destruction time (once per socket, again) and hopefully we
don't "leak" static_key_enable invocation.

> Perhaps it makes sense to convert that to take the udp_sock and handle
> the state change within, to avoid having to open code at multiple
> locations.

Possibly calling directly udp_tunnel_encap_enable()? that additionally
cope with ipv6, which is not needed here, but should not hurt.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-10-25 21:32 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-19 14:25 [RFC PATCH v2 00/10] udp: implement GRO support Paolo Abeni
2018-10-19 14:25 ` [RFC PATCH v2 01/10] udp: implement complete book-keeping for encap_needed Paolo Abeni
2018-10-22 16:06   ` Willem de Bruijn
2018-10-25 13:00     ` Paolo Abeni
2018-10-19 14:25 ` [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets Paolo Abeni
2018-10-21 20:06   ` Willem de Bruijn
2018-10-22 10:13     ` Paolo Abeni
2018-10-22 15:15       ` Willem de Bruijn
2018-10-22 11:24   ` Steffen Klassert
2018-10-22 13:41     ` Paolo Abeni
2018-10-22 15:51       ` Willem de Bruijn
2018-10-19 14:25 ` [RFC PATCH v2 03/10] udp: add support for UDP_GRO cmsg Paolo Abeni
2018-10-21 20:07   ` Willem de Bruijn
2018-10-22 15:44     ` Paolo Abeni
2018-10-19 14:25 ` [RFC PATCH v2 04/10] ip: factor out protocol delivery helper Paolo Abeni
2018-10-19 14:25 ` [RFC PATCH v2 05/10] ipv6: " Paolo Abeni
2018-10-19 14:25 ` [RFC PATCH v2 06/10] udp: cope with UDP GRO packet misdirection Paolo Abeni
2018-10-21 20:08   ` Willem de Bruijn
2018-10-22 10:29     ` Paolo Abeni
2018-10-22 16:00       ` Willem de Bruijn
2018-10-22 11:43   ` Steffen Klassert
2018-10-22 12:51     ` Paolo Abeni
2018-10-23 10:29       ` Steffen Klassert
2018-10-22 19:04   ` Subash Abhinov Kasiviswanathan
2018-10-23  7:59     ` Paolo Abeni
2018-10-24  0:55       ` Subash Abhinov Kasiviswanathan
2018-10-19 14:25 ` [RFC PATCH v2 07/10] selftests: add GRO support to udp bench rx program Paolo Abeni
2018-10-21 20:08   ` Willem de Bruijn
2018-10-22 10:31     ` Paolo Abeni
2018-10-19 14:25 ` [RFC PATCH v2 08/10] selftests: conditionally enable XDP support in udpgso_bench_rx Paolo Abeni
2018-10-21 20:09   ` Willem de Bruijn
2018-10-22 10:37     ` Paolo Abeni
2018-10-19 14:25 ` [RFC PATCH v2 09/10] selftests: add some benchmark for UDP GRO Paolo Abeni
2018-10-19 14:25 ` [RFC PATCH v2 10/10] selftests: add functionals test " Paolo Abeni
2018-10-21 20:09   ` Willem de Bruijn
2018-10-22 10:46     ` Paolo Abeni
2018-10-21 20:05 ` [RFC PATCH v2 00/10] udp: implement GRO support Willem de Bruijn
2018-10-22  9:41   ` Paolo Abeni
2018-10-23 12:10 ` Steffen Klassert
2018-10-23 12:22   ` Paolo Abeni
2018-10-24 10:55     ` Steffen Klassert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.