netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] Support fraglist GRO/GSO
@ 2018-12-21  7:53 Steffen Klassert
  2018-12-21  7:53 ` [PATCH RFC 1/3] net: Prepare GSO return values for fraglist GSO Steffen Klassert
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Steffen Klassert @ 2018-12-21  7:53 UTC (permalink / raw)
  To: netdev
  Cc: Steffen Klassert, Willem de Bruijn, Paolo Abeni, Jason A. Donenfeld

This patchset adds support to do GRO/GSO by chaining packets
of the same flow at the SKB frag_list pointer. This avoids
the overhead to merge payloads into one big packet, and
on the other end, if GSO is needed it avoids the overhead
of splitting the big packet back to the native form.

Patch 1 prepares GSO to handle fraglist GSO packets.
Patch 2 adds the core infrastructure to do fraglist
GRO/GSO. Patch 3 enables IPv4 UDP to use fraglist
GRO/GSO if no GRO supported socket is found.

I have only forwarding performance measurements so far:

I used used my IPsec forwarding test setup for this:

           ------------         ------------
        -->| router 1 |-------->| router 2 |--
        |  ------------         ------------  |
        |                                     |
        |       --------------------          |
        --------|Spirent Testcenter|<----------
                --------------------

net-next (December 10th):

Single stream UDP frame size 1460 Bytes: 1.341.700 fps (15.67 Gbps).

----------------------------------------------------------------------

net-next (December 10th) + hack to enable forwarding for standard UDP GRO:

Single stream UDP frame size 1460 Bytes: 1.651.200 fps (19.28 Gbps).

----------------------------------------------------------------------

net-next (December 10th) + fraglist UDP GRO/GSO:

Single stream UDP frame size 1460 Bytes: 2.742.500 fps (32.03 Gbps).

-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RFC 1/3] net: Prepare GSO return values for fraglist GSO.
  2018-12-21  7:53 [PATCH RFC 0/3] Support fraglist GRO/GSO Steffen Klassert
@ 2018-12-21  7:53 ` Steffen Klassert
  2019-01-08 13:53   ` Paolo Abeni
  2018-12-21  7:53 ` [PATCH RFC 2/3] net: Support GRO/GSO fraglist chaining Steffen Klassert
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2018-12-21  7:53 UTC (permalink / raw)
  To: netdev
  Cc: Steffen Klassert, Willem de Bruijn, Paolo Abeni, Jason A. Donenfeld

On fraglist GSO, we don't need to clone the original
skb. So we don't have anything to return to free.
Prepare GSO that it frees the original skb only
if the return pointer really changed. Fraglist
GSO frees the original skb itself on error and
returns -EREMOTE in this case.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/udp.h      |  8 ++++++--
 net/core/dev.c         | 11 +++++++----
 net/ipv4/ip_output.c   |  3 ++-
 net/xfrm/xfrm_output.c |  3 ++-
 4 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index fd6d948755c8..f89b95c3f91e 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -482,11 +482,15 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
 
 		atomic_add(segs_nr, &sk->sk_drops);
 		SNMP_ADD_STATS(__UDPX_MIB(sk, ipv4), UDP_MIB_INERRORS, segs_nr);
-		kfree_skb(skb);
+
+		if (PTR_ERR(segs) != -EREMOTE)
+			kfree_skb(skb);
 		return NULL;
 	}
 
-	consume_skb(skb);
+	if (segs != skb)
+		consume_skb(skb);
+
 	return segs;
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 754284873355..53df5ac7c9b2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3030,7 +3030,8 @@ struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb,
 	}
 	rcu_read_unlock();
 
-	__skb_push(skb, skb->data - skb_mac_header(skb));
+	if (segs != skb)
+		__skb_push(skb, skb->data - skb_mac_header(skb));
 
 	return segs;
 }
@@ -3099,7 +3100,7 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 
 	segs = skb_mac_gso_segment(skb, features);
 
-	if (unlikely(skb_needs_check(skb, tx_path) && !IS_ERR(segs)))
+	if (segs != skb && unlikely(skb_needs_check(skb, tx_path) && !IS_ERR(segs)))
 		skb_warn_bad_offload(skb);
 
 	return segs;
@@ -3345,8 +3346,10 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device
 
 		segs = skb_gso_segment(skb, features);
 		if (IS_ERR(segs)) {
-			goto out_kfree_skb;
-		} else if (segs) {
+			if (PTR_ERR(segs) != -EREMOTE)
+				goto out_kfree_skb;
+			goto out_null;
+		} else if (segs && segs != skb) {
 			consume_skb(skb);
 			skb = segs;
 		}
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index ab6618036afe..f4cecda6c1e8 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -272,7 +272,8 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk,
 		return -ENOMEM;
 	}
 
-	consume_skb(skb);
+	if (segs != skb)
+		consume_skb(skb);
 
 	do {
 		struct sk_buff *nskb = segs->next;
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 4ae87c5ce2e3..1941dc2a80a0 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -183,7 +183,8 @@ static int xfrm_output_gso(struct net *net, struct sock *sk, struct sk_buff *skb
 	BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
 	BUILD_BUG_ON(sizeof(*IP6CB(skb)) > SKB_SGO_CB_OFFSET);
 	segs = skb_gso_segment(skb, 0);
-	kfree_skb(skb);
+	if (segs != skb)
+		kfree_skb(skb);
 	if (IS_ERR(segs))
 		return PTR_ERR(segs);
 	if (segs == NULL)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 2/3] net: Support GRO/GSO fraglist chaining.
  2018-12-21  7:53 [PATCH RFC 0/3] Support fraglist GRO/GSO Steffen Klassert
  2018-12-21  7:53 ` [PATCH RFC 1/3] net: Prepare GSO return values for fraglist GSO Steffen Klassert
@ 2018-12-21  7:53 ` Steffen Klassert
  2018-12-21  7:53 ` [PATCH RFC 3/3] udp: Support UDP fraglist GRO/GSO Steffen Klassert
  2018-12-24  1:15 ` [PATCH RFC 0/3] Support " Willem de Bruijn
  3 siblings, 0 replies; 18+ messages in thread
From: Steffen Klassert @ 2018-12-21  7:53 UTC (permalink / raw)
  To: netdev
  Cc: Steffen Klassert, Willem de Bruijn, Paolo Abeni, Jason A. Donenfeld

This patch adds the core functions to chain/unchain
GSO skbs at the frag_list pointer. This also adds
a new GSO type SKB_GSO_FRAGLIST and a is_flist
flag to napi_gro_cb which indicates that this
flow will be GROed by fraglist chaining.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/netdevice.h |   4 +-
 include/linux/skbuff.h    |   4 ++
 net/core/skbuff.c         | 103 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fc6ba71513be..ae907cae6461 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2294,7 +2294,8 @@ struct napi_gro_cb {
 	/* Number of gro_receive callbacks this packet already went through */
 	u8 recursion_counter:4;
 
-	/* 1 bit hole */
+	/* GRO is done by frag_list pointer chaining. */
+	u8	is_flist:1;
 
 	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
 	__wsum	csum;
@@ -2648,6 +2649,7 @@ struct net_device *dev_get_by_napi_id(unsigned int napi_id);
 int netdev_get_name(struct net *net, char *name, int ifindex);
 int dev_restart(struct net_device *dev);
 int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb);
+int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb);
 
 static inline unsigned int skb_gro_offset(const struct sk_buff *skb)
 {
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b1831a5ca173..6a496c0dd0f1 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -578,6 +578,8 @@ enum {
 	SKB_GSO_UDP = 1 << 16,
 
 	SKB_GSO_UDP_L4 = 1 << 17,
+
+	SKB_GSO_FRAGLIST = 1 << 18,
 };
 
 #if BITS_PER_LONG > 32
@@ -3366,6 +3368,8 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet);
 bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu);
 bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
+struct sk_buff *skb_segment_list(struct sk_buff *skb, netdev_features_t features,
+				 unsigned int offset);
 struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
 int skb_ensure_writable(struct sk_buff *skb, int write_len);
 int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 40552547c69a..9ff44a3a2625 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3464,6 +3464,109 @@ static inline skb_frag_t skb_head_frag_to_page_desc(struct sk_buff *frag_skb)
 	return head_frag;
 }
 
+struct sk_buff *skb_segment_list(struct sk_buff *skb,
+				 netdev_features_t features,
+				 unsigned int offset)
+{
+	struct sk_buff *list_skb = skb_shinfo(skb)->frag_list;
+	unsigned int tnl_hlen = skb_tnl_header_len(skb);
+	unsigned int delta_truesize = 0;
+	unsigned int delta_len = 0;
+	struct sk_buff *tail = NULL;
+	struct sk_buff *nskb;
+
+	skb_push(skb, -skb_network_offset(skb) + offset);
+
+	skb_shinfo(skb)->frag_list = NULL;
+
+	do {
+		nskb = list_skb;
+		list_skb = list_skb->next;
+
+		if (!tail)
+			skb->next = nskb;
+		else
+			tail->next = nskb;
+
+		tail = nskb;
+
+		delta_len += nskb->len;
+		delta_truesize += nskb->truesize;
+
+		skb_push(nskb, -skb_network_offset(nskb) + offset);
+
+		if (!secpath_exists(nskb))
+			nskb->sp = secpath_get(skb->sp);
+
+		memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
+
+		nskb->ip_summed = CHECKSUM_UNNECESSARY;
+		nskb->tstamp = skb->tstamp;
+		nskb->dev = skb->dev;
+		nskb->queue_mapping = skb->queue_mapping;
+
+		nskb->mac_len = skb->mac_len;
+		nskb->mac_header = skb->mac_header;
+		nskb->transport_header = skb->transport_header;
+		nskb->network_header = skb->network_header;
+		skb_dst_copy(nskb, skb);
+
+		skb_headers_offset_update(nskb, skb_headroom(nskb) - skb_headroom(skb));
+		skb_copy_from_linear_data_offset(skb, -tnl_hlen,
+						 nskb->data - tnl_hlen,
+						 offset + tnl_hlen);
+
+		if (skb_needs_linearize(nskb, features) &&
+		    __skb_linearize(nskb)) {
+			kfree_skb_list(skb);
+			return ERR_PTR(-EREMOTE);
+		}
+	} while (list_skb);
+
+	skb->truesize = skb->truesize - delta_truesize;
+	skb->data_len = skb->data_len - delta_len;
+	skb->len = skb->len - delta_len;
+
+	skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+	skb_gso_reset(skb);
+
+	skb->prev = tail;
+
+	if (skb_needs_linearize(skb, features) &&
+	    __skb_linearize(skb)) {
+		kfree_skb_list(skb);
+		return ERR_PTR(-EREMOTE);
+	}
+
+	return skb;
+}
+EXPORT_SYMBOL_GPL(skb_segment_list);
+
+int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
+{
+	if (unlikely(p->len + skb->len >= 65536))
+		return -E2BIG;
+
+	if (NAPI_GRO_CB(p)->last == p)
+		skb_shinfo(p)->frag_list = skb;
+	else
+		NAPI_GRO_CB(p)->last->next = skb;
+
+	skb_pull(skb, skb_gro_offset(skb));
+
+	NAPI_GRO_CB(p)->last = skb;
+	NAPI_GRO_CB(p)->count++;
+	p->data_len += skb->len;
+	p->truesize += skb->truesize;
+	p->len += skb->len;
+
+	NAPI_GRO_CB(skb)->same_flow = 1;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(skb_gro_receive_list);
+
 /**
  *	skb_segment - Perform protocol segmentation on skb.
  *	@head_skb: buffer to segment
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 3/3] udp: Support UDP fraglist GRO/GSO.
  2018-12-21  7:53 [PATCH RFC 0/3] Support fraglist GRO/GSO Steffen Klassert
  2018-12-21  7:53 ` [PATCH RFC 1/3] net: Prepare GSO return values for fraglist GSO Steffen Klassert
  2018-12-21  7:53 ` [PATCH RFC 2/3] net: Support GRO/GSO fraglist chaining Steffen Klassert
@ 2018-12-21  7:53 ` Steffen Klassert
  2019-01-08 15:00   ` Paolo Abeni
  2018-12-24  1:15 ` [PATCH RFC 0/3] Support " Willem de Bruijn
  3 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2018-12-21  7:53 UTC (permalink / raw)
  To: netdev
  Cc: Steffen Klassert, Willem de Bruijn, Paolo Abeni, Jason A. Donenfeld

This patch extends UDP GRO to support fraglist GRO/GSO
by using the previously introduced infrastructure.
All UDP packets that are not targeted to a GRO capable
UDP sockets are going to fraglist GRO now (local input
and forward).

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/udp_offload.c | 57 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 49 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 0646d61f4fa8..9d77cc44da6b 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -187,6 +187,20 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
 }
 EXPORT_SYMBOL(skb_udp_tunnel_segment);
 
+static struct sk_buff *__udp_gso_segment_list(struct sk_buff *skb,
+					      netdev_features_t features)
+{
+	unsigned int mss = skb_shinfo(skb)->gso_size;
+
+	skb = skb_segment_list(skb, features, skb_mac_header_len(skb));
+	if (IS_ERR(skb))
+		return skb;
+
+	udp_hdr(skb)->len = htons(sizeof(struct udphdr) + mss);
+
+	return skb;
+}
+
 struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 				  netdev_features_t features)
 {
@@ -199,6 +213,9 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 	__sum16 check;
 	__be16 newlen;
 
+	if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST)
+		return __udp_gso_segment_list(gso_skb, features);
+
 	mss = skb_shinfo(gso_skb)->gso_size;
 	if (gso_skb->len <= sizeof(*uh) + mss)
 		return ERR_PTR(-EINVAL);
@@ -351,16 +368,15 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 	struct sk_buff *pp = NULL;
 	struct udphdr *uh2;
 	struct sk_buff *p;
+	int ret;
 
 	/* requires non zero csum, for symmetry with GSO */
 	if (!uh->check) {
 		NAPI_GRO_CB(skb)->flush = 1;
 		return NULL;
 	}
-
 	/* pull encapsulating udp header */
 	skb_gro_pull(skb, sizeof(struct udphdr));
-	skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
 
 	list_for_each_entry(p, head, list) {
 		if (!NAPI_GRO_CB(p)->same_flow)
@@ -378,8 +394,17 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 		 * Under small packet flood GRO count could elsewhere grow a lot
 		 * leading to execessive truesize values
 		 */
-		if (!skb_gro_receive(p, skb) &&
-		    NAPI_GRO_CB(p)->count >= UDP_GRO_CNT_MAX)
+		if (NAPI_GRO_CB(skb)->is_flist) {
+			if (!pskb_may_pull(skb, skb_gro_offset(skb)))
+				return NULL;
+			ret = skb_gro_receive_list(p, skb);
+		} else {
+			skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
+
+			ret = skb_gro_receive(p, skb);
+		}
+
+		if (!ret && NAPI_GRO_CB(p)->count > UDP_GRO_CNT_MAX)
 			pp = p;
 		else if (uh->len != uh2->len)
 			pp = p;
@@ -403,10 +428,17 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 
 	rcu_read_lock();
 	sk = (*lookup)(skb, uh->source, uh->dest);
-	if (!sk)
-		goto out_unlock;
+	if (!sk) {
+		NAPI_GRO_CB(skb)->is_flist = 1;
+		pp = call_gro_receive(udp_gro_receive_segment, head, skb);
+		rcu_read_unlock();
+		return pp;
+	}
+
+	if (!udp_sk(sk)->gro_receive) {
+		if (!udp_sk(sk)->gro_enabled)
+			NAPI_GRO_CB(skb)->is_flist = 1;
 
-	if (udp_sk(sk)->gro_enabled) {
 		pp = call_gro_receive(udp_gro_receive_segment, head, skb);
 		rcu_read_unlock();
 		return pp;
@@ -456,7 +488,7 @@ static struct sk_buff *udp4_gro_receive(struct list_head *head,
 {
 	struct udphdr *uh = udp_gro_udphdr(skb);
 
-	if (unlikely(!uh) || !static_branch_unlikely(&udp_encap_needed_key))
+	if (unlikely(!uh))
 		goto flush;
 
 	/* Don't bother verifying checksum if we're going to flush anyway. */
@@ -530,6 +562,15 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 	const struct iphdr *iph = ip_hdr(skb);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
+	if (NAPI_GRO_CB(skb)->is_flist) {
+		uh->len = htons(skb->len - nhoff);
+
+		skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
+		skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
+
+		return 0;
+	}
+
 	if (uh->check)
 		uh->check = ~udp_v4_check(skb->len - nhoff, iph->saddr,
 					  iph->daddr, 0);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 0/3] Support fraglist GRO/GSO
  2018-12-21  7:53 [PATCH RFC 0/3] Support fraglist GRO/GSO Steffen Klassert
                   ` (2 preceding siblings ...)
  2018-12-21  7:53 ` [PATCH RFC 3/3] udp: Support UDP fraglist GRO/GSO Steffen Klassert
@ 2018-12-24  1:15 ` Willem de Bruijn
  2018-12-26 13:09   ` Marcelo Ricardo Leitner
  2019-01-14 12:50   ` Steffen Klassert
  3 siblings, 2 replies; 18+ messages in thread
From: Willem de Bruijn @ 2018-12-24  1:15 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Network Development, Willem de Bruijn, Paolo Abeni, Jason A. Donenfeld

On Fri, Dec 21, 2018 at 10:23 AM Steffen Klassert
<steffen.klassert@secunet.com> wrote:
>
> This patchset adds support to do GRO/GSO by chaining packets
> of the same flow at the SKB frag_list pointer. This avoids
> the overhead to merge payloads into one big packet, and
> on the other end, if GSO is needed it avoids the overhead
> of splitting the big packet back to the native form.
>
> Patch 1 prepares GSO to handle fraglist GSO packets.
> Patch 2 adds the core infrastructure to do fraglist
> GRO/GSO. Patch 3 enables IPv4 UDP to use fraglist
> GRO/GSO if no GRO supported socket is found.
>
> I have only forwarding performance measurements so far:
>
> I used used my IPsec forwarding test setup for this:
>
>            ------------         ------------
>         -->| router 1 |-------->| router 2 |--
>         |  ------------         ------------  |
>         |                                     |
>         |       --------------------          |
>         --------|Spirent Testcenter|<----------
>                 --------------------
>
> net-next (December 10th):
>
> Single stream UDP frame size 1460 Bytes: 1.341.700 fps (15.67 Gbps).
>
> ----------------------------------------------------------------------
>
> net-next (December 10th) + hack to enable forwarding for standard UDP GRO:
>
> Single stream UDP frame size 1460 Bytes: 1.651.200 fps (19.28 Gbps).
>
> ----------------------------------------------------------------------
>
> net-next (December 10th) + fraglist UDP GRO/GSO:
>
> Single stream UDP frame size 1460 Bytes: 2.742.500 fps (32.03 Gbps).

That's an impressive speed-up over regular UDP GRO. Definitely worth
looking into more, then.

Sorry for the delay. I still haven't parsed everything yet, but a few
high level questions and comments.

This sounds similar to GSO_BY_FRAGS as used by SCTP. Can perhaps reuse
that or deduplicate a bit. It is nice that this uses a separate
skb_segment_list function; skb_segment is arguably too complex as is
already.

This requires UDP GSO to always be enabled, similar to TCP GSO (as of
commit "tcp: switch to GSO being always on").

I would prefer to split the patch that adds UDP GRO on the forwarding
path into one that enables it for existing GRO (the hack you refer to
above) and a second one to optionally convert to listified processing.

Ideally, we can use existing segmentation on paths where hardware UDP
LSO is supported. I'm not quite sure how to decide between the two
yet. Worst case, perhaps globally use listified forwarding unless any
device is registered with hardware offload, then use regular
segmentation.

For both existing UDP GRO and listified, we should verify that this is
not a potential DoS vector before enabling by default.

A few smaller questions, not necessarily exhaustive (or all sensible ;)
- 1/3
  - do gso handlers never return the original skb currently?
- 2/3
  - did you mean CHECKSUM_PARTIAL?
  - are all those assignments really needed, given that nskb was
    already a fully formed udp packet with just its skb->data moved?
  - calls skb_needs_linearize on the first of the segments in the list only?
- 3/3
   - after pskb_may_pull must reload all ptrs into the data (uh)
   - there are some differences in preparation before the skb is
     passed to skb_gro_receive_list vs skb_gro_receive. Is this
     really needed? They should be interchangeable?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 0/3] Support fraglist GRO/GSO
  2018-12-24  1:15 ` [PATCH RFC 0/3] Support " Willem de Bruijn
@ 2018-12-26 13:09   ` Marcelo Ricardo Leitner
  2019-01-14 12:50   ` Steffen Klassert
  1 sibling, 0 replies; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-12-26 13:09 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Steffen Klassert, Network Development, Willem de Bruijn,
	Paolo Abeni, Jason A. Donenfeld

On Sun, Dec 23, 2018 at 08:15:40PM -0500, Willem de Bruijn wrote:
> On Fri, Dec 21, 2018 at 10:23 AM Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> >
> > This patchset adds support to do GRO/GSO by chaining packets
> > of the same flow at the SKB frag_list pointer. This avoids
> > the overhead to merge payloads into one big packet, and
> > on the other end, if GSO is needed it avoids the overhead
> > of splitting the big packet back to the native form.
> >
> > Patch 1 prepares GSO to handle fraglist GSO packets.
> > Patch 2 adds the core infrastructure to do fraglist
> > GRO/GSO. Patch 3 enables IPv4 UDP to use fraglist
> > GRO/GSO if no GRO supported socket is found.
> >
> > I have only forwarding performance measurements so far:
> >
> > I used used my IPsec forwarding test setup for this:
> >
> >            ------------         ------------
> >         -->| router 1 |-------->| router 2 |--
> >         |  ------------         ------------  |
> >         |                                     |
> >         |       --------------------          |
> >         --------|Spirent Testcenter|<----------
> >                 --------------------
> >
> > net-next (December 10th):
> >
> > Single stream UDP frame size 1460 Bytes: 1.341.700 fps (15.67 Gbps).
> >
> > ----------------------------------------------------------------------
> >
> > net-next (December 10th) + hack to enable forwarding for standard UDP GRO:
> >
> > Single stream UDP frame size 1460 Bytes: 1.651.200 fps (19.28 Gbps).
> >
> > ----------------------------------------------------------------------
> >
> > net-next (December 10th) + fraglist UDP GRO/GSO:
> >
> > Single stream UDP frame size 1460 Bytes: 2.742.500 fps (32.03 Gbps).
> 
> That's an impressive speed-up over regular UDP GRO. Definitely worth
> looking into more, then.
> 
> Sorry for the delay. I still haven't parsed everything yet, but a few
> high level questions and comments.
> 
> This sounds similar to GSO_BY_FRAGS as used by SCTP. Can perhaps reuse
> that or deduplicate a bit. It is nice that this uses a separate
> skb_segment_list function; skb_segment is arguably too complex as is
> already.

This patchset looks like a more whole version of what GSO_BY_FRAGS is,
and I like it.  It should be easy to adapt SCTP to use these patches
instead. I can do it if needed.

  Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 1/3] net: Prepare GSO return values for fraglist GSO.
  2018-12-21  7:53 ` [PATCH RFC 1/3] net: Prepare GSO return values for fraglist GSO Steffen Klassert
@ 2019-01-08 13:53   ` Paolo Abeni
  2019-01-14 12:53     ` Steffen Klassert
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2019-01-08 13:53 UTC (permalink / raw)
  To: Steffen Klassert, netdev; +Cc: Willem de Bruijn, Jason A. Donenfeld

On Fri, 2018-12-21 at 08:53 +0100, Steffen Klassert wrote:
> On fraglist GSO, we don't need to clone the original
> skb. So we don't have anything to return to free.
> Prepare GSO that it frees the original skb only
> if the return pointer really changed. Fraglist
> GSO frees the original skb itself on error and
> returns -EREMOTE in this case.

I think it would be nicer preseving the same sematic with gro list, so
that we don't have to add this special handling.

e.g. calling skb_get(skb) in skb_segment_list() when successful, would
avoid the special handling for the no error case (at the cost of 2
atomic ops per gso_list packet)

> diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
> index 4ae87c5ce2e3..1941dc2a80a0 100644
> --- a/net/xfrm/xfrm_output.c
> +++ b/net/xfrm/xfrm_output.c
> @@ -183,7 +183,8 @@ static int xfrm_output_gso(struct net *net, struct sock *sk, struct sk_buff *skb
>  	BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
>  	BUILD_BUG_ON(sizeof(*IP6CB(skb)) > SKB_SGO_CB_OFFSET);
>  	segs = skb_gso_segment(skb, 0);
> -	kfree_skb(skb);
> +	if (segs != skb)
> +		kfree_skb(skb);
>  	if (IS_ERR(segs))

what if IS_ERR(segs) == -EREMOTE here?

>  		return PTR_ERR(segs);
>  	if (segs == NULL)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 3/3] udp: Support UDP fraglist GRO/GSO.
  2018-12-21  7:53 ` [PATCH RFC 3/3] udp: Support UDP fraglist GRO/GSO Steffen Klassert
@ 2019-01-08 15:00   ` Paolo Abeni
  2019-01-25  7:58     ` Steffen Klassert
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2019-01-08 15:00 UTC (permalink / raw)
  To: Steffen Klassert, netdev; +Cc: Willem de Bruijn, Jason A. Donenfeld

On Fri, 2018-12-21 at 08:53 +0100, Steffen Klassert wrote:
> @@ -403,10 +428,17 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
>  
>  	rcu_read_lock();
>  	sk = (*lookup)(skb, uh->source, uh->dest);
> -	if (!sk)
> -		goto out_unlock;
> +	if (!sk) {
> +		NAPI_GRO_CB(skb)->is_flist = 1;
> +		pp = call_gro_receive(udp_gro_receive_segment, head, skb);
> +		rcu_read_unlock();
> +		return pp;
> +	}
> +
> +	if (!udp_sk(sk)->gro_receive) {
> +		if (!udp_sk(sk)->gro_enabled)
> +			NAPI_GRO_CB(skb)->is_flist = 1;
>  
> -	if (udp_sk(sk)->gro_enabled) {
>  		pp = call_gro_receive(udp_gro_receive_segment, head, skb);
>  		rcu_read_unlock();
>  		return pp;

I think we could still avoid the lookup when no vxlan/GRO sockets are
present moving the lookup into udp{4,6}_gro_receive. Very roughly
something alike:

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index f79f1b5b2f9e..b0c0983eac6b 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -420,20 +420,16 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 INDIRECT_CALLABLE_DECLARE(struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
                                                   __be16 sport, __be16 dport));
 struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
-                               struct udphdr *uh, udp_lookup_t lookup)
+                               struct udphdr *uh, struct sock *sk)
 {
        struct sk_buff *pp = NULL;
        struct sk_buff *p;
        struct udphdr *uh2;
        unsigned int off = skb_gro_offset(skb);
        int flush = 1;
-       struct sock *sk;
 
-       rcu_read_lock();
-       sk = INDIRECT_CALL_INET(lookup, udp6_lib_lookup_skb,
-                               udp4_lib_lookup_skb, skb, uh->source, uh->dest);
-       if (!sk) {
-               NAPI_GRO_CB(skb)->is_flist = 1;
+       if (!sk || !udp_sk(sk)->gro_receive) {
+               NAPI_GRO_CB(skb)->is_flist = sk ? !udp_sk(sk)->gro_enabled: 1;
                pp = call_gro_receive(udp_gro_receive_segment, head, skb);
                rcu_read_unlock();
                return pp;
@@ -506,7 +502,12 @@ struct sk_buff *udp4_gro_receive(struct list_head *head, struct sk_buff *skb)
                                             inet_gro_compute_pseudo);
 skip:
        NAPI_GRO_CB(skb)->is_ipv6 = 0;
-       return udp_gro_receive(head, skb, uh, udp4_lib_lookup_skb);
+       rcu_read_lock();
+       sk = static_branch_unlikely(&udp_encap_needed_key) ?
+                       udp4_lib_lookup_skb(skb, uh->source, uh->dest) : NULL;
+       pp = udp_gro_receive(head, skb, uh, sk);
+       rcu_read_unlock();
+       return pp;
 
 flush:
        NAPI_GRO_CB(skb)->flush = 1;
---

Regardless of the above, I think we should drop the later check for
gro_receive:

--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -450,8 +450,7 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
        if (NAPI_GRO_CB(skb)->encap_mark ||
            (skb->ip_summed != CHECKSUM_PARTIAL &&
             NAPI_GRO_CB(skb)->csum_cnt == 0 &&
-            !NAPI_GRO_CB(skb)->csum_valid) ||
-           !udp_sk(sk)->gro_receive)
+            !NAPI_GRO_CB(skb)->csum_valid))
                goto out_unlock;
 
        /* mark that this skb passed once through the tunnel gro layer */
---

Finally this will cause GRO/GSO for local UDP packets delivery to non
GSO_SEGMENT sockets. That could be possibly a win or a regression: we
save on netfilter/IP stack traversal, but we add additional work, some
performances figures would probably help.

Cheers,

Paolo

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 0/3] Support fraglist GRO/GSO
  2018-12-24  1:15 ` [PATCH RFC 0/3] Support " Willem de Bruijn
  2018-12-26 13:09   ` Marcelo Ricardo Leitner
@ 2019-01-14 12:50   ` Steffen Klassert
  2019-01-14 17:09     ` Willem de Bruijn
  1 sibling, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2019-01-14 12:50 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Willem de Bruijn, Paolo Abeni, Jason A. Donenfeld

On Sun, Dec 23, 2018 at 08:15:40PM -0500, Willem de Bruijn wrote:
> On Fri, Dec 21, 2018 at 10:23 AM Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> >
> > This patchset adds support to do GRO/GSO by chaining packets
> > of the same flow at the SKB frag_list pointer. This avoids
> > the overhead to merge payloads into one big packet, and
> > on the other end, if GSO is needed it avoids the overhead
> > of splitting the big packet back to the native form.
> >
> > Patch 1 prepares GSO to handle fraglist GSO packets.
> > Patch 2 adds the core infrastructure to do fraglist
> > GRO/GSO. Patch 3 enables IPv4 UDP to use fraglist
> > GRO/GSO if no GRO supported socket is found.
> >
> > I have only forwarding performance measurements so far:
> >
> > I used used my IPsec forwarding test setup for this:
> >
> >            ------------         ------------
> >         -->| router 1 |-------->| router 2 |--
> >         |  ------------         ------------  |
> >         |                                     |
> >         |       --------------------          |
> >         --------|Spirent Testcenter|<----------
> >                 --------------------
> >
> > net-next (December 10th):
> >
> > Single stream UDP frame size 1460 Bytes: 1.341.700 fps (15.67 Gbps).
> >
> > ----------------------------------------------------------------------
> >
> > net-next (December 10th) + hack to enable forwarding for standard UDP GRO:
> >
> > Single stream UDP frame size 1460 Bytes: 1.651.200 fps (19.28 Gbps).
> >
> > ----------------------------------------------------------------------
> >
> > net-next (December 10th) + fraglist UDP GRO/GSO:
> >
> > Single stream UDP frame size 1460 Bytes: 2.742.500 fps (32.03 Gbps).
> 
> That's an impressive speed-up over regular UDP GRO. Definitely worth
> looking into more, then.
> 
> Sorry for the delay. I still haven't parsed everything yet, but a few
> high level questions and comments.

Sorry for the huge delay on my side. I was off for quite some time
(vacation directly followed by illness).

> 
> This sounds similar to GSO_BY_FRAGS as used by SCTP. Can perhaps reuse
> that or deduplicate a bit. It is nice that this uses a separate
> skb_segment_list function; skb_segment is arguably too complex as is
> already.
> 
> This requires UDP GSO to always be enabled, similar to TCP GSO (as of
> commit "tcp: switch to GSO being always on").
> 
> I would prefer to split the patch that adds UDP GRO on the forwarding
> path into one that enables it for existing GRO (the hack you refer to
> above) and a second one to optionally convert to listified processing.

The hack was to skip the socket lookup. While that was ok for a
forwarding test, it will affect the local receive path of course.

Currently, I check if there is a GRO capable socket. If yes,
do standard GRO. If no, do listified GRO regardless if packets
are going to be forwarded or locally received. So UDP GRO is
always on with this.

> 
> Ideally, we can use existing segmentation on paths where hardware UDP
> LSO is supported. I'm not quite sure how to decide between the two
> yet. Worst case, perhaps globally use listified forwarding unless any
> device is registered with hardware offload, then use regular
> segmentation.

We would need to do an early route lookup to check if the packets
are going to be forwarded or locally received. The current patchset
does not implement this, but could be done. Maybe doing a route
lookup based on some static key that will be enabled when forwarding
on the receiving device is enabled.

But even if the route lookup tell us that the packet should go the
forwarding path, netfilter (bpfilter?) could reroute the packet.
If we do an early route lookup, it would be good to have some early
netfilter (bpfilter?) too, so that we can know which path the packets
go. In this case we could do listified GRO even for TCP, if we can
know that we have to do software segmentation later on.

Btw. do we already have hardware that can do UDP LSO? If not,
the do listified GRO if no GRO capable socket is present would
be a not too intrusive patchset with what we could start
(given that we want always on UDP GRO).

> 
> For both existing UDP GRO and listified, we should verify that this is
> not a potential DoS vector before enabling by default.

Yes, but should'nt this be the same as with TCP GRO?

> 
> A few smaller questions, not necessarily exhaustive (or all sensible ;)
> - 1/3
>   - do gso handlers never return the original skb currently?

As far as I know, yes. But with the idea from Paolo to just
take a refcount on the skb, we might be able to handle the
return without changes to standard GSO.

> - 2/3
>   - did you mean CHECKSUM_PARTIAL?

The checksum should be ok as is, so should not be calculated again.

>   - are all those assignments really needed, given that nskb was
>     already a fully formed udp packet with just its skb->data moved?

I've already minimized these assignments compared to standard GSO.
Which of the assignments do you think are not needed?

>   - calls skb_needs_linearize on the first of the segments in the list only?
> - 3/3
>    - after pskb_may_pull must reload all ptrs into the data (uh)

Yes, will fix this.

>    - there are some differences in preparation before the skb is
>      passed to skb_gro_receive_list vs skb_gro_receive. Is this
>      really needed? They should be interchangeable?

Maybe, I'll look into this.

Thanks for your review!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 1/3] net: Prepare GSO return values for fraglist GSO.
  2019-01-08 13:53   ` Paolo Abeni
@ 2019-01-14 12:53     ` Steffen Klassert
  0 siblings, 0 replies; 18+ messages in thread
From: Steffen Klassert @ 2019-01-14 12:53 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, Willem de Bruijn, Jason A. Donenfeld

On Tue, Jan 08, 2019 at 02:53:52PM +0100, Paolo Abeni wrote:
> On Fri, 2018-12-21 at 08:53 +0100, Steffen Klassert wrote:
> > On fraglist GSO, we don't need to clone the original
> > skb. So we don't have anything to return to free.
> > Prepare GSO that it frees the original skb only
> > if the return pointer really changed. Fraglist
> > GSO frees the original skb itself on error and
> > returns -EREMOTE in this case.
> 
> I think it would be nicer preseving the same sematic with gro list, so
> that we don't have to add this special handling.
> 
> e.g. calling skb_get(skb) in skb_segment_list() when successful, would
> avoid the special handling for the no error case (at the cost of 2
> atomic ops per gso_list packet)

That's a good idea, I'll do this in the next version of the patchset.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 0/3] Support fraglist GRO/GSO
  2019-01-14 12:50   ` Steffen Klassert
@ 2019-01-14 17:09     ` Willem de Bruijn
  2019-01-25  8:14       ` Steffen Klassert
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2019-01-14 17:09 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Network Development, Willem de Bruijn, Paolo Abeni, Jason A. Donenfeld

On Mon, Jan 14, 2019 at 7:50 AM Steffen Klassert
<steffen.klassert@secunet.com> wrote:
>
> On Sun, Dec 23, 2018 at 08:15:40PM -0500, Willem de Bruijn wrote:
> > On Fri, Dec 21, 2018 at 10:23 AM Steffen Klassert
> > <steffen.klassert@secunet.com> wrote:
> > >
> > > This patchset adds support to do GRO/GSO by chaining packets
> > > of the same flow at the SKB frag_list pointer. This avoids
> > > the overhead to merge payloads into one big packet, and
> > > on the other end, if GSO is needed it avoids the overhead
> > > of splitting the big packet back to the native form.
> > >
> > > Patch 1 prepares GSO to handle fraglist GSO packets.
> > > Patch 2 adds the core infrastructure to do fraglist
> > > GRO/GSO. Patch 3 enables IPv4 UDP to use fraglist
> > > GRO/GSO if no GRO supported socket is found.
> > >
> > > I have only forwarding performance measurements so far:
> > >
> > > I used used my IPsec forwarding test setup for this:
> > >
> > >            ------------         ------------
> > >         -->| router 1 |-------->| router 2 |--
> > >         |  ------------         ------------  |
> > >         |                                     |
> > >         |       --------------------          |
> > >         --------|Spirent Testcenter|<----------
> > >                 --------------------
> > >
> > > net-next (December 10th):
> > >
> > > Single stream UDP frame size 1460 Bytes: 1.341.700 fps (15.67 Gbps).
> > >
> > > ----------------------------------------------------------------------
> > >
> > > net-next (December 10th) + hack to enable forwarding for standard UDP GRO:
> > >
> > > Single stream UDP frame size 1460 Bytes: 1.651.200 fps (19.28 Gbps).
> > >
> > > ----------------------------------------------------------------------
> > >
> > > net-next (December 10th) + fraglist UDP GRO/GSO:
> > >
> > > Single stream UDP frame size 1460 Bytes: 2.742.500 fps (32.03 Gbps).
> >
> > That's an impressive speed-up over regular UDP GRO. Definitely worth
> > looking into more, then.
> >
> > Sorry for the delay. I still haven't parsed everything yet, but a few
> > high level questions and comments.
>
> Sorry for the huge delay on my side. I was off for quite some time
> (vacation directly followed by illness).
>
> >
> > This sounds similar to GSO_BY_FRAGS as used by SCTP. Can perhaps reuse
> > that or deduplicate a bit. It is nice that this uses a separate
> > skb_segment_list function; skb_segment is arguably too complex as is
> > already.
> >
> > This requires UDP GSO to always be enabled, similar to TCP GSO (as of
> > commit "tcp: switch to GSO being always on").
> >
> > I would prefer to split the patch that adds UDP GRO on the forwarding
> > path into one that enables it for existing GRO (the hack you refer to
> > above) and a second one to optionally convert to listified processing.
>
> The hack was to skip the socket lookup. While that was ok for a
> forwarding test, it will affect the local receive path of course.
>
> Currently, I check if there is a GRO capable socket. If yes,
> do standard GRO. If no, do listified GRO regardless if packets
> are going to be forwarded or locally received. So UDP GRO is
> always on with this.

I understand. What I suggest is to split into two: enable GRO on the
forwarding path independently from converting the method to listified
GRO.

> > Ideally, we can use existing segmentation on paths where hardware UDP
> > LSO is supported. I'm not quite sure how to decide between the two
> > yet. Worst case, perhaps globally use listified forwarding unless any
> > device is registered with hardware offload, then use regular
> > segmentation.
>
> We would need to do an early route lookup to check if the packets
> are going to be forwarded or locally received. The current patchset
> does not implement this, but could be done. Maybe doing a route
> lookup based on some static key that will be enabled when forwarding
> on the receiving device is enabled.
>
> But even if the route lookup tell us that the packet should go the
> forwarding path, netfilter (bpfilter?) could reroute the packet.
> If we do an early route lookup, it would be good to have some early
> netfilter (bpfilter?) too, so that we can know which path the packets
> go. In this case we could do listified GRO even for TCP, if we can
> know that we have to do software segmentation later on.
>
> Btw. do we already have hardware that can do UDP LSO?

Yes, mlx5

I don't think that the route lookup is needed. If listified is cheaper
for local delivery, too, then we can make that the default unless a
device is active with h/w offload and ip forwarding is enabled. If it
isn't, then use it iff ip forwarding is enabled. I think it's fine to
mispredict between the two in edge cases with netfilter mangling, as
long as all paths can correctly handle both types of GRO packets.

> If not,
> the do listified GRO if no GRO capable socket is present would
> be a not too intrusive patchset with what we could start
> (given that we want always on UDP GRO).
>
> >
> > For both existing UDP GRO and listified, we should verify that this is
> > not a potential DoS vector before enabling by default.
>
> Yes, but should'nt this be the same as with TCP GRO?

That is by now well-tested. I don't think we can simply assume
equivalence for UDP, also because that is easier to spoof.

>
> >
> > A few smaller questions, not necessarily exhaustive (or all sensible ;)
> > - 1/3
> >   - do gso handlers never return the original skb currently?
>
> As far as I know, yes. But with the idea from Paolo to just
> take a refcount on the skb, we might be able to handle the
> return without changes to standard GSO.

That would be great.

> > - 2/3
> >   - did you mean CHECKSUM_PARTIAL?
>
> The checksum should be ok as is, so should not be calculated again.

But CHECKSUM_PARTIAL is not valid on the egress path? As per the
comments at the top of skbuff.h. If the checksum field is correct and
validated, it should be CHECKSUM_NONE.

It is expected as can be seen by the netif_needs_gso() check and
commit cdbee74ce74c ("net: do not do gso for CHECKSUM_UNNECESSARY in
netif_needs_gso"). But that seems like an uncommon protocol edge case
that does not apply to UDP.

>
> >   - are all those assignments really needed, given that nskb was
> >     already a fully formed udp packet with just its skb->data moved?
>
> I've already minimized these assignments compared to standard GSO.
> Which of the assignments do you think are not needed?

I had expected that mac_len, mac_header, network_header, truesize are
all already correct based on their initial assignment in the ingress
path. But I may definitely be mistaken here.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 3/3] udp: Support UDP fraglist GRO/GSO.
  2019-01-08 15:00   ` Paolo Abeni
@ 2019-01-25  7:58     ` Steffen Klassert
  2019-01-26  9:36       ` Paolo Abeni
  0 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2019-01-25  7:58 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, Willem de Bruijn, Jason A. Donenfeld

On Tue, Jan 08, 2019 at 04:00:01PM +0100, Paolo Abeni wrote:
> 
> I think we could still avoid the lookup when no vxlan/GRO sockets are
> present moving the lookup into udp{4,6}_gro_receive. Very roughly
> something alike:
> 
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index f79f1b5b2f9e..b0c0983eac6b 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -420,20 +420,16 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
>  INDIRECT_CALLABLE_DECLARE(struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
>                                                    __be16 sport, __be16 dport));
>  struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
> -                               struct udphdr *uh, udp_lookup_t lookup)
> +                               struct udphdr *uh, struct sock *sk)
>  {
>         struct sk_buff *pp = NULL;
>         struct sk_buff *p;
>         struct udphdr *uh2;
>         unsigned int off = skb_gro_offset(skb);
>         int flush = 1;
> -       struct sock *sk;
>  
> -       rcu_read_lock();
> -       sk = INDIRECT_CALL_INET(lookup, udp6_lib_lookup_skb,
> -                               udp4_lib_lookup_skb, skb, uh->source, uh->dest);
> -       if (!sk) {
> -               NAPI_GRO_CB(skb)->is_flist = 1;
> +       if (!sk || !udp_sk(sk)->gro_receive) {
> +               NAPI_GRO_CB(skb)->is_flist = sk ? !udp_sk(sk)->gro_enabled: 1;
>                 pp = call_gro_receive(udp_gro_receive_segment, head, skb);
>                 rcu_read_unlock();
>                 return pp;
> @@ -506,7 +502,12 @@ struct sk_buff *udp4_gro_receive(struct list_head *head, struct sk_buff *skb)
>                                              inet_gro_compute_pseudo);
>  skip:
>         NAPI_GRO_CB(skb)->is_ipv6 = 0;
> -       return udp_gro_receive(head, skb, uh, udp4_lib_lookup_skb);
> +       rcu_read_lock();
> +       sk = static_branch_unlikely(&udp_encap_needed_key) ?
> +                       udp4_lib_lookup_skb(skb, uh->source, uh->dest) : NULL;
> +       pp = udp_gro_receive(head, skb, uh, sk);
> +       rcu_read_unlock();
> +       return pp;
>  
>  flush:
>         NAPI_GRO_CB(skb)->flush = 1;
> ---
> 
> Regardless of the above, I think we should drop the later check for
> gro_receive:
> 
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -450,8 +450,7 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
>         if (NAPI_GRO_CB(skb)->encap_mark ||
>             (skb->ip_summed != CHECKSUM_PARTIAL &&
>              NAPI_GRO_CB(skb)->csum_cnt == 0 &&
> -            !NAPI_GRO_CB(skb)->csum_valid) ||
> -           !udp_sk(sk)->gro_receive)
> +            !NAPI_GRO_CB(skb)->csum_valid))
>                 goto out_unlock;
>  
>         /* mark that this skb passed once through the tunnel gro layer */
> ---

I've incorporated the above into the next RFC patchset.

> Finally this will cause GRO/GSO for local UDP packets delivery to non
> GSO_SEGMENT sockets. That could be possibly a win or a regression: we
> save on netfilter/IP stack traversal, but we add additional work, some
> performances figures would probably help.

I did some tests for the local receive path with netperf and iperf, but
in this case the sender that generates the packets is the bottleneck.
So the benchmarks are not that meaningful for the receive path.

Do you have some performance tests for UDP GRO receive?
If so, I'd be glad if you could test this, or maybe better
the next patchset I'll send during the next days.

Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 0/3] Support fraglist GRO/GSO
  2019-01-14 17:09     ` Willem de Bruijn
@ 2019-01-25  8:14       ` Steffen Klassert
  2019-01-25 13:57         ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2019-01-25  8:14 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Willem de Bruijn, Paolo Abeni, Jason A. Donenfeld

On Mon, Jan 14, 2019 at 12:09:22PM -0500, Willem de Bruijn wrote:
> On Mon, Jan 14, 2019 at 7:50 AM Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> > On Sun, Dec 23, 2018 at 08:15:40PM -0500, Willem de Bruijn wrote:
> > >
> > > I would prefer to split the patch that adds UDP GRO on the forwarding
> > > path into one that enables it for existing GRO (the hack you refer to
> > > above) and a second one to optionally convert to listified processing.
> >
> > The hack was to skip the socket lookup. While that was ok for a
> > forwarding test, it will affect the local receive path of course.
> >
> > Currently, I check if there is a GRO capable socket. If yes,
> > do standard GRO. If no, do listified GRO regardless if packets
> > are going to be forwarded or locally received. So UDP GRO is
> > always on with this.
> 
> I understand. What I suggest is to split into two: enable GRO on the
> forwarding path independently from converting the method to listified
> GRO.

Ok, will do that in the next patchset.

> > We would need to do an early route lookup to check if the packets
> > are going to be forwarded or locally received. The current patchset
> > does not implement this, but could be done. Maybe doing a route
> > lookup based on some static key that will be enabled when forwarding
> > on the receiving device is enabled.
> >
> > But even if the route lookup tell us that the packet should go the
> > forwarding path, netfilter (bpfilter?) could reroute the packet.
> > If we do an early route lookup, it would be good to have some early
> > netfilter (bpfilter?) too, so that we can know which path the packets
> > go. In this case we could do listified GRO even for TCP, if we can
> > know that we have to do software segmentation later on.
> >
> > Btw. do we already have hardware that can do UDP LSO?
> 
> Yes, mlx5
> 
> I don't think that the route lookup is needed. If listified is cheaper
> for local delivery, too, then we can make that the default unless a
> device is active with h/w offload and ip forwarding is enabled. If it
> isn't, then use it iff ip forwarding is enabled. I think it's fine to
> mispredict between the two in edge cases with netfilter mangling, as
> long as all paths can correctly handle both types of GRO packets.

I'd need at least a route lookup for my usecase, because listified
GRO is always cheaper when a xfrm transformation is needed (even for
TCP). In this case is software GSO needed. So I'd need to either have
an early route lookup or maybe some early ingress hook where a route
lookup could be imlemented in.

> > >
> > > For both existing UDP GRO and listified, we should verify that this is
> > > not a potential DoS vector before enabling by default.
> >
> > Yes, but should'nt this be the same as with TCP GRO?
> 
> That is by now well-tested. I don't think we can simply assume
> equivalence for UDP, also because that is easier to spoof.

What would be a good test for this? I played with masscan
and hping3, but did not notice any differences between
net-next and net-next + UDP GRO patches.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 0/3] Support fraglist GRO/GSO
  2019-01-25  8:14       ` Steffen Klassert
@ 2019-01-25 13:57         ` Willem de Bruijn
  2019-01-28  7:51           ` Steffen Klassert
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2019-01-25 13:57 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Network Development, Willem de Bruijn, Paolo Abeni, Jason A. Donenfeld

On Fri, Jan 25, 2019 at 3:14 AM Steffen Klassert
<steffen.klassert@secunet.com> wrote:
>
> On Mon, Jan 14, 2019 at 12:09:22PM -0500, Willem de Bruijn wrote:
> > On Mon, Jan 14, 2019 at 7:50 AM Steffen Klassert
> > <steffen.klassert@secunet.com> wrote:
> > > On Sun, Dec 23, 2018 at 08:15:40PM -0500, Willem de Bruijn wrote:
> > > >
> > > > I would prefer to split the patch that adds UDP GRO on the forwarding
> > > > path into one that enables it for existing GRO (the hack you refer to
> > > > above) and a second one to optionally convert to listified processing.
> > >
> > > The hack was to skip the socket lookup. While that was ok for a
> > > forwarding test, it will affect the local receive path of course.
> > >
> > > Currently, I check if there is a GRO capable socket. If yes,
> > > do standard GRO. If no, do listified GRO regardless if packets
> > > are going to be forwarded or locally received. So UDP GRO is
> > > always on with this.
> >
> > I understand. What I suggest is to split into two: enable GRO on the
> > forwarding path independently from converting the method to listified
> > GRO.
>
> Ok, will do that in the next patchset.
>
> > > We would need to do an early route lookup to check if the packets
> > > are going to be forwarded or locally received. The current patchset
> > > does not implement this, but could be done. Maybe doing a route
> > > lookup based on some static key that will be enabled when forwarding
> > > on the receiving device is enabled.
> > >
> > > But even if the route lookup tell us that the packet should go the
> > > forwarding path, netfilter (bpfilter?) could reroute the packet.
> > > If we do an early route lookup, it would be good to have some early
> > > netfilter (bpfilter?) too, so that we can know which path the packets
> > > go. In this case we could do listified GRO even for TCP, if we can
> > > know that we have to do software segmentation later on.
> > >
> > > Btw. do we already have hardware that can do UDP LSO?
> >
> > Yes, mlx5
> >
> > I don't think that the route lookup is needed. If listified is cheaper
> > for local delivery, too, then we can make that the default unless a
> > device is active with h/w offload and ip forwarding is enabled. If it
> > isn't, then use it iff ip forwarding is enabled. I think it's fine to
> > mispredict between the two in edge cases with netfilter mangling, as
> > long as all paths can correctly handle both types of GRO packets.
>
> I'd need at least a route lookup for my usecase, because listified
> GRO is always cheaper when a xfrm transformation is needed (even for
> TCP). In this case is software GSO needed. So I'd need to either have
> an early route lookup or maybe some early ingress hook where a route
> lookup could be imlemented in.

Could you use a similar system wide approach as what we discussed
previous wrt hardware offload? Use listified only if (forwarding is enabled
and no device is registered that implements h/w segmentation offload) or
any path requires xfrm transformation (?).

> > > >
> > > > For both existing UDP GRO and listified, we should verify that this is
> > > > not a potential DoS vector before enabling by default.
> > >
> > > Yes, but should'nt this be the same as with TCP GRO?
> >
> > That is by now well-tested. I don't think we can simply assume
> > equivalence for UDP, also because that is easier to spoof.
>
> What would be a good test for this? I played with masscan
> and hping3, but did not notice any differences between
> net-next and net-next + UDP GRO patches.

That's a good indication. Anything that can send Mpps that will
get coalesced will do, so depending on pkt rate those may very
well be sufficient. Else pktgen.ko is interesting, as it explicitly
has an option to repeat send the same prepared skb.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 3/3] udp: Support UDP fraglist GRO/GSO.
  2019-01-25  7:58     ` Steffen Klassert
@ 2019-01-26  9:36       ` Paolo Abeni
  2019-01-28  8:09         ` Steffen Klassert
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2019-01-26  9:36 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev, Willem de Bruijn, Jason A. Donenfeld

On Fri, 2019-01-25 at 08:58 +0100, Steffen Klassert wrote:
> > Finally this will cause GRO/GSO for local UDP packets delivery to non
> > GSO_SEGMENT sockets. That could be possibly a win or a regression: we
> > save on netfilter/IP stack traversal, but we add additional work, some
> > performances figures would probably help.
> 
> I did some tests for the local receive path with netperf and iperf, but
> in this case the sender that generates the packets is the bottleneck.
> So the benchmarks are not that meaningful for the receive path.

I think we can use GSO on the sender if we add some additional code on
the rx side - for testing purpose only - limiting the GRO aggregation
to an (user controlled via sysfs) value.

Beyond that, other options would be using multiple senders threads and
a single rx queue and/or asymmetric CPUs.

> Do you have some performance tests for UDP GRO receive?

I have a bunch of ansible(!!!) scripts I can share, if you dare. They
have a lot of hard-coded setting, so I'm not sure how much can be re-
used outside my testbed.

I also hope/wish/think/ I can allocate some time for benchmarking this
on my own in the next week[s], so I'll try to post some results for the
next iteration.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 0/3] Support fraglist GRO/GSO
  2019-01-25 13:57         ` Willem de Bruijn
@ 2019-01-28  7:51           ` Steffen Klassert
  2019-01-28 16:46             ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2019-01-28  7:51 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Willem de Bruijn, Paolo Abeni, Jason A. Donenfeld

On Fri, Jan 25, 2019 at 08:57:00AM -0500, Willem de Bruijn wrote:
> On Fri, Jan 25, 2019 at 3:14 AM Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> >
> > On Mon, Jan 14, 2019 at 12:09:22PM -0500, Willem de Bruijn wrote:
> > > On Mon, Jan 14, 2019 at 7:50 AM Steffen Klassert
> > > <steffen.klassert@secunet.com> wrote:
> > > > On Sun, Dec 23, 2018 at 08:15:40PM -0500, Willem de Bruijn wrote:
> > >
> > > I don't think that the route lookup is needed. If listified is cheaper
> > > for local delivery, too, then we can make that the default unless a
> > > device is active with h/w offload and ip forwarding is enabled. If it
> > > isn't, then use it iff ip forwarding is enabled. I think it's fine to
> > > mispredict between the two in edge cases with netfilter mangling, as
> > > long as all paths can correctly handle both types of GRO packets.
> >
> > I'd need at least a route lookup for my usecase, because listified
> > GRO is always cheaper when a xfrm transformation is needed (even for
> > TCP). In this case is software GSO needed. So I'd need to either have
> > an early route lookup or maybe some early ingress hook where a route
> > lookup could be imlemented in.
> 
> Could you use a similar system wide approach as what we discussed
> previous wrt hardware offload? Use listified only if (forwarding is enabled
> and no device is registered that implements h/w segmentation offload) or
> any path requires xfrm transformation (?).

The xfrm transformation has to happen for the segments. So if we need to
do xfrm transformation in software, we need to do segmentation in
software too. I think that just forwarding is enabled and the presence
of a device that can do hardware segmentation offload is not a good
indicator. The more devices support hardware segmentation offload
the more likely is it that xfrm take a suboptimal path.

We have to do a route lookup anyway, why not just do it early
in case forwarding is enabled?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 3/3] udp: Support UDP fraglist GRO/GSO.
  2019-01-26  9:36       ` Paolo Abeni
@ 2019-01-28  8:09         ` Steffen Klassert
  0 siblings, 0 replies; 18+ messages in thread
From: Steffen Klassert @ 2019-01-28  8:09 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, Willem de Bruijn, Jason A. Donenfeld

On Sat, Jan 26, 2019 at 10:36:40AM +0100, Paolo Abeni wrote:
> On Fri, 2019-01-25 at 08:58 +0100, Steffen Klassert wrote:
> > Do you have some performance tests for UDP GRO receive?
> 
> I have a bunch of ansible(!!!) scripts I can share, if you dare. They
> have a lot of hard-coded setting, so I'm not sure how much can be re-
> used outside my testbed.
> 
> I also hope/wish/think/ I can allocate some time for benchmarking this
> on my own in the next week[s], so I'll try to post some results for the
> next iteration.

That would be great, thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 0/3] Support fraglist GRO/GSO
  2019-01-28  7:51           ` Steffen Klassert
@ 2019-01-28 16:46             ` Willem de Bruijn
  0 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2019-01-28 16:46 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Network Development, Willem de Bruijn, Paolo Abeni, Jason A. Donenfeld

On Mon, Jan 28, 2019 at 2:51 AM Steffen Klassert
<steffen.klassert@secunet.com> wrote:
>
> On Fri, Jan 25, 2019 at 08:57:00AM -0500, Willem de Bruijn wrote:
> > On Fri, Jan 25, 2019 at 3:14 AM Steffen Klassert
> > <steffen.klassert@secunet.com> wrote:
> > >
> > > On Mon, Jan 14, 2019 at 12:09:22PM -0500, Willem de Bruijn wrote:
> > > > On Mon, Jan 14, 2019 at 7:50 AM Steffen Klassert
> > > > <steffen.klassert@secunet.com> wrote:
> > > > > On Sun, Dec 23, 2018 at 08:15:40PM -0500, Willem de Bruijn wrote:
> > > >
> > > > I don't think that the route lookup is needed. If listified is cheaper
> > > > for local delivery, too, then we can make that the default unless a
> > > > device is active with h/w offload and ip forwarding is enabled. If it
> > > > isn't, then use it iff ip forwarding is enabled. I think it's fine to
> > > > mispredict between the two in edge cases with netfilter mangling, as
> > > > long as all paths can correctly handle both types of GRO packets.
> > >
> > > I'd need at least a route lookup for my usecase, because listified
> > > GRO is always cheaper when a xfrm transformation is needed (even for
> > > TCP). In this case is software GSO needed. So I'd need to either have
> > > an early route lookup or maybe some early ingress hook where a route
> > > lookup could be imlemented in.
> >
> > Could you use a similar system wide approach as what we discussed
> > previous wrt hardware offload? Use listified only if (forwarding is enabled
> > and no device is registered that implements h/w segmentation offload) or
> > any path requires xfrm transformation (?).
>
> The xfrm transformation has to happen for the segments. So if we need to
> do xfrm transformation in software, we need to do segmentation in
> software too. I think that just forwarding is enabled and the presence
> of a device that can do hardware segmentation offload is not a good
> indicator. The more devices support hardware segmentation offload
> the more likely is it that xfrm take a suboptimal path.

That's why I suggested OR any path requires xfrm.

> We have to do a route lookup anyway, why not just do it early
> in case forwarding is enabled?

But actually, yes, that's true, so fine, too.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-01-28 16:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-21  7:53 [PATCH RFC 0/3] Support fraglist GRO/GSO Steffen Klassert
2018-12-21  7:53 ` [PATCH RFC 1/3] net: Prepare GSO return values for fraglist GSO Steffen Klassert
2019-01-08 13:53   ` Paolo Abeni
2019-01-14 12:53     ` Steffen Klassert
2018-12-21  7:53 ` [PATCH RFC 2/3] net: Support GRO/GSO fraglist chaining Steffen Klassert
2018-12-21  7:53 ` [PATCH RFC 3/3] udp: Support UDP fraglist GRO/GSO Steffen Klassert
2019-01-08 15:00   ` Paolo Abeni
2019-01-25  7:58     ` Steffen Klassert
2019-01-26  9:36       ` Paolo Abeni
2019-01-28  8:09         ` Steffen Klassert
2018-12-24  1:15 ` [PATCH RFC 0/3] Support " Willem de Bruijn
2018-12-26 13:09   ` Marcelo Ricardo Leitner
2019-01-14 12:50   ` Steffen Klassert
2019-01-14 17:09     ` Willem de Bruijn
2019-01-25  8:14       ` Steffen Klassert
2019-01-25 13:57         ` Willem de Bruijn
2019-01-28  7:51           ` Steffen Klassert
2019-01-28 16:46             ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).