All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 net-next v4 0/6] Add TCP fraglist GRO support
@ 2024-04-27 18:22 Felix Fietkau
  2024-04-27 18:22 ` [PATCH v4 net-next v4 1/6] net: move skb_gro_receive_list from udp to core Felix Fietkau
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Felix Fietkau @ 2024-04-27 18:22 UTC (permalink / raw)
  To: netdev; +Cc: willemdebruijn.kernel, pabeni, edumazet

When forwarding TCP after GRO, software segmentation is very expensive,
especially when the checksum needs to be recalculated.
One case where that's currently unavoidable is when routing packets over
PPPoE. Performance improves significantly when using fraglist GRO
implemented in the same way as for UDP.

When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established
socket in the same netns as the receiving device. While this may not
cover all relevant use cases in multi-netns configurations, it should be
good enough for most configurations that need this.

Here's a measurement of running 2 TCP streams through a MediaTek MT7622
device (2-core Cortex-A53), which runs NAT with flow offload enabled from
one ethernet port to PPPoE on another ethernet port + cake qdisc set to
1Gbps.

rx-gro-list off: 630 Mbit/s, CPU 35% idle
rx-gro-list on:  770 Mbit/s, CPU 40% idle

Changes since v3:
 - optimize __tcpv4_gso_segment_csum
 - add unlikely()
 - reorder dev_net/skb_gro_network_header calls after NETIF_F_GRO_FRAGLIST
   check
 - add support for ipv6 nat
 - drop redundant pskb_may_pull check

Changes since v2:
 - create tcp_gro_header_pull helper function to pull tcp header only once
 - optimize __tcpv4_gso_segment_list_csum, drop obsolete flags check

Changes since v1:
 - revert bogus tcp flags overwrite on segmentation
 - fix kbuild issue with !CONFIG_IPV6
 - only perform socket lookup for the first skb in the GRO train

Changes since RFC:
 - split up patches
 - handle TCP flags mutations

Felix Fietkau (6):
  net: move skb_gro_receive_list from udp to core
  net: add support for segmenting TCP fraglist GSO packets
  net: add code for TCP fraglist GRO
  net: create tcp_gro_lookup helper function
  net: create tcp_gro_header_pull helper function
  net: add heuristic for enabling TCP fraglist GRO

 include/net/gro.h        |   1 +
 include/net/tcp.h        |   5 +-
 net/core/gro.c           |  27 +++++
 net/ipv4/tcp_offload.c   | 215 ++++++++++++++++++++++++++++++++-------
 net/ipv4/udp_offload.c   |  27 -----
 net/ipv6/tcpv6_offload.c | 120 +++++++++++++++++++++-
 6 files changed, 326 insertions(+), 69 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 net-next v4 1/6] net: move skb_gro_receive_list from udp to core
  2024-04-27 18:22 [PATCH v4 net-next v4 0/6] Add TCP fraglist GRO support Felix Fietkau
@ 2024-04-27 18:22 ` Felix Fietkau
  2024-04-27 18:22 ` [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets Felix Fietkau
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Felix Fietkau @ 2024-04-27 18:22 UTC (permalink / raw)
  To: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, David Ahern
  Cc: willemdebruijn.kernel, linux-kernel

This helper function will be used for TCP fraglist GRO support

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
 include/net/gro.h      |  1 +
 net/core/gro.c         | 27 +++++++++++++++++++++++++++
 net/ipv4/udp_offload.c | 27 ---------------------------
 3 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/include/net/gro.h b/include/net/gro.h
index 50f1e403dbbb..ca8e4b3de044 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -429,6 +429,7 @@ static inline __wsum ip6_gro_compute_pseudo(const struct sk_buff *skb,
 }
 
 int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb);
+int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb);
 
 /* Pass the currently batched GRO_NORMAL SKBs up to the stack. */
 static inline void gro_normal_list(struct napi_struct *napi)
diff --git a/net/core/gro.c b/net/core/gro.c
index 2459ab697f7f..268c6c826d09 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -231,6 +231,33 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
 	return 0;
 }
 
+int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
+{
+	if (unlikely(p->len + skb->len >= 65536))
+		return -E2BIG;
+
+	if (NAPI_GRO_CB(p)->last == p)
+		skb_shinfo(p)->frag_list = skb;
+	else
+		NAPI_GRO_CB(p)->last->next = skb;
+
+	skb_pull(skb, skb_gro_offset(skb));
+
+	NAPI_GRO_CB(p)->last = skb;
+	NAPI_GRO_CB(p)->count++;
+	p->data_len += skb->len;
+
+	/* sk ownership - if any - completely transferred to the aggregated packet */
+	skb->destructor = NULL;
+	skb->sk = NULL;
+	p->truesize += skb->truesize;
+	p->len += skb->len;
+
+	NAPI_GRO_CB(skb)->same_flow = 1;
+
+	return 0;
+}
+
 
 static void napi_gro_complete(struct napi_struct *napi, struct sk_buff *skb)
 {
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 3498dd1d0694..a3cd546a1aea 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -433,33 +433,6 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 	return segs;
 }
 
-static int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
-{
-	if (unlikely(p->len + skb->len >= 65536))
-		return -E2BIG;
-
-	if (NAPI_GRO_CB(p)->last == p)
-		skb_shinfo(p)->frag_list = skb;
-	else
-		NAPI_GRO_CB(p)->last->next = skb;
-
-	skb_pull(skb, skb_gro_offset(skb));
-
-	NAPI_GRO_CB(p)->last = skb;
-	NAPI_GRO_CB(p)->count++;
-	p->data_len += skb->len;
-
-	/* sk ownership - if any - completely transferred to the aggregated packet */
-	skb->destructor = NULL;
-	skb->sk = NULL;
-	p->truesize += skb->truesize;
-	p->len += skb->len;
-
-	NAPI_GRO_CB(skb)->same_flow = 1;
-
-	return 0;
-}
-
 
 #define UDP_GRO_CNT_MAX 64
 static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets
  2024-04-27 18:22 [PATCH v4 net-next v4 0/6] Add TCP fraglist GRO support Felix Fietkau
  2024-04-27 18:22 ` [PATCH v4 net-next v4 1/6] net: move skb_gro_receive_list from udp to core Felix Fietkau
@ 2024-04-27 18:22 ` Felix Fietkau
  2024-04-30 10:19   ` Paolo Abeni
                     ` (2 more replies)
  2024-04-27 18:22 ` [PATCH v4 net-next v4 3/6] net: add code for TCP fraglist GRO Felix Fietkau
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 20+ messages in thread
From: Felix Fietkau @ 2024-04-27 18:22 UTC (permalink / raw)
  To: netdev, Eric Dumazet, David S. Miller, David Ahern,
	Jakub Kicinski, Paolo Abeni
  Cc: willemdebruijn.kernel, linux-kernel

Preparation for adding TCP fraglist GRO support. It expects packets to be
combined in a similar way as UDP fraglist GSO packets.
For IPv4 packets, NAT is handled in the same way as UDP fraglist GSO.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
 net/ipv4/tcp_offload.c   | 67 ++++++++++++++++++++++++++++++++++++++++
 net/ipv6/tcpv6_offload.c | 58 ++++++++++++++++++++++++++++++++++
 2 files changed, 125 insertions(+)

diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index fab0973f995b..affd4ed28cfe 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -28,6 +28,70 @@ static void tcp_gso_tstamp(struct sk_buff *skb, unsigned int ts_seq,
 	}
 }
 
+static void __tcpv4_gso_segment_csum(struct sk_buff *seg,
+				     __be32 *oldip, __be32 newip,
+				     __be16 *oldport, __be16 newport)
+{
+	struct tcphdr *th;
+	struct iphdr *iph;
+
+	if (*oldip == newip && *oldport == newport)
+		return;
+
+	th = tcp_hdr(seg);
+	iph = ip_hdr(seg);
+
+	inet_proto_csum_replace4(&th->check, seg, *oldip, newip, true);
+	inet_proto_csum_replace2(&th->check, seg, *oldport, newport, false);
+	*oldport = newport;
+
+	csum_replace4(&iph->check, *oldip, newip);
+	*oldip = newip;
+}
+
+static struct sk_buff *__tcpv4_gso_segment_list_csum(struct sk_buff *segs)
+{
+	const struct tcphdr *th;
+	const struct iphdr *iph;
+	struct sk_buff *seg;
+	struct tcphdr *th2;
+	struct iphdr *iph2;
+
+	seg = segs;
+	th = tcp_hdr(seg);
+	iph = ip_hdr(seg);
+	th2 = tcp_hdr(seg->next);
+	iph2 = ip_hdr(seg->next);
+
+	if (!(*(const u32 *)&th->source ^ *(const u32 *)&th2->source) &&
+	    iph->daddr == iph2->daddr && iph->saddr == iph2->saddr)
+		return segs;
+
+	while ((seg = seg->next)) {
+		th2 = tcp_hdr(seg);
+		iph2 = ip_hdr(seg);
+
+		__tcpv4_gso_segment_csum(seg,
+					 &iph2->saddr, iph->saddr,
+					 &th2->source, th->source);
+		__tcpv4_gso_segment_csum(seg,
+					 &iph2->daddr, iph->daddr,
+					 &th2->dest, th->dest);
+	}
+
+	return segs;
+}
+
+static struct sk_buff *__tcp4_gso_segment_list(struct sk_buff *skb,
+					      netdev_features_t features)
+{
+	skb = skb_segment_list(skb, features, skb_mac_header_len(skb));
+	if (IS_ERR(skb))
+		return skb;
+
+	return __tcpv4_gso_segment_list_csum(skb);
+}
+
 static struct sk_buff *tcp4_gso_segment(struct sk_buff *skb,
 					netdev_features_t features)
 {
@@ -37,6 +101,9 @@ static struct sk_buff *tcp4_gso_segment(struct sk_buff *skb,
 	if (!pskb_may_pull(skb, sizeof(struct tcphdr)))
 		return ERR_PTR(-EINVAL);
 
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST)
+		return __tcp4_gso_segment_list(skb, features);
+
 	if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL)) {
 		const struct iphdr *iph = ip_hdr(skb);
 		struct tcphdr *th = tcp_hdr(skb);
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index 4b07d1e6c952..7180c30dbbef 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -40,6 +40,61 @@ INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff)
 	return 0;
 }
 
+static void __tcpv6_gso_segment_csum(struct sk_buff *seg,
+				     __be16 *oldport, __be16 newport)
+{
+	struct tcphdr *th;
+
+	if (*oldport == newport)
+		return;
+
+	th = tcp_hdr(seg);
+	inet_proto_csum_replace2(&th->check, seg, *oldport, newport, false);
+	*oldport = newport;
+}
+
+static struct sk_buff *__tcpv6_gso_segment_list_csum(struct sk_buff *segs)
+{
+	const struct tcphdr *th;
+	const struct ipv6hdr *iph;
+	struct sk_buff *seg;
+	struct tcphdr *th2;
+	struct ipv6hdr *iph2;
+
+	seg = segs;
+	th = tcp_hdr(seg);
+	iph = ipv6_hdr(seg);
+	th2 = tcp_hdr(seg->next);
+	iph2 = ipv6_hdr(seg->next);
+
+	if (!(*(const u32 *)&th->source ^ *(const u32 *)&th2->source) &&
+	    ipv6_addr_equal(&iph->saddr, &iph2->saddr) &&
+	    ipv6_addr_equal(&iph->daddr, &iph2->daddr))
+		return segs;
+
+	while ((seg = seg->next)) {
+		th2 = tcp_hdr(seg);
+		iph2 = ipv6_hdr(seg);
+
+		iph2->saddr = iph->saddr;
+		iph2->daddr = iph->daddr;
+		__tcpv6_gso_segment_csum(seg, &th2->source, th->source);
+		__tcpv6_gso_segment_csum(seg, &th2->dest, th->dest);
+	}
+
+	return segs;
+}
+
+static struct sk_buff *__tcp6_gso_segment_list(struct sk_buff *skb,
+					      netdev_features_t features)
+{
+	skb = skb_segment_list(skb, features, skb_mac_header_len(skb));
+	if (IS_ERR(skb))
+		return skb;
+
+	return __tcpv6_gso_segment_list_csum(skb);
+}
+
 static struct sk_buff *tcp6_gso_segment(struct sk_buff *skb,
 					netdev_features_t features)
 {
@@ -51,6 +106,9 @@ static struct sk_buff *tcp6_gso_segment(struct sk_buff *skb,
 	if (!pskb_may_pull(skb, sizeof(*th)))
 		return ERR_PTR(-EINVAL);
 
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST)
+		return __tcp6_gso_segment_list(skb, features);
+
 	if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL)) {
 		const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
 		struct tcphdr *th = tcp_hdr(skb);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 net-next v4 3/6] net: add code for TCP fraglist GRO
  2024-04-27 18:22 [PATCH v4 net-next v4 0/6] Add TCP fraglist GRO support Felix Fietkau
  2024-04-27 18:22 ` [PATCH v4 net-next v4 1/6] net: move skb_gro_receive_list from udp to core Felix Fietkau
  2024-04-27 18:22 ` [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets Felix Fietkau
@ 2024-04-27 18:22 ` Felix Fietkau
  2024-04-27 18:23 ` [PATCH v4 net-next v4 4/6] net: create tcp_gro_lookup helper function Felix Fietkau
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Felix Fietkau @ 2024-04-27 18:22 UTC (permalink / raw)
  To: netdev, Eric Dumazet, David S. Miller, David Ahern,
	Jakub Kicinski, Paolo Abeni
  Cc: willemdebruijn.kernel, linux-kernel

This implements fraglist GRO similar to how it's handled in UDP, however
no functional changes are added yet. The next change adds a heuristic for
using fraglist GRO instead of regular GRO.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
 net/ipv4/tcp_offload.c   | 21 +++++++++++++++++++++
 net/ipv6/tcpv6_offload.c |  9 +++++++++
 2 files changed, 30 insertions(+)

diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index affd4ed28cfe..aa7508676315 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -334,6 +334,18 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
 	flush |= (ntohl(th2->seq) + skb_gro_len(p)) ^ ntohl(th->seq);
 	flush |= skb_cmp_decrypted(p, skb);
 
+	if (unlikely(NAPI_GRO_CB(p)->is_flist)) {
+		flush |= (__force int)(flags ^ tcp_flag_word(th2));
+		flush |= skb->ip_summed != p->ip_summed;
+		flush |= skb->csum_level != p->csum_level;
+		flush |= NAPI_GRO_CB(p)->count >= 64;
+
+		if (flush || skb_gro_receive_list(p, skb))
+			mss = 1;
+
+		goto out_check_final;
+	}
+
 	if (flush || skb_gro_receive(p, skb)) {
 		mss = 1;
 		goto out_check_final;
@@ -400,6 +412,15 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
 	const struct iphdr *iph = ip_hdr(skb);
 	struct tcphdr *th = tcp_hdr(skb);
 
+	if (unlikely(NAPI_GRO_CB(skb)->is_flist)) {
+		skb_shinfo(skb)->gso_type |= SKB_GSO_FRAGLIST | SKB_GSO_TCPV4;
+		skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
+
+		__skb_incr_checksum_unnecessary(skb);
+
+		return 0;
+	}
+
 	th->check = ~tcp_v4_check(skb->len - thoff, iph->saddr,
 				  iph->daddr, 0);
 
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index 7180c30dbbef..575e2743e331 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -32,6 +32,15 @@ INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff)
 	const struct ipv6hdr *iph = ipv6_hdr(skb);
 	struct tcphdr *th = tcp_hdr(skb);
 
+	if (unlikely(NAPI_GRO_CB(skb)->is_flist)) {
+		skb_shinfo(skb)->gso_type |= SKB_GSO_FRAGLIST | SKB_GSO_TCPV6;
+		skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
+
+		__skb_incr_checksum_unnecessary(skb);
+
+		return 0;
+	}
+
 	th->check = ~tcp_v6_check(skb->len - thoff, &iph->saddr,
 				  &iph->daddr, 0);
 	skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV6;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 net-next v4 4/6] net: create tcp_gro_lookup helper function
  2024-04-27 18:22 [PATCH v4 net-next v4 0/6] Add TCP fraglist GRO support Felix Fietkau
                   ` (2 preceding siblings ...)
  2024-04-27 18:22 ` [PATCH v4 net-next v4 3/6] net: add code for TCP fraglist GRO Felix Fietkau
@ 2024-04-27 18:23 ` Felix Fietkau
  2024-04-27 18:23 ` [PATCH v4 net-next v4 5/6] net: create tcp_gro_header_pull " Felix Fietkau
  2024-04-27 18:23 ` [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO Felix Fietkau
  5 siblings, 0 replies; 20+ messages in thread
From: Felix Fietkau @ 2024-04-27 18:23 UTC (permalink / raw)
  To: netdev, Eric Dumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni, David Ahern
  Cc: willemdebruijn.kernel, linux-kernel

This pulls the flow port matching out of tcp_gro_receive, so that it can be
reused for the next change, which adds the TCP fraglist GRO heuristic.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
 include/net/tcp.h      |  1 +
 net/ipv4/tcp_offload.c | 41 +++++++++++++++++++++++++----------------
 2 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index a9eb21251195..9f08ecab26e0 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2199,6 +2199,7 @@ void tcp_v4_destroy_sock(struct sock *sk);
 
 struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 				netdev_features_t features);
+struct sk_buff *tcp_gro_lookup(struct list_head *head, struct tcphdr *th);
 struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb);
 INDIRECT_CALLABLE_DECLARE(int tcp4_gro_complete(struct sk_buff *skb, int thoff));
 INDIRECT_CALLABLE_DECLARE(struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb));
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index aa7508676315..4a194a9d36cd 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -245,6 +245,27 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 	return segs;
 }
 
+struct sk_buff *tcp_gro_lookup(struct list_head *head, struct tcphdr *th)
+{
+	struct tcphdr *th2;
+	struct sk_buff *p;
+
+	list_for_each_entry(p, head, list) {
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		th2 = tcp_hdr(p);
+		if (*(u32 *)&th->source ^ *(u32 *)&th2->source) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+
+		return p;
+	}
+
+	return NULL;
+}
+
 struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
 {
 	struct sk_buff *pp = NULL;
@@ -282,24 +303,12 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
 	len = skb_gro_len(skb);
 	flags = tcp_flag_word(th);
 
-	list_for_each_entry(p, head, list) {
-		if (!NAPI_GRO_CB(p)->same_flow)
-			continue;
-
-		th2 = tcp_hdr(p);
-
-		if (*(u32 *)&th->source ^ *(u32 *)&th2->source) {
-			NAPI_GRO_CB(p)->same_flow = 0;
-			continue;
-		}
-
-		goto found;
-	}
-	p = NULL;
-	goto out_check_final;
+	p = tcp_gro_lookup(head, th);
+	if (!p)
+		goto out_check_final;
 
-found:
 	/* Include the IP ID check below from the inner most IP hdr */
+	th2 = tcp_hdr(p);
 	flush = NAPI_GRO_CB(p)->flush;
 	flush |= (__force int)(flags & TCP_FLAG_CWR);
 	flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 net-next v4 5/6] net: create tcp_gro_header_pull helper function
  2024-04-27 18:22 [PATCH v4 net-next v4 0/6] Add TCP fraglist GRO support Felix Fietkau
                   ` (3 preceding siblings ...)
  2024-04-27 18:23 ` [PATCH v4 net-next v4 4/6] net: create tcp_gro_lookup helper function Felix Fietkau
@ 2024-04-27 18:23 ` Felix Fietkau
  2024-04-27 18:23 ` [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO Felix Fietkau
  5 siblings, 0 replies; 20+ messages in thread
From: Felix Fietkau @ 2024-04-27 18:23 UTC (permalink / raw)
  To: netdev, Eric Dumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni, David Ahern
  Cc: willemdebruijn.kernel, linux-kernel

Pull the code out of tcp_gro_receive in order to access the tcp header
from tcp4/6_gro_receive.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
 include/net/tcp.h        |  4 ++-
 net/ipv4/tcp_offload.c   | 55 +++++++++++++++++++++++++---------------
 net/ipv6/tcpv6_offload.c | 18 +++++++++----
 3 files changed, 50 insertions(+), 27 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9f08ecab26e0..bf392c1fbadc 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2199,8 +2199,10 @@ void tcp_v4_destroy_sock(struct sock *sk);
 
 struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 				netdev_features_t features);
+struct tcphdr *tcp_gro_pull_header(struct sk_buff *skb);
 struct sk_buff *tcp_gro_lookup(struct list_head *head, struct tcphdr *th);
-struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb);
+struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
+				struct tcphdr *th);
 INDIRECT_CALLABLE_DECLARE(int tcp4_gro_complete(struct sk_buff *skb, int thoff));
 INDIRECT_CALLABLE_DECLARE(struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb));
 INDIRECT_CALLABLE_DECLARE(int tcp6_gro_complete(struct sk_buff *skb, int thoff));
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 4a194a9d36cd..87ae9808e260 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -266,40 +266,46 @@ struct sk_buff *tcp_gro_lookup(struct list_head *head, struct tcphdr *th)
 	return NULL;
 }
 
-struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
+struct tcphdr *tcp_gro_pull_header(struct sk_buff *skb)
 {
-	struct sk_buff *pp = NULL;
-	struct sk_buff *p;
+	unsigned int thlen, hlen, off;
 	struct tcphdr *th;
-	struct tcphdr *th2;
-	unsigned int len;
-	unsigned int thlen;
-	__be32 flags;
-	unsigned int mss = 1;
-	unsigned int hlen;
-	unsigned int off;
-	int flush = 1;
-	int i;
 
 	off = skb_gro_offset(skb);
 	hlen = off + sizeof(*th);
 	th = skb_gro_header(skb, hlen, off);
 	if (unlikely(!th))
-		goto out;
+		return NULL;
 
 	thlen = th->doff * 4;
 	if (thlen < sizeof(*th))
-		goto out;
+		return NULL;
 
 	hlen = off + thlen;
 	if (!skb_gro_may_pull(skb, hlen)) {
 		th = skb_gro_header_slow(skb, hlen, off);
 		if (unlikely(!th))
-			goto out;
+			return NULL;
 	}
 
 	skb_gro_pull(skb, thlen);
 
+	return th;
+}
+
+struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
+				struct tcphdr *th)
+{
+	unsigned int thlen = th->doff * 4;
+	struct sk_buff *pp = NULL;
+	struct sk_buff *p;
+	struct tcphdr *th2;
+	unsigned int len;
+	__be32 flags;
+	unsigned int mss = 1;
+	int flush = 1;
+	int i;
+
 	len = skb_gro_len(skb);
 	flags = tcp_flag_word(th);
 
@@ -376,7 +382,6 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
 	if (p && (!NAPI_GRO_CB(skb)->same_flow || flush))
 		pp = p;
 
-out:
 	NAPI_GRO_CB(skb)->flush |= (flush != 0);
 
 	return pp;
@@ -405,15 +410,23 @@ EXPORT_SYMBOL(tcp_gro_complete);
 INDIRECT_CALLABLE_SCOPE
 struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb)
 {
+	struct tcphdr *th;
+
 	/* Don't bother verifying checksum if we're going to flush anyway. */
 	if (!NAPI_GRO_CB(skb)->flush &&
 	    skb_gro_checksum_validate(skb, IPPROTO_TCP,
-				      inet_gro_compute_pseudo)) {
-		NAPI_GRO_CB(skb)->flush = 1;
-		return NULL;
-	}
+				      inet_gro_compute_pseudo))
+		goto flush;
+
+	th = tcp_gro_pull_header(skb);
+	if (!th)
+		goto flush;
 
-	return tcp_gro_receive(head, skb);
+	return tcp_gro_receive(head, skb, th);
+
+flush:
+	NAPI_GRO_CB(skb)->flush = 1;
+	return NULL;
 }
 
 INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index 575e2743e331..e73a4f74fd96 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -16,15 +16,23 @@
 INDIRECT_CALLABLE_SCOPE
 struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb)
 {
+	struct tcphdr *th;
+
 	/* Don't bother verifying checksum if we're going to flush anyway. */
 	if (!NAPI_GRO_CB(skb)->flush &&
 	    skb_gro_checksum_validate(skb, IPPROTO_TCP,
-				      ip6_gro_compute_pseudo)) {
-		NAPI_GRO_CB(skb)->flush = 1;
-		return NULL;
-	}
+				      ip6_gro_compute_pseudo))
+		goto flush;
+
+	th = tcp_gro_pull_header(skb);
+	if (!th)
+		goto flush;
+
+	return tcp_gro_receive(head, skb, th);
 
-	return tcp_gro_receive(head, skb);
+flush:
+	NAPI_GRO_CB(skb)->flush = 1;
+	return NULL;
 }
 
 INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO
  2024-04-27 18:22 [PATCH v4 net-next v4 0/6] Add TCP fraglist GRO support Felix Fietkau
                   ` (4 preceding siblings ...)
  2024-04-27 18:23 ` [PATCH v4 net-next v4 5/6] net: create tcp_gro_header_pull " Felix Fietkau
@ 2024-04-27 18:23 ` Felix Fietkau
  2024-04-30  3:25   ` Jakub Kicinski
                     ` (2 more replies)
  5 siblings, 3 replies; 20+ messages in thread
From: Felix Fietkau @ 2024-04-27 18:23 UTC (permalink / raw)
  To: netdev, Eric Dumazet, David S. Miller, David Ahern,
	Jakub Kicinski, Paolo Abeni
  Cc: willemdebruijn.kernel, linux-kernel

When forwarding TCP after GRO, software segmentation is very expensive,
especially when the checksum needs to be recalculated.
One case where that's currently unavoidable is when routing packets over
PPPoE. Performance improves significantly when using fraglist GRO
implemented in the same way as for UDP.

When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established
socket in the same netns as the receiving device. While this may not
cover all relevant use cases in multi-netns configurations, it should be
good enough for most configurations that need this.

Here's a measurement of running 2 TCP streams through a MediaTek MT7622
device (2-core Cortex-A53), which runs NAT with flow offload enabled from
one ethernet port to PPPoE on another ethernet port + cake qdisc set to
1Gbps.

rx-gro-list off: 630 Mbit/s, CPU 35% idle
rx-gro-list on:  770 Mbit/s, CPU 40% idle

Signe-off-by: Felix Fietkau <nbd@nbd.name>
---
 net/ipv4/tcp_offload.c   | 32 ++++++++++++++++++++++++++++++++
 net/ipv6/tcpv6_offload.c | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+)

diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 87ae9808e260..3e9b8c6f9c8c 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -407,6 +407,36 @@ void tcp_gro_complete(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(tcp_gro_complete);
 
+static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb,
+				    struct tcphdr *th)
+{
+	const struct iphdr *iph;
+	struct sk_buff *p;
+	struct sock *sk;
+	struct net *net;
+	int iif, sdif;
+
+	if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST))
+		return;
+
+	p = tcp_gro_lookup(head, th);
+	if (p) {
+		NAPI_GRO_CB(skb)->is_flist = NAPI_GRO_CB(p)->is_flist;
+		return;
+	}
+
+	inet_get_iif_sdif(skb, &iif, &sdif);
+	iph = skb_gro_network_header(skb);
+	net = dev_net(skb->dev);
+	sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo,
+				       iph->saddr, th->source,
+				       iph->daddr, ntohs(th->dest),
+				       iif, sdif);
+	NAPI_GRO_CB(skb)->is_flist = !sk;
+	if (sk)
+		sock_put(sk);
+}
+
 INDIRECT_CALLABLE_SCOPE
 struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb)
 {
@@ -422,6 +452,8 @@ struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb)
 	if (!th)
 		goto flush;
 
+	tcp4_check_fraglist_gro(head, skb, th);
+
 	return tcp_gro_receive(head, skb, th);
 
 flush:
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index e73a4f74fd96..ba7b0b3cb9f2 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -7,12 +7,45 @@
  */
 #include <linux/indirect_call_wrapper.h>
 #include <linux/skbuff.h>
+#include <net/inet6_hashtables.h>
 #include <net/gro.h>
 #include <net/protocol.h>
 #include <net/tcp.h>
 #include <net/ip6_checksum.h>
 #include "ip6_offload.h"
 
+static void tcp6_check_fraglist_gro(struct list_head *head, struct sk_buff *skb,
+				    struct tcphdr *th)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+	const struct ipv6hdr *hdr;
+	struct sk_buff *p;
+	struct sock *sk;
+	struct net *net;
+	int iif, sdif;
+
+	if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST))
+		return;
+
+	p = tcp_gro_lookup(head, th);
+	if (p) {
+		NAPI_GRO_CB(skb)->is_flist = NAPI_GRO_CB(p)->is_flist;
+		return;
+	}
+
+	inet6_get_iif_sdif(skb, &iif, &sdif);
+	hdr = skb_gro_network_header(skb);
+	net = dev_net(skb->dev);
+	sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo,
+					&hdr->saddr, th->source,
+					&hdr->daddr, ntohs(th->dest),
+					iif, sdif);
+	NAPI_GRO_CB(skb)->is_flist = !sk;
+	if (sk)
+		sock_put(sk);
+#endif /* IS_ENABLED(CONFIG_IPV6) */
+}
+
 INDIRECT_CALLABLE_SCOPE
 struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb)
 {
@@ -28,6 +61,8 @@ struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb)
 	if (!th)
 		goto flush;
 
+	tcp6_check_fraglist_gro(head, skb, th);
+
 	return tcp_gro_receive(head, skb, th);
 
 flush:
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO
  2024-04-27 18:23 ` [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO Felix Fietkau
@ 2024-04-30  3:25   ` Jakub Kicinski
  2024-04-30 10:12   ` Paolo Abeni
  2024-04-30 10:33   ` Eric Dumazet
  2 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2024-04-30  3:25 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: netdev, Eric Dumazet, David S. Miller, David Ahern, Paolo Abeni,
	willemdebruijn.kernel, linux-kernel

On Sat, 27 Apr 2024 20:23:02 +0200 Felix Fietkau wrote:
> Signe-off-by: Felix Fietkau <nbd@nbd.name>
       ^ 
       d

If you have to respin - please update.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO
  2024-04-27 18:23 ` [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO Felix Fietkau
  2024-04-30  3:25   ` Jakub Kicinski
@ 2024-04-30 10:12   ` Paolo Abeni
  2024-04-30 10:23     ` Felix Fietkau
  2024-04-30 10:33   ` Eric Dumazet
  2 siblings, 1 reply; 20+ messages in thread
From: Paolo Abeni @ 2024-04-30 10:12 UTC (permalink / raw)
  To: Felix Fietkau, netdev, Eric Dumazet, David S. Miller,
	David Ahern, Jakub Kicinski
  Cc: willemdebruijn.kernel, linux-kernel

On Sat, 2024-04-27 at 20:23 +0200, Felix Fietkau wrote:
> When forwarding TCP after GRO, software segmentation is very expensive,
> especially when the checksum needs to be recalculated.
> One case where that's currently unavoidable is when routing packets over
> PPPoE. Performance improves significantly when using fraglist GRO
> implemented in the same way as for UDP.
> 
> When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established
> socket in the same netns as the receiving device. While this may not
> cover all relevant use cases in multi-netns configurations, it should be
> good enough for most configurations that need this.
> 
> Here's a measurement of running 2 TCP streams through a MediaTek MT7622
> device (2-core Cortex-A53), which runs NAT with flow offload enabled from
> one ethernet port to PPPoE on another ethernet port + cake qdisc set to
> 1Gbps.
> 
> rx-gro-list off: 630 Mbit/s, CPU 35% idle
> rx-gro-list on:  770 Mbit/s, CPU 40% idle
> 
> Signe-off-by: Felix Fietkau <nbd@nbd.name>
> ---
>  net/ipv4/tcp_offload.c   | 32 ++++++++++++++++++++++++++++++++
>  net/ipv6/tcpv6_offload.c | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 67 insertions(+)
> 
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 87ae9808e260..3e9b8c6f9c8c 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -407,6 +407,36 @@ void tcp_gro_complete(struct sk_buff *skb)
>  }
>  EXPORT_SYMBOL(tcp_gro_complete);
>  
> +static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb,
> +				    struct tcphdr *th)
> +{
> +	const struct iphdr *iph;
> +	struct sk_buff *p;
> +	struct sock *sk;
> +	struct net *net;
> +	int iif, sdif;
> +
> +	if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST))

Should we add an 'unlikely()' here to pair with unlikely(is_flist) in
*gro_receive / *gro_complete?

Should this test be moved into the caller, to avoid an unconditional
function call in the ipv6 code?

(Also waiting for explicit ack from Eric)

Thank,

Paolo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets
  2024-04-27 18:22 ` [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets Felix Fietkau
@ 2024-04-30 10:19   ` Paolo Abeni
  2024-04-30 10:27     ` Felix Fietkau
  2024-04-30 10:23   ` Paolo Abeni
  2024-04-30 11:31   ` Eric Dumazet
  2 siblings, 1 reply; 20+ messages in thread
From: Paolo Abeni @ 2024-04-30 10:19 UTC (permalink / raw)
  To: Felix Fietkau, netdev, Eric Dumazet, David S. Miller,
	David Ahern, Jakub Kicinski
  Cc: willemdebruijn.kernel, linux-kernel

On Sat, 2024-04-27 at 20:22 +0200, Felix Fietkau wrote:
> Preparation for adding TCP fraglist GRO support. It expects packets to be
> combined in a similar way as UDP fraglist GSO packets.
> For IPv4 packets, NAT is handled in the same way as UDP fraglist GSO.
> 
> Signed-off-by: Felix Fietkau <nbd@nbd.name>
> ---
>  net/ipv4/tcp_offload.c   | 67 ++++++++++++++++++++++++++++++++++++++++
>  net/ipv6/tcpv6_offload.c | 58 ++++++++++++++++++++++++++++++++++
>  2 files changed, 125 insertions(+)
> 
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index fab0973f995b..affd4ed28cfe 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -28,6 +28,70 @@ static void tcp_gso_tstamp(struct sk_buff *skb, unsigned int ts_seq,
>  	}
>  }
>  
> +static void __tcpv4_gso_segment_csum(struct sk_buff *seg,
> +				     __be32 *oldip, __be32 newip,
> +				     __be16 *oldport, __be16 newport)
> +{
> +	struct tcphdr *th;
> +	struct iphdr *iph;
> +
> +	if (*oldip == newip && *oldport == newport)
> +		return;
> +
> +	th = tcp_hdr(seg);
> +	iph = ip_hdr(seg);
> +
> +	inet_proto_csum_replace4(&th->check, seg, *oldip, newip, true);
> +	inet_proto_csum_replace2(&th->check, seg, *oldport, newport, false);
> +	*oldport = newport;
> +
> +	csum_replace4(&iph->check, *oldip, newip);
> +	*oldip = newip;
> +}
> +
> +static struct sk_buff *__tcpv4_gso_segment_list_csum(struct sk_buff *segs)
> +{
> +	const struct tcphdr *th;
> +	const struct iphdr *iph;
> +	struct sk_buff *seg;
> +	struct tcphdr *th2;
> +	struct iphdr *iph2;
> +
> +	seg = segs;
> +	th = tcp_hdr(seg);
> +	iph = ip_hdr(seg);
> +	th2 = tcp_hdr(seg->next);
> +	iph2 = ip_hdr(seg->next);
> +
> +	if (!(*(const u32 *)&th->source ^ *(const u32 *)&th2->source) &&
> +	    iph->daddr == iph2->daddr && iph->saddr == iph2->saddr)
> +		return segs;
> +
> +	while ((seg = seg->next)) {
> +		th2 = tcp_hdr(seg);
> +		iph2 = ip_hdr(seg);
> +
> +		__tcpv4_gso_segment_csum(seg,
> +					 &iph2->saddr, iph->saddr,
> +					 &th2->source, th->source);
> +		__tcpv4_gso_segment_csum(seg,
> +					 &iph2->daddr, iph->daddr,
> +					 &th2->dest, th->dest);
> +	}
> +
> +	return segs;
> +}

AFAICS, all the above is really alike the UDP side, except for the
transport header zero csum.

What about renaming the udp version of this helpers as 'tcpudpv4_...',
move them in common code, add an explicit argument for
'zerocsum_allowed' and reuse such helper for both tcp and udp?

The same for the ipv6 variant. 

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO
  2024-04-30 10:12   ` Paolo Abeni
@ 2024-04-30 10:23     ` Felix Fietkau
  2024-04-30 10:31       ` Paolo Abeni
  0 siblings, 1 reply; 20+ messages in thread
From: Felix Fietkau @ 2024-04-30 10:23 UTC (permalink / raw)
  To: Paolo Abeni, netdev, Eric Dumazet, David S. Miller, David Ahern,
	Jakub Kicinski
  Cc: willemdebruijn.kernel, linux-kernel

On 30.04.24 12:12, Paolo Abeni wrote:
> On Sat, 2024-04-27 at 20:23 +0200, Felix Fietkau wrote:
>> When forwarding TCP after GRO, software segmentation is very expensive,
>> especially when the checksum needs to be recalculated.
>> One case where that's currently unavoidable is when routing packets over
>> PPPoE. Performance improves significantly when using fraglist GRO
>> implemented in the same way as for UDP.
>> 
>> When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established
>> socket in the same netns as the receiving device. While this may not
>> cover all relevant use cases in multi-netns configurations, it should be
>> good enough for most configurations that need this.
>> 
>> Here's a measurement of running 2 TCP streams through a MediaTek MT7622
>> device (2-core Cortex-A53), which runs NAT with flow offload enabled from
>> one ethernet port to PPPoE on another ethernet port + cake qdisc set to
>> 1Gbps.
>> 
>> rx-gro-list off: 630 Mbit/s, CPU 35% idle
>> rx-gro-list on:  770 Mbit/s, CPU 40% idle
>> 
>> Signe-off-by: Felix Fietkau <nbd@nbd.name>
>> ---
>>  net/ipv4/tcp_offload.c   | 32 ++++++++++++++++++++++++++++++++
>>  net/ipv6/tcpv6_offload.c | 35 +++++++++++++++++++++++++++++++++++
>>  2 files changed, 67 insertions(+)
>> 
>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
>> index 87ae9808e260..3e9b8c6f9c8c 100644
>> --- a/net/ipv4/tcp_offload.c
>> +++ b/net/ipv4/tcp_offload.c
>> @@ -407,6 +407,36 @@ void tcp_gro_complete(struct sk_buff *skb)
>>  }
>>  EXPORT_SYMBOL(tcp_gro_complete);
>>  
>> +static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb,
>> +				    struct tcphdr *th)
>> +{
>> +	const struct iphdr *iph;
>> +	struct sk_buff *p;
>> +	struct sock *sk;
>> +	struct net *net;
>> +	int iif, sdif;
>> +
>> +	if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST))
> 
> Should we add an 'unlikely()' here to pair with unlikely(is_flist) in
> *gro_receive / *gro_complete?
Not sure if unlikely() will make any difference here. I think it makes 
more sense in the other places than here.

> Should this test be moved into the caller, to avoid an unconditional
> function call in the ipv6 code?

The function is already called from tcp4_gro_receive, which is only 
called by IPv4 code. Also, since it's a static function called in only 
one place, it gets inlined by the compiler (at least in my builds).
Not sure what unconditional function call you're referring to.

- Felix

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets
  2024-04-27 18:22 ` [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets Felix Fietkau
  2024-04-30 10:19   ` Paolo Abeni
@ 2024-04-30 10:23   ` Paolo Abeni
  2024-04-30 11:31   ` Eric Dumazet
  2 siblings, 0 replies; 20+ messages in thread
From: Paolo Abeni @ 2024-04-30 10:23 UTC (permalink / raw)
  To: Felix Fietkau, netdev, Eric Dumazet, David S. Miller,
	David Ahern, Jakub Kicinski
  Cc: willemdebruijn.kernel, linux-kernel

On Sat, 2024-04-27 at 20:22 +0200, Felix Fietkau wrote:
> @@ -37,6 +101,9 @@ static struct sk_buff *tcp4_gso_segment(struct sk_buff *skb,
>  	if (!pskb_may_pull(skb, sizeof(struct tcphdr)))
>  		return ERR_PTR(-EINVAL);
>  
> +	if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST)
> +		return __tcp4_gso_segment_list(skb, features);

I'm sorry for the incremental feedback, I almost forgot.

Possibly the above condition could deserve an unlikely() annotation?
less relevant than for GRO case, but at least we have consistent
handling of such flag.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets
  2024-04-30 10:19   ` Paolo Abeni
@ 2024-04-30 10:27     ` Felix Fietkau
  2024-04-30 10:40       ` Paolo Abeni
  0 siblings, 1 reply; 20+ messages in thread
From: Felix Fietkau @ 2024-04-30 10:27 UTC (permalink / raw)
  To: Paolo Abeni, netdev, Eric Dumazet, David S. Miller, David Ahern,
	Jakub Kicinski
  Cc: willemdebruijn.kernel, linux-kernel

On 30.04.24 12:19, Paolo Abeni wrote:
> On Sat, 2024-04-27 at 20:22 +0200, Felix Fietkau wrote:
>> Preparation for adding TCP fraglist GRO support. It expects packets to be
>> combined in a similar way as UDP fraglist GSO packets.
>> For IPv4 packets, NAT is handled in the same way as UDP fraglist GSO.
>> 
>> Signed-off-by: Felix Fietkau <nbd@nbd.name>
>> ---
>>  net/ipv4/tcp_offload.c   | 67 ++++++++++++++++++++++++++++++++++++++++
>>  net/ipv6/tcpv6_offload.c | 58 ++++++++++++++++++++++++++++++++++
>>  2 files changed, 125 insertions(+)
>> 
>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
>> index fab0973f995b..affd4ed28cfe 100644
>> --- a/net/ipv4/tcp_offload.c
>> +++ b/net/ipv4/tcp_offload.c
>> @@ -28,6 +28,70 @@ static void tcp_gso_tstamp(struct sk_buff *skb, unsigned int ts_seq,
>>  	}
>>  }
>>  
>> +static void __tcpv4_gso_segment_csum(struct sk_buff *seg,
>> +				     __be32 *oldip, __be32 newip,
>> +				     __be16 *oldport, __be16 newport)
>> +{
>> +	struct tcphdr *th;
>> +	struct iphdr *iph;
>> +
>> +	if (*oldip == newip && *oldport == newport)
>> +		return;
>> +
>> +	th = tcp_hdr(seg);
>> +	iph = ip_hdr(seg);
>> +
>> +	inet_proto_csum_replace4(&th->check, seg, *oldip, newip, true);
>> +	inet_proto_csum_replace2(&th->check, seg, *oldport, newport, false);
>> +	*oldport = newport;
>> +
>> +	csum_replace4(&iph->check, *oldip, newip);
>> +	*oldip = newip;
>> +}
>> +
>> +static struct sk_buff *__tcpv4_gso_segment_list_csum(struct sk_buff *segs)
>> +{
>> +	const struct tcphdr *th;
>> +	const struct iphdr *iph;
>> +	struct sk_buff *seg;
>> +	struct tcphdr *th2;
>> +	struct iphdr *iph2;
>> +
>> +	seg = segs;
>> +	th = tcp_hdr(seg);
>> +	iph = ip_hdr(seg);
>> +	th2 = tcp_hdr(seg->next);
>> +	iph2 = ip_hdr(seg->next);
>> +
>> +	if (!(*(const u32 *)&th->source ^ *(const u32 *)&th2->source) &&
>> +	    iph->daddr == iph2->daddr && iph->saddr == iph2->saddr)
>> +		return segs;
>> +
>> +	while ((seg = seg->next)) {
>> +		th2 = tcp_hdr(seg);
>> +		iph2 = ip_hdr(seg);
>> +
>> +		__tcpv4_gso_segment_csum(seg,
>> +					 &iph2->saddr, iph->saddr,
>> +					 &th2->source, th->source);
>> +		__tcpv4_gso_segment_csum(seg,
>> +					 &iph2->daddr, iph->daddr,
>> +					 &th2->dest, th->dest);
>> +	}
>> +
>> +	return segs;
>> +}
> 
> AFAICS, all the above is really alike the UDP side, except for the
> transport header zero csum.
> 
> What about renaming the udp version of this helpers as 'tcpudpv4_...',
> move them in common code, add an explicit argument for
> 'zerocsum_allowed' and reuse such helper for both tcp and udp?
> 
> The same for the ipv6 variant.

Wouldn't that make it more convoluted when taking into account that the 
checksum field offset is different for tcp vs udp?
How would you handle that?

- Felix

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO
  2024-04-30 10:23     ` Felix Fietkau
@ 2024-04-30 10:31       ` Paolo Abeni
  2024-04-30 10:55         ` Felix Fietkau
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Abeni @ 2024-04-30 10:31 UTC (permalink / raw)
  To: Felix Fietkau, netdev, Eric Dumazet, David S. Miller,
	David Ahern, Jakub Kicinski
  Cc: willemdebruijn.kernel, linux-kernel

On Tue, 2024-04-30 at 12:23 +0200, Felix Fietkau wrote:
> On 30.04.24 12:12, Paolo Abeni wrote:
> > On Sat, 2024-04-27 at 20:23 +0200, Felix Fietkau wrote:
> > > When forwarding TCP after GRO, software segmentation is very expensive,
> > > especially when the checksum needs to be recalculated.
> > > One case where that's currently unavoidable is when routing packets over
> > > PPPoE. Performance improves significantly when using fraglist GRO
> > > implemented in the same way as for UDP.
> > > 
> > > When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established
> > > socket in the same netns as the receiving device. While this may not
> > > cover all relevant use cases in multi-netns configurations, it should be
> > > good enough for most configurations that need this.
> > > 
> > > Here's a measurement of running 2 TCP streams through a MediaTek MT7622
> > > device (2-core Cortex-A53), which runs NAT with flow offload enabled from
> > > one ethernet port to PPPoE on another ethernet port + cake qdisc set to
> > > 1Gbps.
> > > 
> > > rx-gro-list off: 630 Mbit/s, CPU 35% idle
> > > rx-gro-list on:  770 Mbit/s, CPU 40% idle
> > > 
> > > Signe-off-by: Felix Fietkau <nbd@nbd.name>
> > > ---
> > >  net/ipv4/tcp_offload.c   | 32 ++++++++++++++++++++++++++++++++
> > >  net/ipv6/tcpv6_offload.c | 35 +++++++++++++++++++++++++++++++++++
> > >  2 files changed, 67 insertions(+)
> > > 
> > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> > > index 87ae9808e260..3e9b8c6f9c8c 100644
> > > --- a/net/ipv4/tcp_offload.c
> > > +++ b/net/ipv4/tcp_offload.c
> > > @@ -407,6 +407,36 @@ void tcp_gro_complete(struct sk_buff *skb)
> > >  }
> > >  EXPORT_SYMBOL(tcp_gro_complete);
> > >  
> > > +static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb,
> > > +				    struct tcphdr *th)
> > > +{
> > > +	const struct iphdr *iph;
> > > +	struct sk_buff *p;
> > > +	struct sock *sk;
> > > +	struct net *net;
> > > +	int iif, sdif;
> > > +
> > > +	if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST))
> > 
> > Should we add an 'unlikely()' here to pair with unlikely(is_flist) in
> > *gro_receive / *gro_complete?
> Not sure if unlikely() will make any difference here. I think it makes 
> more sense in the other places than here.

Why? AFAICS this will be called for every packet on the wire, exactly
as the code getting this annotation in patch 3/6.

> > Should this test be moved into the caller, to avoid an unconditional
> > function call in the ipv6 code?
> 
> The function is already called from tcp4_gro_receive, which is only 
> called by IPv4 code. Also, since it's a static function called in only 
> one place, it gets inlined by the compiler (at least in my builds).
> Not sure what unconditional function call you're referring to.

Right you are. I just got fooled by my hope to reuse the same function
for ipv4 and v6. Please just ignore this last part.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO
  2024-04-27 18:23 ` [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO Felix Fietkau
  2024-04-30  3:25   ` Jakub Kicinski
  2024-04-30 10:12   ` Paolo Abeni
@ 2024-04-30 10:33   ` Eric Dumazet
  2 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-04-30 10:33 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: netdev, David S. Miller, David Ahern, Jakub Kicinski,
	Paolo Abeni, willemdebruijn.kernel, linux-kernel

On Sat, Apr 27, 2024 at 8:23 PM Felix Fietkau <nbd@nbd.name> wrote:
>
> When forwarding TCP after GRO, software segmentation is very expensive,
> especially when the checksum needs to be recalculated.
> One case where that's currently unavoidable is when routing packets over
> PPPoE. Performance improves significantly when using fraglist GRO
> implemented in the same way as for UDP.
>
> When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established
> socket in the same netns as the receiving device. While this may not
> cover all relevant use cases in multi-netns configurations, it should be
> good enough for most configurations that need this.
>
> Here's a measurement of running 2 TCP streams through a MediaTek MT7622
> device (2-core Cortex-A53), which runs NAT with flow offload enabled from
> one ethernet port to PPPoE on another ethernet port + cake qdisc set to
> 1Gbps.
>
> rx-gro-list off: 630 Mbit/s, CPU 35% idle
> rx-gro-list on:  770 Mbit/s, CPU 40% idle
>
> Signe-off-by: Felix Fietkau <nbd@nbd.name>


Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets
  2024-04-30 10:27     ` Felix Fietkau
@ 2024-04-30 10:40       ` Paolo Abeni
  2024-04-30 10:57         ` Felix Fietkau
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Abeni @ 2024-04-30 10:40 UTC (permalink / raw)
  To: Felix Fietkau, netdev, Eric Dumazet, David S. Miller,
	David Ahern, Jakub Kicinski
  Cc: willemdebruijn.kernel, linux-kernel

On Tue, 2024-04-30 at 12:27 +0200, Felix Fietkau wrote:
> On 30.04.24 12:19, Paolo Abeni wrote:
> > On Sat, 2024-04-27 at 20:22 +0200, Felix Fietkau wrote:
> > > Preparation for adding TCP fraglist GRO support. It expects packets to be
> > > combined in a similar way as UDP fraglist GSO packets.
> > > For IPv4 packets, NAT is handled in the same way as UDP fraglist GSO.
> > > 
> > > Signed-off-by: Felix Fietkau <nbd@nbd.name>
> > > ---
> > >  net/ipv4/tcp_offload.c   | 67 ++++++++++++++++++++++++++++++++++++++++
> > >  net/ipv6/tcpv6_offload.c | 58 ++++++++++++++++++++++++++++++++++
> > >  2 files changed, 125 insertions(+)
> > > 
> > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> > > index fab0973f995b..affd4ed28cfe 100644
> > > --- a/net/ipv4/tcp_offload.c
> > > +++ b/net/ipv4/tcp_offload.c
> > > @@ -28,6 +28,70 @@ static void tcp_gso_tstamp(struct sk_buff *skb, unsigned int ts_seq,
> > >  	}
> > >  }
> > >  
> > > +static void __tcpv4_gso_segment_csum(struct sk_buff *seg,
> > > +				     __be32 *oldip, __be32 newip,
> > > +				     __be16 *oldport, __be16 newport)
> > > +{
> > > +	struct tcphdr *th;
> > > +	struct iphdr *iph;
> > > +
> > > +	if (*oldip == newip && *oldport == newport)
> > > +		return;
> > > +
> > > +	th = tcp_hdr(seg);
> > > +	iph = ip_hdr(seg);
> > > +
> > > +	inet_proto_csum_replace4(&th->check, seg, *oldip, newip, true);
> > > +	inet_proto_csum_replace2(&th->check, seg, *oldport, newport, false);
> > > +	*oldport = newport;
> > > +
> > > +	csum_replace4(&iph->check, *oldip, newip);
> > > +	*oldip = newip;
> > > +}
> > > +
> > > +static struct sk_buff *__tcpv4_gso_segment_list_csum(struct sk_buff *segs)
> > > +{
> > > +	const struct tcphdr *th;
> > > +	const struct iphdr *iph;
> > > +	struct sk_buff *seg;
> > > +	struct tcphdr *th2;
> > > +	struct iphdr *iph2;
> > > +
> > > +	seg = segs;
> > > +	th = tcp_hdr(seg);
> > > +	iph = ip_hdr(seg);
> > > +	th2 = tcp_hdr(seg->next);
> > > +	iph2 = ip_hdr(seg->next);
> > > +
> > > +	if (!(*(const u32 *)&th->source ^ *(const u32 *)&th2->source) &&
> > > +	    iph->daddr == iph2->daddr && iph->saddr == iph2->saddr)
> > > +		return segs;
> > > +
> > > +	while ((seg = seg->next)) {
> > > +		th2 = tcp_hdr(seg);
> > > +		iph2 = ip_hdr(seg);
> > > +
> > > +		__tcpv4_gso_segment_csum(seg,
> > > +					 &iph2->saddr, iph->saddr,
> > > +					 &th2->source, th->source);
> > > +		__tcpv4_gso_segment_csum(seg,
> > > +					 &iph2->daddr, iph->daddr,
> > > +					 &th2->dest, th->dest);
> > > +	}
> > > +
> > > +	return segs;
> > > +}
> > 
> > AFAICS, all the above is really alike the UDP side, except for the
> > transport header zero csum.
> > 
> > What about renaming the udp version of this helpers as 'tcpudpv4_...',
> > move them in common code, add an explicit argument for
> > 'zerocsum_allowed' and reuse such helper for both tcp and udp?
> > 
> > The same for the ipv6 variant.
> 
> Wouldn't that make it more convoluted when taking into account that the 
> checksum field offset is different for tcp vs udp?
> How would you handle that?

Probably making a common helper just for
__tcpudpv{4,6}_gso_segment_csum and pass it the target l4 csum pointer
as an additional argument. It would not be spectacularly nice, so no
strong opinion either way.

Cheers,

Paolo






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO
  2024-04-30 10:31       ` Paolo Abeni
@ 2024-04-30 10:55         ` Felix Fietkau
  2024-04-30 11:14           ` Paolo Abeni
  0 siblings, 1 reply; 20+ messages in thread
From: Felix Fietkau @ 2024-04-30 10:55 UTC (permalink / raw)
  To: Paolo Abeni, netdev, Eric Dumazet, David S. Miller, David Ahern,
	Jakub Kicinski
  Cc: willemdebruijn.kernel, linux-kernel

On 30.04.24 12:31, Paolo Abeni wrote:
> On Tue, 2024-04-30 at 12:23 +0200, Felix Fietkau wrote:
>> On 30.04.24 12:12, Paolo Abeni wrote:
>> > On Sat, 2024-04-27 at 20:23 +0200, Felix Fietkau wrote:
>> > > When forwarding TCP after GRO, software segmentation is very expensive,
>> > > especially when the checksum needs to be recalculated.
>> > > One case where that's currently unavoidable is when routing packets over
>> > > PPPoE. Performance improves significantly when using fraglist GRO
>> > > implemented in the same way as for UDP.
>> > > 
>> > > When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established
>> > > socket in the same netns as the receiving device. While this may not
>> > > cover all relevant use cases in multi-netns configurations, it should be
>> > > good enough for most configurations that need this.
>> > > 
>> > > Here's a measurement of running 2 TCP streams through a MediaTek MT7622
>> > > device (2-core Cortex-A53), which runs NAT with flow offload enabled from
>> > > one ethernet port to PPPoE on another ethernet port + cake qdisc set to
>> > > 1Gbps.
>> > > 
>> > > rx-gro-list off: 630 Mbit/s, CPU 35% idle
>> > > rx-gro-list on:  770 Mbit/s, CPU 40% idle
>> > > 
>> > > Signe-off-by: Felix Fietkau <nbd@nbd.name>
>> > > ---
>> > >  net/ipv4/tcp_offload.c   | 32 ++++++++++++++++++++++++++++++++
>> > >  net/ipv6/tcpv6_offload.c | 35 +++++++++++++++++++++++++++++++++++
>> > >  2 files changed, 67 insertions(+)
>> > > 
>> > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
>> > > index 87ae9808e260..3e9b8c6f9c8c 100644
>> > > --- a/net/ipv4/tcp_offload.c
>> > > +++ b/net/ipv4/tcp_offload.c
>> > > @@ -407,6 +407,36 @@ void tcp_gro_complete(struct sk_buff *skb)
>> > >  }
>> > >  EXPORT_SYMBOL(tcp_gro_complete);
>> > >  
>> > > +static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb,
>> > > +				    struct tcphdr *th)
>> > > +{
>> > > +	const struct iphdr *iph;
>> > > +	struct sk_buff *p;
>> > > +	struct sock *sk;
>> > > +	struct net *net;
>> > > +	int iif, sdif;
>> > > +
>> > > +	if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST))
>> > 
>> > Should we add an 'unlikely()' here to pair with unlikely(is_flist) in
>> > *gro_receive / *gro_complete?
>> Not sure if unlikely() will make any difference here. I think it makes 
>> more sense in the other places than here.
> 
> Why? AFAICS this will be called for every packet on the wire, exactly
> as the code getting this annotation in patch 3/6.

I had compared assembly after adding an annotation and didn't see a 
difference. However, my annotation was wrong.
When I add: if (likely(!(skb->dev->features & NETIF_F_GRO_FRAGLIST)))
the generated code is different, and I probably should use that.

- Felix


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets
  2024-04-30 10:40       ` Paolo Abeni
@ 2024-04-30 10:57         ` Felix Fietkau
  0 siblings, 0 replies; 20+ messages in thread
From: Felix Fietkau @ 2024-04-30 10:57 UTC (permalink / raw)
  To: Paolo Abeni, netdev, Eric Dumazet, David S. Miller, David Ahern,
	Jakub Kicinski
  Cc: willemdebruijn.kernel, linux-kernel

On 30.04.24 12:40, Paolo Abeni wrote:
> On Tue, 2024-04-30 at 12:27 +0200, Felix Fietkau wrote:
>> On 30.04.24 12:19, Paolo Abeni wrote:
>> > On Sat, 2024-04-27 at 20:22 +0200, Felix Fietkau wrote:
>> > > Preparation for adding TCP fraglist GRO support. It expects packets to be
>> > > combined in a similar way as UDP fraglist GSO packets.
>> > > For IPv4 packets, NAT is handled in the same way as UDP fraglist GSO.
>> > > 
>> > > Signed-off-by: Felix Fietkau <nbd@nbd.name>
>> > > ---
>> > >  net/ipv4/tcp_offload.c   | 67 ++++++++++++++++++++++++++++++++++++++++
>> > >  net/ipv6/tcpv6_offload.c | 58 ++++++++++++++++++++++++++++++++++
>> > >  2 files changed, 125 insertions(+)
>> > > 
>> > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
>> > > index fab0973f995b..affd4ed28cfe 100644
>> > > --- a/net/ipv4/tcp_offload.c
>> > > +++ b/net/ipv4/tcp_offload.c
>> > > @@ -28,6 +28,70 @@ static void tcp_gso_tstamp(struct sk_buff *skb, unsigned int ts_seq,
>> > >  	}
>> > >  }
>> > >  
>> > > +static void __tcpv4_gso_segment_csum(struct sk_buff *seg,
>> > > +				     __be32 *oldip, __be32 newip,
>> > > +				     __be16 *oldport, __be16 newport)
>> > > +{
>> > > +	struct tcphdr *th;
>> > > +	struct iphdr *iph;
>> > > +
>> > > +	if (*oldip == newip && *oldport == newport)
>> > > +		return;
>> > > +
>> > > +	th = tcp_hdr(seg);
>> > > +	iph = ip_hdr(seg);
>> > > +
>> > > +	inet_proto_csum_replace4(&th->check, seg, *oldip, newip, true);
>> > > +	inet_proto_csum_replace2(&th->check, seg, *oldport, newport, false);
>> > > +	*oldport = newport;
>> > > +
>> > > +	csum_replace4(&iph->check, *oldip, newip);
>> > > +	*oldip = newip;
>> > > +}
>> > > +
>> > > +static struct sk_buff *__tcpv4_gso_segment_list_csum(struct sk_buff *segs)
>> > > +{
>> > > +	const struct tcphdr *th;
>> > > +	const struct iphdr *iph;
>> > > +	struct sk_buff *seg;
>> > > +	struct tcphdr *th2;
>> > > +	struct iphdr *iph2;
>> > > +
>> > > +	seg = segs;
>> > > +	th = tcp_hdr(seg);
>> > > +	iph = ip_hdr(seg);
>> > > +	th2 = tcp_hdr(seg->next);
>> > > +	iph2 = ip_hdr(seg->next);
>> > > +
>> > > +	if (!(*(const u32 *)&th->source ^ *(const u32 *)&th2->source) &&
>> > > +	    iph->daddr == iph2->daddr && iph->saddr == iph2->saddr)
>> > > +		return segs;
>> > > +
>> > > +	while ((seg = seg->next)) {
>> > > +		th2 = tcp_hdr(seg);
>> > > +		iph2 = ip_hdr(seg);
>> > > +
>> > > +		__tcpv4_gso_segment_csum(seg,
>> > > +					 &iph2->saddr, iph->saddr,
>> > > +					 &th2->source, th->source);
>> > > +		__tcpv4_gso_segment_csum(seg,
>> > > +					 &iph2->daddr, iph->daddr,
>> > > +					 &th2->dest, th->dest);
>> > > +	}
>> > > +
>> > > +	return segs;
>> > > +}
>> > 
>> > AFAICS, all the above is really alike the UDP side, except for the
>> > transport header zero csum.
>> > 
>> > What about renaming the udp version of this helpers as 'tcpudpv4_...',
>> > move them in common code, add an explicit argument for
>> > 'zerocsum_allowed' and reuse such helper for both tcp and udp?
>> > 
>> > The same for the ipv6 variant.
>> 
>> Wouldn't that make it more convoluted when taking into account that the 
>> checksum field offset is different for tcp vs udp?
>> How would you handle that?
> 
> Probably making a common helper just for
> __tcpudpv{4,6}_gso_segment_csum and pass it the target l4 csum pointer
> as an additional argument. It would not be spectacularly nice, so no
> strong opinion either way.

I'd rather keep it duplicated but more straightforward and easier to read.

- Felix

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO
  2024-04-30 10:55         ` Felix Fietkau
@ 2024-04-30 11:14           ` Paolo Abeni
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Abeni @ 2024-04-30 11:14 UTC (permalink / raw)
  To: Felix Fietkau, netdev, Eric Dumazet, David S. Miller,
	David Ahern, Jakub Kicinski
  Cc: willemdebruijn.kernel, linux-kernel

On Tue, 2024-04-30 at 12:55 +0200, Felix Fietkau wrote:
> On 30.04.24 12:31, Paolo Abeni wrote:
> > On Tue, 2024-04-30 at 12:23 +0200, Felix Fietkau wrote:
> > > On 30.04.24 12:12, Paolo Abeni wrote:
> > > > On Sat, 2024-04-27 at 20:23 +0200, Felix Fietkau wrote:
> > > > > When forwarding TCP after GRO, software segmentation is very expensive,
> > > > > especially when the checksum needs to be recalculated.
> > > > > One case where that's currently unavoidable is when routing packets over
> > > > > PPPoE. Performance improves significantly when using fraglist GRO
> > > > > implemented in the same way as for UDP.
> > > > > 
> > > > > When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established
> > > > > socket in the same netns as the receiving device. While this may not
> > > > > cover all relevant use cases in multi-netns configurations, it should be
> > > > > good enough for most configurations that need this.
> > > > > 
> > > > > Here's a measurement of running 2 TCP streams through a MediaTek MT7622
> > > > > device (2-core Cortex-A53), which runs NAT with flow offload enabled from
> > > > > one ethernet port to PPPoE on another ethernet port + cake qdisc set to
> > > > > 1Gbps.
> > > > > 
> > > > > rx-gro-list off: 630 Mbit/s, CPU 35% idle
> > > > > rx-gro-list on:  770 Mbit/s, CPU 40% idle
> > > > > 
> > > > > Signe-off-by: Felix Fietkau <nbd@nbd.name>
> > > > > ---
> > > > >  net/ipv4/tcp_offload.c   | 32 ++++++++++++++++++++++++++++++++
> > > > >  net/ipv6/tcpv6_offload.c | 35 +++++++++++++++++++++++++++++++++++
> > > > >  2 files changed, 67 insertions(+)
> > > > > 
> > > > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> > > > > index 87ae9808e260..3e9b8c6f9c8c 100644
> > > > > --- a/net/ipv4/tcp_offload.c
> > > > > +++ b/net/ipv4/tcp_offload.c
> > > > > @@ -407,6 +407,36 @@ void tcp_gro_complete(struct sk_buff *skb)
> > > > >  }
> > > > >  EXPORT_SYMBOL(tcp_gro_complete);
> > > > >  
> > > > > +static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb,
> > > > > +				    struct tcphdr *th)
> > > > > +{
> > > > > +	const struct iphdr *iph;
> > > > > +	struct sk_buff *p;
> > > > > +	struct sock *sk;
> > > > > +	struct net *net;
> > > > > +	int iif, sdif;
> > > > > +
> > > > > +	if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST))
> > > > 
> > > > Should we add an 'unlikely()' here to pair with unlikely(is_flist) in
> > > > *gro_receive / *gro_complete?
> > > Not sure if unlikely() will make any difference here. I think it makes 
> > > more sense in the other places than here.
> > 
> > Why? AFAICS this will be called for every packet on the wire, exactly
> > as the code getting this annotation in patch 3/6.
> 
> I had compared assembly after adding an annotation and didn't see a 
> difference. However, my annotation was wrong.
> When I add: if (likely(!(skb->dev->features & NETIF_F_GRO_FRAGLIST)))
> the generated code is different, and I probably should use that.

I read the above as you intend to send a new revision. If so, feel free
to add 

Acked-by: Paolo Abeni <pabeni@redhat.com>

to the whole series.

Otherwise, please LMK, I think we can merge it as-is and ev. follow-up.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets
  2024-04-27 18:22 ` [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets Felix Fietkau
  2024-04-30 10:19   ` Paolo Abeni
  2024-04-30 10:23   ` Paolo Abeni
@ 2024-04-30 11:31   ` Eric Dumazet
  2 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-04-30 11:31 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: netdev, David S. Miller, David Ahern, Jakub Kicinski,
	Paolo Abeni, willemdebruijn.kernel, linux-kernel

On Sat, Apr 27, 2024 at 8:23 PM Felix Fietkau <nbd@nbd.name> wrote:
>
> Preparation for adding TCP fraglist GRO support. It expects packets to be
> combined in a similar way as UDP fraglist GSO packets.
> For IPv4 packets, NAT is handled in the same way as UDP fraglist GSO.
>
> Signed-off-by: Felix Fietkau <nbd@nbd.name>
> ---
>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-04-30 11:31 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-27 18:22 [PATCH v4 net-next v4 0/6] Add TCP fraglist GRO support Felix Fietkau
2024-04-27 18:22 ` [PATCH v4 net-next v4 1/6] net: move skb_gro_receive_list from udp to core Felix Fietkau
2024-04-27 18:22 ` [PATCH v4 net-next v4 2/6] net: add support for segmenting TCP fraglist GSO packets Felix Fietkau
2024-04-30 10:19   ` Paolo Abeni
2024-04-30 10:27     ` Felix Fietkau
2024-04-30 10:40       ` Paolo Abeni
2024-04-30 10:57         ` Felix Fietkau
2024-04-30 10:23   ` Paolo Abeni
2024-04-30 11:31   ` Eric Dumazet
2024-04-27 18:22 ` [PATCH v4 net-next v4 3/6] net: add code for TCP fraglist GRO Felix Fietkau
2024-04-27 18:23 ` [PATCH v4 net-next v4 4/6] net: create tcp_gro_lookup helper function Felix Fietkau
2024-04-27 18:23 ` [PATCH v4 net-next v4 5/6] net: create tcp_gro_header_pull " Felix Fietkau
2024-04-27 18:23 ` [PATCH v4 net-next v4 6/6] net: add heuristic for enabling TCP fraglist GRO Felix Fietkau
2024-04-30  3:25   ` Jakub Kicinski
2024-04-30 10:12   ` Paolo Abeni
2024-04-30 10:23     ` Felix Fietkau
2024-04-30 10:31       ` Paolo Abeni
2024-04-30 10:55         ` Felix Fietkau
2024-04-30 11:14           ` Paolo Abeni
2024-04-30 10:33   ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.